From 15d469707bca783737bc2cace154e70d5fdca4b4 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?K=D0=B8=D1=80=D0=BE=2EK=D1=80=D0=B8=D0=BA=D0=B0?= <95271587+Goshko812@users.noreply.github.com> Date: Wed, 14 Aug 2024 21:23:25 +0300 Subject: [PATCH] Adding instructions --- README.md | 34 +++++++++++++++++++++++++++++++++- 1 file changed, 33 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 3ea159c..20f0f17 100644 --- a/README.md +++ b/README.md @@ -1,5 +1,5 @@ # Playwright_Scraper -Playwright scraper and crawler +Scraper and crawler built with Playwright and Cheerio # Versions and Differences @@ -12,3 +12,35 @@ This pretty much lets the crawler to go wild (can't recommend) **Scrape Domain Scope only** Scrapes within the domain scope (worse BFS version as this goes in a straight line and doesn't scan everything) + +# Requirements +first install npm + +**Arch** + +`sudo pacman -Sy nodejs` + +**Debian/Ubuntu** + +```bash +curl -sL https://deb.nodesource.com/setup_18.x -o nodesource_setup.sh + +sudo bash nodesource_setup.sh + +sudo apt install nodejs +``` + + +Then install Playwright and the other dependencies + +```bash +npm init playwright@latest + +npm install path + +npm install url + +npm install cheerio + +npm install fs +```