Scraper and crawler built with Playwright and Cheerio

Go to file

Kиро.Kрика 0fa9f7003b Added everything to readme		2024-08-14 21:05:04 +03:00
bfs-scrape.js	Added the BFS version	2024-08-14 20:49:07 +03:00
LICENSE	Initial commit	2024-08-14 20:47:49 +03:00
README.md	Added everything to readme	2024-08-14 21:05:04 +03:00
scrape-everything.js	added the clusterfuck	2024-08-14 20:53:42 +03:00
scrape-within-domain-only.js	added domain scope only scraper	2024-08-14 20:54:57 +03:00

README.md

Playwright_Scraper

Playwright scraper and crawler

Versions and Differences

BFS version The BFS version uses the Breadth-First Search Approach To ensure the crawler explores all pages more thoroughly the crawler processes all immediate links (siblings) at the current depth level before moving on to deeper levels.

Scrape Everything This pretty much lets the crawler to go wild (can't recommend)

Scrape Domain Scope only Scrapes within the domain scope (worse BFS version as this goes in a straight line and doesn't scan everything)