Webscraping in Node.js with Crawlee & Puppeteer
Web scraping in Node.js with Crawlee crawler from Apify and Puppeteer browser automation.
Piece of code to find page containing certain keywords or list of keywords in a list of websites and print out the page list with keywords in front of the links. The crawler runs per a proxy pool configued through a proxy list file.
Follow the steps below to set your scraping project.
npm init
npm install apify
npm install crawlee
npm install --save puppeteer
.txt files of the following names in the code: (One entry per line, the github does not pertain to this format, be careful!!!)
example entries:
artificial intelligence money finance investment
example entries:
https://money.com/ https://easywithai.com/ https://openaimaster.com/ https://www.fool.com/
example entris:
http://50.218.57.67:80 http://50.168.163.183:80 http://50.171.32.224:80 http://50.173.140.147:80 http://50.207.199.87:80 http://50.168.72.119:80 http://178.21.163.24:80 http://50.171.2.11:80 http://50.168.72.122:80 http://107.1.93.208:80
Node webscraping_nodejs_crawlee_puppeteer.js