Tool to scrape emails, phone numbers, and links from a given URL either passively from archived sources or actively by fetching the URL. This project is a rewrite of WebSift by s-r-e-e-r-a-j in Python.
- Scraping Emails: Extract emails from visible text as well as from
mailto:links. - Scraping Phone Numbers: Extract phone numbers found in visible text and from
tel:links. - Scraping Links: Extract HTTP and HTTPS links from the page.
- Passive Recon: Fetch content from archived sources using Wayback Machine or archive.is.
The project requires Python 3 and the following packages:
A requirements.txt file is provided for easy installation.
- Clone the repository:
git clone https://github.com/yetanotherf0rked/waybackwebsift.git
cd waybackwebsift- Install the required packages:
pip install -r requirements.txtRun the main script:
python waybackwebsift.pyFollow the interactive prompts to choose the URL, the archive source (if any), the data to scrape, and whether or not you want to save the results in a specified folder.
- When requesting archive.today, we get a 302 with a timeout before getting the URL. This is not supported yet by the script.
- Links extracted when using archivers are suffixed by their archived URLs.
MIT
