Skip to content

Scrap emails, phone numbers and links from a URL or its archived versions.

Notifications You must be signed in to change notification settings

yetanotherf0rked/WaybackWebSift

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

WaybackWebSift

Tool to scrape emails, phone numbers, and links from a given URL either passively from archived sources or actively by fetching the URL. This project is a rewrite of WebSift by s-r-e-e-r-a-j in Python.

Demo

Features

  • Scraping Emails: Extract emails from visible text as well as from mailto: links.
  • Scraping Phone Numbers: Extract phone numbers found in visible text and from tel: links.
  • Scraping Links: Extract HTTP and HTTPS links from the page.
  • Passive Recon: Fetch content from archived sources using Wayback Machine or archive.is.

Requirements

The project requires Python 3 and the following packages:

A requirements.txt file is provided for easy installation.

Installation

  1. Clone the repository:
git clone https://github.com/yetanotherf0rked/waybackwebsift.git
cd waybackwebsift
  1. Install the required packages:
pip install -r requirements.txt

Usage

Run the main script:

python waybackwebsift.py

Follow the interactive prompts to choose the URL, the archive source (if any), the data to scrape, and whether or not you want to save the results in a specified folder.

Known Issues

  • When requesting archive.today, we get a 302 with a timeout before getting the URL. This is not supported yet by the script.
  • Links extracted when using archivers are suffixed by their archived URLs.

License

MIT

About

Scrap emails, phone numbers and links from a URL or its archived versions.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages