OffInternet

A comprehensive tool to download websites and YouTube videos for offline use.

OffInternet is a Python-based tool designed to reliably save complex web content—including complete websites with all assets or YouTube videos with related content—to your local machine for offline access.

Features

Complete Website Archival: Downloads entire websites including HTML, CSS, JavaScript, images, and other assets
YouTube Video Downloader: Downloads YouTube videos and optionally fetches related/suggested videos
Offline Access: Download once, view anytime—perfect for archiving, research, or areas with poor connectivity
Cloudflare Bypass: Built-in support for websites protected by Cloudflare security
Smart Crawling: Intelligent URL filtering with relaxed rules to capture more content
Bundle Export: Saves all downloaded content as .page files (zipped archives with metadata)
Python-Powered: Built with 100% Python for transparency and easy customization

Getting Started

Prerequisites

Python 3.10+ installed on your system
Chrome/Chromium browser (for Cloudflare bypass and JavaScript rendering)
ChromeDriver (automatically managed by the script)

Installation & Usage

Clone the repository:

git clone https://github.com/lazerkatsweirdstuff/offinternet.git
cd offinternet

Install required dependencies:
```
pip install -r requirements.txt
```
Run the downloader:
```
python downloader.py
```
View downloaded content:
```
python page-browser.py
```
The browser will run on localhost:8000

Advanced Usage

Download YouTube Videos

python downloader.py -y https://youtube.com/watch?v=VIDEO_ID -m 5

This downloads the video and up to 5 suggested videos. The -m flag sets the maximum number of videos to download.

Download from a List of URLs

python downloader.py -f urls.txt -m 10

Downloads websites from a text file (urls.txt), with a limit of 10 pages per site.

Advanced Website Crawling

python downloader.py https://example.com -m 20 --skip-assets

Downloads a website deeply (20 pages) but skips images and stylesheets to save space.

Command Line Reference

Argument	Description
`URL`	Direct URL to download (website or YouTube)
`-y`, `--youtube`	Force YouTube downloader mode
`-w`, `--website`	Force website downloader mode
`-f FILE`, `--file FILE`	File containing list of URLs
`-o DIR`, `--output DIR`	Custom output directory
`-m N`, `--max-pages N`	Maximum pages/videos to download (default: 2)
`-d N`, `--depth N`	Crawl depth for websites (default: 1)
`-q`, `--quiet`	Quiet mode (no interactive prompts)
`-v`, `--verbose`	Verbose output
`--skip-assets`	Skip downloading CSS/images/assets
`--yt-quality QUALITY`	YouTube quality: best, 720p, 480p, 360p, worst (default: 720p)
`--yt-format FORMAT`	Custom YouTube download format string

Project Structure

offinternet/
├── page-downloader.py   # Main downloader
├── page-browser.py      # Local web server for viewing .page files
├── README.md            # This file
├── downloaded_sites/    # Default location for saved websites
|  ├── youtube_videos/      # Default location for saved YouTube videos
└── executables/         # (Work in progress) Pre-built binaries

How It Works

For Websites:

Intelligent Crawling: Starts from a URL and follows internal links with configurable depth.
Asset Capture: Downloads HTML, CSS, JavaScript, images, fonts, and other resources.
Link Rewriting: Converts web URLs to local file paths for offline navigation.
Bundle Creation: Packages everything into a .page file (a ZIP archive with metadata).

For YouTube:

Video Download: Downloads the specified video using yt-dlp.
Metadata Capture: Saves title, description, thumbnail, and video info.
Related Content: Optionally downloads suggested videos.
Embedded Player: Creates an HTML page with the video embedded for offline viewing.

Technical Details

Format: Saved as .page files (ZIP archives containing HTML, assets, and metadata)
Viewing: Open .page files with the included page-browser.py or any ZIP file utility
Compatibility: Works on Windows, macOS, and Linux
Dependencies: See requirements.txt for complete list

Project Status

The project is under active development. Pre-built executables for easier use might be available in the /executables directory (these are a work in progress, so the Python version is recommended for reliability).

Contributing

Contributions are welcome! If you encounter a website that doesn't download correctly with OffInternet:

Submit an Issue: Open a GitHub Issue detailing the page you'd like to add support for

License

BSD-2-Clause license

Name		Name	Last commit message	Last commit date
Latest commit History 110 Commits
.github		.github
executables		executables
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
hook-fake_useragent.py		hook-fake_useragent.py
page-browser.py		page-browser.py
page-downloader.py		page-downloader.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OffInternet

Features

Getting Started

Prerequisites

Installation & Usage

Advanced Usage

Download YouTube Videos

Download from a List of URLs

Advanced Website Crawling

Command Line Reference

Project Structure

How It Works

For Websites:

For YouTube:

Technical Details

Project Status

Contributing

License

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

lazerkatsweirdstuff/offinternet

Folders and files

Latest commit

History

Repository files navigation

OffInternet

Features

Getting Started

Prerequisites

Installation & Usage

Advanced Usage

Download YouTube Videos

Download from a List of URLs

Advanced Website Crawling

Command Line Reference

Project Structure

How It Works

For Websites:

For YouTube:

Technical Details

Project Status

Contributing

License

About

Topics

Resources

License

Code of conduct

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages