ScrapeIt Search

A self-hosted search engine that crawls websites you add and indexes them for search.

Features

Web search — full-text search across indexed pages
Image search — search images by alt text / description
Background crawler — recursively discovers and crawls linked pages
Light / dark mode
No external APIs — fully self-contained

Setup

Requirements

Node.js v18 or newer

Install

# 1. Navigate to the project folder
cd scrapeit

# 2. Install dependencies
cd backend && npm install

Run

# From the scrapeit/ root:
npm start

# Or with auto-restart on changes:
npm run dev

Open http://localhost:3001 in your browser.

Usage

Click Add Site in the header.
Enter any URL (e.g. https://example.com) and click Add & Crawl.
ScrapeIt will:
- Fetch and index the page (title, description, favicon, keywords, og:image)
- Extract all images and add them to the image index
- Find all links on the page and queue them for crawling
- Repeat for every discovered link, up to depth 5
Use the Search tab to search indexed pages.
Use the Images tab to search indexed images.
Check Stats for crawl progress and totals.
Click the Crawling button in the header to pause/resume the crawler.

Architecture

scrapeit/
├── backend/
│   ├── server.js    — Express API server
│   ├── db.js        — SQLite database setup
│   ├── crawler.js   — Background crawl engine (concurrent, depth-limited)
│   ├── scraper.js   — Page fetch + HTML parsing
│   └── data/        — SQLite database file (auto-created)
└── frontend/
    └── public/
        ├── index.html
        ├── css/style.css
        └── js/app.js

API Endpoints

Method	Path	Description
GET	/api/stats	Crawler stats and recent activity
GET	/api/search?q=	Web search
GET	/api/search/images?q=	Image search
GET	/api/sites	List all sites (with pagination & filter)
POST	/api/sites	Add a new seed URL
DELETE	/api/sites/:id	Delete a site
POST	/api/sites/:id/recrawl	Re-queue a site for crawling
GET	/api/crawler/status	Crawler status
POST	/api/crawler/pause	Pause crawler
POST	/api/crawler/resume	Resume crawler

Name		Name	Last commit message	Last commit date
Latest commit History 60 Commits
backend		backend
frontend/public		frontend/public
.gitignore		.gitignore
README.md		README.md
update.sh		update.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ScrapeIt Search

Features

Setup

Requirements

Install

Run

Usage

Architecture

API Endpoints

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ScrapeIt Search

Features

Setup

Requirements

Install

Run

Usage

Architecture

API Endpoints

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages