Skip to content

Crawlith/crawlith

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

247 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Crawlith

Professional-grade SEO crawling and graph intelligence suite.

image

License: Apache-2.0 Crawlith CI Coverage TypeScript


🚀 Overview

Crawlith is a high-performance, deterministic SEO intelligence engine built for serious structural analysis. Unlike traditional "flat" crawlers, Crawlith treats your website as a weighted directed graph, allowing you to identify not just broken links, but deep architectural flaws in authority distribution, content health, and technical infrastructure.

Whether you are performing a quick on-page audit or mapping a 100k-page spider-graph, Crawlith provides the precision and depth required for modern SEO professionals.


✨ Key Features

  • 🧠 Graph Intelligence: Built-in algorithms for PageRank, HITS (Hubs/Authorities), and link-equity flow analysis.
  • 🕸️ High-Performance Crawler: BFS-based discovery engine with robots.txt compliance, rate limiting, and multi-threaded execution.
  • 🧩 Extensible Plugin System: A modular architecture with 15+ specialized plugins for Soft 404 detection, content clustering, orphan intelligence, and more.
  • 🖥️ Premium Dashboard: Launch a local React-based UI (crawlith ui) to explore your link graphs and metrics interactively.
  • 🛡️ Secure & Compliant: Enterprise-grade safety features including DNS-validated SSRF protection (IPGuard), redirect loop detection, and scope enforcement.
  • 📊 Unified Data Layer: Production-grade SQLite persistence enabling snapshot history, trend tracking, and incremental crawling.

🏗 Monorepo Architecture

Crawlith is organized as a pnpm-powered monorepo for maximum modularity:

Package Purpose
@crawlith/core Headless engine handles crawling, graph math, and SQLite data layer.
@crawlith/cli Premium terminal interface with color-coded reports and interactive commands.
@crawlith/web React + Vite dashboard for visual site-graph exploration.
@crawlith/server REST API bridge connecting the headless core to visual consumers.
@crawlith/plugins Specialized intelligence modules (PageRank, Soft404, etc).

🚦 Quick Start

📦 Installation

To use Crawlith globally on your system:

npm install -g @crawlith/cli
# or
pnpm add -g @crawlith/cli

Or run it instantly without installation using npx:

npx crawlith --help

🛠 Usage

1. Crawl a Website

Build a full link graph and SEO metrics for a domain.

crawlith crawl https://example.com --limit 1000 --depth 10

2. Analyze a Single Page

Perfect for quick on-page SEO audits and content structure checks.

crawlith page https://example.com/blog/seo-guide

3. Start the UI Dashboard

Visualize your crawl snapshots in a beautiful, interactive interface.

crawlith ui

4. Probe Security

Inspect transport-layer headers, SSL/TLS status, and HTTP/2 support.

crawlith probe https://example.com

5. List Tracked Sites

View all sites currently stored in your local intelligence database.

crawlith sites

🔌 Intelligence Plugins

Crawlith ships with a suite of professional plugins:

  • pagerank: Measures the relative importance of every page in the link graph.
  • hits: Identifies "Hubs" (navigation) vs "Authorities" (content).
  • soft404-detector: Heuristic analysis to find 200 OK pages that are actually errors.
  • orphan-intelligence: Detects pages with zero internal inbound links.
  • pagespeed: Integration with Google PageSpeed Insights for Core Web Vitals and Lighthouse metrics.
  • snapshot-diff: Compare two crawl snapshots to see how metrics have evolved.

🛠 Development

We use pnpm for workspace management and vitest for testing.

# Run all tests with coverage
pnpm run test --coverage

# Clean and rebuild everything
pnpm run rebuild

# Lint the codebase
pnpm run lint

🛡 License & Safety

Crawlith is released under the Apache License 2.0.

IMPORTANT: Please ensure you have permission to crawl target domains. Crawlith respects robots.txt and rate limits by default. Do not use this tool for unauthorized scraping or density-testing.


Built with ❤️ by the Crawlith Team. Deterministic Crawl Intelligence.

About

Crawlith — Deterministic crawl intelligence.

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Contributors

Languages