Crawlith is a high-performance, deterministic SEO intelligence engine built for serious structural analysis. Unlike traditional "flat" crawlers, Crawlith treats your website as a weighted directed graph, allowing you to identify not just broken links, but deep architectural flaws in authority distribution, content health, and technical infrastructure.
Whether you are performing a quick on-page audit or mapping a 100k-page spider-graph, Crawlith provides the precision and depth required for modern SEO professionals.
- 🧠 Graph Intelligence: Built-in algorithms for PageRank, HITS (Hubs/Authorities), and link-equity flow analysis.
- 🕸️ High-Performance Crawler: BFS-based discovery engine with
robots.txtcompliance, rate limiting, and multi-threaded execution. - 🧩 Extensible Plugin System: A modular architecture with 15+ specialized plugins for Soft 404 detection, content clustering, orphan intelligence, and more.
- 🖥️ Premium Dashboard: Launch a local React-based UI (
crawlith ui) to explore your link graphs and metrics interactively. - 🛡️ Secure & Compliant: Enterprise-grade safety features including DNS-validated SSRF protection (
IPGuard), redirect loop detection, and scope enforcement. - 📊 Unified Data Layer: Production-grade SQLite persistence enabling snapshot history, trend tracking, and incremental crawling.
Crawlith is organized as a pnpm-powered monorepo for maximum modularity:
| Package | Purpose |
|---|---|
@crawlith/core |
Headless engine handles crawling, graph math, and SQLite data layer. |
@crawlith/cli |
Premium terminal interface with color-coded reports and interactive commands. |
@crawlith/web |
React + Vite dashboard for visual site-graph exploration. |
@crawlith/server |
REST API bridge connecting the headless core to visual consumers. |
@crawlith/plugins |
Specialized intelligence modules (PageRank, Soft404, etc). |
To use Crawlith globally on your system:
npm install -g @crawlith/cli
# or
pnpm add -g @crawlith/cliOr run it instantly without installation using npx:
npx crawlith --helpBuild a full link graph and SEO metrics for a domain.
crawlith crawl https://example.com --limit 1000 --depth 10Perfect for quick on-page SEO audits and content structure checks.
crawlith page https://example.com/blog/seo-guideVisualize your crawl snapshots in a beautiful, interactive interface.
crawlith uiInspect transport-layer headers, SSL/TLS status, and HTTP/2 support.
crawlith probe https://example.comView all sites currently stored in your local intelligence database.
crawlith sitesCrawlith ships with a suite of professional plugins:
pagerank: Measures the relative importance of every page in the link graph.hits: Identifies "Hubs" (navigation) vs "Authorities" (content).soft404-detector: Heuristic analysis to find 200 OK pages that are actually errors.orphan-intelligence: Detects pages with zero internal inbound links.pagespeed: Integration with Google PageSpeed Insights for Core Web Vitals and Lighthouse metrics.snapshot-diff: Compare two crawl snapshots to see how metrics have evolved.
We use pnpm for workspace management and vitest for testing.
# Run all tests with coverage
pnpm run test --coverage
# Clean and rebuild everything
pnpm run rebuild
# Lint the codebase
pnpm run lintCrawlith is released under the Apache License 2.0.
IMPORTANT: Please ensure you have permission to crawl target domains. Crawlith respects robots.txt and rate limits by default. Do not use this tool for unauthorized scraping or density-testing.
