English ·
Deutsch
Crawls websites and generates standard-compliant sitemap.xml files. Uses Playwright for JavaScript rendering or httpx for fast HTTP crawling.
Linux / macOS:
curl -fsSL https://raw.githubusercontent.com/michaelblaess/sitemap-generator/main/install.sh | bashWindows (PowerShell):
irm https://raw.githubusercontent.com/michaelblaess/sitemap-generator/main/install.ps1 | iex# Simple crawl (httpx mode, fast)
sitemap-generator https://example.com
# With JavaScript rendering (Playwright)
sitemap-generator https://example.com --render
# Save sitemap directly
sitemap-generator https://example.com --output sitemap.xml
# Limit crawl depth
sitemap-generator https://example.com --max-depth 5
# More concurrency
sitemap-generator https://example.com --concurrency 16
# Ignore robots.txt
sitemap-generator https://example.com --ignore-robots
# With cookies (e.g. for login)
sitemap-generator https://example.com --cookie session=abc123| Parameter | Description | Default |
|---|---|---|
URL |
Start URL of the website | - |
--output, -o |
Output path for sitemap.xml | sitemap_<host>_<timestamp>.xml |
--max-depth, -d |
Maximum crawl depth | 10 |
--concurrency, -c |
Parallel requests | 8 |
--timeout, -t |
Timeout per page (seconds) | 30 |
--render |
Render JavaScript with Playwright | off |
--no-headless |
Browser visible (debugging) | off |
--ignore-robots |
Ignore robots.txt | off |
--user-agent |
Custom User-Agent | Chrome 131 |
--cookie |
Set cookie (NAME=VALUE, multiple) | - |
| Key | Function |
|---|---|
c |
Start crawl |
x |
Cancel crawl / JSON error report |
m |
Save sitemap |
s |
Settings |
g |
Export form report (JSON) |
j |
JIRA table to clipboard |
e |
Show errors only |
b |
Sitemap tree |
f |
Sitemap diff |
d |
Copy URL details |
l |
Toggle log |
h |
History |
i |
Info dialog |
q |
Quit |
Copying / exporting the log runs via right-click on the log panel.
- Dual mode: httpx (fast, HTML only) or Playwright (JavaScript rendering)
- robots.txt: Respected by default,
--ignore-robotsto disable - Auto-split: With >50,000 URLs, an automatic sitemap index with partial sitemaps
- Priority: Automatically based on crawl depth (home page = 1.0)
- lastmod: From HTTP Last-Modified header
- URL normalization: Duplicates avoided through normalization
- Form detection:
<form>tags are detected, marked in the table and exportable as JSON - Live TUI: Progress, statistics and URL details in real time
- Resizable panels: Splitters to freely resize the URL table, log and stats panels
- Log panel: Right-click context menu — copy, export to file, or hide
- Settings dialog: Language, robots.txt, Playwright, concurrency, timeout and crawl depth — persisted across runs
- Filter with history: Filter the URL table by URL/status; recent filter terms in a dropdown
- System Chrome preferred (faster startup, less memory)
- Bundled Chromium as fallback (included in standalone installation)
Important: Crawling a website may be perceived as unusual traffic by the operator. Please note:
- Inform the website operator before crawling, especially for large websites
- Respect
robots.txt(enabled by default) - Use reasonable concurrency and timeout values
- This tool is intended for your own websites and authorized analyses
git clone https://github.com/michaelblaess/sitemap-generator.git
cd sitemap-generator
# Windows
.\bootstrap.ps1
# Linux/macOS
./bootstrap.sh# Windows
.\run.ps1 https://example.com
# Linux/macOS
./run.sh https://example.comgit tag vX.Y.Z
git push origin vX.Y.ZGitHub Actions automatically builds executables for Windows, Linux and macOS.
Apache License 2.0 - see LICENSE


