Sitemap Generator

English · Deutsch

Crawls websites and generates standard-compliant sitemap.xml files. Uses Playwright for JavaScript rendering or httpx for fast HTTP crawling.

Screenshots

Main View

Sitemap Tree

Crawl History

Installation

One-Liner (Standalone, no Python required)

Linux / macOS:

curl -fsSL https://raw.githubusercontent.com/michaelblaess/sitemap-generator/main/install.sh | bash

Windows (PowerShell):

irm https://raw.githubusercontent.com/michaelblaess/sitemap-generator/main/install.ps1 | iex

Usage

# Simple crawl (httpx mode, fast)
sitemap-generator https://example.com

# With JavaScript rendering (Playwright)
sitemap-generator https://example.com --render

# Save sitemap directly
sitemap-generator https://example.com --output sitemap.xml

# Limit crawl depth
sitemap-generator https://example.com --max-depth 5

# More concurrency
sitemap-generator https://example.com --concurrency 16

# Ignore robots.txt
sitemap-generator https://example.com --ignore-robots

# With cookies (e.g. for login)
sitemap-generator https://example.com --cookie session=abc123

CLI Parameters

Parameter	Description	Default
`URL`	Start URL of the website	-
`--output`, `-o`	Output path for sitemap.xml	`sitemap_<host>_<timestamp>.xml`
`--max-depth`, `-d`	Maximum crawl depth	10
`--concurrency`, `-c`	Parallel requests	8
`--timeout`, `-t`	Timeout per page (seconds)	30
`--render`	Render JavaScript with Playwright	off
`--no-headless`	Browser visible (debugging)	off
`--ignore-robots`	Ignore robots.txt	off
`--user-agent`	Custom User-Agent	Chrome 131
`--cookie`	Set cookie (NAME=VALUE, multiple)	-

Keyboard Shortcuts (TUI)

Key	Function
`c`	Start crawl
`x`	Cancel crawl / JSON error report
`m`	Save sitemap
`s`	Settings
`g`	Export form report (JSON)
`j`	JIRA table to clipboard
`e`	Show errors only
`b`	Sitemap tree
`f`	Sitemap diff
`d`	Copy URL details
`l`	Toggle log
`h`	History
`i`	Info dialog
`q`	Quit

Copying / exporting the log runs via right-click on the log panel.

Features

Dual mode: httpx (fast, HTML only) or Playwright (JavaScript rendering)
robots.txt: Respected by default, --ignore-robots to disable
Auto-split: With >50,000 URLs, an automatic sitemap index with partial sitemaps
Priority: Automatically based on crawl depth (home page = 1.0)
lastmod: From HTTP Last-Modified header
URL normalization: Duplicates avoided through normalization
Form detection: <form> tags are detected, marked in the table and exportable as JSON
Live TUI: Progress, statistics and URL details in real time
Resizable panels: Splitters to freely resize the URL table, log and stats panels
Log panel: Right-click context menu — copy, export to file, or hide
Settings dialog: Language, robots.txt, Playwright, concurrency, timeout and crawl depth — persisted across runs
Filter with history: Filter the URL table by URL/status; recent filter terms in a dropdown

Browser Strategy

System Chrome preferred (faster startup, less memory)
Bundled Chromium as fallback (included in standalone installation)

Privacy

Important: Crawling a website may be perceived as unusual traffic by the operator. Please note:

Inform the website operator before crawling, especially for large websites
Respect robots.txt (enabled by default)
Use reasonable concurrency and timeout values
This tool is intended for your own websites and authorized analyses

Development

Setup

git clone https://github.com/michaelblaess/sitemap-generator.git
cd sitemap-generator

# Windows
.\bootstrap.ps1

# Linux/macOS
./bootstrap.sh

Local Start

# Windows
.\run.ps1 https://example.com

# Linux/macOS
./run.sh https://example.com

Creating a Release

git tag vX.Y.Z
git push origin vX.Y.Z

GitHub Actions automatically builds executables for Windows, Linux and macOS.

License

Apache License 2.0 - see LICENSE

Name		Name	Last commit message	Last commit date
Latest commit History 57 Commits
.github/workflows		.github/workflows
docs		docs
src/sitemap_generator		src/sitemap_generator
.git-blame-ignore-revs		.git-blame-ignore-revs
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
LICENSE		LICENSE
README.de.md		README.de.md
README.md		README.md
bootstrap.ps1		bootstrap.ps1
bootstrap.sh		bootstrap.sh
compile-linux.sh		compile-linux.sh
compile-macos.sh		compile-macos.sh
compile-win64.ps1		compile-win64.ps1
install.ps1		install.ps1
install.sh		install.sh
pyproject.toml		pyproject.toml
run.ps1		run.ps1
run.sh		run.sh
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sitemap Generator

Screenshots

Main View

Sitemap Tree

Crawl History

Installation

One-Liner (Standalone, no Python required)

Usage

CLI Parameters

Keyboard Shortcuts (TUI)

Features

Browser Strategy

Privacy

Development

Setup

Local Start

Creating a Release

License

About

Uh oh!

Releases 15

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Sitemap Generator

Screenshots

Main View

Sitemap Tree

Crawl History

Installation

One-Liner (Standalone, no Python required)

Usage

CLI Parameters

Keyboard Shortcuts (TUI)

Features

Browser Strategy

Privacy

Development

Setup

Local Start

Creating a Release

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 15

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages