boots-static

A small, dependency-free static site generator written from scratch in Python.

It reads Markdown files from content/, converts them into HTML using a hand-written Markdown parser, wraps them in an HTML template, and writes the finished site into docs/ (ready to be served by GitHub Pages or any static host).

This project was built as part of the Boot.dev "Build a Static Site Generator in Python" course. The live demo site (the "Tolkien Fan Club") is deployed to GitHub Pages at:

https://searse.github.io/boots-static/

What is a "static site generator"?

A static site generator (SSG) is a program that takes:

some easy-to-write source files (here, Markdown), and
a layout/template (here, a single template.html),

and produces a folder full of plain HTML/CSS/image files.

Those files are "static" because no server-side code runs when a visitor loads the page — the browser just downloads the pre-built HTML. That makes the site fast, cheap to host, and easy to deploy on services like GitHub Pages, Netlify, or Cloudflare Pages.

Popular SSGs include Hugo, Jekyll, and Eleventy. This project builds a tiny one from scratch so you can see exactly how each step works.

Features

Pure Python 3 — no third-party libraries required (uses only the standard library).
Hand-written Markdown parser that supports headings, paragraphs, bold, italic, inline code, code blocks, blockquotes, ordered/unordered lists, links, and images.
Recursively walks content/ so the site's URL structure mirrors the folder structure.
Copies a static/ folder (CSS, images) verbatim into the output.
Configurable basepath so the same code can be served at / locally and at /boots-static/ on GitHub Pages.
Full unit-test suite using Python's built-in unittest.

Project structure

boots-static/
├── build.sh              # Build for GitHub Pages (uses /boots-static/ basepath)
├── main.sh               # Build for local dev + start a local server on :8888
├── test.sh               # Run the unit-test suite
├── template.html         # The single HTML layout used for every page
│
├── content/              # Markdown source files (your site's pages)
│   ├── index.md
│   ├── contact/index.md
│   └── blog/
│       ├── glorfindel/index.md
│       ├── majesty/index.md
│       └── tom/index.md
│
├── static/               # Files copied verbatim into the output
│   ├── index.css
│   └── images/
│       ├── glorfindel.png
│       ├── rivendell.png
│       ├── tolkien.png
│       └── tom.png
│
├── docs/                 # Build output — deployed to GitHub Pages (auto-generated)
│
└── src/                  # The static site generator itself
    ├── main.py                 # Entry point: orchestrates the build
    ├── copystatic.py           # Recursively copies static/ -> docs/
    ├── gencontent.py           # Recursively converts content/*.md -> docs/*.html
    ├── textnode.py             # TextNode: an inline piece of text (bold, link, etc.)
    ├── htmlnode.py             # HTMLNode / LeafNode / ParentNode: the HTML tree
    ├── inline_markdown.py      # Inline parser: bold, italic, code, links, images
    ├── block_markdown.py       # Block parser: headings, lists, quotes, paragraphs
    └── test_*.py               # Unit tests for each module

Quick start

You need Python 3 installed (no pip install required — it's pure stdlib).

git clone https://github.com/searse/boots-static.git
cd boots-static

./main.sh

main.sh will:

Build the site into docs/.
Start a local web server in docs/ at http://localhost:8888.

Open the URL in your browser and you'll see the demo site.

How it works (high-level data flow)

When you run python3 src/main.py, four things happen in order:

                  ┌───────────────┐
                  │  template.html│
                  └──────┬────────┘
                         │
   static/   ──copy──▶   docs/      (CSS + images)
                         ▲
   content/  ──parse──▶  ║   ──fill template──▶  docs/**/*.html
   (Markdown)            ║
                         ║
                 (Markdown → HTML
                  via tiny custom parser)

Step by step:

Wipe docs/ — guarantees a clean build.
Copy static/ → docs/ — CSS and images are static assets and pass straight through (copystatic.py).
Walk content/ recursively — for every *.md file found, generate a matching *.html file at the same relative path inside docs/ (gencontent.py).
For each Markdown file:
- Parse the Markdown into an in-memory tree of HTML nodes.
- Render the tree to an HTML string.
- Extract the first # Heading and use it as the page <title>.
- Substitute {{ Title }} and {{ Content }} in template.html.
- Rewrite root-relative URLs (/foo) to use the configured basepath (e.g. /boots-static/foo) so links work on GitHub Pages.
- Write the result to disk.

Architecture in depth

The interesting part of this project is the Markdown-to-HTML pipeline. It's split into two layers — an HTML node model and a two-pass parser — that work together. Here is how each piece fits.

1. The HTML node model — `src/htmlnode.py`

Rather than building an HTML string character by character, the generator first builds a tree of objects that represents the page. This is essentially a tiny DOM.

There are three classes:

HTMLNode — the abstract base class. Stores a tag ("p", "h1", ...), an optional value (text content), optional children, and optional props (HTML attributes like href or src).
LeafNode — an HTML node with no children, only a text value. Used for <b>, <i>, <a>, <img>, or plain text.
ParentNode — an HTML node that contains other nodes as children. Used for <p>, <ul>, <h2>, <blockquote>, etc.

Every node knows how to render itself as HTML via to_html(). A ParentNode does this by recursively calling to_html() on each of its children and concatenating the results — a classic tree-walking pattern.

ParentNode("p", [
    LeafNode(None, "Hello, "),
    LeafNode("b", "world"),
    LeafNode(None, "!"),
]).to_html()
# -> "<p>Hello, <b>world</b>!</p>"

2. The intermediate text representation — `src/textnode.py`

Inside a paragraph, text is a mix of plain words, bold spans, italics, code, links, and images. To make parsing tractable, the generator first represents these inline elements as TextNode objects — a flat list of typed text spans:

class TextType(Enum):
    TEXT, BOLD, ITALIC, CODE, LINK, IMAGE = ...

TextNode("hello", TextType.TEXT)
TextNode("world", TextType.BOLD)
TextNode("Boot.dev", TextType.LINK, url="https://boot.dev")

text_node_to_html_node() then converts each TextNode into the appropriate LeafNode (e.g. BOLD → <b>, LINK → <a href="…">). This separation keeps the parser simple: it only worries about what kind of span each piece of text is, not about HTML tags or attributes.

3. The inline parser — `src/inline_markdown.py`

This module turns a raw line of Markdown into a list of TextNodes by applying a series of splitters:

split_nodes_delimiter — splits by paired delimiters (**bold**, _italic_, `code`). It walks each existing text node, splits on the delimiter, and tags the odd-indexed chunks as the new text type.
split_nodes_image — uses a regex to find ![alt](url) and replaces them with IMAGE nodes.
split_nodes_link — uses a regex to find [text](url) (skipping ones preceded by !) and replaces them with LINK nodes.

text_to_textnodes() chains these together in the right order. Once a node has been tagged as anything other than TEXT, later splitters leave it alone — that's how nested delimiters are avoided.

4. The block parser — `src/block_markdown.py`

Markdown is also organized into blocks separated by blank lines (a paragraph, a list, a code fence, etc.). This module:

markdown_to_blocks() — splits the whole document on \n\n and trims whitespace.
block_to_block_type() — inspects each block and classifies it as one of HEADING, CODE, QUOTE, UNORDERED_LIST, ORDERED_LIST, or PARAGRAPH.
A dedicated *_to_html_node function for each block type builds the right ParentNode (<h2>, <ul>, <pre><code>, <blockquote>, <p>, …), using the inline parser to fill in its children.
markdown_to_html_node() wraps every block in a top-level <div> and returns it.

The end result of calling markdown_to_html_node(my_markdown).to_html() is a complete HTML fragment.

5. Page generation and the template — `src/gencontent.py` + `template.html`

template.html is a minimal layout with two placeholders:

<title>{{ Title }}</title>
...
<article>{{ Content }}</article>

For each content/**/*.md file:

extract_title() finds the first # H1 heading in the Markdown and uses it as the title (raising an error if missing — every page is required to have one).
markdown_to_html_node() produces the HTML body.
The placeholders are replaced.
All root-relative URLs (href="/...", src="/...") are rewritten to start with the configured basepath (e.g. /boots-static/...) so the site works under a GitHub Pages subpath. Locally, the basepath defaults to /.

generate_pages_recursive() walks the content/ tree, mirroring the directory layout into docs/, swapping .md for .html on the way out.

6. Static assets — `src/copystatic.py`

Before content is generated, copy_files_recursive() deletes docs/ and copies the entire static/ tree into it. Anything in static/ — CSS, images, fonts, downloads — lands at the same relative path in the output.

7. The entry point — `src/main.py`

main.py orchestrates the whole build:

basepath = sys.argv[1] if len(sys.argv) > 1 else "/"

shutil.rmtree("./docs", ignore_errors=True)
copy_files_recursive("./static", "./docs")
generate_pages_recursive("./content", "./template.html", "./docs", basepath)

That's the entire program in three lines of logic.

Supported Markdown syntax

The hand-written parser supports the following constructs:

Markdown	HTML output
`# Heading 1` … `###### Heading 6`	`<h1>…</h1>` … `<h6>…</h6>`
Paragraph text	`<p>…</p>`
`bold`	`<b>bold</b>`
`_italic_`	`<i>italic</i>`
`code`	`<code>code</code>`
Triple-backtick block	`<pre><code>…</code></pre>`
`> quoted line` (one or more lines)	`<blockquote>…</blockquote>`
`- item` lines	`<ul><li>…</li></ul>`
`1. item`, `2. item`, …	`<ol><li>…</li></ol>`
`[text](url)`	`<a href="url">text</a>`
`![alt](url)`	`<img src="url" alt="alt">`

Blocks are separated by blank lines. Headings use # (with a trailing space) and must be 1–6 hashes. Ordered lists must start at 1. and increment by one.

This is a teaching project, so the parser is intentionally small. It does not implement the full CommonMark spec — features like nested lists, reference-style links, HTML passthrough, tables, or footnotes are not supported.

Adding your own content

Create a new folder under content/, e.g. content/about/.
Put an index.md inside it. The file must contain a top-level # Heading — that becomes the page title.
Link to your new page from any other Markdown file using a root-relative path:
```
[About me](/about)
```
Drop any images you want into static/images/ and reference them with ![alt](/images/foo.png).
Run ./main.sh and refresh your browser.

Folders become URL paths automatically. For example, content/blog/tom/index.md is rendered to docs/blog/tom/index.html, served at /blog/tom/.

Building and deploying to GitHub Pages

There are two build scripts:

main.sh — builds with the default basepath (/) and serves it locally on port 8888. Use this while developing.
build.sh — builds with basepath /boots-static/, which matches the URL prefix used by GitHub Pages for this repo (https://<user>.github.io/<repo>/). Use this before pushing for deployment.

This repo is configured to publish the docs/ folder on the main branch to GitHub Pages. Deploy flow:

./build.sh                                  # generates docs/ with the /boots-static/ basepath
git add docs
git commit -m "Deploy: rebuild site"
git push

Within a minute or so, GitHub Pages will pick up the new docs/ content and publish it.

If you fork this project under a different repo name, change the basepath in build.sh to match your own repo (/<your-repo-name>/), or just leave it / if you're deploying to a root domain.

Testing

All non-trivial modules have unit tests using Python's built-in unittest framework. To run them:

./test.sh
# equivalent to:
python3 -m unittest discover -s src

The test files (src/test_*.py) are great places to look if you want concrete examples of how each module is meant to be called.

Glossary for beginners

Static site — A website made of pre-built HTML/CSS/JS files. No database or server-side code runs per request.
Markdown — A lightweight plain-text format that's easy to write and converts cleanly to HTML.
AST (abstract syntax tree) — An in-memory tree representation of parsed content. Here, the HTMLNode tree is the AST of a page.
Leaf node — A tree node with no children (just a value).
Parent node — A tree node that contains other nodes.
Recursion — A function calling itself to process tree-shaped or nested data. Used here both for walking the content/ folder and for rendering nested HTML nodes.
Template / templating — Filling placeholders in a layout file ({{ Title }}, {{ Content }}) with real values.
Basepath — A URL prefix the whole site lives under. GitHub Pages serves project sites from /<repo>/, so links need to be rewritten to include that prefix.
GitHub Pages — A free static-hosting service built into GitHub that serves files directly from a branch/folder of your repository.

Credits

Built for the Boot.dev course "Build a Static Site Generator in Python."
Demo content ("Tolkien Fan Club") is the course's sample site; feel free to replace it with your own.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

boots-static

Table of contents

What is a "static site generator"?

Features

Project structure

Quick start

How it works (high-level data flow)

Architecture in depth

1. The HTML node model — `src/htmlnode.py`

2. The intermediate text representation — `src/textnode.py`

3. The inline parser — `src/inline_markdown.py`

4. The block parser — `src/block_markdown.py`

5. Page generation and the template — `src/gencontent.py` + `template.html`

6. Static assets — `src/copystatic.py`

7. The entry point — `src/main.py`

Supported Markdown syntax

Adding your own content

Building and deploying to GitHub Pages

Testing

Glossary for beginners

Credits

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
content		content
docs		docs
src		src
static		static
.gitignore		.gitignore
README.md		README.md
build.sh		build.sh
main.sh		main.sh
template.html		template.html
test.sh		test.sh

Folders and files

Latest commit

History

Repository files navigation

boots-static

Table of contents

What is a "static site generator"?

Features

Project structure

Quick start

How it works (high-level data flow)

Architecture in depth

1. The HTML node model — src/htmlnode.py

2. The intermediate text representation — src/textnode.py

3. The inline parser — src/inline_markdown.py

4. The block parser — src/block_markdown.py

5. Page generation and the template — src/gencontent.py + template.html

6. Static assets — src/copystatic.py

7. The entry point — src/main.py

Supported Markdown syntax

Adding your own content

Building and deploying to GitHub Pages

Testing

Glossary for beginners

Credits

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

1. The HTML node model — `src/htmlnode.py`

2. The intermediate text representation — `src/textnode.py`

3. The inline parser — `src/inline_markdown.py`

4. The block parser — `src/block_markdown.py`

5. Page generation and the template — `src/gencontent.py` + `template.html`

6. Static assets — `src/copystatic.py`

7. The entry point — `src/main.py`

Packages