Skip to content

philipszdavido/HTMLParser-C-

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

13 Commits
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ› οΈ HTML Parser in C++

A lightweight HTML parser written entirely in C++ β€” no external dependencies.
Designed for speed, clarity, and flexibility, this parser can tokenize and extract HTML elements, attributes, and text content for further processing.


✨ Features

  • πŸ” Tokenization β€” Splits HTML into tags, attributes, and text nodes
  • 🏷 Element parsing β€” Identifies start tags, end tags, and self-closing tags
  • πŸ“œ Attribute handling β€” Reads attributes and values, supports quoted/unquoted values
  • πŸͺΆ Lightweight β€” Zero third-party dependencies, pure standard C++
  • πŸš€ Fast β€” Minimal allocations, stream-based parsing

πŸ“¦ Installation

Clone the repository:

git clone https://github.com/philipszdavido/HTMLParser-C-.git
cd HTMLParser-C-
#include "HtmlParser.hpp"
#include <iostream>

int main() {
    std::string html = R"(
        <html>
            <head><title>Hello</title></head>
            <body>
                <h1 class="title">Welcome!</h1>
                <p id="intro">This is a test.</p>
            </body>
        </html>
    )";

    HtmlParser parser(html);
    auto tokens = parser.parse();

    for (const auto& token : tokens) {
        std::cout << token << "\n";
    }

    return 0;
}
Token(name="html", start=true, end=false, attributes=[])
Token(name="head", start=true, end=false, attributes=[])
Token(name="title", start=true, end=false, attributes=[])
...

About

A HTML parser written in C++

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published