Version: 2.0 Date: 2026-03-17 Formats: 17
This document provides comprehensive documentation for all text formats supported by Yole's Kotlin Multiplatform shared module.
All format parsers live in the shared KMP module at shared/src/commonMain/kotlin/digital/vasic/yole/format/. Each format is implemented once and runs on all platforms (Android, Desktop, iOS, Web/Wasm).
shared/src/commonMain/kotlin/digital/vasic/yole/format/
├── FormatRegistry.kt # Central registry: lazy-loaded, detection priority order
├── TextFormat.kt # Format metadata (id, name, extensions, detectionPatterns)
├── TextParser.kt # ParsedDocument class with lazy HTML caching
├── ParserInitializer.kt # Lazy and eager parser registration
├── DocumentCache.kt # LRU cache for ParsedDocument with hit/miss tracking
├── StyleSheets.kt # CSS generation with styleSheetCache
├── markdown/ # MarkdownParser.kt
├── todotxt/ # TodoTxtParser.kt
├── csv/ # CsvParser.kt
├── latex/ # LatexParser.kt
├── asciidoc/ # AsciidocParser.kt
├── orgmode/ # OrgModeParser.kt
├── wikitext/ # WikitextParser.kt
├── restructuredtext/ # RestructuredTextParser.kt
├── taskpaper/ # TaskpaperParser.kt
├── textile/ # TextileParser.kt
├── creole/ # CreoleParser.kt
├── tiddlywiki/ # TiddlyWikiParser.kt
├── jupyter/ # JupyterParser.kt
├── rmarkdown/ # RMarkdownParser.kt
├── plaintext/ # PlaintextParser.kt
└── keyvalue/ # KeyValueParser.kt
- Detection --
FormatRegistry.detectByExtension()(O(1) map lookup) ordetectByContent()(regex patterns in priority order) - Parsing -- Format-specific parser produces a
ParsedDocument(raw content + parsed content + metadata + errors) - HTML Generation --
ParsedDocument.toHtml(lightMode)with lazy caching (first call generates, subsequent calls return cached) - Styling --
StyleSheets.ktgenerates CSS for light/dark themes withstyleSheetCache
// Register all parsers lazily (done once at app startup)
ParserInitializer.registerAllParsersLazy()
// Detect format by file extension
val format = FormatRegistry.detectByExtension("notes.md")
// -> TextFormat(id = "markdown", ...)
// Get parser and parse content
val parser = ParserRegistry.getParser(format!!)
val doc = parser!!.parse("# Hello\n\nThis is **bold** text.")
// Access parsed content
println(doc.rawContent) // "# Hello\n\nThis is **bold** text."
println(doc.parsedContent) // Parsed representation
println(doc.metadata) // {"title": "Hello", ...}
println(doc.errors) // [] (empty if valid)// Generate HTML (lazy: first call computes, subsequent calls return cached)
val htmlLight = doc.toHtml(lightMode = true)
val htmlDark = doc.toHtml(lightMode = false)
// Check cache state
println(doc.hasHtmlCached(lightMode = true)) // true
println(doc.hasHtmlCached(lightMode = false)) // true
// Free memory if needed
doc.clearHtmlCache()val cache = DocumentCache(maxSize = 100)
// Parse with caching
val doc = FormatRegistry.parseWithCache(content, format, cache)
// Monitor cache effectiveness
println("Cache size: ${cache.size}")
println("Hit rate: ${cache.hitRate}")
println("Hits: ${cache.hits}, Misses: ${cache.misses}")Parser: markdown/MarkdownParser.kt
Format ID: TextFormat.ID_MARKDOWN
Extensions: .md, .markdown, .mdown, .mkd
MIME Types: text/markdown, text/x-markdown
Capabilities:
- Full CommonMark specification support
- GitHub Flavored Markdown (GFM) extensions
- Tables with alignment
- Task lists with interactive checkboxes
- Code blocks with syntax highlighting (50+ languages)
- KaTeX math rendering (inline and block)
- Footnotes, abbreviations, definition lists
- YAML front matter parsing
- Table of contents generation
- Mermaid diagram rendering
- Emoji shortcodes
- Auto-link detection
Parsing Example:
val parser = MarkdownParser()
val doc = parser.parse("""
# Project Plan
## Tasks
- [x] Setup repository
- [ ] Write documentation
| Feature | Status |
|---------|--------|
| Parsing | Done |
""".trimIndent())
val html = doc.toHtml(lightMode = true)
// Produces full HTML with table, checkboxes, headingsDependencies: Flexmark 0.64.8 with 16+ extensions
Parser: todotxt/TodoTxtParser.kt
Format ID: TextFormat.ID_TODOTXT
Extensions: .txt (detected by content pattern)
MIME Types: text/plain
Capabilities:
- Task management with completion tracking
- Priority levels (A-Z)
- Contexts (@context)
- Projects (+project)
- Due dates (due:YYYY-MM-DD)
- Creation dates
- Advanced query syntax with boolean operators
- Archive functionality
- Sorting and filtering
Parsing Example:
val parser = TodoTxtParser()
val doc = parser.parse("""
(A) Call dentist @phone +health due:2026-04-01
x 2026-03-15 2026-03-10 Submit report @work +quarterly
(B) Buy groceries @errands
""".trimIndent())Note: Input guard limits regex processing to 10K characters to prevent backtracking on very long lines.
Parser: csv/CsvParser.kt
Format ID: TextFormat.ID_CSV
Extensions: .csv
MIME Types: text/csv, application/csv
Capabilities:
- Table preview with HTML table rendering
- Column-based syntax highlighting
- CSV parsing with quote handling
- Automatic header row detection
- Sorting and filtering
- Markdown rendering within cells
Parsing Example:
val parser = CsvParser()
val doc = parser.parse("""
Name,Age,City
Alice,30,New York
Bob,25,San Francisco
""".trimIndent())
val html = doc.toHtml(lightMode = true)
// Produces an HTML table with headers and rowsParser: wikitext/WikitextParser.kt
Format ID: TextFormat.ID_WIKITEXT
Extensions: .wiki, .wikitext, .mediawiki
MIME Types: text/x-wiki
Capabilities:
- MediaWiki/Zim wiki format support
- Heading syntax (
== Heading ==) - Link resolution and validation
- Bold/italic formatting
- Lists (ordered and unordered)
- Table support
- Transclusion support
- Backlink detection
Parser: keyvalue/KeyValueParser.kt
Format ID: TextFormat.ID_KEYVALUE
Extensions: .properties, .ini, .env, .conf, .config, .cfg
MIME Types: text/x-java-properties, text/x-ini
Capabilities:
- Java properties format (
key=value) - INI format with sections (
[section]) - Environment variable files (
KEY=value) - Comment styles:
#,;,// - Structure validation
- Syntax highlighting for all sub-formats
- TOML basic support
- YAML basic support
Parsing Example:
val parser = KeyValueParser()
val doc = parser.parse("""
[database]
host=localhost
port=5432
# Connection pool
max_connections=10
""".trimIndent())Parser: asciidoc/AsciidocParser.kt
Format ID: TextFormat.ID_ASCIIDOC
Extensions: .adoc, .asciidoc, .asc
MIME Types: text/asciidoc
Capabilities:
- Technical documentation format
- Document structure parsing (sections, chapters)
- Admonition blocks (NOTE, TIP, WARNING, CAUTION, IMPORTANT)
- Code blocks with syntax highlighting
- Cross-references and anchors
- Attributes and variables
- Table of contents generation
- Table support
- Include directives
Parser: orgmode/OrgModeParser.kt
Format ID: TextFormat.ID_ORGMODE
Extensions: .org
MIME Types: text/x-org
Capabilities:
- Emacs org-mode compatibility
- Heading hierarchy (
*,**,***) - TODO state tracking (TODO, DONE, custom keywords)
- Tags (
:tag1:tag2:) - Properties drawers (
:PROPERTIES:) - Timestamps (
<2026-03-17>) - Scheduling (SCHEDULED, DEADLINE)
- Org-mode tables
- Source code blocks (
#+BEGIN_SRC) - Links (
[[link][description]])
Parser: latex/LatexParser.kt
Format ID: TextFormat.ID_LATEX
Extensions: .tex, .latex
MIME Types: application/x-latex, text/x-latex
Capabilities:
- LaTeX command highlighting
- Math expression parsing (inline
$...$and display$$...$$) - Document structure detection (sections, chapters, parts)
- Environment support (
\begin{...}/\end{...}) - Bibliography reference handling
- Cross-references (
\ref,\label) - Command auto-detection
Parser: restructuredtext/RestructuredTextParser.kt
Format ID: TextFormat.ID_RESTRUCTUREDTEXT
Extensions: .rst, .rest, .restx, .rtxt
MIME Types: text/x-rst, text/prs.fallenstein.rst
Capabilities:
- reStructuredText specification support
- Section headers with underline characters
- Directive support (code-block, image, note, warning)
- Role support (
:ref:,:doc:,:math:) - Field lists and option lists
- Table of contents generation
- Cross-references
- Grid and simple table formats
Parser: taskpaper/TaskpaperParser.kt
Format ID: TextFormat.ID_TASKPAPER
Extensions: .taskpaper, .todo
MIME Types: text/x-taskpaper
Capabilities:
- Project-based task organization (lines ending with
:) - Task items (lines starting with
-) - Tag support (
@tag,@tag(value)) - Note attachments
- Search and filtering
- Done tag tracking (
@done) - Archive functionality
Parser: textile/TextileParser.kt
Format ID: TextFormat.ID_TEXTILE
Extensions: .textile, .txtl
MIME Types: text/x-textile
Capabilities:
- Textile markup language support
- Text formatting (
*bold*,_italic_,-strikethrough-) - Headings (
h1.,h2., etc.) - Block quotes (
bq.) - Code blocks (
bc.) - Table support (
|cell|cell|) - Image embedding (
!url!) - Link formatting (
"text":url)
Parser: creole/CreoleParser.kt
Format ID: TextFormat.ID_CREOLE
Extensions: .creole, .wiki
MIME Types: text/x-creole
Capabilities:
- Creole wiki markup standard
- Bold (
**bold**) and italic (//italic//) - Headings (
= H1,== H2, etc.) - Lists (unordered
*, ordered#) - Links (
[[link|text]]) - Images (
{{image.png|alt}}) - Table support (
|cell|cell|) - Horizontal rules (
----) - Nowiki/preformatted blocks (
{{{ ... }}})
Parser: tiddlywiki/TiddlyWikiParser.kt
Format ID: TextFormat.ID_TIDDLYWIKI
Extensions: .tid, .tiddler
MIME Types: text/x-tiddlywiki
Capabilities:
- Personal wiki format (tiddler structure)
- Metadata fields (created, modified, tags, type)
- Tagging system
- Bold, italic, underline, strikethrough
- Headings (
! H1,!! H2, etc.) - Lists (ordered and unordered)
- Transclusion
- Macros
Parser: jupyter/JupyterParser.kt
Format ID: TextFormat.ID_JUPYTER
Extensions: .ipynb
MIME Types: application/x-ipynb+json
Capabilities:
- Jupyter notebook JSON format parsing
- Cell-based structure (code, markdown, raw)
- Code cell output handling (text, images, HTML)
- Markdown cell rendering
- Cell metadata parsing
- Notebook metadata extraction (kernel, language)
- Source handling (array and string formats)
- Error output display
Parsing Example:
val parser = JupyterParser()
val doc = parser.parse("""
{
"cells": [
{
"cell_type": "markdown",
"source": ["# Analysis\n", "Results below."]
},
{
"cell_type": "code",
"source": ["import pandas as pd\n", "df = pd.read_csv('data.csv')"],
"outputs": [{"output_type": "stream", "text": [" Name Age\n0 Alice 30"]}]
}
],
"metadata": {"kernelspec": {"language": "python"}}
}
""".trimIndent())Parser: rmarkdown/RMarkdownParser.kt
Format ID: TextFormat.ID_RMARKDOWN
Extensions: .Rmd, .rmd, .rmarkdown
MIME Types: text/x-r-markdown
Capabilities:
- R code chunk integration (
```{r} ... ```) - YAML front matter parsing
- Markdown rendering between code chunks
- Inline R expressions (
`r expression`) - Output format specification
- Bibliography support
- Cross-references
Parser: plaintext/PlaintextParser.kt
Format ID: TextFormat.ID_PLAINTEXT
Extensions: .txt, .text, .log
MIME Types: text/plain
Capabilities:
- Universal text format (fallback)
- Language detection for code files
- Line numbering
- Search and replace
- Minimal processing overhead
Format ID: TextFormat.ID_BINARY
Extensions: All binary file types
MIME Types: Various binary types
Capabilities:
- Detects binary files by magic numbers (PDF:
%PDF-, PNG:89504E47, etc.) - Prevents editing of non-text files
- Displays file type information
- No parsing or HTML generation
- User Override -- Manual format selection by the user
- File Extension -- Primary detection via
FormatRegistry.detectByExtension()(O(1) map lookup) - Content Analysis -- Fallback via
FormatRegistry.detectByContent()(regex patterns in priority order) - Default -- Plain text format
| Format | Detection Pattern |
|---|---|
| Markdown | Lines starting with # |
| Todo.txt | Priority pattern (A) at start of line |
| WikiText | == Heading == syntax |
| Org Mode | Lines starting with * followed by space |
| TiddlyWiki | Metadata fields: created:, modified:, tags: |
| Jupyter | JSON with "cells" array |
| R Markdown | YAML front matter + ```{r} code chunks |
| Binary | File header magic numbers |
- Create parser directory in
shared/src/commonMain/kotlin/digital/vasic/yole/format/[name]/ - Implement parser class that extends
TextParserand producesParsedDocument - Add
TextFormatentry toFormatRegistry.formatslist (order matters -- more specific formats before general ones) - Add format ID constant to
TextFormat.Companion - Register in
ParserInitializer.kt(both eager and lazy paths) - Add tests in
shared/src/commonTest/kotlin/digital/vasic/yole/format/[name]/ - Add platform-specific code in
androidMain/,desktopMain/, etc. if needed - Update
FORMAT_SUPPORT_MATRIX.md
Parsers are registered lazily via ParserInitializer.registerAllParsersLazy(), which stores factory lambdas instead of parser instances. Each parser is instantiated on first access, saving 30-50ms at startup.
ParsedDocument.toHtml() generates HTML only on first call and caches the result. Separate caches exist for light and dark modes.
An LRU cache (DocumentCache) stores recently parsed documents with hit/miss tracking. Use FormatRegistry.parseWithCache() for automatic caching.
- TodoTxtParser has a 10K character guard to prevent regex backtracking
- DocumentCache uses cooperative cancellation (
yield()) for long operations - Streaming is used for large file operations
Document Version: 2.0 Last Updated: 2026-03-17 Maintained By: Engineering Team