cfxmark

Bidirectional Markdown ↔ Confluence Storage XHTML converter — with lossless opaque preservation for everything cfxmark doesn't explicitly know how to convert.

import cfxmark

# Markdown → Confluence storage XHTML
result = cfxmark.to_cfx(markdown_text)
result.xhtml          # str    — ready for Confluence REST PUT
result.attachments    # tuple  — local file refs the caller should upload
result.warnings       # tuple  — human-readable conversion warnings

# Confluence storage XHTML → Markdown
result = cfxmark.to_md(xhtml_text)
result.markdown       # str    — canonical markdown
result.warnings       # tuple

# Markdown or Confluence XHTML → Jira wiki markup
result = cfxmark.to_jira_wiki(markdown_text)
result.jira_wiki      # str | None — Jira wiki markup

# Jira wiki markup → Markdown  (EXPERIMENTAL — v0.3+, enhanced v0.4)
from cfxmark.jira import from_jira_wiki
result = from_jira_wiki(jira_issue_description)
result.markdown       # str
result.attachments    # tuple  — filenames referenced via [^file] / !file!
result.warnings       # tuple  — unsupported user mentions, …

ConversionResult is the same dataclass for all directions — xhtml is populated for to_cfx, markdown for to_md / from_jira_wiki, and jira_wiki for to_jira_wiki.

Why another converter?

Two existing projects inspired this one — md2cf and md2conf — but both are one-directional (md → cf) and neither preserves unknown macros across a round trip. cfxmark fills both gaps:

Bidirectional. to_md(to_cfx(m)) is byte-identical to canonicalize(m) for every construct in the supported subset.
Opaque preservation. Confluence content cfxmark doesn't understand (custom plugins, drawio diagrams, exotic table cells) round-trips byte-for-byte, including the ac:macro-id UUID. Confluence treats the round-tripped macro as the same instance, so comments, attachments, and permissions stay attached.
Pure text-in / text-out. No Confluence API, no network, no attachment upload. The caller owns REST I/O. (See "Image assets" below for the helper function that lets the caller plug in network-bound logic without bloating cfxmark.)

Install

cfxmark ships in two modes:

# Core: Markdown ↔ Confluence XHTML converter + Jira wiki renderer
pip install cfxmark

# With uv (recommended):
uv add cfxmark

# With the optional Confluence REST client (zero additional deps —
# the extra is namespace-only and reserves a stable upgrade slot):
pip install 'cfxmark[confluence]'

The confluence extra declares zero third-party runtime dependencies — from cfxmark.confluence import ConfluenceClient works even without it. The extra exists to signal intent in requirements files and to reserve a stable upgrade slot for future convenience helpers.

cfxmark depends on lxml and mistletoe. Python 3.10+.

The contract

cfxmark grades every Confluence construct into one of three buckets:

Grade	Description	Behaviour
I — Native	Standard CommonMark / GFM (headings, lists, tables, code fences, links, images, blockquote, hr, inline emphasis)	Lossless round-trip after canonicalization.
II — Directive	Confluence macros with a known Markdown directive mapping (`info`, `note`, `warning`, `tip`, `jira`, `expand`, `toc`)	Lossless after canonicalization. Pluggable via `MacroRegistry`.
III — Opaque	Everything else	Captured byte-for-byte through cfxmark's opaque-block / inline-opaque mechanism. Never dropped, never rewritten.

See docs/SPEC.md for the full mapping table and docs/OPAQUE.md for the opaque-block format.

Usage

Round-trip a Confluence page through Markdown

import cfxmark

# Whatever fetched the page (REST API call, exported XML file, …)
xhtml = my_confluence_client.get_storage_format(page_id)

# Convert to Markdown
md_result = cfxmark.to_md(xhtml)
markdown = md_result.markdown

# … user edits the Markdown …

# Convert back to Confluence storage XHTML
cfx_result = cfxmark.to_cfx(markdown)
my_confluence_client.update_page(page_id, cfx_result.xhtml)

# Optionally upload any newly referenced local images
for filename in cfx_result.attachments:
    my_confluence_client.upload_attachment(page_id, filename)

Image assets

When you convert a Confluence page that references uploaded attachments, the resulting Markdown looks like this:

![](image-3.png#cfxmark:w=700)<!-- cfxmark:asset src="image-3.png" -->

The image link still points at the original Confluence filename (broken in any local Markdown viewer until you fetch the bytes), and the  HTML comment carries enough metadata for a follow-up step to fetch and embed.

cfxmark.resolve_assets is that follow-up step. You provide a fetcher callback that returns bytes for one filename at a time, and choose between two output strategies:

import cfxmark
from pathlib import Path

def fetcher(filename: str) -> bytes:
    # Whatever you use to download from Confluence:
    return my_confluence_client.download_attachment(page_id, filename)

# Strategy A — sidecar directory (recommended for git-tracked docs).
# Saves bytes to ./assets/ and rewrites links to relative paths.
md = cfxmark.resolve_assets(
    md_result.markdown,
    fetcher,
    mode="sidecar",
    asset_dir="docs/page-42/assets",
    md_path="docs/page-42.md",
)
Path("docs/page-42.md").write_text(md)
# docs/page-42/assets/image-3.png exists
# md link: ![](assets/image-3.png#cfxmark:w=700)<!-- cfxmark:asset src="image-3.png" -->

# Strategy B — inline data URIs (single self-contained file).
md = cfxmark.resolve_assets(md_result.markdown, fetcher, mode="inline")
# md link: ![](data:image/png;base64,iVBORw0K...)<!-- cfxmark:asset src="image-3.png" -->

The asset markers are preserved through both strategies, so resolve_assets is idempotent and a subsequent to_cfx call always recovers the original Confluence filename — even if the visible link target has been rewritten to a sidecar path or a data URI.

Mermaid diagrams

cfxmark maps Markdown's ```mermaid fenced code block to Confluence's code macro with language=mermaid. If your Confluence instance has a Mermaid plugin installed (e.g. Mermaid Diagrams for Confluence) it will render the diagram automatically; otherwise the content is shown as a syntax-highlighted code block.

```mermaid
graph LR
  A --> B --> C
```

Inline opaque references

Inline elements that have no native Markdown form — Confluence user mentions, inline Jira issue macros, custom widget invocations, … — become a short Markdown link with a cfx:op-... URL:

Contact the purchaser ([@user-2c9402cc](cfx:op-4fab0f8d))

The [label] is auto-derived from the underlying element type (@user-…, jira:PROJ-1, cfx:status, …) and the op-XXXXXXXX ID is a SHA-256 prefix of the original XML payload. The full XML lives in a cfxmark:payloads sidecar at the bottom of the same Markdown file:

<!-- cfxmark:payloads -->
<!-- op-4fab0f8d
<ac:link><ri:user ri:userkey="2c9402cc83d4bcc40183d976ef730001"/></ac:link>
-->
<!-- /cfxmark:payloads -->

The SHA-256 fingerprint means a user who types that exact link syntax in their own Markdown is not silently re-interpreted as an opaque payload — the verification fails and the region falls back to ordinary text.

Block opaque blocks

Block-level Confluence content cfxmark doesn't know how to convert (e.g. drawio diagrams, plantuml, complex tables) is wrapped in a fenced code block with sentinel comments:

<!-- cfxmark:opaque id="op-1188e2b4" -->
```cfx-storage
<ac:structured-macro ac:name="drawio" ac:macro-id="...">
  <ac:parameter ac:name="diagramName">flow</ac:parameter>
  ...
</ac:structured-macro>
```
<!-- /cfxmark:opaque -->

Editors render this as a clearly visible code block — a "do not touch" signal for human readers. The Markdown parser detects the sentinels first and round-trips the contents byte-for-byte, including the original ac:macro-id UUID that Confluence uses to identify macro instances.

Header notice

When a converted Markdown document contains any opaque or directive markers, cfxmark prepends a single-line HTML comment explaining the conventions to humans and AI agents:

<!-- cfxmark:notice Converted from Confluence storage format. Inline
[label](cfx:op-XXXXXXXX) references preserve Confluence content that
has no native Markdown form; the raw XML for each lives in the
cfxmark:payloads sidecar at the bottom of this file. Do not edit
those references or the sidecar — tampering invalidates a SHA-256
fingerprint and the round trip falls back to plain text. -->

The comment is invisible in any Markdown viewer.

Custom macros

Promote a Confluence macro from "opaque" to "directive" by registering a custom handler:

import cfxmark
from cfxmark.macros import MacroRegistry
from cfxmark.macros.builtins import AdmonitionHandler

# Start from the default registry and add your own.
my_registry = cfxmark.default_registry.copy()
# Built-in AdmonitionHandler accepts one of: "info", "note", "warning", "tip".
# To promote a previously-opaque macro, write a small MacroHandler subclass —
# see cfxmark/macros/builtins/admonition.py for a complete example.
my_registry.register(AdmonitionHandler("warning"))

result = cfxmark.to_md(xhtml, macros=my_registry)

Implementing a MacroHandler from scratch requires a small amount of lxml knowledge — see cfxmark/macros/builtins/admonition.py for a complete example. A higher-level handler API that hides lxml is planned for v0.3.

Canonicalization helpers

cfxmark ships two canonicalization helpers, one for each side of the pipeline. Both are idempotent: f(f(x)) == f(x).

`canonicalize_cfx(xhtml)` — compare two storage fragments

Two Confluence storage fragments are "the same" only after a deep normalization pass that strips volatile attributes, editor noise, and rendering hints. Use canonicalize_cfx to compare two snapshots:

import cfxmark

c1 = cfxmark.canonicalize_cfx(original_xhtml)
c2 = cfxmark.canonicalize_cfx(round_tripped_xhtml)
assert c1 == c2  # passes for any document in the supported subset

canonicalize_cfx is the same function the test suite uses to verify byte-identical round trips against real Confluence pages. A good push pipeline calls it before the REST PUT so an unchanged body is skipped entirely:

remote = cfxmark.canonicalize_cfx(my_client.get_page(page_id))
local = cfxmark.canonicalize_cfx(cfxmark.to_cfx(local_md).xhtml)
if remote != local:
    my_client.update_page(page_id, ...)

`normalize_md(markdown)` — converge hand-edited Markdown

normalize_md is the Markdown-side counterpart: it runs the document through parse_md → render_md so the output is exactly the form cfxmark would have produced. Applying it before push flattens any drift introduced by hand edits, a different editor's Markdown autoformatter, or a historical cfxmark version.

import cfxmark

# Pre-push recipe: normalize hand-edited Markdown so the canonical
# XHTML body is stable across authors and editor plugins.
clean_md = cfxmark.normalize_md(local_md_from_disk)
xhtml = cfxmark.to_cfx(clean_md).xhtml

The key property: a document produced by to_md is already a fixed point of normalize_md, so round-trippers pay nothing. Hand-edited documents converge in a single pass, and that pass is enough to eliminate the "local file drifted from the round-trip form" class of bug (for example, stray ** delimiters in positions where cfxmark would have emitted raw  HTML because of CommonMark's CJK word-boundary rule).

If you only push normalize_md(text) rather than raw hand-edits, the canonicalize_cfx diff above stays stable across collaborators.

Jira wiki output

to_jira_wiki converts Markdown or Confluence storage XHTML to Jira wiki markup. It accepts the same source formats as to_cfx / to_md and auto-detects which format it received.

import cfxmark

result = cfxmark.to_jira_wiki(markdown_text)
print(result.jira_wiki)   # h2. Heading\n\n*bold* text …

Two optional parameters cover common push-pipeline patterns:

import re

# Only render the body of the first H2 section titled "Summary".
result = cfxmark.to_jira_wiki(markdown_text, section="Summary")

# Drop a leading cfxmark:notice comment before rendering
# (useful when pushing a round-tripped Confluence page to Jira).
result = cfxmark.to_jira_wiki(
    markdown_text,
    drop_leading_notice=(re.compile(r"cfxmark:notice"),),
)

Code block language identifiers can be normalised for Jira Server compatibility:

result = cfxmark.to_jira_wiki(
    markdown_text,
    code_language_map={"ts": "javascript", "kotlin": "java"},
)

result.jira_wiki is None when section= is specified but not found in the document.

Confluence client (optional extra)

Install with the confluence extra to signal intent — the client itself is always importable because it is built on Python's standard library:

pip install 'cfxmark[confluence]'

The extra declares zero additional runtime dependencies. It exists to:

Signal the dependency in your requirements.txt / pyproject.toml so readers see that you rely on the optional subsystem.
Reserve a stable upgrade slot — if future convenience helpers (credential stores, rich CLI) gain third-party deps, the extra is the place they'll land.

from cfxmark.confluence import ConfluenceClient, BearerTokenFile

client = ConfluenceClient(
    host="https://confluence.example.com",
    auth=BearerTokenFile("~/.secrets/confluence_pat"),
    dialect="server",
)

# Canonical-aware push — skips the REST PUT entirely when the remote
# body is byte-equivalent to the rendered local Markdown.
result = client.push_markdown(
    page_id="12345",
    md_text=my_markdown,
    md_path="docs/my_page.md",
    on_conflict="abort",
)
if result.changed:
    print(f"Pushed. Uploaded {len(result.uploaded_attachments)} new attachments.")
    if result.has_partial_failure:
        for name, ex in result.failed_attachments:
            print(f"  ! attachment {name} failed: {ex!r}")
else:
    print("No-op; remote is already current.")

# Canonical-aware pull with resolved assets in a sidecar directory.
pull = client.pull_markdown(
    page_id="12345",
    md_path="docs/my_page.md",
    resolve_assets_mode="sidecar",
    asset_dir="docs/my_page-assets",
)

Logging. The client uses logging.getLogger("cfxmark.confluence") exclusively — no direct writes to sys.stdout or sys.stderr. Enable progress output with:

import logging
logging.getLogger("cfxmark").setLevel(logging.INFO)

Confluence dialect. The default is dialect="server" because Confluence Server / Data Center is the reference test target. Confluence Cloud users should pass dialect="cloud" — the X-Atlassian-Token: no-check XSRF bypass header (mandatory on Server, unsupported on Cloud) is gated on this setting. Cloud support is best-effort; if you hit a Cloud-only regression, please open an issue.

Jira wiki (experimental, lossy)

The Confluence round-trip (to_cfx / to_md) is lossless after canonicalization — every construct in the supported subset round-trips byte-for-byte, and everything else is preserved through the opaque-block mechanism. The Jira wiki direction is not. Jira wiki markup is a looser dialect without opaque-macro identity preservation, and several constructs have no equivalent on the Markdown side.

Keep the two contracts at different import sites so the asymmetry is visible:

from cfxmark import to_cfx, to_md           # lossless, Confluence side
from cfxmark.jira import from_jira_wiki, to_jira_wiki  # experimental

Contract

The strongest guarantee the Jira pipeline offers is:

from_jira_wiki(jira_wiki) produces a :class:ConversionResult whose Markdown representation reaches a fixed point after at most two wiki → md → wiki iterations. Pass 1 (wiki → md) is a one-way canonicalization; pass 2 (md → wiki → md) must be idempotent. The real-world fixture corpus (6 Jira issue descriptions drawn from production) is exercised in tests/unit/test_jira_wiki_parser.py to pin this contract.

Explicitly allowed canonicalization (not considered a diff):

Heading spacing, list indent, trailing whitespace normalisation
Soft-break inside a paragraph collapsed into a single line
Jira {panel} macro mapped to {note} admonition
_italic_ canonicalised to Markdown *italic*
-strike- canonicalised to Markdown ~~strike~~
~sub~ → , ^sup^ → , +ins+ → <ins> (v0.4+)
{color:#hex}text{color} → text (v0.4+)
??text?? → <cite>text</cite> (v0.4+)

Explicitly forbidden (would break the contract):

Content loss inside headings, paragraphs, list items, code blocks
Re-ordering of top-level blocks
URL rewriting in links
Renaming attachments (filenames in [^file] / !file! are preserved verbatim in ConversionResult.attachments)

Lossy mapping table

Jira wiki	Markdown	Note
`h1.`…`h6.`	`#`…`######`	1:1 identity (no promotion on parse)
`bold`	`bold`	boundary-aware
`_italic_`	`italic`
`-strike-`	`~~strike~~`	boundary-aware
`{{mono}}`	`mono`
`~sub~`	`<sub>sub</sub>`	Subscript node (v0.4+)
`^sup^`	`<sup>sup</sup>`	Superscript node (v0.4+)
`+ins+`	`<ins>ins</ins>`	Underline node (v0.4+)
`{color:#hex}x{color}`	`<span style="color:#hex">x</span>`	ColorSpan node (v0.4+)
`??text??`	`<cite>text</cite>`	Citation node (v0.4+)
`[url]`	`[url](url)` (empty label form)
`[label\|url]`	`[label](url)`	label may contain nested `[...]`
`[^file.png]`	`![](file.png)`	when extension is image-like
`[^file.msg]`	`[file.msg](attachment:file.msg)`	otherwise
`[~user]`	(dropped)	warning recorded
`{code:python}body{code}`	```python\nbody\n```
`{noformat}body{noformat}`	```\nbody\n```
`{quote}body{quote}`, `bq.`	`> body`
`{info}`/`{note}`/`{warning}`/`{tip}`	`> [!INFO]` callout	GitHub / Obsidian style
`{panel:title=X}body{panel}`	`> [!NOTE] X` + warning	D4 mapping
`h3. Title`	`### Title`	literal nested bold preserved
Multi-line table cell	GFM cell with `<br>` soft break
`\|\|h1\|\|h2\|\|` header row	GFM header row
`----`	`---`

Warnings accumulate on ConversionResult.warnings; fixture tests in tests/unit/test_jira_wiki_parser.py pin the behaviour of every entry in this table.

Heading promotion on the output side

to_jira_wiki has a heading_promotion keyword that controls the heading level mapping:

"confluence" (default) — Markdown H3 collapses to Jira h2 (and H4 → h3, …). Use when the Jira wiki output will be pushed to a Confluence page whose title already occupies the top slot.
"jira" — identity mapping. Use when the output is pushed to a Jira issue description, because the issue title lives in a separate field and the body can start at h1.
"none" — alias for "jira".

from cfxmark import to_jira_wiki

# Confluence push — historical default
to_jira_wiki(md)

# Jira issue description push
to_jira_wiki(md, heading_promotion="jira")

HTML comment passthrough (wrapper metadata)

If your wrapper embeds caller-owned metadata as HTML comments in the local Markdown file — for example a workflow manifest:

# My feature

<!-- workflow:meta
  key: TASK-42
  type: Story
  last_synced_version: 15
-->

body...

— opt in via ConversionOptions.passthrough_html_comment_prefixes. Matching comments are preserved verbatim across parse_md / render_md, and silently dropped by to_cfx / to_jira_wiki so they never leak to Confluence or Jira:

from cfxmark import ConversionOptions, to_cfx, to_md
from cfxmark.normalize import strip_passthrough_comments

opts = ConversionOptions(
    passthrough_html_comment_prefixes=("workflow:",)
)

# Push: comment is dropped on the way to Confluence
result = to_cfx(local_md, options=opts)

# Canonical compare: strip comments on both sides before diffing
left = strip_passthrough_comments(local_md, ("workflow:",))
right = strip_passthrough_comments(pulled_md, ("workflow:",))
assert left == right

cfxmark: prefixes are filtered out so cfxmark's own sentinel comments (cfxmark:opaque, cfxmark:notice) cannot be hijacked.

Security

cfxmark hardens its XML parser against XXE and billion-laughs attacks:

Inputs containing <!DOCTYPE> or <!ENTITY> declarations are rejected before lxml ever sees them.
The lxml parser is configured with no_network=True, load_dtd=False, and huge_tree=False.
Opaque-block sentinels are SHA-256 verified — accidental sentinel syntax in user-typed Markdown does not become a real opaque block.

If you find a security issue, please open a GitHub issue.

Stability contract

The following names are covered by semantic versioning and will not be removed or incompatibly changed without a major version bump:

cfxmark package — to_cfx, to_md, to_jira_wiki, from_jira_wiki, canonicalize_cfx, normalize_md, strip_passthrough_comments, resolve_assets, ConversionResult, ConversionOptions, DEFAULT_OPTIONS, AssetFetcher, ResolveMode, CfxmarkError, ConversionError, MacroError, ParseError, AssetSecurityError, MacroRegistry, default_registry.

cfxmark.jira — to_jira_wiki, from_jira_wiki. Experimental in v0.3: the Jira parser contract is "converges after at most three wiki → md → wiki iterations", not byte-identical. The top-level to_jira_wiki, from_jira_wiki, and ConversionResult.jira_wiki field ARE stable — only the quality of the round-trip itself is experimental. See the "Jira wiki (experimental, lossy)" section above for the full contract.

cfxmark.confluence — ConfluenceClient, PushResult, PullResult, Auth, BearerToken, BearerTokenFile, BasicAuth, EnvBearerToken, HTTPError, ConfluenceVersionConflict.

Guarantees:

Breaking changes bump the minor version for 0.x.y releases.
canonicalize_cfx normalization rules are cumulative — each release is a strict superset of the previous release's canonicalization.
Deprecations are announced one minor version before removal.

Not covered: underscore-prefixed symbols, parsers.* / renderers.* / ast.* internals, logging message wording, ConversionResult.document AST shape, warning message wording.

Note: 0.x.y versioning is looser than 1.x.y — minor version bumps may carry breaking changes as noted above.

Development

git clone https://github.com/eunsanMountain/cfxmark
cd cfxmark
uv sync --all-extras

# Run all tests
uv run pytest

# Type-check
uv run mypy src/

# Lint
uv run ruff check .

# Build
uv build

The corpus tests look for .cfx files in tests/corpus/ (gitignored to keep your own private samples out of version control). Drop your own Confluence storage XHTML there and they will be exercised by pytest tests/test_corpus.py.

License

MIT. See LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
docs		docs
src/cfxmark		src/cfxmark
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

cfxmark

Why another converter?

Install

The contract

Usage

Round-trip a Confluence page through Markdown

Image assets

Mermaid diagrams

Inline opaque references

Block opaque blocks

Header notice

Custom macros

Canonicalization helpers

`canonicalize_cfx(xhtml)` — compare two storage fragments

`normalize_md(markdown)` — converge hand-edited Markdown

Jira wiki output

Confluence client (optional extra)

Jira wiki (experimental, lossy)

Contract

Lossy mapping table

Heading promotion on the output side

HTML comment passthrough (wrapper metadata)

Security

Stability contract

Development

License

About

Uh oh!

Releases 3

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

cfxmark

Why another converter?

Install

The contract

Usage

Round-trip a Confluence page through Markdown

Image assets

Mermaid diagrams

Inline opaque references

Block opaque blocks

Header notice

Custom macros

Canonicalization helpers

canonicalize_cfx(xhtml) — compare two storage fragments

normalize_md(markdown) — converge hand-edited Markdown

Jira wiki output

Confluence client (optional extra)

Jira wiki (experimental, lossy)

Contract

Lossy mapping table

Heading promotion on the output side

HTML comment passthrough (wrapper metadata)

Security

Stability contract

Development

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`canonicalize_cfx(xhtml)` — compare two storage fragments

`normalize_md(markdown)` — converge hand-edited Markdown

Packages