Bidirectional Markdown ↔ Confluence Storage XHTML converter — with lossless opaque preservation for everything cfxmark doesn't explicitly know how to convert.
import cfxmark
# Markdown → Confluence storage XHTML
result = cfxmark.to_cfx(markdown_text)
result.xhtml # str — ready for Confluence REST PUT
result.attachments # tuple — local file refs the caller should upload
result.warnings # tuple — human-readable conversion warnings
# Confluence storage XHTML → Markdown
result = cfxmark.to_md(xhtml_text)
result.markdown # str — canonical markdown
result.warnings # tuple
# Markdown or Confluence XHTML → Jira wiki markup
result = cfxmark.to_jira_wiki(markdown_text)
result.jira_wiki # str | None — Jira wiki markup
# Jira wiki markup → Markdown (EXPERIMENTAL — v0.3+, enhanced v0.4)
from cfxmark.jira import from_jira_wiki
result = from_jira_wiki(jira_issue_description)
result.markdown # str
result.attachments # tuple — filenames referenced via [^file] / !file!
result.warnings # tuple — unsupported user mentions, …ConversionResult is the same dataclass for all directions —
xhtml is populated for to_cfx, markdown for to_md /
from_jira_wiki, and jira_wiki for to_jira_wiki.
Two existing projects inspired this one — md2cf and
md2conf — but both are one-directional (md → cf) and
neither preserves unknown macros across a round trip. cfxmark fills
both gaps:
- Bidirectional.
to_md(to_cfx(m))is byte-identical tocanonicalize(m)for every construct in the supported subset. - Opaque preservation. Confluence content cfxmark doesn't
understand (custom plugins, drawio diagrams, exotic table cells)
round-trips byte-for-byte, including the
ac:macro-idUUID. Confluence treats the round-tripped macro as the same instance, so comments, attachments, and permissions stay attached. - Pure text-in / text-out. No Confluence API, no network, no attachment upload. The caller owns REST I/O. (See "Image assets" below for the helper function that lets the caller plug in network-bound logic without bloating cfxmark.)
cfxmark ships in two modes:
# Core: Markdown ↔ Confluence XHTML converter + Jira wiki renderer
pip install cfxmark
# With uv (recommended):
uv add cfxmark
# With the optional Confluence REST client (zero additional deps —
# the extra is namespace-only and reserves a stable upgrade slot):
pip install 'cfxmark[confluence]'The confluence extra declares zero third-party runtime dependencies —
from cfxmark.confluence import ConfluenceClient works even without it.
The extra exists to signal intent in requirements files and to reserve a
stable upgrade slot for future convenience helpers.
cfxmark depends on lxml and mistletoe. Python 3.10+.
cfxmark grades every Confluence construct into one of three buckets:
| Grade | Description | Behaviour |
|---|---|---|
| I — Native | Standard CommonMark / GFM (headings, lists, tables, code fences, links, images, blockquote, hr, inline emphasis) | Lossless round-trip after canonicalization. |
| II — Directive | Confluence macros with a known Markdown directive mapping (info, note, warning, tip, jira, expand, toc) |
Lossless after canonicalization. Pluggable via MacroRegistry. |
| III — Opaque | Everything else | Captured byte-for-byte through cfxmark's opaque-block / inline-opaque mechanism. Never dropped, never rewritten. |
See docs/SPEC.md
for the full mapping table and
docs/OPAQUE.md
for the opaque-block format.
import cfxmark
# Whatever fetched the page (REST API call, exported XML file, …)
xhtml = my_confluence_client.get_storage_format(page_id)
# Convert to Markdown
md_result = cfxmark.to_md(xhtml)
markdown = md_result.markdown
# … user edits the Markdown …
# Convert back to Confluence storage XHTML
cfx_result = cfxmark.to_cfx(markdown)
my_confluence_client.update_page(page_id, cfx_result.xhtml)
# Optionally upload any newly referenced local images
for filename in cfx_result.attachments:
my_confluence_client.upload_attachment(page_id, filename)When you convert a Confluence page that references uploaded attachments, the resulting Markdown looks like this:
<!-- cfxmark:asset src="image-3.png" -->The image link still points at the original Confluence filename
(broken in any local Markdown viewer until you fetch the bytes), and
the <!-- cfxmark:asset --> HTML comment carries enough metadata for
a follow-up step to fetch and embed.
cfxmark.resolve_assets is that follow-up step. You provide a
fetcher callback that returns bytes for one filename at a time, and
choose between two output strategies:
import cfxmark
from pathlib import Path
def fetcher(filename: str) -> bytes:
# Whatever you use to download from Confluence:
return my_confluence_client.download_attachment(page_id, filename)
# Strategy A — sidecar directory (recommended for git-tracked docs).
# Saves bytes to ./assets/ and rewrites links to relative paths.
md = cfxmark.resolve_assets(
md_result.markdown,
fetcher,
mode="sidecar",
asset_dir="docs/page-42/assets",
md_path="docs/page-42.md",
)
Path("docs/page-42.md").write_text(md)
# docs/page-42/assets/image-3.png exists
# md link: <!-- cfxmark:asset src="image-3.png" -->
# Strategy B — inline data URIs (single self-contained file).
md = cfxmark.resolve_assets(md_result.markdown, fetcher, mode="inline")
# md link: <!-- cfxmark:asset src="image-3.png" -->The asset markers are preserved through both strategies, so
resolve_assets is idempotent and a subsequent to_cfx call always
recovers the original Confluence filename — even if the visible link
target has been rewritten to a sidecar path or a data URI.
cfxmark maps Markdown's ```mermaid fenced code block to
Confluence's code macro with language=mermaid. If your Confluence
instance has a Mermaid plugin installed (e.g. Mermaid Diagrams for
Confluence) it will render the diagram automatically; otherwise the
content is shown as a syntax-highlighted code block.
```mermaid
graph LR
A --> B --> C
```Inline elements that have no native Markdown form — Confluence user
mentions, inline Jira issue macros, custom widget invocations, … —
become a short Markdown link with a cfx:op-... URL:
Contact the purchaser ([@user-2c9402cc](cfx:op-4fab0f8d))The [label] is auto-derived from the underlying element type
(@user-…, jira:PROJ-1, cfx:status, …) and the op-XXXXXXXX ID
is a SHA-256 prefix of the original XML payload. The full XML lives
in a cfxmark:payloads sidecar at the bottom of the same Markdown
file:
<!-- cfxmark:payloads -->
<!-- op-4fab0f8d
<ac:link><ri:user ri:userkey="2c9402cc83d4bcc40183d976ef730001"/></ac:link>
-->
<!-- /cfxmark:payloads -->The SHA-256 fingerprint means a user who types that exact link syntax in their own Markdown is not silently re-interpreted as an opaque payload — the verification fails and the region falls back to ordinary text.
Block-level Confluence content cfxmark doesn't know how to convert (e.g. drawio diagrams, plantuml, complex tables) is wrapped in a fenced code block with sentinel comments:
<!-- cfxmark:opaque id="op-1188e2b4" -->
```cfx-storage
<ac:structured-macro ac:name="drawio" ac:macro-id="...">
<ac:parameter ac:name="diagramName">flow</ac:parameter>
...
</ac:structured-macro>
```
<!-- /cfxmark:opaque -->Editors render this as a clearly visible code block — a "do not
touch" signal for human readers. The Markdown parser detects the
sentinels first and round-trips the contents byte-for-byte, including
the original ac:macro-id UUID that Confluence uses to identify
macro instances.
When a converted Markdown document contains any opaque or directive markers, cfxmark prepends a single-line HTML comment explaining the conventions to humans and AI agents:
<!-- cfxmark:notice Converted from Confluence storage format. Inline
[label](cfx:op-XXXXXXXX) references preserve Confluence content that
has no native Markdown form; the raw XML for each lives in the
cfxmark:payloads sidecar at the bottom of this file. Do not edit
those references or the sidecar — tampering invalidates a SHA-256
fingerprint and the round trip falls back to plain text. -->The comment is invisible in any Markdown viewer.
Promote a Confluence macro from "opaque" to "directive" by registering a custom handler:
import cfxmark
from cfxmark.macros import MacroRegistry
from cfxmark.macros.builtins import AdmonitionHandler
# Start from the default registry and add your own.
my_registry = cfxmark.default_registry.copy()
# Built-in AdmonitionHandler accepts one of: "info", "note", "warning", "tip".
# To promote a previously-opaque macro, write a small MacroHandler subclass —
# see cfxmark/macros/builtins/admonition.py for a complete example.
my_registry.register(AdmonitionHandler("warning"))
result = cfxmark.to_md(xhtml, macros=my_registry)Implementing a MacroHandler from scratch requires a small amount
of lxml knowledge — see cfxmark/macros/builtins/admonition.py for
a complete example. A higher-level handler API that hides lxml is
planned for v0.3.
cfxmark ships two canonicalization helpers, one for each side of the
pipeline. Both are idempotent: f(f(x)) == f(x).
Two Confluence storage fragments are "the same" only after a deep
normalization pass that strips volatile attributes, editor noise,
and rendering hints. Use canonicalize_cfx to compare two snapshots:
import cfxmark
c1 = cfxmark.canonicalize_cfx(original_xhtml)
c2 = cfxmark.canonicalize_cfx(round_tripped_xhtml)
assert c1 == c2 # passes for any document in the supported subsetcanonicalize_cfx is the same function the test suite uses to
verify byte-identical round trips against real Confluence pages.
A good push pipeline calls it before the REST PUT so an unchanged
body is skipped entirely:
remote = cfxmark.canonicalize_cfx(my_client.get_page(page_id))
local = cfxmark.canonicalize_cfx(cfxmark.to_cfx(local_md).xhtml)
if remote != local:
my_client.update_page(page_id, ...)normalize_md is the Markdown-side counterpart: it runs the document
through parse_md → render_md so the output is exactly the form
cfxmark would have produced. Applying it before push flattens any
drift introduced by hand edits, a different editor's Markdown
autoformatter, or a historical cfxmark version.
import cfxmark
# Pre-push recipe: normalize hand-edited Markdown so the canonical
# XHTML body is stable across authors and editor plugins.
clean_md = cfxmark.normalize_md(local_md_from_disk)
xhtml = cfxmark.to_cfx(clean_md).xhtmlThe key property: a document produced by to_md is already a fixed
point of normalize_md, so round-trippers pay nothing. Hand-edited
documents converge in a single pass, and that pass is enough to
eliminate the "local file drifted from the round-trip form" class of
bug (for example, stray ** delimiters in positions where cfxmark
would have emitted raw <strong> HTML because of CommonMark's CJK
word-boundary rule).
If you only push normalize_md(text) rather than raw hand-edits,
the canonicalize_cfx diff above stays stable across collaborators.
to_jira_wiki converts Markdown or Confluence storage XHTML to Jira
wiki markup. It accepts the same source formats as to_cfx / to_md
and auto-detects which format it received.
import cfxmark
result = cfxmark.to_jira_wiki(markdown_text)
print(result.jira_wiki) # h2. Heading\n\n*bold* text …Two optional parameters cover common push-pipeline patterns:
import re
# Only render the body of the first H2 section titled "Summary".
result = cfxmark.to_jira_wiki(markdown_text, section="Summary")
# Drop a leading cfxmark:notice comment before rendering
# (useful when pushing a round-tripped Confluence page to Jira).
result = cfxmark.to_jira_wiki(
markdown_text,
drop_leading_notice=(re.compile(r"cfxmark:notice"),),
)Code block language identifiers can be normalised for Jira Server compatibility:
result = cfxmark.to_jira_wiki(
markdown_text,
code_language_map={"ts": "javascript", "kotlin": "java"},
)result.jira_wiki is None when section= is specified but not
found in the document.
Install with the confluence extra to signal intent — the client
itself is always importable because it is built on Python's standard
library:
pip install 'cfxmark[confluence]'The extra declares zero additional runtime dependencies. It exists to:
- Signal the dependency in your
requirements.txt/pyproject.tomlso readers see that you rely on the optional subsystem. - Reserve a stable upgrade slot — if future convenience helpers (credential stores, rich CLI) gain third-party deps, the extra is the place they'll land.
from cfxmark.confluence import ConfluenceClient, BearerTokenFile
client = ConfluenceClient(
host="https://confluence.example.com",
auth=BearerTokenFile("~/.secrets/confluence_pat"),
dialect="server",
)
# Canonical-aware push — skips the REST PUT entirely when the remote
# body is byte-equivalent to the rendered local Markdown.
result = client.push_markdown(
page_id="12345",
md_text=my_markdown,
md_path="docs/my_page.md",
on_conflict="abort",
)
if result.changed:
print(f"Pushed. Uploaded {len(result.uploaded_attachments)} new attachments.")
if result.has_partial_failure:
for name, ex in result.failed_attachments:
print(f" ! attachment {name} failed: {ex!r}")
else:
print("No-op; remote is already current.")
# Canonical-aware pull with resolved assets in a sidecar directory.
pull = client.pull_markdown(
page_id="12345",
md_path="docs/my_page.md",
resolve_assets_mode="sidecar",
asset_dir="docs/my_page-assets",
)Logging. The client uses logging.getLogger("cfxmark.confluence")
exclusively — no direct writes to sys.stdout or sys.stderr.
Enable progress output with:
import logging
logging.getLogger("cfxmark").setLevel(logging.INFO)Confluence dialect. The default is dialect="server" because
Confluence Server / Data Center is the reference test target.
Confluence Cloud users should pass dialect="cloud" — the
X-Atlassian-Token: no-check XSRF bypass header (mandatory on
Server, unsupported on Cloud) is gated on this setting. Cloud support
is best-effort; if you hit a Cloud-only regression, please open an
issue.
The Confluence round-trip (to_cfx / to_md) is lossless after
canonicalization — every construct in the supported subset
round-trips byte-for-byte, and everything else is preserved through
the opaque-block mechanism. The Jira wiki direction is not. Jira
wiki markup is a looser dialect without opaque-macro identity
preservation, and several constructs have no equivalent on the
Markdown side.
Keep the two contracts at different import sites so the asymmetry is visible:
from cfxmark import to_cfx, to_md # lossless, Confluence side
from cfxmark.jira import from_jira_wiki, to_jira_wiki # experimentalThe strongest guarantee the Jira pipeline offers is:
from_jira_wiki(jira_wiki)produces a :class:ConversionResultwhose Markdown representation reaches a fixed point after at most twowiki → md → wikiiterations. Pass 1 (wiki → md) is a one-way canonicalization; pass 2 (md → wiki → md) must be idempotent. The real-world fixture corpus (6 Jira issue descriptions drawn from production) is exercised intests/unit/test_jira_wiki_parser.pyto pin this contract.
Explicitly allowed canonicalization (not considered a diff):
- Heading spacing, list indent, trailing whitespace normalisation
- Soft-break inside a paragraph collapsed into a single line
- Jira
{panel}macro mapped to{note}admonition _italic_canonicalised to Markdown*italic*-strike-canonicalised to Markdown~~strike~~~sub~→<sub>,^sup^→<sup>,+ins+→<ins>(v0.4+){color:#hex}text{color}→<span style="color:#hex">text</span>(v0.4+)??text??→<cite>text</cite>(v0.4+)
Explicitly forbidden (would break the contract):
- Content loss inside headings, paragraphs, list items, code blocks
- Re-ordering of top-level blocks
- URL rewriting in links
- Renaming attachments (filenames in
[^file]/!file!are preserved verbatim inConversionResult.attachments)
| Jira wiki | Markdown | Note |
|---|---|---|
h1.…h6. |
#…###### |
1:1 identity (no promotion on parse) |
*bold* |
**bold** |
boundary-aware |
_italic_ |
*italic* |
|
-strike- |
~~strike~~ |
boundary-aware |
{{mono}} |
`mono` |
|
~sub~ |
<sub>sub</sub> |
Subscript node (v0.4+) |
^sup^ |
<sup>sup</sup> |
Superscript node (v0.4+) |
+ins+ |
<ins>ins</ins> |
Underline node (v0.4+) |
{color:#hex}x{color} |
<span style="color:#hex">x</span> |
ColorSpan node (v0.4+) |
??text?? |
<cite>text</cite> |
Citation node (v0.4+) |
[url] |
[url](url) (empty label form) |
|
[label|url] |
[label](url) |
label may contain nested [...] |
[^file.png] |
 |
when extension is image-like |
[^file.msg] |
[file.msg](attachment:file.msg) |
otherwise |
[~user] |
(dropped) | warning recorded |
{code:python}body{code} |
```python\nbody\n``` |
|
{noformat}body{noformat} |
```\nbody\n``` |
|
{quote}body{quote}, bq. |
> body |
|
{info}/{note}/{warning}/{tip} |
> [!INFO] callout |
GitHub / Obsidian style |
{panel:title=X}body{panel} |
> [!NOTE] X + warning |
D4 mapping |
h3. *Title* |
### **Title** |
literal nested bold preserved |
| Multi-line table cell | GFM cell with <br> soft break |
|
||h1||h2|| header row |
GFM header row | |
---- |
--- |
Warnings accumulate on ConversionResult.warnings; fixture tests in
tests/unit/test_jira_wiki_parser.py pin the behaviour of every
entry in this table.
to_jira_wiki has a heading_promotion keyword that controls the
heading level mapping:
"confluence"(default) — Markdown H3 collapses to Jirah2(and H4 →h3, …). Use when the Jira wiki output will be pushed to a Confluence page whose title already occupies the top slot."jira"— identity mapping. Use when the output is pushed to a Jira issue description, because the issue title lives in a separate field and the body can start ath1."none"— alias for"jira".
from cfxmark import to_jira_wiki
# Confluence push — historical default
to_jira_wiki(md)
# Jira issue description push
to_jira_wiki(md, heading_promotion="jira")If your wrapper embeds caller-owned metadata as HTML comments in the local Markdown file — for example a workflow manifest:
# My feature
<!-- workflow:meta
key: TASK-42
type: Story
last_synced_version: 15
-->
body...— opt in via ConversionOptions.passthrough_html_comment_prefixes.
Matching comments are preserved verbatim across parse_md /
render_md, and silently dropped by to_cfx / to_jira_wiki so
they never leak to Confluence or Jira:
from cfxmark import ConversionOptions, to_cfx, to_md
from cfxmark.normalize import strip_passthrough_comments
opts = ConversionOptions(
passthrough_html_comment_prefixes=("workflow:",)
)
# Push: comment is dropped on the way to Confluence
result = to_cfx(local_md, options=opts)
# Canonical compare: strip comments on both sides before diffing
left = strip_passthrough_comments(local_md, ("workflow:",))
right = strip_passthrough_comments(pulled_md, ("workflow:",))
assert left == rightcfxmark: prefixes are filtered out so cfxmark's own sentinel
comments (cfxmark:opaque, cfxmark:notice) cannot be hijacked.
cfxmark hardens its XML parser against XXE and billion-laughs attacks:
- Inputs containing
<!DOCTYPE>or<!ENTITY>declarations are rejected before lxml ever sees them. - The lxml parser is configured with
no_network=True,load_dtd=False, andhuge_tree=False. - Opaque-block sentinels are SHA-256 verified — accidental sentinel syntax in user-typed Markdown does not become a real opaque block.
If you find a security issue, please open a GitHub issue.
The following names are covered by semantic versioning and will not be removed or incompatibly changed without a major version bump:
cfxmark package — to_cfx, to_md, to_jira_wiki,
from_jira_wiki, canonicalize_cfx, normalize_md,
strip_passthrough_comments, resolve_assets, ConversionResult,
ConversionOptions, DEFAULT_OPTIONS, AssetFetcher, ResolveMode,
CfxmarkError, ConversionError, MacroError, ParseError,
AssetSecurityError, MacroRegistry, default_registry.
cfxmark.jira — to_jira_wiki, from_jira_wiki.
Experimental in v0.3: the Jira parser contract is "converges
after at most three wiki → md → wiki iterations", not
byte-identical. The top-level to_jira_wiki, from_jira_wiki, and
ConversionResult.jira_wiki field ARE stable — only the quality
of the round-trip itself is experimental. See the "Jira wiki
(experimental, lossy)" section above for the full contract.
cfxmark.confluence — ConfluenceClient, PushResult,
PullResult, Auth, BearerToken, BearerTokenFile, BasicAuth,
EnvBearerToken, HTTPError, ConfluenceVersionConflict.
Guarantees:
- Breaking changes bump the minor version for 0.x.y releases.
canonicalize_cfxnormalization rules are cumulative — each release is a strict superset of the previous release's canonicalization.- Deprecations are announced one minor version before removal.
Not covered: underscore-prefixed symbols, parsers.* / renderers.* /
ast.* internals, logging message wording, ConversionResult.document
AST shape, warning message wording.
Note: 0.x.y versioning is looser than 1.x.y — minor version bumps may carry breaking changes as noted above.
git clone https://github.com/eunsanMountain/cfxmark
cd cfxmark
uv sync --all-extras
# Run all tests
uv run pytest
# Type-check
uv run mypy src/
# Lint
uv run ruff check .
# Build
uv buildThe corpus tests look for .cfx files in tests/corpus/ (gitignored
to keep your own private samples out of version control). Drop your
own Confluence storage XHTML there and they will be exercised by
pytest tests/test_corpus.py.
MIT. See LICENSE.