xmldiffreport

📖 Documentation: https://xmldiffreport.bilouro.com/ · English
📖 Documentation: https://xmldiffreport.bilouro.com/pt/ · Português

N-way structural & semantic XML diff that produces human-readable Markdown reports — driven by per-dialect recipes.

xmldiffreport compares two or more XML files at once — BMC Control-M exports, Maven POMs, JUnit/xUnit reports, sitemaps, or any dialect you teach it with a small recipe — and tells you what actually changed, element by element and attribute by attribute, not a noisy line-by-line text diff. It aligns elements by a natural key (not by position), ignores volatile attributes, and renders a clean Markdown report with a summary table plus per-element detail.

It was born from a real problem — spotting differences between BMC Control-M job patches flowing through test → uat → bench → prod — and generalized into a recipe-driven engine that works on any XML dialect (Control-M exports, sitemaps, POMs, manifests, …).

Status: early (0.1.0), but already useful. Feedback and recipes welcome.

Why not a normal diff / `xmldiff`?

A plain diff (or git diff) on XML lies, for three reasons:

Volatile attributes — VERSION, CREATION_TIME, JOBISN… change on every export with no functional meaning.
Reordering — children are often unordered; a reorder is not a change.
Attribute order inside a tag is irrelevant.

Text/edit-script diffs (like the excellent xmldiff) solve part of this but are 2-way, algorithm-matched (you can't say "match <JOB> by JOBNAME"), and output an edit script rather than a review-friendly report.

	xmldiffreport	xmldiff	DiffDog / Oxygen	DeltaXML
Match by declared natural key	✅	❌	⚠️ limited	✅
N-way (3+ files at once)	✅	❌	❌	❌
Markdown report out of the box	✅	❌ (edit script)	⚠️ GUI	❌ (delta XML)
Open source	✅	✅	❌	❌

When to use which — choose xmldiffreport for N-way, key-aligned, report-first comparison (e.g. "the same folder in uat, bench and prod"); reach for xmldiff to produce a patch/edit script, DiffDog/Oxygen for interactive 2-way merging, DeltaXML for heuristic matching of keyless documents, and git diff for raw line changes on already-normalized XML. Full breakdown: How it compares.

Install

pip install xmldiffreport

Requires Python 3.11+ (uses the standard-library tomllib). No third-party dependencies.

Quickstart

Compare two XML files — that's the core idea:

xmldiffreport old.xml new.xml -o report.md

report.md lists every element that changed, one column per file. No options needed — it uses the generic recipe by default. Pass as many files as you like; the report just grows a column each:

xmldiffreport v1.xml v2.xml v3.xml -o report.md

Prefer an HTML page? Add -f html (or name the output *.html):

xmldiffreport old.xml new.xml -f html -o report.html

Exit code is 1 when a difference is found (handy for CI), 0 otherwise.

No files handy? git clone the repo and try the bundled, synthetic examples/: xmldiffreport examples/sitemap/old/sitemap.xml examples/sitemap/new/sitemap.xml --recipe sitemap

Sharper results: recipes

The default compares any XML, but a recipe teaches the tool how to identify elements in a specific dialect — matching "the same" element by a key (not by position) and ignoring volatile attributes. Built-ins: controlm, maven-pom, junit, sitemap, generic; or write your own.

xmldiffreport old.xml new.xml --recipe sitemap -o report.md

→ Writing recipes · generate one from your XML with an LLM.

Comparing many files (or whole directories)

Point it at directories too — they're scanned recursively for *.xml, and every file found becomes a source:

xmldiffreport ./dump-a ./dump-b --recipe controlm -o report.md

Mental model: every file is a source (labelled by its path); a unit is the recipe's unit element (e.g. a Control-M SMART_FOLDER); the engine compares each unit across every source that contains it (2+). A unit that appears in only one file is ignored. The tool has no notion of "environments" — if it matters which file is production, name it so.

→ Full, worked guide with directory trees and a complete example: Inputs & file layout.

What the report looks like

For each unit (e.g. a Control-M SMART_FOLDER) present in 2+ sources with differences (names below are from the synthetic examples/):

GLX_INGEST_DAILY (SMART_FOLDER)

Sources: bench/patch-a.xml, uat/patch-b.xml, prod/hotfix-c.xml

**~ JOB GLX_INGEST_LOAD**

Element · attribute bench/patch-a.xml uat/patch-b.xml prod/hotfix-c.xml

CMDLINE …--force …--retry …%%P_DATE

MAXRERUN 0 5 3

INCOND GLX_INGEST_STAGE-…_OK · AND_OR A O A

OUTCOND GLX_INGEST_LOAD-…_OK · SIGN - + +

ON NOTOK|RERUN − present present

Notice: it's N-way (one column per file), it shows attribute-level changes of the same element (the SIGN flip, the AND_OR change), it collapses identical jobs into a count, and the volatile VERSION/CREATION_TIME noise is gone.

Recipes

A recipe is a small TOML file that teaches the generic engine about one XML dialect: the natural key per element and which attributes to ignore.

name = "controlm"

[defaults]
unit = "SMART_FOLDER"           # the unit of comparison
ignore_attrs = ["VERSION", "JOBISN", "CREATION_TIME", "LAST_UPLOAD", "..."]

[elements.JOB]
key = ["@JOBNAME"]

[elements.OUTCOND]
key = ["@NAME"]                 # SIGN / ODATE are compared as attributes

[elements.ON]                   # no clear key → synthesize from CODE + DO actions
key = ["@CODE", "*kinds"]
inline = true                   # treat children as pseudo-attributes

Key mini-language

A key is a list of tokens, joined by |:

Token	Meaning
`@ATTR`	value of attribute `ATTR`
`#text`	the element's own text
`*tag`	the element's tag name (use for singletons compared by their text)
`child:TAG@ATTR`	attribute of a child element
`child:TAG#text`	text of a child element (e.g. sitemap `<loc>`)
`*kinds`	summary of child kinds / `DOACTION` actions (for keyless elements like `<ON>`)

If no key is given, the engine falls back to @NAME, then #text, then a composite of all attributes.

Built-in recipes

controlm — BMC Control-M exports (DEFTABLE → SMART_FOLDER → JOB → INCOND/OUTCOND/QUANTITATIVE/CONTROL/ON).
maven-pom — Maven pom.xml: dependency & plugin drift, keyed by coordinates (groupId:artifactId). Reports version/scope changes and added/removed entries across <dependencies>, <dependencyManagement> and <build>.
junit — JUnit/xUnit reports (Surefire, Gradle, pytest, …): keyed by classname+name. Surfaces pass↔fail↔skip transitions and added/removed tests, ignoring time/timestamp/hostname.
sitemap — sitemap.xml (identity by <loc> text; compares <lastmod>/<priority>/<changefreq>).
generic — no dialect knowledge (default).

Drop a .toml anywhere and pass its path to --recipe to add your own dialect.

Generate & validate a recipe

Don't want to write one by hand? Let an LLM draft it from a sample of your XML:

xmldiffreport-recipe scaffold sample.xml > prompt.txt   # paste prompt.txt into any LLM
xmldiffreport-recipe validate my-dialect.toml           # check the result (ships a JSON Schema)
xmldiffreport-recipe show controlm                      # print a built-in recipe to learn from

See Generate a recipe with an LLM.

Project layout — tool vs. your usage

src/xmldiffreport/     the installable TOOL (engine, recipes, CLI) — generic, reusable
examples/              synthetic datasets + generator (no real data)
usage/                 a config-driven HARNESS to run the tool on YOUR files
tests/                 pytest suite

The tool in src/ knows nothing about your folders. The usage/ folder is the thin layer you adapt: a config.toml listing the inputs (files/dirs), a report_dir, and a collect.py that runs the diff and writes the report.

cp usage/config.example.toml usage/config.toml   # then edit the paths
python usage/collect.py                            # writes usage/reports/<timestamp>.md

Your config.toml, reports, and any XML under usage/ are git-ignored — real data and paths never get committed.

Library use

from xmldiffreport import diff

result = diff(["old.xml", "new.xml"], recipe="sitemap")   # a file, files, or dir(s)
print(result.render())                                    # Markdown — or result.render("html")

for unit in result.units:        # what differs
    print(unit.ident, unit.sources)
if result:                       # truthy when anything differs (handy for exit codes)
    ...

Performance

Each file is parsed once into an in-memory tree (xml.etree.ElementTree); the diff cost is roughly linear in the number of nodes. For typical Control-M exports (a few MB) it's instant, and it's fine up to the order of tens of MB. It is not designed for gigabyte-scale files — we deliberately favour simple, maintainable code over incremental/streaming parsing.

Development

python -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"

ruff check . && ruff format --check .
mypy src
pytest

See CONTRIBUTING.md. Examples and tests use synthetic data only — never real exports.

Roadmap

Report top-level units that exist in only one source (added/removed units).
JSON report format (Markdown and HTML already ship; formats are pluggable).
Similarity-based matching fallback for keyless elements.
More built-in recipes (Android manifest, RSS/Atom, .NET web.config, …).

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
.github		.github
docs		docs
examples		examples
overrides		overrides
src/xmldiffreport		src/xmldiffreport
tests		tests
usage		usage
.editorconfig		.editorconfig
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
mkdocs.yml		mkdocs.yml
mkdocs_hooks.py		mkdocs_hooks.py
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

xmldiffreport

Why not a normal diff / `xmldiff`?

Install

Quickstart

Sharper results: recipes

Comparing many files (or whole directories)

What the report looks like

`GLX_INGEST_DAILY` (SMART_FOLDER)

Recipes

Key mini-language

Built-in recipes

Generate & validate a recipe

Project layout — tool vs. your usage

Library use

Performance

Development

Roadmap

License

About

Uh oh!

Releases 7

Packages

Uh oh!

Contributors

Uh oh!

Languages

Element · attribute	bench/patch-a.xml	uat/patch-b.xml	prod/hotfix-c.xml
`CMDLINE`	…`--force`	…`--retry`	…%%P_DATE
`MAXRERUN`	0	5	3
INCOND `GLX_INGEST_STAGE-…_OK` · `AND_OR`	A	O	A
OUTCOND `GLX_INGEST_LOAD-…_OK` · `SIGN`	-	+	+
ON `NOTOK\|RERUN`	−	present	present

Folders and files

Latest commit

History

Repository files navigation

xmldiffreport

Why not a normal diff / xmldiff?

Install

Quickstart

Sharper results: recipes

Comparing many files (or whole directories)

What the report looks like

GLX_INGEST_DAILY (SMART_FOLDER)

Recipes

Key mini-language

Built-in recipes

Generate & validate a recipe

Project layout — tool vs. your usage

Library use

Performance

Development

Roadmap

License

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 7

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Why not a normal diff / `xmldiff`?

`GLX_INGEST_DAILY` (SMART_FOLDER)

Packages