purepatch

The patch engine for code agents, in pure Python. Apply unified diffs and fuzzy search/replace edits with no git, no patch binary, no C extension — in sandboxes, Pyodide/WASM, Lambda, anywhere pip install works. And because it runs in-process, it applies a patch in ~25 µs where spawning a binary costs milliseconds.

pip install purepatch

import purepatch

new_text = purepatch.apply(diff_text, old_text)          # unified diff -> text
report   = purepatch.apply_files(diff_text, root=".")    # multi-file patch
new_text = purepatch.apply_edit(text, search, replace)   # fuzzy block edit

purepatch --dry-run < change.patch     # the familiar CLI, agent-friendly
purepatch -R < change.patch            # un-apply

Why

LLMs edit code by emitting unified diffs and SEARCH/REPLACE blocks — and both arrive slightly wrong: line numbers drifted, context rotted, indentation moved, trailing whitespace differs. The existing Python options either only parse diffs (unidiff) or are long abandoned (python-patch, last release 2019). So every agent framework re-implements patching, badly, or shells out to git.

purepatch is that missing engine:

GNU patch semantics for unified diffs: cumulative offset tracking, bidirectional position search, fuzz degradation — verified against the real thing (below).
A fuzzy edit ladder for LLM edit blocks: exact match → trailing whitespace tolerance → indentation transplant (the block the model wrote at top level gets re-indented to where it actually lives). Refuses to guess on ambiguity.
Errors an agent can act on: failed matches report the closest near-miss (closest match: line 41, 87% similar) so the model can correct its edit instead of retrying blind.
Git extended headers understood: new/deleted files, renames, quoted paths, \ No newline at end of file, CRLF content.

Verified against GNU patch and git apply

Following the pure* series methodology: behavior is checked by differential testing against the reference implementations, run in CI on every commit —

500 random clean patches: purepatch ≡ GNU patch ≡ git apply ≡ expected output, byte for byte;
200 drift scenarios (the file gained unrelated lines): offset behavior matches GNU patch exactly;
200 rotted-context scenarios: fuzz behavior matches GNU patch's output wherever GNU patch succeeds;
300 property cases: apply(diff(a,b), a) == b and apply(diff(a,b), b, reverse=True) == a.

Performance

Per-application latency — how a code agent actually uses a patcher: one patch at a time. Spawn cost is the binaries' real cost; in-process is purepatch's real cost. Median of 7, three independent rounds (spread <10%), outputs verified equal before timing. Reproduce: python tools/bench.py --verify.

workload	purepatch (in-process)	GNU patch (spawn)	git apply (spawn)
200-line file, 5 edits	0.021 ms	2.6 ms (~120×)	7.8 ms (~370×)
2k-line file, 30 edits	0.17 ms	2.9 ms (17×)	8.8 ms (52×)
20k-line file, 200 edits	1.4 ms	6.0 ms (4.3×)	16.2 ms (12×)

Fuzzy apply_edit on a 400-line file: ~0.01 ms per call.

The slow paths are engineered too, because an agent waits on them: hunk placement uses tiered C-speed anchor scans (a patch drifted by 500 lines in a 20k-line file still applies in ~1.5 ms), and a failed fuzzy edit produces its closest-match diagnostics on a 10k-line file in ~10 ms (two-pass candidate scoring; 21× faster than naively diffing every window).

An agent loop applying hundreds of edits per session pays milliseconds total, not seconds — and needs no git in its sandbox.

API sketch

purepatch.parse(text) -> PatchSet               # inspect hunks/files
purepatch.apply(patch, source, reverse=False, max_fuzz=2) -> str
purepatch.apply_files(patch, root=".", strip=None,  # strip auto-detected
                      reverse=False, dry_run=False) -> ApplyReport
purepatch.apply_edit(content, search, replace) -> str
purepatch.find_block(content, search) -> (start, end, strategy)

ApplyReport.ok, per-file actions (patched/created/deleted/renamed/ failed), and per-hunk offset/fuzz are all inspectable — log them and an agent can explain exactly what happened.

Exceptions: ParseError, HunkApplyError, NoMatchError (with closest_line / closest_similarity), AmbiguousMatchError (with all locations).

Limitations (honest ones)

Binary patches are rejected, not applied.
File modes are parsed from git headers but not applied to the filesystem (chmod is on the roadmap).
purepatch the CLI covers the agent subset (-p -d -R --fuzz --dry-run), not every GNU patch flag.
Like GNU patch, fuzzy hunk placement can in principle pick a wrong spot in pathological inputs; --fuzz 0 disables tolerance entirely.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.github/workflows		.github/workflows
src/purepatch		src/purepatch
tests		tests
tools		tools
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

purepatch

Why

Verified against GNU patch and git apply

Performance

API sketch

Limitations (honest ones)

License

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

purepatch

Why

Verified against GNU patch and git apply

Performance

API sketch

Limitations (honest ones)

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages