Skip to content

adam2go/purepatch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

purepatch

CI PyPI Python License: MIT

The patch engine for code agents, in pure Python. Apply unified diffs and fuzzy search/replace edits with no git, no patch binary, no C extension — in sandboxes, Pyodide/WASM, Lambda, anywhere pip install works. And because it runs in-process, it applies a patch in ~25 µs where spawning a binary costs milliseconds.

pip install purepatch
import purepatch

new_text = purepatch.apply(diff_text, old_text)          # unified diff -> text
report   = purepatch.apply_files(diff_text, root=".")    # multi-file patch
new_text = purepatch.apply_edit(text, search, replace)   # fuzzy block edit
purepatch --dry-run < change.patch     # the familiar CLI, agent-friendly
purepatch -R < change.patch            # un-apply

Why

LLMs edit code by emitting unified diffs and SEARCH/REPLACE blocks — and both arrive slightly wrong: line numbers drifted, context rotted, indentation moved, trailing whitespace differs. The existing Python options either only parse diffs (unidiff) or are long abandoned (python-patch, last release 2019). So every agent framework re-implements patching, badly, or shells out to git.

purepatch is that missing engine:

  • GNU patch semantics for unified diffs: cumulative offset tracking, bidirectional position search, fuzz degradation — verified against the real thing (below).
  • A fuzzy edit ladder for LLM edit blocks: exact match → trailing whitespace tolerance → indentation transplant (the block the model wrote at top level gets re-indented to where it actually lives). Refuses to guess on ambiguity.
  • Errors an agent can act on: failed matches report the closest near-miss (closest match: line 41, 87% similar) so the model can correct its edit instead of retrying blind.
  • Git extended headers understood: new/deleted files, renames, quoted paths, \ No newline at end of file, CRLF content.

Verified against GNU patch and git apply

Following the pure* series methodology: behavior is checked by differential testing against the reference implementations, run in CI on every commit —

  • 500 random clean patches: purepatch ≡ GNU patch ≡ git apply ≡ expected output, byte for byte;
  • 200 drift scenarios (the file gained unrelated lines): offset behavior matches GNU patch exactly;
  • 200 rotted-context scenarios: fuzz behavior matches GNU patch's output wherever GNU patch succeeds;
  • 300 property cases: apply(diff(a,b), a) == b and apply(diff(a,b), b, reverse=True) == a.

Performance

Per-application latency — how a code agent actually uses a patcher: one patch at a time. Spawn cost is the binaries' real cost; in-process is purepatch's real cost. Median of 7, three independent rounds (spread <10%), outputs verified equal before timing. Reproduce: python tools/bench.py --verify.

workload purepatch (in-process) GNU patch (spawn) git apply (spawn)
200-line file, 5 edits 0.021 ms 2.6 ms (~120×) 7.8 ms (~370×)
2k-line file, 30 edits 0.17 ms 2.9 ms (17×) 8.8 ms (52×)
20k-line file, 200 edits 1.4 ms 6.0 ms (4.3×) 16.2 ms (12×)

Fuzzy apply_edit on a 400-line file: ~0.01 ms per call.

The slow paths are engineered too, because an agent waits on them: hunk placement uses tiered C-speed anchor scans (a patch drifted by 500 lines in a 20k-line file still applies in ~1.5 ms), and a failed fuzzy edit produces its closest-match diagnostics on a 10k-line file in ~10 ms (two-pass candidate scoring; 21× faster than naively diffing every window).

An agent loop applying hundreds of edits per session pays milliseconds total, not seconds — and needs no git in its sandbox.

API sketch

purepatch.parse(text) -> PatchSet               # inspect hunks/files
purepatch.apply(patch, source, reverse=False, max_fuzz=2) -> str
purepatch.apply_files(patch, root=".", strip=None,  # strip auto-detected
                      reverse=False, dry_run=False) -> ApplyReport
purepatch.apply_edit(content, search, replace) -> str
purepatch.find_block(content, search) -> (start, end, strategy)

ApplyReport.ok, per-file actions (patched/created/deleted/renamed/ failed), and per-hunk offset/fuzz are all inspectable — log them and an agent can explain exactly what happened.

Exceptions: ParseError, HunkApplyError, NoMatchError (with closest_line / closest_similarity), AmbiguousMatchError (with all locations).

Limitations (honest ones)

  • Binary patches are rejected, not applied.
  • File modes are parsed from git headers but not applied to the filesystem (chmod is on the roadmap).
  • purepatch the CLI covers the agent subset (-p -d -R --fuzz --dry-run), not every GNU patch flag.
  • Like GNU patch, fuzzy hunk placement can in principle pick a wrong spot in pathological inputs; --fuzz 0 disables tolerance entirely.

License

MIT

About

Apply unified diffs and fuzzy search/replace edits in pure Python - the patch engine for code agents. No git, no patch binary.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages