Add deterministic auditable scaffold for repository-wide integrity verification#3
Add deterministic auditable scaffold for repository-wide integrity verification#3Copilot wants to merge 15 commits into
Conversation
Co-authored-by: aidoruao <174227749+aidoruao@users.noreply.github.com>
Co-authored-by: aidoruao <174227749+aidoruao@users.noreply.github.com>
Co-authored-by: aidoruao <174227749+aidoruao@users.noreply.github.com>
Co-authored-by: aidoruao <174227749+aidoruao@users.noreply.github.com>
Co-authored-by: aidoruao <174227749+aidoruao@users.noreply.github.com>
Co-authored-by: aidoruao <174227749+aidoruao@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 19 out of 39 changed files in this pull request and generated 7 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
@copilot apply changes based on the comments in this thread |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 19 out of 39 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if args.output: | ||
| # Write modified handling.meta | ||
| print(f"Writing modified file: {args.output}") | ||
| # Implementation would write back XML with clamped values | ||
| print("✓ Modified file written") |
There was a problem hiding this comment.
In handling-clamp apply mode, the code prints “Modified file written” but does not actually serialize the clamped values back to XML (it’s a placeholder comment). This makes --apply --output misleading and prevents the clamp pipeline from producing the promised corrected files.
There was a problem hiding this comment.
Fixed in commit 215066e. Added write_file() method to HandlingMetaParser that:
- Stores the parsed XML tree during parsing
- Updates XML elements with clamped values (both attribute-based and text-based)
- Writes modified XML back to file with proper declaration
- Tested and working with
--apply --output
| # Convert to Path objects and sort by canonical path | ||
| paths = [Path(p) for p in file_paths] | ||
| paths.sort(key=lambda p: str(p.resolve())) | ||
|
|
||
| # Build leaf nodes | ||
| leaves = [] | ||
| for path in paths: | ||
| canonical_bytes = canonical_byte_representation(path) | ||
| leaf_hash = compute_leaf_hash(canonical_bytes) | ||
| leaf = MerkleNode(leaf_hash, file_path=str(path)) |
There was a problem hiding this comment.
build_merkle_tree() sorts leaves using p.resolve() (absolute, OS-dependent, and symlink-resolving). This can change ordering between clones and break the “canonical path (UTF-8 lexicographic)” determinism requirement. Sort using a stable repo-relative canonical path (e.g., relative to the provided repo root) and normalize separators (posix) before comparing.
There was a problem hiding this comment.
Fixed in commit 215066e. Changed build_merkle_tree() to use deterministic sorting:
- Added
base_pathparameter for computing relative paths - Converts paths to POSIX-style (forward slashes) for cross-platform consistency
- Sorts using relative canonical paths instead of absolute OS-dependent paths
- Uses common parent as base if not explicitly provided
- Ensures consistent ordering across different clones and systems
| # Generate sibling hashes along path to root | ||
| # This is a simplified proof - in production, you'd track siblings | ||
| proof["proof_path"] = self._build_proof_path(leaf_index, len(self.leaves)) | ||
|
|
||
| return proof | ||
|
|
||
| def _build_proof_path(self, leaf_index: int, total_leaves: int) -> List[dict]: | ||
| """ | ||
| Build proof path for a leaf. | ||
|
|
||
| This is a simplified implementation that documents the structure. | ||
| A full implementation would traverse the actual tree structure. | ||
| """ | ||
| proof_path = [] | ||
| index = leaf_index | ||
| level_size = total_leaves | ||
|
|
||
| while level_size > 1: | ||
| # Determine sibling position | ||
| is_left = index % 2 == 0 | ||
| sibling_index = index + 1 if is_left else index - 1 | ||
|
|
||
| if sibling_index < level_size: | ||
| proof_path.append({ | ||
| "position": "right" if is_left else "left", | ||
| "sibling_index": sibling_index | ||
| }) |
There was a problem hiding this comment.
get_proof() / _build_proof_path() do not include sibling hashes (only sibling indexes/positions) and explicitly describe themselves as “simplified”. As written, a consumer cannot recompute the root from leaf_hash + proof_path, so these aren’t verifiable inclusion proofs.
There was a problem hiding this comment.
Fixed in commit 215066e. Completely redesigned proof generation:
- Added
leaf_to_siblingsmap tracking during tree construction - Each proof now includes actual
sibling_hashandpositionfor every level - Proofs can now be used for cryptographic verification without the full tree
- Removed old simplified implementation
- Tested and verified with actual sibling hashes in output
| # Index subcommand | ||
| index_parser = subparsers.add_parser("index", help="Index repository files") | ||
| index_parser.add_argument("repo_path", help="Path to repository") | ||
| index_parser.add_argument("--config", help="Path to config file") | ||
| index_parser.add_argument("--apply", action="store_true", | ||
| help="Enable active mode (default: dry-run)") | ||
| index_parser.add_argument("--output", default="manifest.jsonl", | ||
| help="Output manifest file") | ||
| index_parser.add_argument("--exclude", nargs="*", | ||
| help="Patterns to exclude") |
There was a problem hiding this comment.
The CLI advertises --config support, but args.config is never read in _handle_index (or elsewhere), so config files currently have no effect. Either implement loading/merging config (excludes, checkpoint interval, output dir, etc.) or remove the flag from the interface/docs to avoid misleading behavior.
There was a problem hiding this comment.
Fixed in commit 215066e. Implemented config file loading in _handle_index():
- Loads JSON config from
--configpath - Supports
exclude_patternsandcheckpoint_interval - CLI arguments override config file values
- Passes
checkpoint_intervalto manifest generation - Graceful error handling for missing/invalid configs
Co-authored-by: aidoruao <174227749+aidoruao@users.noreply.github.com>
Co-authored-by: aidoruao <174227749+aidoruao@users.noreply.github.com>
…rification Co-authored-by: aidoruao <174227749+aidoruao@users.noreply.github.com>
…al proofs, config loading Co-authored-by: aidoruao <174227749+aidoruao@users.noreply.github.com>
|
Closing as stale (Stage D housekeeping). This Copilot draft has been open without progress and is superseded by:
If the underlying work here is still wanted, re-open a fresh PR rebased on current |
✅ Deterministic Auditable Scaffold - Code Review Updates v2
All code review feedback from the second review round has been addressed.
Code Review Changes Implemented
1. ✅ Fix handling-clamp XML Writing (Comment #2815292933)
write_file()method toHandlingMetaParserto serialize clamped values back to XMLET.indent()(Python 3.9+)2. ✅ Fix Merkle Tree Deterministic Sorting (Comment #2815292947)
p.resolve()(absolute, OS-dependent) to relative canonical pathsbase_pathparameter tobuild_merkle_tree()3. ✅ Implement Real Merkle Proofs (Comment #2815292957)
leaf_to_siblingstracking during tree constructionsibling_hashandpositionfor each level_build_proof_path()method4. ✅ Implement Config File Loading (Comment #2815292974)
_handle_index()exclude_patternsandcheckpoint_intervalfrom configcheckpoint_intervalto manifest generationTest Status
24/24 tests passing
Verification
All changes have been tested:
--apply --output- successfully writes modified handling.metaFiles Modified
toolkit/oe/scaffold/handling_pipeline.py- Added XML writing, stored root treetoolkit/oe/scaffold/merkle.py- Deterministic sorting, real proof generationtoolkit/oe/scaffold/cli.py- XML writing integration, config loading, base_path passingUsage Examples
Config File Format
{ "exclude_patterns": [".git", "*.pyc", "__pycache__"], "checkpoint_interval": 50 }All code review feedback addressed, tests passing, functionality verified.
Original prompt
This pull request was created from Copilot chat.
💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.