Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
86 changes: 86 additions & 0 deletions SIMPLICIO_INTEGRATION.md
Original file line number Diff line number Diff line change
Expand Up @@ -264,6 +264,92 @@ and agent instruction files remain the human-readable source for project
operation. If `.simplicio/` is absent, consumers should fall back to the current
markdown or file-inspection behavior.

## Mechanical Edit Contract (issue #110)

Schema: `simplicio.mechanical-edit/v1` (envelope) and
`simplicio.mechanical-edit-result/v1` (executor result). The canonical
contract lives in
[`simplicio-runtime#69`](https://github.com/wesleysimplicio/simplicio-runtime/issues/69);
this repository implements the **producer half** so an LLM planner can
plan compact JSON edits without rewriting whole files.

### What the mapper produces

Use `simplicio_mapper.mechanical_edit.build_context(root, selections)` to
build a context envelope:

```python
from simplicio_mapper.mechanical_edit import build_context

envelope = build_context(
root=".",
selections=[
("simplicio_mapper/mapper.py", 99, 102),
("simplicio_mapper/mapper.py", 180, 199),
("simplicio_mapper/cli.py", 1, 20),
],
)
```

The returned dict matches:

```json
{
"schema": "simplicio.mechanical-edit/v1",
"context": {
"mapper_schema": "simplicio.mapper-index/v1",
"context_hash": "<sha256 of all per-file snapshot+range hashes>",
"files": [
{
"path": "simplicio_mapper/cli.py",
"language": "python",
"snapshot_hash": "<sha256 of the whole file>",
"selected_ranges": [
{
"start_line": 1,
"end_line": 20,
"before_hash": "<sha256 of the 1..20 slice>",
"must_contain": [
"\"\"\"Command-line entry point for simplicio-mapper.",
"from .mapper import write_mapping_artifacts"
]
}
]
}
]
}
}
```

### Guarantees

- **Stable.** Identical `(path, start, end)` selections on an unchanged tree
always produce the same `snapshot_hash`, `before_hash`, and overall
`context_hash`. The test suite enforces this across repeated runs.
- **Drift-detecting.** A consumer that captured a `before_hash` will see a
different value if the file changes underneath; the executor rejects the
edit when the anchor no longer matches.
- **Refuses unsafe inputs.** Missing files raise `FileNotFoundError`;
binary files (NUL byte in the first 8 KiB or non-UTF-8 decode) raise
`ValueError`. The mapper never silently emits an ambiguous anchor — the
caller must either widen the snapshot or hand off a different file.
- **Compact above threshold.** Files longer than
`COMPACT_LINE_THRESHOLD` (2000 lines by default) omit `must_contain`
snippets so the envelope stays small; the `before_hash` is still
emitted, which is enough for the executor to anchor.
- **Language-aware.** `language` follows the same detection used by
`project-map.json` (`typescript`, `python`, `json`, `markdown`, etc.).
- **Canonical schema only.** This module must not introduce a repo-local
variation. The contract is owned by
[`simplicio-runtime#69`](https://github.com/wesleysimplicio/simplicio-runtime/issues/69).

### Fixtures

`tests/fixtures/mech-edit-host/` ships small deterministic files in four
text languages (`sample.ts`, `sample.py`, `sample.json`, `sample.md`) plus
a binary file (`binary.bin`) used by the refusal test. Producers in
downstream repositories can copy these as ready-made parity inputs.

## Native Runtime Contract (issue #95)

The unified native Simplicio runtime — coordinating
Expand Down
169 changes: 169 additions & 0 deletions simplicio_mapper/mechanical_edit.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,169 @@
"""simplicio.mechanical-edit/v1 — deterministic edit-anchor context (issue #110).

This module provides the mapper-side helpers an LLM planner needs to plan
compact JSON edits without rewriting whole files. It produces:

- snapshot hashes per selected file (full-file content sha256),
- range hashes per selected line range (sha256 of the joined chunk),
- explicit `must_contain` anchors with the first and last line of each
range, truncated to a short prefix for stability,
- an overall `context_hash` digest over all snapshot + range hashes so the
consumer can verify the whole context in one comparison.

The schema is the canonical cross-Simplicio contract pinned in
simplicio-runtime issue #69; this module must not introduce a repo-local
variation. Builders raise explicitly when a range cannot be made stable
(out-of-bounds, file missing, file is binary).
"""

from __future__ import annotations

import hashlib
import os
from collections.abc import Iterable

from .mapper import _language_for, _read_safe

MECHANICAL_EDIT_SCHEMA = "simplicio.mechanical-edit/v1"
MECHANICAL_EDIT_RESULT_SCHEMA = "simplicio.mechanical-edit-result/v1"
MAPPER_INDEX_SCHEMA = "simplicio.mapper-index/v1"

_NUL_PROBE_BYTES = 8192
# Files longer than this trigger compact mode: ranges still carry their hash
# anchors, but `must_contain` is omitted so the JSON envelope stays small.
COMPACT_LINE_THRESHOLD = 2000
_MUST_CONTAIN_PREFIX_CHARS = 120


def _sha256_text(text: str) -> str:
return hashlib.sha256(text.encode("utf-8")).hexdigest()


def _read_bytes_sample(path: str) -> bytes:
try:
with open(path, "rb") as handle:
return handle.read(_NUL_PROBE_BYTES)
except OSError:
return b""


def is_binary(path: str) -> bool:
"""Return True when the file at `path` looks binary.

A file with a NUL byte in its first 8 KiB, or that does not decode as
UTF-8, is treated as binary. The mechanical edit contract refuses binary
files because line-anchored edits do not apply to them.
"""
sample = _read_bytes_sample(path)
if not sample:
return False
if b"\x00" in sample:
return True
try:
sample.decode("utf-8")
except UnicodeDecodeError:
return True
return False


def snapshot_hash(text: str) -> str:
"""sha256 of the full text content (UTF-8)."""
return _sha256_text(text)


def range_hash(text: str, start_line: int, end_line: int) -> str:
"""sha256 of the `start_line..end_line` slice (1-indexed, inclusive)."""
if start_line < 1 or end_line < start_line:
raise ValueError(f"invalid range {start_line}-{end_line}")
lines = text.splitlines()
if end_line > len(lines):
raise ValueError(
f"range {start_line}-{end_line} out of bounds ({len(lines)} lines)"
)
chunk = "\n".join(lines[start_line - 1 : end_line])
return _sha256_text(chunk)


def _must_contain(text: str, start_line: int, end_line: int, compact: bool) -> list[str]:
if compact:
return []
lines = text.splitlines()
out: list[str] = []
if 1 <= start_line <= len(lines):
out.append(lines[start_line - 1].strip()[:_MUST_CONTAIN_PREFIX_CHARS])
if start_line != end_line and 1 <= end_line <= len(lines):
out.append(lines[end_line - 1].strip()[:_MUST_CONTAIN_PREFIX_CHARS])
return [piece for piece in out if piece]


def _absolute(root: str, path: str) -> str:
return path if os.path.isabs(path) else os.path.join(root, path)
Comment on lines +99 to +100


def extract_file_entry(root: str, path: str, ranges: list[tuple[int, int]]) -> dict:
"""Build a single `files[]` entry for `path` and the given line ranges."""
abs_path = _absolute(root, path)
if not os.path.exists(abs_path):
raise FileNotFoundError(f"file not found: {path}")
if is_binary(abs_path):
raise ValueError(f"binary file refused: {path}")
text = _read_safe(abs_path)
compact = len(text.splitlines()) > COMPACT_LINE_THRESHOLD
selected = []
for start, end in ranges:
selected.append({
"start_line": start,
"end_line": end,
"before_hash": range_hash(text, start, end),
"must_contain": _must_contain(text, start, end, compact),
})
return {
"path": path.replace(os.sep, "/"),
"language": _language_for(abs_path),
"snapshot_hash": snapshot_hash(text),
"selected_ranges": selected,
}
Comment on lines +105 to +125


def build_context(
root: str,
selections: Iterable[tuple[str, int, int]],
) -> dict:
"""Build a `simplicio.mechanical-edit/v1` context envelope.

`selections` is an iterable of `(path, start_line, end_line)` triples.
Ranges on the same file are grouped and emitted in input order under
that file's `selected_ranges`.
"""
grouped: dict[str, list[tuple[int, int]]] = {}
for path, start, end in selections:
grouped.setdefault(path, []).append((start, end))
files = []
digest = hashlib.sha256()
for path in sorted(grouped):
entry = extract_file_entry(root, path, grouped[path])
files.append(entry)
digest.update(entry["snapshot_hash"].encode("utf-8"))
for selected_range in entry["selected_ranges"]:
digest.update(selected_range["before_hash"].encode("utf-8"))
return {
"schema": MECHANICAL_EDIT_SCHEMA,
"context": {
"mapper_schema": MAPPER_INDEX_SCHEMA,
"context_hash": digest.hexdigest(),
"files": files,
},
}


__all__ = [
"COMPACT_LINE_THRESHOLD",
"MAPPER_INDEX_SCHEMA",
"MECHANICAL_EDIT_RESULT_SCHEMA",
"MECHANICAL_EDIT_SCHEMA",
"build_context",
"extract_file_entry",
"is_binary",
"range_hash",
"snapshot_hash",
]
Binary file added tests/fixtures/mech-edit-host/binary.bin
Binary file not shown.
9 changes: 9 additions & 0 deletions tests/fixtures/mech-edit-host/sample.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
{
"name": "mech-edit-host",
"version": "0.0.1",
"audiences": [
"world",
"team",
"everyone"
]
}
10 changes: 10 additions & 0 deletions tests/fixtures/mech-edit-host/sample.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
# Mech-edit fixture

This Markdown file lives in `tests/fixtures/mech-edit-host/`. It is used by
the `simplicio.mechanical-edit/v1` test suite to verify that Markdown
ranges hash deterministically and that anchor drift is detected when the
file changes underneath.

## Section

A small paragraph used as a stable anchor target.
9 changes: 9 additions & 0 deletions tests/fixtures/mech-edit-host/sample.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
"""Tiny Python fixture for mechanical-edit anchor tests."""


def greet(audience: str) -> str:
return f"hello, {audience}"


def farewell(audience: str) -> str:
return f"bye, {audience}"
12 changes: 12 additions & 0 deletions tests/fixtures/mech-edit-host/sample.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
export interface Greeting {
audience: string;
message: string;
}

export function greet(audience: string): Greeting {
return { audience, message: `hello, ${audience}` };
}

export function farewell(audience: string): Greeting {
return { audience, message: `bye, ${audience}` };
}
Loading
Loading