Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
96 changes: 96 additions & 0 deletions SIMPLICIO_INTEGRATION.md
Original file line number Diff line number Diff line change
Expand Up @@ -350,6 +350,102 @@ text languages (`sample.ts`, `sample.py`, `sample.json`, `sample.md`) plus
a binary file (`binary.bin`) used by the refusal test. Producers in
downstream repositories can copy these as ready-made parity inputs.

## Context Packs and Hash-Based Cache (issue #115)

Schemas: `simplicio.context-pack/v1` and `simplicio.context-cache/v1`.
The canonical contracts live in
[`simplicio-runtime#70`](https://github.com/wesleysimplicio/simplicio-runtime/issues/70);
this repository implements the producer half so the mapper, not the LLM,
decides what compact context goes into a prompt.

### Context pack

```python
from simplicio_mapper.context_pack import build_context_pack

pack = build_context_pack(
root=".",
targets=[
{"path": "simplicio_mapper/cli.py", "ranges": [(900, 950)]},
{"path": "simplicio_mapper/mapper.py", "ranges": [(160, 200)]},
],
)
```

The returned envelope:

```json
{
"schema": "simplicio.context-pack/v1",
"repo": {
"mapper_schema": "simplicio.mapper-index/v1",
"root_hash": "<sha256 of the absolute root path>"
},
"pack_hash": "<sha256 over all snapshot+range hashes>",
Comment on lines +380 to +384
"files": [
{
"path": "...",
"language": "python",
"snapshot_hash": "<sha256 of the whole file>",
"line_count": 1234,
"compact": false,
"ranges": [
{
"start_line": 900,
"end_line": 950,
"range_hash": "<sha256 of the slice>",
"snippet": ["first line", "last line"]
}
],
"symbols": [{"name": "...", "kind": "...", "line": 0, "hash": "..."}],
"callers": ["a/file.py"],
"imports": ["b/file.py"],
"tests": ["tests/test_file.py"]
}
],
"dependencies": { "package_manager": "...", "manifest": "..." },
"recent_changes": [ "..." ],
"needs_broader_context": false,
"needs_broader_context_reason": ""
}
```

`build_context_pack` accepts pre-loaded `project_map`, `symbol_index`, and
`call_graph` dicts; otherwise it reads them from `.simplicio/`. When any
of the three is absent — or a target is missing / unreadable, or a range
is out-of-bounds — the function still returns a pack but sets
`needs_broader_context=True` and lists the concrete reasons. **The mapper
does not pretend compact context is enough when anchors, hashes, or
symbol coverage are missing.**

### Context cache

```python
from simplicio_mapper.context_cache import ContextCache

cache = ContextCache(".simplicio/context-cache.json")
hit = cache.get(file_or_pack_hash)
if hit is None:
summary = summarize_via_llm(...) # caller-supplied
cache.set(file_or_pack_hash, summary)
```

Stored on disk as a single JSON document with shape
`{"schema": "simplicio.context-cache/v1", "entries": {...}}`. Entries are
keyed by any opaque hash string the caller chooses — typically
`snapshot_hash`, `range_hash`, or the overall `pack_hash` — so a change in
the underlying file invalidates the cached summary naturally. Writes are
persisted immediately; multiple processes pick up the latest value on
their next load.

### Fixtures

`tests/fixtures/ctx-pack-host/` ships four small multi-language fixtures
(`sample.ts`, `sample.py`, `sample.json`, `sample.md`) used by the test
suite to verify language detection and snippet emission. Large-file
behavior is exercised in a temp-dir generated test (`huge.py` with more
than `COMPACT_LINE_THRESHOLD` lines).

## Native Runtime Contract (issue #95)

The unified native Simplicio runtime — coordinating
Expand Down
80 changes: 80 additions & 0 deletions simplicio_mapper/context_cache.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
"""simplicio.context-cache/v1 — file-backed summary cache keyed by hash.

The mapper context pack is the source of compact context; the context
cache lets downstream LLM planners reuse summaries for unchanged files
without re-summarizing them. Entries are keyed by a content hash
(`snapshot_hash` of a file, the `range_hash` of a slice, or the
`pack_hash` of a whole context pack) so a change underneath naturally
invalidates the cached summary.

The cache is intentionally small and JSON-backed: it is persisted under
`.simplicio/context-cache.json` by default and is safe to ship across
machines.
"""

from __future__ import annotations

import json
import os
from typing import Any

CONTEXT_CACHE_SCHEMA = "simplicio.context-cache/v1"


class ContextCache:
"""Hash-keyed cache for LLM summaries.

`key` is any opaque string the caller chose (typically a content hash
derived by the mapper). `summary` is any JSON-serialisable payload.
Reads return `None` on miss; writes are persisted immediately so
multiple processes pick up the latest value on their next load.
"""

def __init__(self, cache_path: str | os.PathLike) -> None:
self.path = str(cache_path)
self._entries: dict[str, Any] = self._load()

def _load(self) -> dict[str, Any]:
try:
with open(self.path, encoding="utf-8") as handle:
payload = json.load(handle)
except (OSError, ValueError):
return {}
if not isinstance(payload, dict):
return {}
if payload.get("schema") != CONTEXT_CACHE_SCHEMA:
return {}
entries = payload.get("entries", {})
return dict(entries) if isinstance(entries, dict) else {}

def get(self, key: str) -> Any | None:
return self._entries.get(key)

def set(self, key: str, summary: Any) -> None:
self._entries[key] = summary
self._persist()

def clear(self) -> None:
self._entries = {}
self._persist()

def __contains__(self, key: str) -> bool:
return key in self._entries

def __len__(self) -> int:
return len(self._entries)

def _persist(self) -> None:
directory = os.path.dirname(self.path)
if directory:
os.makedirs(directory, exist_ok=True)
payload = {
"schema": CONTEXT_CACHE_SCHEMA,
"entries": self._entries,
}
with open(self.path, "w", encoding="utf-8") as handle:
json.dump(payload, handle, sort_keys=True, indent=2)
handle.write("\n")
Comment on lines +67 to +77


__all__ = ["CONTEXT_CACHE_SCHEMA", "ContextCache"]
Loading
Loading