Memscribe architecture

Memscribe is a deterministic, zero-LLM pipeline that turns the transcript logs AI coding agents already write into typed nodes the downstream inference-and-governance layer (MemCortex) can consume. No model is ever called: capture is reading and parsing, never summarizing. The output is an exact function of the input, which is what makes the whole module golden-file, property, and fuzz testable.

It is the bottom of a three-layer stack — Memtrace uses MemCortex, and MemCortex uses Memscribe. The dependency direction is strictly one-way: each layer depends only on the one below it, and memscribe-core depends on nothing else in the workspace. Memscribe never calls upward.

The pipeline

A single, linear, deterministic pipeline. Each stage is a trait, so it can be tested in isolation and swapped. Everything between Source and Sink is a pure, synchronous function of the event stream.

 Source                Adapter           Gate        Segmenter      Binder        NodePrep        Sink
 (memscribe-io)        (memscribe-       (core)      (core)         (core)        (core)          (memscribe-sink)
                        adapters)
 tail JSONL        →   parse one     →   admit?  →   arc / turn  →  decision  →   assemble    →   NDJSON / SQLite
 hook stdin            RawRecord →       commitment  spans;         ↔ edit,       PreparedNode    / MemDB
 OTLP receiver         CaptureEvent[]    markers     elevate gated  PROV          stream
                       (version-                     turns; seed    (t_use
                        tolerant)                     decisions;    ≤ t_gen)
                                                      collect edits
   RawRecord               CaptureEvent      markers    Segmentation   BindingEdge   PreparedNode    (consumer)
   (bytes + provenance)    (normalized)                                              stream

Source → Adapter produces the normalized CaptureEvent stream — the system of record. This is the only stage that touches tool-specific formats.
Gate → Segmenter → Binder → NodePrep transform that stream into PreparedNodes. Pure and synchronous given the events.
An optional redaction pass runs over the prepared nodes before the sink.
Sink writes the nodes out. It is the single seam that decouples Memscribe from MemDB.

The orchestration lives in memscribe-core::pipeline::DefaultPipeline:

let nodes = DefaultPipeline::new()                 // redaction ON by default
    .run_records(adapter.as_ref(), &records);      // parse → prepare → redact
// or stream straight to a sink:
let n = DefaultPipeline::new()
    .run_to_sink(adapter.as_ref(), &records, &mut sink)?;

DefaultPipeline::prepare_events(&events) is the pure core: its output is an exact function of events. without_redaction() turns the redactor off (golden tests assert on verbatim content), and with_gate(..) / with_redactor(..) swap in config-driven stages.

Crate responsibilities

Crate	Responsibility
`memscribe-core`	The frozen contract: the event model, the prepared-node output types, the `TranscriptAdapter` and `Sink` traits, and the deterministic pipeline (`gate` → `segmenter` → `binder` → `nodeprep`) plus the `redact` pass. Depends on nothing in the workspace.
`memscribe-adapters`	Per-tool parsers behind feature flags. Each implements `TranscriptAdapter`. The `registry` assembles the enabled set (`all_adapters`) and resolves one by `SourceKind` (`adapter_for`).
`memscribe-io`	Generic sources: a notify-based file tailer (offset resume), a hook server, and an OTLP receiver. Turns raw bytes into `RawRecord`s.
`memscribe-sink`	Concrete `Sink`s: `NdjsonSink` (canonical default), `SqliteSink` (feature `sqlite`), and `MemDbSink` (feature `memdb`, off by default).
`memscribe-cli`	The `memscribe` binary: `watch` / `hook` / `parse` / `replay` / `verify` / `redact`.
`memscribe-testkit`	The harness: `parse_events` / `prepare_nodes`, the invariant checks, golden-fixture loaders, and the cross-tool conformance scenario catalog.

The contract types

All of these live in memscribe-core and are re-exported from its crate root. Do not change their behavior or public shape — the test suite and every consumer depend on exact output.

Input: the normalized event model (`model.rs`)

CaptureEvent is the system of record produced by adapters. Every field is copied verbatim from the source; none is generated by Memscribe.

pub struct CaptureEvent {
    pub schema_version: u16,        // SCHEMA_VERSION; consumers gate on this
    pub source: SourceKind,         // which tool produced it
    pub session_id: String,         // tool-native session/thread id
    pub seq: u64,                   // monotonic per-session, from file order
    pub event_id: String,           // tool-native id, or blake3(content) fallback
    pub parent_id: Option<String>,  // DAG link where the tool provides one
    pub timestamp: OffsetDateTime,  // RFC3339, verbatim
    pub project: ProjectRef,        // cwd / repo_root / git, from session start
    pub kind: EventKind,            // the payload
    pub provenance: SourceLocation, // pointer back into the source bytes
}

EventKind is the payload enum. EventKind::Unknown is load-bearing: an unrecognized record type or a new field is preserved verbatim and flagged, never discarded — that is how the stream stays lossless across tool-version churn.

`EventKind` variant	Meaning
`SessionStart`	cwd, git ref, model, tool version
`UserTurn`	a user message (flattened text + structured `Part`s)
`AssistantTurn`	an assistant message (text, thinking, model, usage, parts)
`ToolCall`	a tool invocation (`call_id`, name, raw args)
`ToolResult`	a tool result (`call_id`, `ok`, raw output)
`FileEdit`	a normalized `Diff` (from Edit/Write/apply_patch/replace)
`Compaction`	model-side history compaction — flagged, never stored as truth
`Rewind`	a user rewind back to an earlier event
`SessionEnd`	the session ended
`Unknown`	an unrecognized record, preserved verbatim and flagged

SourceKind enumerates the nine tools plus Unknown; SourceKind::parse maps CLI/--as slugs (tolerant of aliases such as claude / claude-code).

Output: the prepared-node stream (`node.rs`)

PreparedNode is the typed data a consumer ingests. It is a tagged enum:

`PreparedNode` variant	Payload	Meaning
`Conversation`	`ConversationSpan`	a gated, verbatim dialogue span with the markers that fired
`Decision`	`DecisionRecord`	a deterministically-parsed decision (IBIS / QOC / MADR / Kruchten shape)
`Episode`	`CodeEpisode`	a code edit episode: path, `Diff`, git ref, deterministic `episode_id`
`Binding`	`BindingEdge`	a decision/conversation → episode edge carrying a `ProvRecord`

Epistemic honesty: `FactStatus`

Every node and edge carries a FactStatus. Memscribe only ever emits the first two; the latter two are flags for a downstream inference layer — values Memscribe never computes by guessing. This is the property that keeps the module zero-LLM and its output golden-testable.

`FactStatus`	Who sets it
`Observed`	Memscribe — verbatim from the source
`DeterministicallyDerived`	Memscribe — a pure function of observed data
`StatisticallyRanked`	downstream — a statistical measure
`LlmHypothesis`	downstream — an LLM hypothesis; Memscribe only flags it

ProvRecord records used(session, decision) + wasGeneratedBy(diff, session) with the temporal invariant t_use ≤ t_gen (ProvRecord::is_temporally_valid).

How to add a new adapter

Adapters are the volatile part — every tool's format churns — so adding one is a well-trodden, five-step path. The contract: a parser is version-tolerant (it pattern-matches on the fields it needs and routes anything unrecognized to EventKind::Unknown) and must never panic.

Add a SourceKind variant (memscribe-core/src/model.rs). Wire its stable snake_case slug into SourceKind::as_str and into SourceKind::parse (include any aliases). This is the one allowed touch of memscribe-core for a new tool — coordinate it, since the frozen contract is shared.
Add the adapter module (memscribe-adapters/src/<tool>.rs) behind a #[cfg(feature = "<tool>")] and a matching entry in the crate's [features] table. Implement TranscriptAdapter:
- source_kind() — return your SourceKind.
- discover(&DiscoverCfg) — locate live & historical transcripts. Honor the per-tool override key in DiscoverCfg.overrides (e.g. CLAUDE_CONFIG_DIR, CODEX_HOME) and fall back to cfg.home_dir(). Return handles in a deterministic (sorted) order.
- parse(&RawRecord, &mut ParseCtx) — turn ONE record into zero or more CaptureEvents. Use ParseCtx::alloc_seq for the monotonic seq, ParseCtx::first_seen for dedup, and ParseCtx::project_or_default for the project binding. Never panic; route unknowns to EventKind::Unknown.
- schema_fingerprint(&RawRecord) — return a SchemaVariant so the corpus and runtime can version-gate the parser.
Register it (memscribe-adapters/src/registry.rs). Add the cfg-gated push in all_adapters() and the cfg-gated arm in adapter_for().
Add fixtures under fixtures/<tool>/<version>/<scenario>.jsonl for the canonical scenarios in memscribe-testkit::scenarios::SCENARIOS, and bless the expected outputs under fixtures-expected/<tool>/<version>/ (see CONTRIBUTING.md for the capture → golden → bless flow).
Add tests. Unit-test the parser; run the shared invariant checks from memscribe-testkit::invariants (check_monotonic_seq, check_lossless, check_unique_event_ids, check_determinism); and add a cargo-fuzz target so the never-panic contract is enforced. Verify in isolation: cargo test -p memscribe-adapters --test <your_file_stem>.

The conformance suite then asserts your tool normalizes the canonical scenarios to the same shape as every other tool — that cross-tool equivalence is the point of the thin-waist event model.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Memscribe architecture

The pipeline

Crate responsibilities

The contract types

Input: the normalized event model (`model.rs`)

Output: the prepared-node stream (`node.rs`)

Epistemic honesty: `FactStatus`

How to add a new adapter

Uh oh!

FilesExpand file tree

ARCHITECTURE.md

Latest commit

History

ARCHITECTURE.md

File metadata and controls

Memscribe architecture

The pipeline

Crate responsibilities

The contract types

Input: the normalized event model (model.rs)

Output: the prepared-node stream (node.rs)

Epistemic honesty: FactStatus

How to add a new adapter

Input: the normalized event model (`model.rs`)

Output: the prepared-node stream (`node.rs`)

Epistemic honesty: `FactStatus`