Memscribe is a deterministic, zero-LLM pipeline that turns the transcript logs AI coding agents already write into typed nodes the downstream inference-and-governance layer (MemCortex) can consume. No model is ever called: capture is reading and parsing, never summarizing. The output is an exact function of the input, which is what makes the whole module golden-file, property, and fuzz testable.
It is the bottom of a three-layer stack — Memtrace uses MemCortex, and
MemCortex uses Memscribe. The dependency direction is strictly one-way:
each layer depends only on the one below it, and memscribe-core depends on
nothing else in the workspace. Memscribe never calls upward.
A single, linear, deterministic pipeline. Each stage is a trait, so it can be tested in isolation and swapped. Everything between Source and Sink is a pure, synchronous function of the event stream.
Source Adapter Gate Segmenter Binder NodePrep Sink
(memscribe-io) (memscribe- (core) (core) (core) (core) (memscribe-sink)
adapters)
tail JSONL → parse one → admit? → arc / turn → decision → assemble → NDJSON / SQLite
hook stdin RawRecord → commitment spans; ↔ edit, PreparedNode / MemDB
OTLP receiver CaptureEvent[] markers elevate gated PROV stream
(version- turns; seed (t_use
tolerant) decisions; ≤ t_gen)
collect edits
RawRecord CaptureEvent markers Segmentation BindingEdge PreparedNode (consumer)
(bytes + provenance) (normalized) stream
Source → Adapterproduces the normalizedCaptureEventstream — the system of record. This is the only stage that touches tool-specific formats.Gate → Segmenter → Binder → NodePreptransform that stream intoPreparedNodes. Pure and synchronous given the events.- An optional redaction pass runs over the prepared nodes before the sink.
Sinkwrites the nodes out. It is the single seam that decouples Memscribe from MemDB.
The orchestration lives in memscribe-core::pipeline::DefaultPipeline:
let nodes = DefaultPipeline::new() // redaction ON by default
.run_records(adapter.as_ref(), &records); // parse → prepare → redact
// or stream straight to a sink:
let n = DefaultPipeline::new()
.run_to_sink(adapter.as_ref(), &records, &mut sink)?;DefaultPipeline::prepare_events(&events) is the pure core: its output is an
exact function of events. without_redaction() turns the redactor off (golden
tests assert on verbatim content), and with_gate(..) / with_redactor(..)
swap in config-driven stages.
| Crate | Responsibility |
|---|---|
memscribe-core |
The frozen contract: the event model, the prepared-node output types, the TranscriptAdapter and Sink traits, and the deterministic pipeline (gate → segmenter → binder → nodeprep) plus the redact pass. Depends on nothing in the workspace. |
memscribe-adapters |
Per-tool parsers behind feature flags. Each implements TranscriptAdapter. The registry assembles the enabled set (all_adapters) and resolves one by SourceKind (adapter_for). |
memscribe-io |
Generic sources: a notify-based file tailer (offset resume), a hook server, and an OTLP receiver. Turns raw bytes into RawRecords. |
memscribe-sink |
Concrete Sinks: NdjsonSink (canonical default), SqliteSink (feature sqlite), and MemDbSink (feature memdb, off by default). |
memscribe-cli |
The memscribe binary: watch / hook / parse / replay / verify / redact. |
memscribe-testkit |
The harness: parse_events / prepare_nodes, the invariant checks, golden-fixture loaders, and the cross-tool conformance scenario catalog. |
All of these live in memscribe-core and are re-exported from its crate root.
Do not change their behavior or public shape — the test suite and every
consumer depend on exact output.
CaptureEvent is the system of record produced by adapters. Every field is
copied verbatim from the source; none is generated by Memscribe.
pub struct CaptureEvent {
pub schema_version: u16, // SCHEMA_VERSION; consumers gate on this
pub source: SourceKind, // which tool produced it
pub session_id: String, // tool-native session/thread id
pub seq: u64, // monotonic per-session, from file order
pub event_id: String, // tool-native id, or blake3(content) fallback
pub parent_id: Option<String>, // DAG link where the tool provides one
pub timestamp: OffsetDateTime, // RFC3339, verbatim
pub project: ProjectRef, // cwd / repo_root / git, from session start
pub kind: EventKind, // the payload
pub provenance: SourceLocation, // pointer back into the source bytes
}EventKind is the payload enum. EventKind::Unknown is load-bearing: an
unrecognized record type or a new field is preserved verbatim and flagged,
never discarded — that is how the stream stays lossless across tool-version
churn.
EventKind variant |
Meaning |
|---|---|
SessionStart |
cwd, git ref, model, tool version |
UserTurn |
a user message (flattened text + structured Parts) |
AssistantTurn |
an assistant message (text, thinking, model, usage, parts) |
ToolCall |
a tool invocation (call_id, name, raw args) |
ToolResult |
a tool result (call_id, ok, raw output) |
FileEdit |
a normalized Diff (from Edit/Write/apply_patch/replace) |
Compaction |
model-side history compaction — flagged, never stored as truth |
Rewind |
a user rewind back to an earlier event |
SessionEnd |
the session ended |
Unknown |
an unrecognized record, preserved verbatim and flagged |
SourceKind enumerates the nine tools plus Unknown; SourceKind::parse maps
CLI/--as slugs (tolerant of aliases such as claude / claude-code).
PreparedNode is the typed data a consumer ingests. It is a tagged enum:
PreparedNode variant |
Payload | Meaning |
|---|---|---|
Conversation |
ConversationSpan |
a gated, verbatim dialogue span with the markers that fired |
Decision |
DecisionRecord |
a deterministically-parsed decision (IBIS / QOC / MADR / Kruchten shape) |
Episode |
CodeEpisode |
a code edit episode: path, Diff, git ref, deterministic episode_id |
Binding |
BindingEdge |
a decision/conversation → episode edge carrying a ProvRecord |
Every node and edge carries a FactStatus. Memscribe only ever emits the
first two; the latter two are flags for a downstream inference layer —
values Memscribe never computes by guessing. This is the property that keeps the
module zero-LLM and its output golden-testable.
FactStatus |
Who sets it |
|---|---|
Observed |
Memscribe — verbatim from the source |
DeterministicallyDerived |
Memscribe — a pure function of observed data |
StatisticallyRanked |
downstream — a statistical measure |
LlmHypothesis |
downstream — an LLM hypothesis; Memscribe only flags it |
ProvRecord records used(session, decision) + wasGeneratedBy(diff, session)
with the temporal invariant t_use ≤ t_gen (ProvRecord::is_temporally_valid).
Adapters are the volatile part — every tool's format churns — so adding one is a
well-trodden, five-step path. The contract: a parser is version-tolerant
(it pattern-matches on the fields it needs and routes anything unrecognized to
EventKind::Unknown) and must never panic.
-
Add a
SourceKindvariant (memscribe-core/src/model.rs). Wire its stable snake_case slug intoSourceKind::as_strand intoSourceKind::parse(include any aliases). This is the one allowed touch ofmemscribe-corefor a new tool — coordinate it, since the frozen contract is shared. -
Add the adapter module (
memscribe-adapters/src/<tool>.rs) behind a#[cfg(feature = "<tool>")]and a matching entry in the crate's[features]table. ImplementTranscriptAdapter:source_kind()— return yourSourceKind.discover(&DiscoverCfg)— locate live & historical transcripts. Honor the per-tool override key inDiscoverCfg.overrides(e.g.CLAUDE_CONFIG_DIR,CODEX_HOME) and fall back tocfg.home_dir(). Return handles in a deterministic (sorted) order.parse(&RawRecord, &mut ParseCtx)— turn ONE record into zero or moreCaptureEvents. UseParseCtx::alloc_seqfor the monotonicseq,ParseCtx::first_seenfor dedup, andParseCtx::project_or_defaultfor the project binding. Never panic; route unknowns toEventKind::Unknown.schema_fingerprint(&RawRecord)— return aSchemaVariantso the corpus and runtime can version-gate the parser.
-
Register it (
memscribe-adapters/src/registry.rs). Add the cfg-gatedpushinall_adapters()and the cfg-gated arm inadapter_for(). -
Add fixtures under
fixtures/<tool>/<version>/<scenario>.jsonlfor the canonical scenarios inmemscribe-testkit::scenarios::SCENARIOS, and bless the expected outputs underfixtures-expected/<tool>/<version>/(see CONTRIBUTING.md for the capture → golden → bless flow). -
Add tests. Unit-test the parser; run the shared invariant checks from
memscribe-testkit::invariants(check_monotonic_seq,check_lossless,check_unique_event_ids,check_determinism); and add acargo-fuzztarget so the never-panic contract is enforced. Verify in isolation:cargo test -p memscribe-adapters --test <your_file_stem>.
The conformance suite then asserts your tool normalizes the canonical scenarios to the same shape as every other tool — that cross-tool equivalence is the point of the thin-waist event model.