Skip to content

Commit 4d6b05d

Browse files
Goal 3: OMC-PROTOCOL.md v1 spec
Formalizes the substrate-signed wire format we've proven with the Hermes tandem demos. v1 frozen 2026-05-16. Sections: - Design goals (identity-without-keys, alpha-rename invariance, compression-without-state, forward compat, composability) - Wire format (sender_id, kind, content, content_hash, attractor, resonance, him_score, packed; optional codec fields) - Message kind registry (REQUEST, RESPONSE, NOTIFY, FETCH, STORE, HEARTBEAT, ONBOARDING, ERROR + 16+ for application-defined) - Verification algorithm (recompute canonical hash, compare; library-lookup recovery for codec messages) - Canonicalization schemes (omc_fn, json, prose/blob) - Codec parameters + wire-byte break-even table (honest) - Reference flows: A=FETCH, B=STORE broadcast, C=compressed messaging, D=onboarding - Compatibility commitments (additions non-breaking; new kinds in [16,inf) free; canonical algorithm frozen per scheme name) - Reference implementations (pointers to interpreter.rs, kernel, MCP server, demos) - Non-goals (auth, encryption, transport, discovery — explicit) This is the network-effect move: anyone can now write an OMC-P-speaking agent against this spec. The existing impls are both the reference and the conformance test. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
1 parent 0721e35 commit 4d6b05d

1 file changed

Lines changed: 268 additions & 0 deletions

File tree

OMC-PROTOCOL.md

Lines changed: 268 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,268 @@
1+
# OMC Substrate Protocol (OMC-P) v1
2+
3+
> An inter-agent wire protocol for content-addressed code and data,
4+
> built on substrate-canonical hashes and signature verification
5+
> without shared keys.
6+
7+
## Status
8+
9+
Living specification. Reference implementation lives in this
10+
repository:
11+
- Sender / receiver: `omc_msg_sign` / `omc_msg_verify` / the
12+
`omc_codec_*` family (OMC builtins, see `examples/lib/test.omc`
13+
patterns)
14+
- Storage layer: `omc-kernel` (`omnimcode-cli/src/bin/omc_kernel.rs`)
15+
- MCP adapter: `tools/mcp_substrate/server.py`
16+
- End-to-end demos: `examples/demos/llm_tandem_*.omc`
17+
18+
## Design goals
19+
20+
| Goal | Mechanism |
21+
|---|---|
22+
| **Identity without keys.** Verify content integrity without PKI. | Substrate signature: `content_hash = fnv1a_64(canonicalize(content))`; receiver recomputes and compares. Tamper-evident by construction. |
23+
| **Alpha-rename invariance.** Code that means the same thing has the same address. | Canonicalization at sender + receiver: AST normalization for OMC code; recursive key-sort for JSON; raw bytes for prose. |
24+
| **Compression without context-key state.** Sender and receiver share no per-message agreement. | Codec produces sampled-token payload addressed by canonical hash; receiver recovers via library lookup. |
25+
| **Forward compatibility.** Old receivers handle new message kinds gracefully. | Numeric `kind` field; unknown kinds short-circuit to "passthrough" handling. |
26+
| **Composability with content-addressed stores.** Messages reference content the receiver may already hold. | `omc_msg_recover_compressed` / `omc_msg_recover_from_registry` walk known libraries by canonical hash. |
27+
28+
## Wire format
29+
30+
Every OMC-P message is a JSON object with these fields:
31+
32+
| Field | Type | Purpose |
33+
|---|---|---|
34+
| `sender_id` | int | Agent identity. `0` reserved for kernel-level / anonymous. Convention: `fnv1a_64("agent_name")` truncated to i32. |
35+
| `kind` | int | Message kind (see registry below). |
36+
| `content` | string | The payload (raw, or omitted if `sampled_tokens` is present). |
37+
| `content_hash` | int (string in JSON for precision) | Canonical hash of `content`, computed by `canonicalize` per the kind's addressing scheme. |
38+
| `attractor` | int | Nearest Fibonacci attractor to `content_hash`. |
39+
| `resonance` | float | `phi.res(content_hash)`. |
40+
| `him_score` | int | HBit invariant marker. |
41+
| `packed` | int | `(sender_id ^ kind ^ low32(content_hash))`. Identity dedup key. |
42+
| `sampled_tokens` (optional) | int[] | Codec compressed payload (codec messages only). |
43+
| `every_n` (optional) | int | Codec sampling rate. |
44+
| `original_tok_count` (optional) | int | Codec receiver hint. |
45+
| `source_bytes` (optional) | int | Original byte count. |
46+
| `compression_ratio` (optional) | float | Token-count compression. |
47+
48+
### Example: raw signed message
49+
50+
```json
51+
{
52+
"sender_id": 18173,
53+
"kind": 1,
54+
"content": "fn compute_mean(xs) { ... }",
55+
"content_hash": "3551785709911115688",
56+
"attractor": "63245986",
57+
"resonance": 1.78e-17,
58+
"him_score": 0,
59+
"packed": 606047779
60+
}
61+
```
62+
63+
### Example: codec-compressed message
64+
65+
```json
66+
{
67+
"sender_id": 18173,
68+
"kind": 1,
69+
"sampled_tokens": [4, 0, 109, 0, 116, 95, 0, 120, 629, 0, 118, 0, 99, 0, 109, 0, 34, 524],
70+
"content_hash": "3551785709911115688",
71+
"attractor": "63245986",
72+
"every_n": 3,
73+
"original_tok_count": 54,
74+
"source_bytes": 127,
75+
"compression_ratio": 2.117
76+
}
77+
```
78+
79+
Note: `content` is absent. Receiver recovers via library lookup.
80+
81+
## Message kind registry
82+
83+
| `kind` | Name | Purpose |
84+
|---|---|---|
85+
| 0 | RESERVED | Do not use. |
86+
| 1 | REQUEST | Sender is asking the receiver to act on `content`. |
87+
| 2 | RESPONSE | Reply to a REQUEST. Carry `in_reply_to: <packed>` field if returning to a specific request. |
88+
| 3 | NOTIFY | Best-effort one-way notification. No response expected. |
89+
| 4 | FETCH | Receiver should treat `content_hash` as a request to send back the addressed content (or NOT_FOUND). |
90+
| 5 | STORE | Sender is offering content for the receiver's local store. Receiver MAY accept. |
91+
| 6 | HEARTBEAT | Peer liveness ping. |
92+
| 7 | ONBOARDING | Bundle of language reference / lib manifest for new agents. See `examples/tools/gen_onboarding_token.omc`. |
93+
| 8 | ERROR | Last operation failed. Body SHOULD contain `error: string` + optional `correlates_to: <packed>`. |
94+
| 16+ | application-defined | Reserved for negotiated extensions. |
95+
96+
Receivers MUST handle kinds 1, 2, 3, 4, 5, 8. Other kinds MAY be
97+
silently dropped if unsupported.
98+
99+
## Verification algorithm
100+
101+
To verify a received message `M`:
102+
103+
1. If `M.sampled_tokens` is absent (raw message):
104+
- `canon = canonicalize(M.content)` per addressing scheme for the
105+
content's kind
106+
- `recomputed = fnv1a_64(canon)`
107+
- If `recomputed != M.content_hash` → REJECT (tampered)
108+
- Optionally recompute `attractor`, `resonance`, `him_score` from
109+
`content_hash`; mismatches indicate sender bug or different
110+
substrate version — accept with warning.
111+
2. If `M.sampled_tokens` is present (codec message):
112+
- Look up `M.content_hash` in your library (`omc-kernel`,
113+
registry, peer store). If found:
114+
- `recomputed = fnv1a_64(canonicalize(found_content))`
115+
- If `recomputed == M.content_hash` → RECOVERED, content = `found_content`
116+
- If not found:
117+
- SEND back a FETCH message (kind=4) for the missing hash
118+
- Or: REJECT pending content acquisition
119+
120+
`sender_id` is informational only — there is NO key-based proof that
121+
this sender wrote this content. The integrity guarantee is over
122+
content, not author. To bind author to content, sign the
123+
`packed`+`content_hash` tuple with conventional PKI on top of OMC-P
124+
(out-of-scope here).
125+
126+
## Canonicalization schemes (the "addressing" field)
127+
128+
| Scheme | Applied to | Algorithm |
129+
|---|---|---|
130+
| `omc_fn` | OMC source code | `canonical::canonicalize` — AST parse, normalize whitespace and comments, alpha-rename parameters/locals to canonical order, re-serialize. |
131+
| `json` | JSON data | Recursive key-sort, re-serialize. |
132+
| `prose` / `blob` | Arbitrary bytes | Identity (raw bytes). |
133+
134+
The scheme determines what counts as "the same content." Choose
135+
the strictest scheme that preserves your semantic notion of equality.
136+
137+
## Codec parameters
138+
139+
| Param | Purpose | Range / default |
140+
|---|---|--:|
141+
| `every_n` | Keep every Nth canonical token | 1..16, typical 3-8 |
142+
143+
Wire-byte break-even (single message, measured on TinyShakespeare-
144+
shaped OMC payloads):
145+
146+
| Source size | Recommended `every_n` |
147+
|---|---|
148+
| < 500 B | Don't compress — use raw |
149+
| 500 B – 2 KB | 5 |
150+
| > 2 KB | 8 |
151+
152+
The always-on win regardless of size is **library-lookup recovery**:
153+
alpha-rename invariant content addressing on the receiver, no
154+
shared key.
155+
156+
## Peer discovery (informative, not normative for v1)
157+
158+
v1 spec is point-to-point: peers know each other's addresses
159+
out-of-band (file path, socket, HTTP URL). Peer discovery is
160+
deferred to a future v2 that may build on:
161+
162+
- Substrate-aware DHT (peers announce by `attractor_bucket(content_hash)`)
163+
- WebRTC datachannels for browser-resident agents
164+
- Existing libp2p / IPFS peer routing
165+
166+
The wire format does not depend on the transport. The reference
167+
impl uses files in a shared directory; production deployments
168+
should use sockets / HTTP / message queues at their discretion.
169+
170+
## Reference flows
171+
172+
### Flow A: agent asks agent for a code-fragment (compressed)
173+
174+
```
175+
A → B: {sender=A, kind=4, content_hash=H} # FETCH H
176+
B: hash H is in B's store? yes → send RESPONSE
177+
B → A: {sender=B, kind=2, content="fn ...", # RESPONSE
178+
content_hash=H, attractor=..., ...}
179+
A: verify: recompute fnv1a_64(canonicalize("fn ...")) == H? yes
180+
→ ACCEPT, content trusted
181+
```
182+
183+
### Flow B: agent broadcasts a code library
184+
185+
```
186+
A → *: {sender=A, kind=5, content="fn add(x,y)..."} # STORE
187+
A → *: {sender=A, kind=5, content="fn mean(xs)..."} # STORE
188+
...
189+
peers: each verifies + stores in local omc-kernel
190+
```
191+
192+
### Flow C: codec-compressed messaging
193+
194+
```
195+
A: msg = omc_msg_sign_compressed(big_source, A_id, 1, every_n=8)
196+
A → B: msg (carries sampled_tokens + content_hash, no content)
197+
B: recovered = omc_msg_recover_from_registry(msg) # checks local store
198+
B: if recovered: ACCEPT
199+
else: send FETCH back to A
200+
```
201+
202+
### Flow D: onboarding new agent
203+
204+
```
205+
A → B: {sender=A, kind=7, content=<json blob>, ...} # ONBOARDING
206+
B: verify signature
207+
B: parse content: {bootstrap_pack, lib_manifest, ...}
208+
B: ingest manifest into local omc-kernel
209+
B: now knows every standard fn by canonical hash
210+
```
211+
212+
See `examples/tools/gen_onboarding_token.omc` for a complete
213+
ONBOARDING bundle generator.
214+
215+
## Compatibility commitments
216+
217+
OMC-P v1:
218+
- Field name additions are non-breaking
219+
- Field removals require version bump
220+
- New `kind` values in [16, ∞) are non-breaking
221+
- New `kind` values in [9, 15] reserved for future v1 additions
222+
- Numeric IDs must fit in `i64` for `content_hash`, `attractor`,
223+
`sender_id`, `packed`; JSON should serialize as decimal strings
224+
to avoid float-precision loss in receivers
225+
- The `canonicalize` algorithm for each scheme is part of v1
226+
forever; substrate-version changes must produce a new scheme
227+
name (e.g. `omc_fn_v2`)
228+
229+
## Reference implementations
230+
231+
| Component | Path |
232+
|---|---|
233+
| Sign / verify / serialize | `omnimcode-core/src/interpreter.rs` (`omc_msg_*` builtins) |
234+
| Codec encode / decode-lookup | `omnimcode-core/src/interpreter.rs` (`omc_codec_*` builtins) |
235+
| Persistent store | `omnimcode-cli/src/bin/omc_kernel.rs` |
236+
| MCP adapter | `tools/mcp_substrate/server.py` |
237+
| End-to-end demo (raw) | `examples/demos/llm_tandem_send.omc` + `llm_tandem_receive.omc` |
238+
| End-to-end demo (compressed + library) | `examples/demos/llm_tandem_send_compressed.omc` + `llm_tandem_receive_compressed.omc` + `llm_tandem_registry.omc` |
239+
| Onboarding bundle | `examples/tools/gen_onboarding_token.omc` + `consume_onboarding_token.omc` |
240+
241+
## Non-goals
242+
243+
- **Authentication.** OMC-P proves CONTENT integrity, not AUTHOR
244+
identity. Layer PKI / OAuth / OIDC on top if needed.
245+
- **Encryption.** Wire is plaintext JSON. Use TLS or wrap in an
246+
encrypted envelope before transport if confidentiality is needed.
247+
- **Transport.** OMC-P is wire format only. Use HTTP, sockets,
248+
message queues, files — anything that delivers bytes.
249+
- **Discovery.** Peers know each other out-of-band in v1.
250+
251+
## Naming
252+
253+
OMC-P is the inter-AGENT wire protocol. It is distinct from:
254+
255+
- **OMC** the language (`omnicode`)
256+
- **omc-kernel** the storage CLI
257+
- **MCP** (Anthropic Model Context Protocol) — the OMC-P MCP server
258+
in `tools/mcp_substrate/` adapts OMC-P operations to the MCP
259+
RPC layer for LLM client consumption.
260+
261+
## Version
262+
263+
This document describes **OMC-P v1**, frozen 2026-05-16.
264+
265+
Changes require:
266+
- Backwards-compatible additions: PR + this doc updated
267+
- Backwards-incompatible changes: bump to v2 + new file
268+
(`OMC-PROTOCOL-v2.md`) + reference impls forked or feature-gated

0 commit comments

Comments
 (0)