|
| 1 | +# OMC Substrate Protocol (OMC-P) v1 |
| 2 | + |
| 3 | +> An inter-agent wire protocol for content-addressed code and data, |
| 4 | +> built on substrate-canonical hashes and signature verification |
| 5 | +> without shared keys. |
| 6 | +
|
| 7 | +## Status |
| 8 | + |
| 9 | +Living specification. Reference implementation lives in this |
| 10 | +repository: |
| 11 | +- Sender / receiver: `omc_msg_sign` / `omc_msg_verify` / the |
| 12 | + `omc_codec_*` family (OMC builtins, see `examples/lib/test.omc` |
| 13 | + patterns) |
| 14 | +- Storage layer: `omc-kernel` (`omnimcode-cli/src/bin/omc_kernel.rs`) |
| 15 | +- MCP adapter: `tools/mcp_substrate/server.py` |
| 16 | +- End-to-end demos: `examples/demos/llm_tandem_*.omc` |
| 17 | + |
| 18 | +## Design goals |
| 19 | + |
| 20 | +| Goal | Mechanism | |
| 21 | +|---|---| |
| 22 | +| **Identity without keys.** Verify content integrity without PKI. | Substrate signature: `content_hash = fnv1a_64(canonicalize(content))`; receiver recomputes and compares. Tamper-evident by construction. | |
| 23 | +| **Alpha-rename invariance.** Code that means the same thing has the same address. | Canonicalization at sender + receiver: AST normalization for OMC code; recursive key-sort for JSON; raw bytes for prose. | |
| 24 | +| **Compression without context-key state.** Sender and receiver share no per-message agreement. | Codec produces sampled-token payload addressed by canonical hash; receiver recovers via library lookup. | |
| 25 | +| **Forward compatibility.** Old receivers handle new message kinds gracefully. | Numeric `kind` field; unknown kinds short-circuit to "passthrough" handling. | |
| 26 | +| **Composability with content-addressed stores.** Messages reference content the receiver may already hold. | `omc_msg_recover_compressed` / `omc_msg_recover_from_registry` walk known libraries by canonical hash. | |
| 27 | + |
| 28 | +## Wire format |
| 29 | + |
| 30 | +Every OMC-P message is a JSON object with these fields: |
| 31 | + |
| 32 | +| Field | Type | Purpose | |
| 33 | +|---|---|---| |
| 34 | +| `sender_id` | int | Agent identity. `0` reserved for kernel-level / anonymous. Convention: `fnv1a_64("agent_name")` truncated to i32. | |
| 35 | +| `kind` | int | Message kind (see registry below). | |
| 36 | +| `content` | string | The payload (raw, or omitted if `sampled_tokens` is present). | |
| 37 | +| `content_hash` | int (string in JSON for precision) | Canonical hash of `content`, computed by `canonicalize` per the kind's addressing scheme. | |
| 38 | +| `attractor` | int | Nearest Fibonacci attractor to `content_hash`. | |
| 39 | +| `resonance` | float | `phi.res(content_hash)`. | |
| 40 | +| `him_score` | int | HBit invariant marker. | |
| 41 | +| `packed` | int | `(sender_id ^ kind ^ low32(content_hash))`. Identity dedup key. | |
| 42 | +| `sampled_tokens` (optional) | int[] | Codec compressed payload (codec messages only). | |
| 43 | +| `every_n` (optional) | int | Codec sampling rate. | |
| 44 | +| `original_tok_count` (optional) | int | Codec receiver hint. | |
| 45 | +| `source_bytes` (optional) | int | Original byte count. | |
| 46 | +| `compression_ratio` (optional) | float | Token-count compression. | |
| 47 | + |
| 48 | +### Example: raw signed message |
| 49 | + |
| 50 | +```json |
| 51 | +{ |
| 52 | + "sender_id": 18173, |
| 53 | + "kind": 1, |
| 54 | + "content": "fn compute_mean(xs) { ... }", |
| 55 | + "content_hash": "3551785709911115688", |
| 56 | + "attractor": "63245986", |
| 57 | + "resonance": 1.78e-17, |
| 58 | + "him_score": 0, |
| 59 | + "packed": 606047779 |
| 60 | +} |
| 61 | +``` |
| 62 | + |
| 63 | +### Example: codec-compressed message |
| 64 | + |
| 65 | +```json |
| 66 | +{ |
| 67 | + "sender_id": 18173, |
| 68 | + "kind": 1, |
| 69 | + "sampled_tokens": [4, 0, 109, 0, 116, 95, 0, 120, 629, 0, 118, 0, 99, 0, 109, 0, 34, 524], |
| 70 | + "content_hash": "3551785709911115688", |
| 71 | + "attractor": "63245986", |
| 72 | + "every_n": 3, |
| 73 | + "original_tok_count": 54, |
| 74 | + "source_bytes": 127, |
| 75 | + "compression_ratio": 2.117 |
| 76 | +} |
| 77 | +``` |
| 78 | + |
| 79 | +Note: `content` is absent. Receiver recovers via library lookup. |
| 80 | + |
| 81 | +## Message kind registry |
| 82 | + |
| 83 | +| `kind` | Name | Purpose | |
| 84 | +|---|---|---| |
| 85 | +| 0 | RESERVED | Do not use. | |
| 86 | +| 1 | REQUEST | Sender is asking the receiver to act on `content`. | |
| 87 | +| 2 | RESPONSE | Reply to a REQUEST. Carry `in_reply_to: <packed>` field if returning to a specific request. | |
| 88 | +| 3 | NOTIFY | Best-effort one-way notification. No response expected. | |
| 89 | +| 4 | FETCH | Receiver should treat `content_hash` as a request to send back the addressed content (or NOT_FOUND). | |
| 90 | +| 5 | STORE | Sender is offering content for the receiver's local store. Receiver MAY accept. | |
| 91 | +| 6 | HEARTBEAT | Peer liveness ping. | |
| 92 | +| 7 | ONBOARDING | Bundle of language reference / lib manifest for new agents. See `examples/tools/gen_onboarding_token.omc`. | |
| 93 | +| 8 | ERROR | Last operation failed. Body SHOULD contain `error: string` + optional `correlates_to: <packed>`. | |
| 94 | +| 16+ | application-defined | Reserved for negotiated extensions. | |
| 95 | + |
| 96 | +Receivers MUST handle kinds 1, 2, 3, 4, 5, 8. Other kinds MAY be |
| 97 | +silently dropped if unsupported. |
| 98 | + |
| 99 | +## Verification algorithm |
| 100 | + |
| 101 | +To verify a received message `M`: |
| 102 | + |
| 103 | +1. If `M.sampled_tokens` is absent (raw message): |
| 104 | + - `canon = canonicalize(M.content)` per addressing scheme for the |
| 105 | + content's kind |
| 106 | + - `recomputed = fnv1a_64(canon)` |
| 107 | + - If `recomputed != M.content_hash` → REJECT (tampered) |
| 108 | + - Optionally recompute `attractor`, `resonance`, `him_score` from |
| 109 | + `content_hash`; mismatches indicate sender bug or different |
| 110 | + substrate version — accept with warning. |
| 111 | +2. If `M.sampled_tokens` is present (codec message): |
| 112 | + - Look up `M.content_hash` in your library (`omc-kernel`, |
| 113 | + registry, peer store). If found: |
| 114 | + - `recomputed = fnv1a_64(canonicalize(found_content))` |
| 115 | + - If `recomputed == M.content_hash` → RECOVERED, content = `found_content` |
| 116 | + - If not found: |
| 117 | + - SEND back a FETCH message (kind=4) for the missing hash |
| 118 | + - Or: REJECT pending content acquisition |
| 119 | + |
| 120 | +`sender_id` is informational only — there is NO key-based proof that |
| 121 | +this sender wrote this content. The integrity guarantee is over |
| 122 | +content, not author. To bind author to content, sign the |
| 123 | +`packed`+`content_hash` tuple with conventional PKI on top of OMC-P |
| 124 | +(out-of-scope here). |
| 125 | + |
| 126 | +## Canonicalization schemes (the "addressing" field) |
| 127 | + |
| 128 | +| Scheme | Applied to | Algorithm | |
| 129 | +|---|---|---| |
| 130 | +| `omc_fn` | OMC source code | `canonical::canonicalize` — AST parse, normalize whitespace and comments, alpha-rename parameters/locals to canonical order, re-serialize. | |
| 131 | +| `json` | JSON data | Recursive key-sort, re-serialize. | |
| 132 | +| `prose` / `blob` | Arbitrary bytes | Identity (raw bytes). | |
| 133 | + |
| 134 | +The scheme determines what counts as "the same content." Choose |
| 135 | +the strictest scheme that preserves your semantic notion of equality. |
| 136 | + |
| 137 | +## Codec parameters |
| 138 | + |
| 139 | +| Param | Purpose | Range / default | |
| 140 | +|---|---|--:| |
| 141 | +| `every_n` | Keep every Nth canonical token | 1..16, typical 3-8 | |
| 142 | + |
| 143 | +Wire-byte break-even (single message, measured on TinyShakespeare- |
| 144 | +shaped OMC payloads): |
| 145 | + |
| 146 | +| Source size | Recommended `every_n` | |
| 147 | +|---|---| |
| 148 | +| < 500 B | Don't compress — use raw | |
| 149 | +| 500 B – 2 KB | 5 | |
| 150 | +| > 2 KB | 8 | |
| 151 | + |
| 152 | +The always-on win regardless of size is **library-lookup recovery**: |
| 153 | +alpha-rename invariant content addressing on the receiver, no |
| 154 | +shared key. |
| 155 | + |
| 156 | +## Peer discovery (informative, not normative for v1) |
| 157 | + |
| 158 | +v1 spec is point-to-point: peers know each other's addresses |
| 159 | +out-of-band (file path, socket, HTTP URL). Peer discovery is |
| 160 | +deferred to a future v2 that may build on: |
| 161 | + |
| 162 | +- Substrate-aware DHT (peers announce by `attractor_bucket(content_hash)`) |
| 163 | +- WebRTC datachannels for browser-resident agents |
| 164 | +- Existing libp2p / IPFS peer routing |
| 165 | + |
| 166 | +The wire format does not depend on the transport. The reference |
| 167 | +impl uses files in a shared directory; production deployments |
| 168 | +should use sockets / HTTP / message queues at their discretion. |
| 169 | + |
| 170 | +## Reference flows |
| 171 | + |
| 172 | +### Flow A: agent asks agent for a code-fragment (compressed) |
| 173 | + |
| 174 | +``` |
| 175 | +A → B: {sender=A, kind=4, content_hash=H} # FETCH H |
| 176 | +B: hash H is in B's store? yes → send RESPONSE |
| 177 | +B → A: {sender=B, kind=2, content="fn ...", # RESPONSE |
| 178 | + content_hash=H, attractor=..., ...} |
| 179 | +A: verify: recompute fnv1a_64(canonicalize("fn ...")) == H? yes |
| 180 | + → ACCEPT, content trusted |
| 181 | +``` |
| 182 | + |
| 183 | +### Flow B: agent broadcasts a code library |
| 184 | + |
| 185 | +``` |
| 186 | +A → *: {sender=A, kind=5, content="fn add(x,y)..."} # STORE |
| 187 | +A → *: {sender=A, kind=5, content="fn mean(xs)..."} # STORE |
| 188 | +... |
| 189 | +peers: each verifies + stores in local omc-kernel |
| 190 | +``` |
| 191 | + |
| 192 | +### Flow C: codec-compressed messaging |
| 193 | + |
| 194 | +``` |
| 195 | +A: msg = omc_msg_sign_compressed(big_source, A_id, 1, every_n=8) |
| 196 | +A → B: msg (carries sampled_tokens + content_hash, no content) |
| 197 | +B: recovered = omc_msg_recover_from_registry(msg) # checks local store |
| 198 | +B: if recovered: ACCEPT |
| 199 | + else: send FETCH back to A |
| 200 | +``` |
| 201 | + |
| 202 | +### Flow D: onboarding new agent |
| 203 | + |
| 204 | +``` |
| 205 | +A → B: {sender=A, kind=7, content=<json blob>, ...} # ONBOARDING |
| 206 | +B: verify signature |
| 207 | +B: parse content: {bootstrap_pack, lib_manifest, ...} |
| 208 | +B: ingest manifest into local omc-kernel |
| 209 | +B: now knows every standard fn by canonical hash |
| 210 | +``` |
| 211 | + |
| 212 | +See `examples/tools/gen_onboarding_token.omc` for a complete |
| 213 | +ONBOARDING bundle generator. |
| 214 | + |
| 215 | +## Compatibility commitments |
| 216 | + |
| 217 | +OMC-P v1: |
| 218 | +- Field name additions are non-breaking |
| 219 | +- Field removals require version bump |
| 220 | +- New `kind` values in [16, ∞) are non-breaking |
| 221 | +- New `kind` values in [9, 15] reserved for future v1 additions |
| 222 | +- Numeric IDs must fit in `i64` for `content_hash`, `attractor`, |
| 223 | + `sender_id`, `packed`; JSON should serialize as decimal strings |
| 224 | + to avoid float-precision loss in receivers |
| 225 | +- The `canonicalize` algorithm for each scheme is part of v1 |
| 226 | + forever; substrate-version changes must produce a new scheme |
| 227 | + name (e.g. `omc_fn_v2`) |
| 228 | + |
| 229 | +## Reference implementations |
| 230 | + |
| 231 | +| Component | Path | |
| 232 | +|---|---| |
| 233 | +| Sign / verify / serialize | `omnimcode-core/src/interpreter.rs` (`omc_msg_*` builtins) | |
| 234 | +| Codec encode / decode-lookup | `omnimcode-core/src/interpreter.rs` (`omc_codec_*` builtins) | |
| 235 | +| Persistent store | `omnimcode-cli/src/bin/omc_kernel.rs` | |
| 236 | +| MCP adapter | `tools/mcp_substrate/server.py` | |
| 237 | +| End-to-end demo (raw) | `examples/demos/llm_tandem_send.omc` + `llm_tandem_receive.omc` | |
| 238 | +| End-to-end demo (compressed + library) | `examples/demos/llm_tandem_send_compressed.omc` + `llm_tandem_receive_compressed.omc` + `llm_tandem_registry.omc` | |
| 239 | +| Onboarding bundle | `examples/tools/gen_onboarding_token.omc` + `consume_onboarding_token.omc` | |
| 240 | + |
| 241 | +## Non-goals |
| 242 | + |
| 243 | +- **Authentication.** OMC-P proves CONTENT integrity, not AUTHOR |
| 244 | + identity. Layer PKI / OAuth / OIDC on top if needed. |
| 245 | +- **Encryption.** Wire is plaintext JSON. Use TLS or wrap in an |
| 246 | + encrypted envelope before transport if confidentiality is needed. |
| 247 | +- **Transport.** OMC-P is wire format only. Use HTTP, sockets, |
| 248 | + message queues, files — anything that delivers bytes. |
| 249 | +- **Discovery.** Peers know each other out-of-band in v1. |
| 250 | + |
| 251 | +## Naming |
| 252 | + |
| 253 | +OMC-P is the inter-AGENT wire protocol. It is distinct from: |
| 254 | + |
| 255 | +- **OMC** the language (`omnicode`) |
| 256 | +- **omc-kernel** the storage CLI |
| 257 | +- **MCP** (Anthropic Model Context Protocol) — the OMC-P MCP server |
| 258 | + in `tools/mcp_substrate/` adapts OMC-P operations to the MCP |
| 259 | + RPC layer for LLM client consumption. |
| 260 | + |
| 261 | +## Version |
| 262 | + |
| 263 | +This document describes **OMC-P v1**, frozen 2026-05-16. |
| 264 | + |
| 265 | +Changes require: |
| 266 | +- Backwards-compatible additions: PR + this doc updated |
| 267 | +- Backwards-incompatible changes: bump to v2 + new file |
| 268 | + (`OMC-PROTOCOL-v2.md`) + reference impls forked or feature-gated |
0 commit comments