Status: Draft v1. Shipped in Gortex v0.9.0.
GCX1 is a tab-delimited, line-oriented, round-trippable wire format
for Gortex MCP tool responses. It is an opt-in alternative to JSON
selected per-call via format: "gcx". On the benchmark bundled at
bench/wire-format/ it yields a median −27.4 % tiktoken savings
vs JSON with 100 % round-trip integrity across 20 representative
tool responses.
- Round-trippable. Every GCX payload decodes back to an equivalent Go value. No lossy text.
- Tokenizer-aware. Field delimiters, escape sequences, and header syntax are chosen so tiktoken (cl100k_base) counts them as whitespace or single tokens — matching the LLM budget users care about, not just raw bytes.
- Per-tool tunable. Hot-path tools (
search_symbols,find_usages,analyze, ...) ship hand-tuned encoders with fixed field layouts. Everything else falls through to a generic fallback so no tool ever produces invalid GCX. - Versioned. The header carries a protocol version. Decoders reject unknown versions and agents can fall back to JSON transparently.
- Binary encoding. GCX1 is text-only; a future
GCX2may carry binary payloads (CBOR / MessagePack) under the same version prefix, but v1 stays text so agents can read raw payloads during debugging. - Schema evolution inside a major version. The field layout for a
given tool is fixed for the lifetime of
GCX1. New fields ship asGCX2. - Streaming. GCX1 is full-response.
GCX1-streamis a reserved future extension.
payload = section { section } ;
section = header row-line { row-line | comment } ;
header = TAG SP "tool=" token { SP key-value } SP "fields=" field-list LF ;
key-value = token "=" value ;
field-list = token { "," token } ;
row-line = value { TAB value } LF | LF ;
comment = "#" [ SP text ] LF ;
value = { escaped-char | safe-char } ;
escaped-char = "\\" ( "\\" | "t" | "n" ) ;
safe-char = any UTF-8 codepoint except TAB, LF, "\\" ;
TAG = "GCX1" ;
TAB = U+0009 ;
LF = U+000A ;
SP = U+0020 ;
Each section begins with a single-line header:
GCX1 tool=<name> fields=<a>,<b>,... [k=v]...
tool=is the MCP tool name (or a dot-suffixed sub-section name likeget_callers.edges).fields=is a comma-separated list declaring the column order for subsequent rows. At least one field is required.- Additional space-separated
k=vpairs carry metadata (total,truncated,etag,rows,ms, ...). Keys are emitted in sorted order so fixtures stay deterministic.
Header values that contain spaces, =, tabs, newlines, or backslashes
must be escaped exactly as row values are escaped.
Example:
GCX1 tool=search_symbols fields=id,kind,name,path,line,sig rows=3 total=7 truncated=false
After the header, each non-blank, non-comment line is a row of
tab-separated values in the order declared by fields=.
- Fewer values than declared fields: missing trailing columns default
to
"". - More values than declared fields: decoder returns an error.
- Blank lines between rows are ignored.
Lines beginning with # are comments. Comments carry no data; any
intermediary may drop them. The encoder uses them to annotate the
first row of a section (e.g. # 3 matches).
A row value may contain the following characters by escaping them:
| Character | Escape |
|---|---|
\ (backslash) |
\\ |
| TAB (U+0009) | \t |
| LF (U+000A) | \n |
Any other \x sequence decodes to the literal byte x so a
pathological payload cannot wedge the decoder. Callers should treat
decoded values as untrusted input.
CR (U+000D) is stripped on encode so Windows CRLF input round-trips as
\n-only output.
A GCX1 payload may contain multiple sections concatenated back-to-back.
Each new section begins with its own GCX1 header line. Decoders
detect section boundaries by scanning for the header tag after the
current section's rows exhaust.
Multi-section is used by:
get_callers,get_call_chain,get_dependencies,get_dependents,find_implementations— emit<tool>.nodesthen<tool>.edges.get_editing_context— emitstarget,callers,dependencies,testssections.get_repo_outline— one section per top-level key (languages,communities,hotspots,most_imported,entry_points).
| field | type | description |
|---|---|---|
| id | string | node ID |
| kind | string | function, method, type, interface, variable, contract |
| name | string | short name |
| path | string | file path |
| line | int | start line |
| sig | string | extracted signature, optional |
Header meta: total, truncated.
| field | type | description |
|---|---|---|
| id | string | |
| kind | string | |
| name | string | |
| path | string | |
| start_line | int | |
| end_line | int | |
| from_line | int | first line of returned source (may precede start_line by context_lines) |
| sig | string | |
| etag | string | content hash for if_none_match caching |
| source | string | full source text, tab/newline-escaped |
Exactly one row.
| field | type |
|---|---|
| id | string |
| kind | string |
| name | string |
| path | string |
| start_line | int |
| end_line | int |
| sig | string |
| source | string |
| error | string |
| field | type | description |
|---|---|---|
| from | string | caller symbol ID |
| to | string | called symbol ID (the query subject) |
| edge_kind | string | calls, references, implements, ... |
| origin | string | tier: lsp_resolved, lsp_dispatch, ast_resolved, ast_inferred, text_matched |
| confidence | float | 0..1 |
| from_name | string | caller short name |
| from_path | string | caller file path |
| from_line | int | caller start line |
| field | type |
|---|---|
| id | string |
| kind | string |
| name | string |
| line | int |
| sig | string |
Header meta: total_nodes, total_edges, truncated, etag.
Two sections: <tool>.nodes then <tool>.edges.
.nodesfields:id,kind,name,path,line..edgesfields:from,to,kind,origin,confidence,label.
Four sections. Fields:
.target:id,kind,name,path,start_line,end_line,sig,etag. One row..callers:id,kind,name,path,line..dependencies: same as.callers..tests:path.
Two sections: .task (one row, field task) and .symbols with
fields id, kind, name, path, line, score, reason.
Kind-polymorphic header tag (analyze.dead_code,
analyze.hotspots, analyze.cycles, analyze.<other>):
analyze.dead_code:id,kind,name,path,line,reason.analyze.hotspots:id,name,path,line,fan_in,fan_out,cross_cut,score.analyze.cycles:size,severity,nodes(comma-separated).- Anything else falls through to the generic fallback encoder.
contracts.list:id,type,method,path,service,providers,consumers(comma-separated lists).contracts.orphans(only whenaction=check):contract_id,side,repo,symbol.
GCX1 v1 also defines three protocol-level shapes that travel alongside
tool responses: a tool-definitions registry section, a
tool-request envelope, and an error envelope. Every MCP
tool definition carries an explicit scope, and so the legality of an
inbound call can be decided by combining that scope with the request's
repo parameter. All three shapes are first-class GCX1 sections and
must round-trip byte-identically across gcx-go and gcx-ts.
Section for the per-tool scope registry. Layout:
GCX1 tool=tool_definitions fields=name,scope
<name>\t<scope>\n
...
nameis the MCP tool name (one row per tool).scopeis one of the three string literalsrepo,workspace,fan-out. Anything else is a schema error in both codecs.- Rows are emitted in ascending
nameorder so the bytes are reproducible regardless of the encoder's input order.
A definition without scope (empty cell, missing column, or unknown
value) is a schema error and both codecs reject it on encode and on
decode.
Envelope for one inbound MCP call. Layout:
GCX1 tool=tool_request fields=tool,scope,repo
<tool>\t<scope>\t<repo-cell>\n
Exactly one row. The repo cell is a union shape decided by
scope:
| scope | repo cell |
|---|---|
repo |
a non-empty repo name (plain string, e.g. gortex) |
workspace |
empty string (the repo parameter is absent) |
fan-out |
a compact JSON-array literal, e.g. ["*"] or ["gortex","gortex-cloud"] |
Rationale for the cell encoding choices:
-
scope=repo → plain string. A single repo name is the most common case and never needs structure; a plain string keeps the cell tokenizer-friendly.
-
scope=workspace → empty. The
repoparameter MUST NOT be present for workspace-level tools. The empty cell — already how GCX1 represents an absent column under the "fewer values than declared fields default to empty" rule — is the correct on-wire signal for that absence. -
scope=fan-out → compact JSON array. This re-uses the generic-fallback nested-value rule already used elsewhere in GCX1 ("nested values inside a cell serialise to compact JSON"). Callers decode the cell with
JSON.parse(TypeScript) orjson.Unmarshal(Go) without learning a new escape format. Alternative encodings considered:- Comma-joined string (e.g.
gortex,gortex-cloud): rejected because some namespaces legitimately contain commas (gRPC method paths, generic type parameters). - Repeated cells across multiple rows: rejected because the request envelope is single-row by contract; multi-row would overload the section's identity.
- Tab-joined string: rejected because tab is the GCX1 column delimiter; any in-cell use would force an escape and break the "tabs never appear in cells" property the format relies on for fast scanning.
Compact JSON wins on three axes simultaneously: it is unambiguous (every list value round-trips), it composes with the existing generic-fallback rule, and it stays on a single physical line.
- Comma-joined string (e.g.
The ["*"] sentinel is a literal two-character string * inside a
JSON array — it is the only legal way to spell "fan out across
every repo in this workspace". Omitting repo for a fan-out tool is a
protocol error, surfaced as an error section with code
missing_repo_list (see below).
Envelope for protocol-level rejections returned by the server in lieu of a tool result. Layout:
GCX1 tool=error fields=code,message,detail
<code>\t<message>\t<detail>\n
Exactly one row. code MUST be non-empty; message and detail are
free-form strings (escape rules apply per the standard table). The
codes defined in GCX1 v1:
| code | when |
|---|---|
unknown_repo |
a fan-out request lists a name not present in the active workspace (resolved Q1) |
missing_repo_list |
a scope: fan-out request omits repo in workspace mode |
missing_repo |
a scope: repo request omits repo in workspace mode |
repo_not_allowed |
a scope: workspace request includes repo (any value) |
wrong_repo_shape |
the repo parameter has the wrong type for the tool's declared scope |
Both codecs expose these as named constants
(ErrCodeUnknownRepo / ERR_CODE_UNKNOWN_REPO, etc.) so call sites
do not stringly type the code value.
The fixtures under gcx-ts/test/golden/scope_*.gcx cover one fixture
per scope kind (repo, workspace, fan-out with ["*"], fan-out with a
named subset) plus the two named protocol-error shapes. The Go-side
gcx-go parity test (scope_golden_test.go) re-encodes the same
logical inputs and asserts byte-for-byte equality against the
committed fixtures. Any drift between gcx-go and gcx-ts MUST fail
that test before any other CI step.
Any tool without a hand-tuned encoder routes through the generic fallback. The fallback inspects the canonical JSON shape:
| Input shape | Output |
|---|---|
{} object |
one section, one row, fields = sorted keys |
[] array of objects |
one section, one row per element, fields = union of keys (sorted) |
[] array of scalars |
one section, field value, one row per element |
| scalar | one section, field value, one row |
Nested values (arrays / objects) inside a cell serialise to compact
JSON so the cell stays on a single physical line. Decoders may
re-hydrate by JSON.parse on such cells.
- The literal header prefix
GCX1is stable for the lifetime of version 1. - A decoder that sees a different prefix (e.g.,
GCX2) must treat the payload as unknown and MAY fall back to JSON by re-issuing the MCP call withoutformat: "gcx". - Field layouts for declared tools are frozen within
GCX1. Additions ship asGCX2— renaming a tool's field set is a breaking change.
- Tab delimiter (not comma): symbol names routinely contain
commas (
(int, string)) and parentheses. Tab is rare in source and absent from identifiers. Escape pressure stays low. - Newline-terminated rows: tokenizer-friendly and transport-transparent (no binary framing). SSE / chunked HTTP can forward one row per frame without re-parsing.
- Minimal escape alphabet: two-byte
\t/\n/\\keeps the hot path cheap. Code payloads rarely contain raw tabs or unescaped backslashes, so escape overhead is a rounding error in practice. - Header-based metadata:
total,truncated,etaglive on the header rather than a per-row phantom column. That keeps the row schema flat and lets the encoder skip meta work when the tool doesn't care.
- Go encoder / decoder: MIT-licensed standalone module at
github.com/gortexhq/gcx-go(go get github.com/gortexhq/gcx-go) — header + row + escape primitives + generic fallback. Per-tool hand-tuned encoders live ininternal/mcp/gcx.go. - TypeScript decoder: MIT-licensed standalone package at
github.com/gortexhq/gcx-ts(npm:@gortex/wire).
See bench/wire-format/. The harness scores bytes, tiktoken tokens,
gzip bytes, and round-trip integrity across 20 representative tool
responses and emits a markdown scorecard. Rerun after any change to
the upstream gcx-go module or internal/mcp/gcx.go to catch regressions.