Skip to content

fix: derive wire from spec version + fold tasks/MRTR into harness conventions#8

Merged
panyam merged 5 commits into
feat/tasks-mrtr-extensionfrom
fix/derive-wire-from-spec-version
Jun 5, 2026
Merged

fix: derive wire from spec version + fold tasks/MRTR into harness conventions#8
panyam merged 5 commits into
feat/tasks-mrtr-extensionfrom
fix/derive-wire-from-spec-version

Conversation

@panyam

@panyam panyam commented Jun 2, 2026

Copy link
Copy Markdown
Owner

Addresses the review comments on conformance#262 plus a CLI bug where draft-only scenarios silently used the legacy initialize wire instead of SEP-2575, on a single follow-up branch so it can be reviewed in isolation before merging back into PR 262.

What's in here

Wire derivation (fixes a CLI _meta-on-draft bug, addresses comment 3341183243)

  • Drops the wire? field from RunContext. Scenarios derive wire from specVersion via isStateless(ctx) (returns true when the spec version is post-SEP-2575). Now lives in connection/select.ts next to connectFor, sharing the single STATEFUL_VERSIONS source of truth.
  • Extends the CLI's spec-version inference to default extension-keyed scenarios to draft (today every extension scenario in the repo lives there). Without this, the inference stayed on LATEST_SPEC_VERSION for tasks scenarios so the CLI silently emitted the legacy initialize handshake against draft instead of the SEP-2575 per-request _meta envelope.
  • New regression in src/runner/server.test.ts: spins up an in-process recording HTTP server, drives runServerConformanceTest("tasks-lifecycle"), asserts no initialize was sent and the first body carries the SEP-2575 _meta.io.modelcontextprotocol/* envelope.

Layout: stop forking the harness (addresses 3341935238, 3341984677, 3341986870, 3341988987, 3341965202, 3341967601, 3341970447, 3341972921, 3341942597)

  • Deletes scenarios/server/_shared/ entirely. checks.ts + sep-refs.ts + wire-format.ts consolidate into a single top-level scenarios/server/tasks-mrtr-helpers.ts matching the existing input-required-result-helpers.ts pattern.
  • Deletes tasks/all-scenarios.test.ts, mrtr/all-scenarios.test.ts, _shared/test-runner.ts. Per AGENTS.md line 55 ("Don't reimplement the runner"), the existing CLI runner is the supported entry point — npm start -- server --scenario tasks-lifecycle --url <fixture> or --suite pending --url <fixture> to run all of them.
  • Deletes tasks/README.md and mrtr/README.md. Fixture requirements moved into each scenario's description string (addresses 3341953147).

Connection abstraction (addresses 3341919092)

  • Folds _shared/raw-session.ts (692 lines) into connection/stateless.ts. The TS SDK didn't have the Connection abstraction when this PR opened; PR 318 landed June 1 with exactly that, so this is the rebase the review asked for.
  • Connection grows three additions tasks/MRTR scenarios actually use:
    • request(method, params, extraHeaders?) — third positional arg, for the SEP-2243 header-mismatch tests.
    • discover() — lazy on the stateless wire, eager-resolved from SDK accessors on the stateful wire. Returns the bootstrap-shaped {capabilities, serverInfo, instructions}.
    • ConnectOptions { capabilities?, clientInfo? } threaded through RunContext.connect()connectForconnectStateful/connectStateless.
  • The stateless wire does NOT eagerly emit server/discover when capabilities are declared. SEP-2575 says per-request _meta.clientCapabilities is authoritative; servers that gate extensions on a prior server/discover are non-conformant, and the suite's job is to surface that, not paper over it.

MRTR dedupe (addresses 3341960887)

  • 7 of the 8 checks in mrtr/ephemeral-flow.ts substantially overlapped Conformance Tests for SEP-2322 MRTR modelcontextprotocol/conformance#188's InputRequiredResult* scenarios (basic-elicitation, sampling, list-roots, request-state, multiple-input-requests, multi-round, wrong-input-key). They go away with the whole mrtr/ directory.
  • The one genuinely new check, sep-2663-mrtr-synchronous-before-task-creation (MRTR → Tasks composition, SEP-2663 commit 451f5e1), moves to its own tasks scenario at tasks/composition.ts (extensionId = tasks, registered alongside the other tasks-* scenarios), reusing Conformance Tests for SEP-2322 MRTR modelcontextprotocol/conformance#188's helpers (isInputRequiredResult, mockElicitResponse from input-required-result-helpers.ts).

Reviewer's guide

Three commits, each topically self-contained. Reviewing in order matches the historical evolution; reviewing in reverse matches a "what's the final shape" pass.

Commit 6466f05 — wire derivation (the CLI _meta-on-draft fix)

Smallest, most behavioral. Read this first if you only have 10 minutes.

  • src/connection/index.tswire? field dropped from RunContext. Single-line diff, anchors everything downstream.
  • src/runner/server.ts — spec-version inference: extension-keyed scenarios now default to DRAFT_PROTOCOL_VERSION. The actual fix for the CLI bug; subtle because tasks-* scenarios were falling through to LATEST_SPEC_VERSION (a 2025-x stateful version) and silently emitting legacy initialize.
  • src/runner/server.test.ts — regression test pinning the SEP-2575 fingerprint on the CLI path. In-process recording HTTP server, asserts initialize is never sent + first body has _meta. This is the test the bug would regress.
  • The ~10 scenario edits are mechanical (stateless: ctx.wire === 'stateless'stateless: isStateless(ctx)); skim, don't grep.

Commit 3d026d4 — layout refactor + Connection extension

The bulk of the diff. Two distinct sub-changes that landed together because the layout flatten needed Connection to absorb raw-session's surface first.

Connection extension (read first to ground the rest):

  • src/connection/index.ts — new ConnectOptions type, new discover() method, extraHeaders 3rd-arg on request. The whole API surface fits on one screen; the rest of the commit just propagates.
  • src/connection/stateless.ts — biggest behavior file. Key sections: mcpNameForRequest gains tasks/get|update|cancel (uses params.taskId); withRequestMeta now accepts capabilities/clientInfo overrides; connectStateless does lazy memoized server/discover (NOT eager — deliberately).
  • src/connection/stateful.ts — synthesizes discover() from the SDK Client's post-initialize accessors. extraHeaders throws on stateful (no current 2025-x scenario needs it; flag loudly if one shows up).
  • src/connection/select.tsconnectFor now threads ConnectOptions; isStateless(ctx) helper lives here alongside STATEFUL_VERSIONS (single source of truth for the wire decision).
  • src/connection/testing.ts, src/runner/server.ts — thin shims wiring ConnectOptions through.

Layout flatten (read after Connection lands in your head):

Commit aeb450e — MRTR dedupe (the smallest blast radius)

Pure net-negative diff (+190 / −725). Last in the sequence because it required the layout to have landed.

Pressure-test

Things worth a second pair of eyes:

  1. Connection API shape (request(..., extraHeaders?) as a 3rd positional arg; discover() as a method, not a Promise property; ConnectOptions { capabilities?, clientInfo? }). All three were judgment calls — extraHeaders could have been an options bag; discover could have been an awaitable property; ConnectOptions could have grown more fields. Current shape minimizes the diff at call sites at the cost of being slightly less future-proof.
  2. Spec-version inference in src/runner/server.ts — defaulting extension-keyed scenarios to draft is correct today (every extension in the repo lives on draft), but a future non-draft extension would need an explicit lookup. If we should be doing that lookup now (e.g. per-extension map keyed on extensionId), flag it.
  3. Strict-per-spec stateless wire (connectStateless does NOT eagerly emit server/discover when capabilities are declared). Per SEP-2575, per-request _meta.clientCapabilities is authoritative; servers that gate extensions on a prior server/discover are non-conformant. The conformance suite's job is to surface that, not paper over it — but if you'd rather the harness behaved like real-world clients (eager bootstrap when capabilities are declared) and surfaced the gap differently, now's the time to push back.
  4. Helper consolidation in src/scenarios/server/tasks-mrtr-helpers.tserrMsg / failureCheck / skipCheck / AnyResult are generic enough that they could live at a more general top-level location (e.g. scenarios/server/check-helpers.ts). Bundled here because no non-tasks/MRTR scenario needs them today; promote if/when one does.

Test plan

  • npm test — 241 pass (was 253; -12 is the deleted raw-session.test.ts. ~6 were redundant with connection/connection.test.ts, ~6 pinned behavior the current connection parser doesn't have).
  • npm run lint — clean.
  • npm run build — 427 kB bundle.
  • all-scenarios.test.ts (the existing full conformance suite, spawns the bundled everything-server fixture): 47/47 scenarios pass.
  • Regression in runner/server.test.ts pins the CLI's SEP-2575 wire fingerprint so the bug can't silently regress.
  • End-to-end against a SEP-2663 reference fixture: tasks-capability-negotiation passes 4/4 because it explicitly calls await withExt.discover() and registers the session. Other tasks scenarios surface a fixture-side spec gap (per-request _meta ignored without a prior server/discover) — to be filed upstream against the fixture, not patched here.

What's NOT in here

  • --spawn-cmd CLI flag to auto-launch fixture binaries - useful but the harness change be in a follow up PR.

Today's workflow = spawn the fixture in the background, run npm start -- server --suite pending --url ..., kill the fixture.

panyam added 2 commits June 2, 2026 08:02
`RunContext.wire` was a parallel knob to `specVersion` that the tasks
and MRTR harnesses used to loop scenarios across both the legacy
session wire and the SEP-2575 stateless wire. After SEP-2575 collapsed
the legacy wire on DRAFT-2026-v1 (a6f7c27), the matrix shrank to a
single column on draft — and an explicit `wire` field was both
redundant and unsafe: it let callers go out of sync with the
`specVersion` they declared on the wire.

The CLI's `runServerConformanceTest` never set the field, so every
`npm start -- server --scenario tasks-* --url ...` invocation silently
emitted the legacy `initialize` handshake against a draft server,
producing requests with no `_meta.io.modelcontextprotocol/*` envelope.
Strict SEP-2575 servers correctly rejected the traffic; lenient ones
accepted it and masked the gap.

Changes:
- Drop `wire?` from `RunContext`.
- Add `isStateless(ctx)` in `_shared/wire-mode.ts`; scenarios derive
  `stateless` from `POST_SEP_2575_VERSIONS.has(ctx.specVersion)`.
- Extend the CLI's spec-version inference to default extension
  scenarios to draft (today every extension scenario in the repo lives
  there). Without this, the inference stayed on `LATEST_SPEC_VERSION`
  for tasks scenarios and kept emitting legacy traffic.
- Flatten the now-1-element wire loop in the tasks + mrtr harnesses;
  remove the parseWireModes/effectiveWireModes/MCP_WIRE_MODES infra
  that drove it (no remaining consumers).

Adds a vitest regression in `runner/server.test.ts` that spins up an
in-process recording HTTP server, drives `runServerConformanceTest`
at `tasks-lifecycle`, and asserts the first outgoing call is
`server/discover` carrying `_meta.io.modelcontextprotocol/*` — the
SEP-2575 fingerprint — so the CLI wire selection can't silently
regress.
Folds the parallel mini-framework the tasks/MRTR suites had grown into
the existing harness shape pcarleton called out in conformance PR 262.
No new spec coverage; existing scenarios still pass 241/241 unit tests
and the regression added for the wire-derivation fix still pins the CLI
fingerprint.

Layout

  scenarios/server/_shared/ is gone. The helper bundle moves to
  scenarios/server/tasks-mrtr-helpers.ts (matching the existing
  input-required-result-helpers.ts precedent); raw-session.ts and its
  test fold into connection/stateless.ts.

  Per-suite runners + READMEs (tasks/all-scenarios.test.ts,
  mrtr/all-scenarios.test.ts, _shared/test-runner.ts, both READMEs) are
  removed. The scenarios are already registered in
  scenarios/index.ts pendingClientScenariosList, so the CLI runner
  (`npm start -- server --scenario tasks-lifecycle --url ...` or
  `--suite pending --url ...`) is the supported entry point. Per
  AGENTS.md: don't reimplement the runner.

Connection extension

  Connection grows the three things tasks/MRTR scenarios actually need
  beyond the existing surface:

   - `request(method, params, extraHeaders?)` for the SEP-2243
     header-mismatch tests.
   - `discover(): Promise<Record<string, unknown>>` for
     capability-negotiation scenarios. Resolved eagerly from the SDK
     accessors on the stateful wire; issued lazily as `server/discover`
     on the stateless wire and memoized.
   - `ConnectOptions { capabilities?, clientInfo? }` on
     RunContext.connect() so scenarios can negotiate extensions
     (`extensions: { 'io.modelcontextprotocol/tasks': {} }`) at
     bootstrap on either wire.

  Crucially the stateless wire does NOT eagerly emit `server/discover`
  when capabilities are declared. SEP-2575 says per-request
  `_meta.io.modelcontextprotocol/clientCapabilities` is authoritative;
  a server that gates extensions on a prior `server/discover` is
  non-conformant, and the conformance suite's job is to surface that,
  not paper over it.

Scenario migration

  Every tasks/MRTR scenario switches from `initRawSession(...)` to
  `await ctx.connect({ capabilities })`, and the capability scenario
  switches from `session.initializeResult.capabilities` to
  `await withExt.discover()`. The wire choice now flows entirely from
  `specVersion` — `isStateless(ctx)` (moved to connection/select.ts
  next to `connectFor`) is the single source of truth.

Description sweep

  Each scenario's `description` string now spells out the required
  server fixtures (tool names + behavior) directly. Previously that
  content lived only in the file-header comment, so an implementer
  reading the runner's emitted description was missing it.

Reviewer notes

  - 23 files changed, +455 / -291 (net + because the Connection
    abstraction now carries surface that used to live in raw-session;
    the raw-session deletion was 692 lines on its own).
  - Unit tests: 241 pass (the 12 deleted tests were redundant with
    connection-level coverage; the genuinely-unique SSE multi-line
    `data:` test pinned behavior the connection parser doesn't have).
  - Reference fixture runs: tasks scenarios surface a fixture-side
    spec gap (per-request `_meta` not honored without a prior
    `server/discover`). That's the suite working; the gap will be
    filed upstream against the fixture, not papered over here.
…old composition into tasks/

Of the 8 checks in mrtr/ephemeral-flow.ts, 7 substantially overlapped
the SEP-2322 InputRequiredResult* scenarios that modelcontextprotocol#188 already shipped
(basic-elicitation, sampling, list-roots, request-state, multiple-input-
requests, multi-round, wrong-input-key). Only sep-2663-mrtr-synchronous-
before-task-creation — the SEP-2663 commit 451f5e1 composition path
where the MRTR loop's final round returns a CreateTaskResult — is
genuinely new versus modelcontextprotocol#188's coverage, and it's SEP-2663-specific, not
SEP-2322-specific.

Moves that one check to its own tasks scenario in tasks/composition.ts
(extensionId = io.modelcontextprotocol/tasks, registered alongside the
other tasks-* scenarios in scenarios/index.ts) and deletes the entire
mrtr/ directory.

Reuses the existing modelcontextprotocol#188 helpers (isInputRequiredResult, mockElicitResponse
from input-required-result-helpers.ts) rather than reintroducing the
parallel mrtr/helpers.ts ones.
Comment thread src/runner/server.test.ts
Comment on lines +80 to +82
} catch {
// non-JSON body — record nothing
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't hitting this be a significant issue?

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True - when I doing this - the catch body was meant to ignore things if NOT json - thougth this would be fine but you are right. Even though the harness is only supposed to ever send JSON RPC bodies, the rare case where it may not be shoudnt be suppressed.

Comment thread src/connection/stateless.ts Outdated
Comment on lines +417 to +419
const response = await send('server/discover');
drainEvents(response);
return unwrap<Record<string, unknown>>('server/discover', response);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This just reduces to request('server/discover'), doesn't it?

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep was on a different train of thought when i was doing this. But that didnt stick so shouldnt have been here. Simplified it now.

Comment on lines -8 to 21
it('returns stateful for dated 2025-x versions', () => {
expect(connectFor('2025-03-26')).toBe(connectStateful);
expect(connectFor('2025-06-18')).toBe(connectStateful);
expect(connectFor('2025-11-25')).toBe(connectStateful);
// connectFor now always wraps the underlying impl in a closure (to thread
// ConnectOptions through), so identity with connectStateful/connectStateless
// no longer holds; assert wire-shape behaviour in the dedicated suites
// below and via stateless.test.ts.
it('returns a function for dated 2025-x versions', () => {
expect(typeof connectFor('2025-03-26')).toBe('function');
expect(typeof connectFor('2025-06-18')).toBe('function');
expect(typeof connectFor('2025-11-25')).toBe('function');
});
it('returns stateless for the draft version', () => {
// connectFor wraps connectStateless in a closure (to pass the spec
// version through), so identity with connectStateless no longer holds;
// assert it did not select the stateful implementation. The wire-level
// behaviour of the wrapper is covered in stateless.test.ts.
it('returns a function for the draft version', () => {
expect(typeof connectFor('DRAFT-2026-v1')).toBe('function');
expect(connectFor('DRAFT-2026-v1')).not.toBe(connectStateful);
expect(connectFor('DRAFT-2026-v1')).not.toBe(connectStateless);
});

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks tautological? It's not clear to me how this could fail, and what it would tell us if it does.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Removed it

panyam added 2 commits June 2, 2026 17:21
- connectStateless.discover() now reuses the request() pipeline (send +
  drainEvents + unwrap) instead of inlining a duplicate copy. The local
  memoization stays via discoverPromise ??=.
- runner/server.test.ts mock server no longer silently swallows
  unparseable bodies. They land in a parseFailures sidecar that the
  test asserts is empty — a non-JSON body to this mock is exactly the
  kind of regression the test exists to catch.
- Drop the tautological connectFor identity tests. They asserted
  `typeof === 'function'` and `not.toBe(connectStateful)`, both of which
  hold trivially once connectFor wraps in a closure. Wire-shape behavior
  is already covered by the connectStateless mock-fetch suite and by
  runner/server.test.ts.
Conflicts in src/connection/{select.ts, connection.test.ts}.

- select.ts: keep the ConnectOptions threading from this branch and the
  isStatefulVersion / ALL_SPEC_VERSIONS / STATELESS_SPEC_VERSIONS
  helpers introduced upstream for the version-aware MockServer. connectFor
  now goes through isStatefulVersion while still forwarding opts; isStateless
  uses the same helper.
- connection.test.ts: keep upstream's STATELESS_SPEC_VERSIONS assertions
  (genuine round-trip against isStatefulVersion); drop the old connectFor
  identity test that this branch already removed as tautological once
  connectFor wraps in a closure.

Full vitest run: 267/267 pass; tsc clean.
@panyam panyam merged commit c05f9e6 into feat/tasks-mrtr-extension Jun 5, 2026
2 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants