Skip to content

fix(agent): refuse to register a CG that can never satisfy global quorum#374

Open
branarakic wants to merge 2 commits intomainfrom
fix/register-cg-quorum-preflight
Open

fix(agent): refuse to register a CG that can never satisfy global quorum#374
branarakic wants to merge 2 commits intomainfrom
fix/register-cg-quorum-preflight

Conversation

@branarakic
Copy link
Copy Markdown
Contributor

Summary

DKGAgent.registerContextGraph now refuses on-chain registration when the proposed CG configuration cannot satisfy ParametersStorage.minimumRequiredSignatures (the global quorum floor).

This closes the operator footgun observed during the Base Sepolia cgId=10 → cgId=11 incident, where running dkg context-graph register <local-name> against a CG with no recorded participantIdentityIds in _meta silently fell back to [selfIdentityId] + requiredSignatures = 1, then succeeded on-chain because ContextGraphStorage.createContextGraph only enforces requiredSignatures <= hostingNodes.length — it never reads the global minimum. The minimum-signatures check is deferred to KnowledgeAssetsV10.publishDirect, so every publish to that CG reverts forever with MinSignaturesRequirementNotMet. The on-chain CG is paid-for, persistent, and permanently broken; the operator only finds out at publish time.

Repro chain (pre-PR)

  1. dkg context-graph create my-cg --name 'My CG' (no --participants).
  2. dkg context-graph register my-cg succeeds, mints CG #N on-chain with hostingNodes=[selfIdentityId], requiredSignatures=1.
  3. dkg publish ... to that CG reverts with MinSignaturesRequirementNotMet(3, 1) (assuming production-typical minimumRequiredSignatures = 3).
  4. CG #N is now wasted gas + a confused operator.

After this PR

Step 2 throws synchronously with one of:

  • `Context graph "my-cg" cannot be registered on-chain: it has 1 hosting node but the global minimum quorum (ParametersStorage.minimumRequiredSignatures) is 3. Recreate the CG with at least 3 hosting nodes — otherwise every publish would revert with MinSignaturesRequirementNotMet(3, 1).`
  • `Context graph "my-cg" cannot be registered on-chain: requiredSignatures=1 is below the global minimum quorum of 3. Recreate the CG with --required-signatures 3 (or higher) before registering — otherwise every publish would revert with MinSignaturesRequirementNotMet(3, 1).`

The silent [self] fallback is preserved (some single-host dev/test flows depend on it) but is now surfaced as a structured WARN log so operators can grep for it. The new check is gated on chain.getMinimumRequiredSignatures being implemented, so legacy/mock adapters that don't expose it are unaffected.

Why pre-flight (vs let chain revert)

The chain revert is the symptom, not the root cause. With pre-flight:

  • No wasted gas on a CG that can never publish.
  • Error message names the future revert explicitly + suggests the fix.
  • Operator notices at register time (interactive), not at publish time (typically minutes/hours/days later in a different shell, often during incident response).

Test plan

  • pnpm --filter @origintrail-official/dkg-agent build clean.
  • New packages/agent/test/register-quorum-preflight.test.ts (5 tests) all green:
    • refuses silent [self] fallback when globalMin=3
    • refuses explicit participants below globalMin
    • refuses requiredSignatures below globalMin
    • accepts when both satisfy globalMin
    • preserves legacy single-host behaviour when globalMin=1
  • Existing Genesis Knowledge registration tests in agent.test.ts (which use MockChainAdapter with default minimumRequiredSignatures=1) all pass — no regression.
  • Pre-existing per-cg-quorum-extra.test.ts failure verified unrelated (intentional RED spec anchor for production bug A-5; identical failure on unmodified main).

Out of scope

  • Adding a --solo opt-out flag to allow registering a single-host CG when globalMin > 1 (no use case yet).
  • Surfacing the same check at dkg context-graph create time (the create call is purely local — the user isn't paying gas yet, and the existing requiredSignatures > participantCount check there already covers the most common typo).

Made with Cursor

Comment thread packages/agent/src/dkg-agent.ts Outdated
// implement `getMinimumRequiredSignatures` (mock/legacy adapters): the
// existing on-chain create call is still subject to its own constraints,
// and we don't want to regress test fixtures that use a permissive mock.
if (typeof this.chain.getMinimumRequiredSignatures === 'function') {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Bug: This guard only protects registerContextGraph(). The daemon/API path that calls registerContextGraphOnChain() directly still accepts participantIdentityIds/requiredSignatures combinations below minimumRequiredSignatures, so clients can still mint a context graph that will always fail on publish. Move this validation into registerContextGraphOnChain() (or a shared helper used by both entry points) and add coverage for the direct registration path.

Comment thread packages/agent/src/dkg-agent.ts Outdated
let globalMin: number;
try {
globalMin = await this.chain.getMinimumRequiredSignatures();
} catch (err) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Bug: Swallowing getMinimumRequiredSignatures() failures reintroduces the exact footgun this PR is trying to prevent. A transient eth_call/RPC failure here can still be followed by a successful createContextGraph transaction, leaving the operator with a permanently unpublishable CG. For adapters that expose this getter, fail closed instead of logging and continuing; if tests need to bypass the check, make that an explicit opt-in rather than the default on read errors.

Branimir Rakic and others added 2 commits May 4, 2026 14:20
Operator footgun observed during the Base Sepolia cgId=10/cgId=11 incident:
running `dkg context-graph register <local-name>` against a CG with no
recorded `participantIdentityIds` in `_meta` silently fell back to
`[selfIdentityId]` plus `requiredSignatures = 1`, then succeeded on-chain
because `ContextGraphStorage.createContextGraph` only enforces
`requiredSignatures <= hostingNodes.length` — it never reads
`ParametersStorage.minimumRequiredSignatures`. The minimum-signatures
check happens later, inside `KnowledgeAssetsV10.publishDirect`, so every
publish to that CG reverts forever with `MinSignaturesRequirementNotMet`
and the on-chain CG is paid-for and permanently broken.

Pre-flight in `DKGAgent.registerContextGraph`:
- When `chain.getMinimumRequiredSignatures` is implemented, refuse when
  `effectiveParticipantIdentityIds.length < globalMin` or
  `effectiveRequiredSignatures < globalMin`, with explicit hints naming
  the future revert (`MinSignaturesRequirementNotMet(globalMin, X)`).
- Surface the silent `[self]` fallback as a structured WARN log so
  operators can scrape for it instead of finding out at publish time.
- Skip the check when the adapter does not expose
  `getMinimumRequiredSignatures` (mock/legacy adapters), keeping existing
  fixtures green.

`MockChainAdapter.minimumRequiredSignatures` defaults to 1, so the check
is a no-op for the existing agent-test fixtures (verified locally:
Genesis Knowledge registration tests still pass).

Adds 5 focused tests covering all four branches plus the legacy
single-host backwards-compatible case.

Co-authored-by: Cursor <cursoragent@cursor.com>
…um gate

Codex review on PR #374 flagged two real bugs in the global-quorum
pre-flight added in the previous commit:

1. The validation only ran inside `registerContextGraph(id, opts)`. The
   daemon's `/api/context-graph` route calls `registerContextGraphOnChain`
   directly (`packages/cli/src/daemon/routes/context-graph.ts:429`) and
   therefore bypasses the check entirely — an HTTP client could still
   mint a permanently-broken CG by submitting bad params at the lower
   entry point. The footgun the PR was supposed to close was only
   half-closed.

2. The `try { getMinimumRequiredSignatures() } catch { warn-and-continue }`
   shape silently skipped the pre-flight on a transient RPC failure,
   after which the chain's own `createContextGraph` would happily mint
   the CG (it doesn't check the global floor) and every subsequent
   publish would revert. That is exactly the bug the pre-flight exists
   to prevent: a single flaky `eth_call` reintroduces the footgun.

Refactor the validation into a private `assertGlobalQuorumOrThrow`
helper, called from both `registerContextGraph` (preserves existing
test messages, gives operators a nicer error with the local CG `id`
and an early failure before the lazy curator-stamping work) AND
`registerContextGraphOnChain` (closes the daemon-HTTP bypass). Belt +
braces; the lower-level chokepoint is the security boundary that all
callers must traverse, the upper-level call is the better UX surface.

Switch the RPC-error branch from `warn-and-continue (globalMin = 0)` to
`throw`. Skip silently only when the adapter does not implement
`getMinimumRequiredSignatures` at all — i.e. legacy / pre-V10 mock
adapters that don't model the floor. This keeps the existing test
fixtures working without re-introducing the swallowed-error footgun
on real chain adapters.

Tests:
- Existing 5 high-level tests still pass.
- 3 new tests assert the bypass is closed (call `registerContextGraphOnChain`
  directly with bad hostingNodes / requiredSignatures combos and assert
  it throws before any chain mutation).
- 2 new tests assert fail-closed semantics (override `getMinimumRequiredSignatures`
  to reject; assert both the high-level and direct-call paths throw with
  a clear "Refusing to proceed (fail-closed)" message instead of silently
  proceeding).

Verified by sabotage: removing the inner-call validation makes exactly
the 3 new direct-call + RPC-error tests fail, leaving the original 7
passing — proving the new tests genuinely catch the bugs, not just the
happy path.

Co-authored-by: Cursor <cursoragent@cursor.com>
@branarakic branarakic force-pushed the fix/register-cg-quorum-preflight branch from 23a1ef1 to 74202c1 Compare May 4, 2026 12:32
Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Codex review completed — no issues found.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant