Skip to content

Fix edge publishes with dynamic core ACK quorum#405

Open
branarakic wants to merge 1 commit intomainfrom
codex/fix-edge-cg-dynamic-core-quorum
Open

Fix edge publishes with dynamic core ACK quorum#405
branarakic wants to merge 1 commit intomainfrom
codex/fix-edge-cg-dynamic-core-quorum

Conversation

@branarakic
Copy link
Copy Markdown
Contributor

Summary

  • Allow edge-owned context graph registration without fixed hostingNodes or legacy quorum coupling.
  • Validate V10 publish ACK quorum against active sharding-table core identities instead of stake-only eligibility.
  • Add preferred publish gateway config/CLI/HTTP support, gateway publisher signatures, and PCA/paymaster validation.
  • Advertise publish-gateway metadata from core profiles and auto-subscribe cores to public CGs they discover.

Verification

  • Local isolated devnet smoke:
    • 5 nodes: 4 cores + 1 identity-less edge.
    • Restored minimumRequiredSignatures to 3.
    • Edge registered an empty-host public CG on chain.
    • All 4 cores auto-discovered/subscribed.
    • Edge published 10 KCs through gateway core identity 1.
    • Each publish collected 3 core ACKs and confirmed on chain.
    • Produced value was credited to the gateway identity.
  • pnpm -r --filter @origintrail-official/dkg-core --filter @origintrail-official/dkg-storage --filter @origintrail-official/dkg-query --filter @origintrail-official/dkg-publisher --filter @origintrail-official/dkg-chain --filter @origintrail-official/dkg-epcis --filter @origintrail-official/dkg-random-sampling --filter @origintrail-official/dkg-agent --filter @origintrail-official/dkg-adapter-openclaw --filter @origintrail-official/dkg-adapter-hermes --filter @origintrail-official/dkg-mcp run build
  • pnpm --dir packages/publisher exec vitest run test/publish-gateway-handler.test.ts
  • pnpm --dir packages/evm-module exec hardhat test --network hardhat --config hardhat.node.config.ts test/unit/ContextGraphStorage.test.ts test/unit/ContextGraphs.test.ts
  • pnpm --dir packages/evm-module exec hardhat test --network hardhat --config hardhat.node.config.ts --grep "empty-host CG|ACK from an operational key|raising quorum|duplicate identity ids|lowering quorum" test/unit/KnowledgeAssetsV10-extra.test.ts

Notes

  • Full pnpm --filter @origintrail-official/dkg-evm-module test:unit still includes unrelated intentional red audit/spec-gap cases, including Hub event/auth expectations, RandomSampling auth expectations, and the KAV10 KnowledgeBatchCreated spec-gap test.
  • pnpm --dir packages/cli exec tsc --noEmit is still blocked by unrelated Hermes ChatMemoryManager type errors in packages/cli/src/daemon/hermes.ts and packages/cli/src/daemon/routes/hermes.ts.

@branarakic branarakic force-pushed the codex/fix-edge-cg-dynamic-core-quorum branch from d0fc724 to 9673062 Compare May 5, 2026 13:22
Comment thread packages/chain/src/evm-adapter.ts Outdated
);
}
const registeredAccountId = BigInt(
await this.contracts.publishingConvictionAccount.agentToAccountId(txSigner.address),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Bug: This preflight is checking agentToAccountId on PublishingConvictionAccount, but that getter exists on DKGPublishingConvictionNFT and KnowledgeAssetsV10.publish() also resolves the paying account through the NFT contract. As written, any gateway publish that sets publisherConvictionAccountId will fail before broadcast (or validate the wrong account model). Please switch this lookup to the same NFT-backed registration source that publish() uses.

Comment thread packages/agent/src/dkg-agent.ts Outdated
// Do not force an edge-owned CG to mint or reuse a node identity just to
// satisfy legacy ContextGraphStorage.hostingNodes metadata.
const effectiveParticipantIdentityIds = participantIdentityIds;
const effectiveRequiredSignatures = Number.isInteger(storedRequiredSignatures) && storedRequiredSignatures >= 0
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Bug: Allowing requiredSignatures to persist as 0 here creates on-chain context graphs that the existing verify flow cannot use. DKGAgent.verify() still treats the chain value as a strictly positive quorum and throws when getContextGraphConfig().requiredSignatures < 1 (dkg-agent.ts:5692-5709). Either keep the old floor here or update the verify path in the same PR so zero-quorum CGs remain operable.

}
}

if (this.config.isSignerRegistered) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Bug: The gateway only checks isSignerRegistered() before signing. After this PR, KnowledgeAssetsV10._verifySignature() also requires the publisher identity to be an active sharding-table core node, so a core that has dropped out of the active set will continue handing out signatures that deterministically revert on-chain. Please gate gateway signing on active-core membership as well (or reuse the same identity check the ACK path uses).

@branarakic branarakic force-pushed the codex/fix-edge-cg-dynamic-core-quorum branch from 9673062 to 9ce2e42 Compare May 5, 2026 13:29
Comment thread packages/cli/src/cli.ts Outdated
}

function resolvePublishGatewayForCommand(opts: ActionOpts, stored?: PublisherGatewayConfig): PublisherGatewayConfig | undefined {
if (!opts.publishGatewayPeer && !opts.pcaAccount && !opts.paymaster) return undefined;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Bug: This early return makes the saved publisher.gateway config effectively dead unless the user also passes one of the per-command flags. After dkg publisher gateway set ..., a plain dkg shared-memory publish still publishes without the gateway. Fall back to stored here and treat CLI flags as overrides, rather than requiring a flag to opt into the saved config.

Comment thread packages/agent/src/dkg-agent.ts Outdated
pcaAccountId: this.config.publishGateway?.pcaAccountId,
paymaster: this.config.publishGateway?.paymaster,
isSignerRegistered: async () => {
const isOperationalWalletRegistered = this.chain.isOperationalWalletRegistered;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Bug: The gateway liveness check only verifies that this wallet is still registered for the identity. KnowledgeAssetsV10._verifySignature() now also requires the publisher identity to be an active sharding-table core, so a node that has been unstaked/evicted will keep signing gateway responses and every publish will fail later on-chain. Wire this through verifyACKIdentity (or an equivalent active-core check) instead of isOperationalWalletRegistered alone.

Comment thread packages/chain/src/mock-adapter.ts Outdated
);
}
if (
params.publisherConvictionAccountId !== undefined &&
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Issue: The mock only checks that the requested conviction account exists, but the real EVM adapter also verifies that the transaction signer is registered to that account via agentToAccountId(txSigner.address). That means mock-backed tests can now pass for a gateway publish that will revert on a real chain when the sender is bound to a different account. Mirror the signer/account binding check here so tests keep parity with production behavior.

@branarakic branarakic force-pushed the codex/fix-edge-cg-dynamic-core-quorum branch 4 times, most recently from 60ffdcc to dd909bd Compare May 5, 2026 13:42
this.config = config;
}

handler = async (data: Uint8Array, _peerId: PeerId): Promise<Uint8Array> => {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Bug: The requester identity is ignored here, so any connected peer can ask this core to sign publisher digests. When paymaster is configured, an untrusted peer can get free publishes charged to that paymaster on open context graphs, and even without a paymaster they can still attribute arbitrary produced value to this node. Gate publish-gateway requests on an explicit allowlist or another authorization check tied to peerId before signing.

randomSamplingWalPath: config.randomSampling?.walPath,
randomSamplingTickIntervalMs: config.randomSampling?.tickIntervalMs,
randomSamplingUseWorkerThread: config.randomSampling?.useWorkerThread,
publishGateway: normalizePublisherGatewayConfig(config.publisher?.gateway),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Bug: config.publisher.gateway is created by dkg publisher gateway set as the preferred remote gateway, but this line now feeds it into DKGAgentConfig.publishGateway, which is also used for the local handler/profile on core nodes. A core that sets an upstream gateway will start advertising and serving a local gateway with the upstream PCA/paymaster settings. Split outbound gateway preference from local gateway capability instead of reusing the same config object.

if (typeof gateway.peerId !== 'string' || gateway.peerId.trim().length === 0) {
throw new Error('"publishGateway.peerId" is required');
}
if (typeof gateway.nodeIdentityId !== 'string' && typeof gateway.nodeIdentityId !== 'number' && typeof gateway.nodeIdentityId !== 'bigint') {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Issue: Accepting numeric nodeIdentityId/pcaAccountId values is unsafe here. JSON numbers above Number.MAX_SAFE_INTEGER are already rounded before BigInt(String(value)) runs, so large on-chain IDs can be silently corrupted. Require these fields to be strings (or explicitly reject unsafe integers) to avoid misrouting publishes to the wrong identity/account.

@branarakic branarakic force-pushed the codex/fix-edge-cg-dynamic-core-quorum branch from dd909bd to bfa8214 Compare May 5, 2026 15:34
this.config = config;
}

handler = async (data: Uint8Array, _peerId: PeerId): Promise<Uint8Array> => {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Bug: _peerId is ignored here, so any connected peer can request a valid publisher signature from this core. When a paymaster is configured, that also gives arbitrary peers a way to route their own publish txs through that paymaster, because the only check below is “is this paymaster valid?”, not “is this requester allowed to use it?”. Gate this protocol on the requesting peer / publisher identity before returning a signature.

Comment thread packages/publisher/src/dkg-publisher.ts Outdated
`Publish gateway did not confirm requested PCA account ${options.publishGateway.pcaAccountId}`,
);
}
if (gatewaySignature.pcaAccountId !== undefined) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Bug: this accepts pcaAccountId and paymaster from the gateway response even when the caller did not request them. A buggy or compromised gateway can silently switch the publish onto a conviction-account or paymaster path the publisher never opted into. Only apply these optional fields when they were explicitly requested in options.publishGateway, otherwise reject unsolicited values.

if (requiredSignatures == 0 || requiredSignatures > hostCount) {
revert KnowledgeAssetsLib.InvalidContextGraphConfig("invalid M/N threshold");
}
_contextGraphs[contextGraphId].requiredSignatures = requiredSignatures;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Bug: removing the quorum validation here makes the contract accept requiredSignatures values that the shipped clients still treat as invalid. The agent verify path throws on < 1 and the daemon/API still rejects zero or inconsistent thresholds, so a CG updated here can become unverifiable/unmanageable through this repo’s own tooling. Either keep the storage invariant, or update the off-chain consumers in the same PR.

@branarakic branarakic force-pushed the codex/fix-edge-cg-dynamic-core-quorum branch from bfa8214 to 51478d4 Compare May 5, 2026 16:30
}
}

if (paymaster) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Bug: A configured paymaster is still usable when allowedPeers is empty, so any connected peer can turn this core into an open sponsor endpoint. The docstring above already calls that out as unsafe. Fail closed here (or in the constructor) whenever a paymaster is configured without a non-empty allowlist.

Comment thread packages/cli/src/cli.ts
pcaAccount?: string;
paymaster?: string;
}): PublisherGatewayConfig {
return {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Issue: This accepts both --pca-account and --paymaster, but the new publish path rejects that combination later in DKGPublisher/EVMChainAdapter. Right now the CLI can persist an impossible gateway config and the user only finds out at publish time after the network round-trip. Reject the combination when building/saving the config.

Comment thread packages/agent/src/dkg-agent.ts Outdated
nodeRole: this.config.nodeRole ?? 'edge',
nodeRole: role,
nodeIdentityId: nodeIdentityId > 0n ? nodeIdentityId : undefined,
publishGateway: role === 'core' && nodeIdentityId > 0n
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Issue: This advertises publish-gateway support for every core with a non-zero identity, even though the handler is only registered after attemptStorageACKRegistration() succeeds and it is explicitly unregistered when that signer drops off-chain. Peers can discover a gateway from the profile and then hit a missing protocol/timeout. Tie the profile flag to the actual handler-registration state instead of role && nodeIdentityId > 0.

@branarakic branarakic force-pushed the codex/fix-edge-cg-dynamic-core-quorum branch from 51478d4 to 7bd3a1b Compare May 5, 2026 16:51
Comment thread packages/agent/src/dkg-agent.ts Outdated
);
},
}, this.eventBus);
const isGatewayActiveCore = async () => {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Bug: This PR makes on-chain signature validity depend on active sharding-table membership, but the StorageACKHandler configured just above still only re-checks operational-wallet registration before signing. Once a core is evicted from ShardingTableStorage, it will keep emitting ACKs until restart, and publishers can collect a quorum that now deterministically reverts on-chain with ACK signer is not an active core node. Pass the same active-core probe into StorageACKHandler (and unregister/refuse on false) instead of applying it only to the publish-gateway path.

if (hostingNodes.length == 0) {
revert KnowledgeAssetsLib.InvalidContextGraphConfig("empty hosting nodes");
}
if (hostingNodes.length > MAX_HOSTING_NODES) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Bug: This relaxes the on-chain invariants for edge-owned context graphs, but the supported create entrypoints still enforce the old ones. /api/context-graph/create currently rejects requiredSignatures > participantIdentityIds.length, so a CLI/daemon client still cannot create the empty-host / legacy-quorum shape this PR is enabling here. Either update the request validation in the same PR or keep the old invariant until the off-chain create flow accepts it.

if (requiredSignatures == 0 || requiredSignatures > hostCount) {
revert KnowledgeAssetsLib.InvalidContextGraphConfig("invalid M/N threshold");
}
_contextGraphs[contextGraphId].requiredSignatures = requiredSignatures;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Bug: Allowing requiredSignatures = 0 here creates a context-graph state that current clients cannot consume. The agent-side verify flow still treats getContextGraphConfig().requiredSignatures < 1 as invalid and throws, so any graph updated to zero quorum becomes unverifiable unless callers manually override the value. Keep zero invalid here, or normalize the legacy 0 case across the agent/CLI before merging.

@branarakic branarakic force-pushed the codex/fix-edge-cg-dynamic-core-quorum branch from 7bd3a1b to d369470 Compare May 6, 2026 00:25
Reject unsolicited PCA/paymaster from publish-gateway responses unless the
caller requested them; strip the same fields in the agent gateway provider
as defense in depth. Add `dkg publisher local-gateway set|clear|status`
for publisher.localGateway (handler advertisement).

Co-authored-by: Cursor <cursoragent@cursor.com>
@branarakic branarakic force-pushed the codex/fix-edge-cg-dynamic-core-quorum branch from d369470 to fd686d8 Compare May 6, 2026 00:29
'chain adapter cannot read conviction accounts',
);
}
const account = await this.config.getConvictionAccountInfo(pcaAccountId);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Bug: getConvictionAccountInfo() is now a generic lookup that falls back to legacy PublishingConvictionAccount, but this gateway path later drives KnowledgeAssetsV10.publish() and the adapter only accepts NFT-backed agentToAccountId accounts. That means a core can confirm/advertise a legacy PCA id that will always fail at broadcast time. Please validate against the NFT-specific publish-conviction mapping here (or plumb a dedicated chain check) instead of the generic account lookup.

contextGraphId,
merkleRoot,
);
const signature = ethers.Signature.from(
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Bug: this signs with signerWallet after only checking isActiveCore. If the operational key is rotated off nodeIdentityId while the identity stays active, /publish-gateway will keep returning signatures that the edge rejects in createPublishGatewayProvider() as an unregistered wallet. Add the same live signer-registration check/callback that StorageACKHandler uses and stop serving the protocol once it fails.

// been torn down because the ACK signer is no longer confirmed
// on-chain — peers would discover a gateway and then hit a
// missing-protocol / timeout.
publishGateway: role === 'core' && nodeIdentityId > 0n && this.publishGatewayHandlerActive
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Issue: tying profile advertisement to publishGatewayHandlerActive only affects the next publishProfile() call, but the daemon publishes the profile once at startup. If the handler comes up later via the retry loop, or is torn down after signer eviction, peers will keep seeing stale discovery data. Republish the profile when this flag flips so the advertised gateway state matches runtime reality.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant