continuum transfunctioner: TSS peer mesh with keygen, signing, resharing#796
Open
marcopeereboom wants to merge 124 commits intomainfrom
Open
continuum transfunctioner: TSS peer mesh with keygen, signing, resharing#796marcopeereboom wants to merge 124 commits intomainfrom
marcopeereboom wants to merge 124 commits intomainfrom
Conversation
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
0584452 to
6c21c60
Compare
05ef18a to
1bbe4f5
Compare
bf68add to
18a0a7f
Compare
5e1957e to
5104340
Compare
7827377 to
0a72392
Compare
8389d88 to
886414b
Compare
marcopeereboom
added a commit
that referenced
this pull request
Mar 17, 2026
Remove local filesystem replace directive — CI has no access to /home/marco/Documents/src/x/tss-lib. Resolve to the pushed commit on origin/max/tss_changes (30339d0b0ce1). Bump go directive from 1.25.0 to 1.26.0 to match main (577d577). CI runs GOTOOLCHAIN=local with go 1.25.4 which refuses modules requiring >= 1.26. Remove stale nolint:prealloc directive — golangci-lint v2 dropped the prealloc linter. Add missing trailing newline to preparams.json fixture files. Add CHANGELOG entry for #796.
886414b to
ad56a51
Compare
Bump hemilabs/x/tss-lib/v2 to Max's security fork (112 audit fixes, SSID domain separation, ReceiverID binding, deterministic protobuf, secret zeroing). Wire SetCeremonyID with the 32-byte CeremonyID and SetSSIDNonce with 0 (attempt counter) in Keygen, Sign, and both Reshare party constructors. CeremonyID gives per-instance uniqueness; the nonce field is Max's retry attempt counter. Add threshold validation in tss.go (Keygen, Sign) and in the RPC integration test helpers before calling NewParameters. The fork panics on invalid threshold/partyCount; we validate early and return a clean error. NOTE: go.mod has a temporary replace directive pointing at the local x repo. Remove after pushing the x commit and running go get with the real hash.
Rewrite Keygen() and Sign() to use the pure round functions from tss-lib instead of the channel-based NewLocalParty + goroutine pump pattern. Each ceremony gets a single buffered inCh for inbound messages. HandleMessage delivers parsed messages to inCh; the ceremony driver (Keygen/Sign) reads with select on ctx.Done(). No pump goroutine, no outCh/errCh/endCh. Add msgBuf to handle message reordering: faster peers may send round N+1 messages before the local node finishes round N. Messages that don't match the current round's accept filter are buffered and drained on the next round. Delete pumpMessages (dead code — keygen/sign no longer use it). Reshare still uses the channel-based path (pending conversion). tss_round.go: msgBuf, sendRound helpers (157 lines). tss.go: +459/-125 lines (net +334).
Convert tssImpl.Reshare from channel-based tss-lib LocalParty instances to explicit round-function calls (ReshareRound1-5), completing the pattern established by keygen/sign in 45d1762. Production code: - ceremony struct: remove party, outCh, errCh, oldParty, oldKeyToID, newKeyToID; ceremony lifecycle uses ctx/cancel derived from caller context (no termination channels) - Reshare(): 5-round driver with msgBuf.collect gated on committee membership (old-only nodes skip new->new message collection) - HandleMessage(ctx, ...): ctx threaded through interface and all callers; channel sends select on ctx.Done() + c.ctx.Done() - sendReshareRound(): new helper encodes committee flags from MessageRouting and routes to both committee PID sets - Delete handleReshareMessage() and pumpReshareMessages() - FillBytes for pubkey encoding (X/Y padded to 32 bytes) Server fixes: - handle(): goroutine watches sessionCtx.Done() and closes transport to unblock ReadEnvelope on shutdown - deleteSession/deleteAllSessions: demote close errors to Debug (double-close during shutdown is expected) - connectRandom: dial gap-many shuffled candidates per maintain cycle instead of one random pick (fixes 100-node convergence) Tests: - Delete tss_transport_test.go (channel-based, redundant with RPC) - Delete rpc_integration_test.go; port 3 unique error-path tests and 2 fuzz tests to rpc_tss_test.go - Rewrite rpc_tss_test.go: test nodes use production tssImpl via rpcTransportAdapter over encrypted TCP; all 11 tests preserved - All context.Background() in test code replaced with t.Context() - All ceremony struct literals in tests carry ctx/cancel - TestHundredNodeMesh: set InitialPingTimeout=30s, increase convergence timeouts to 60s (prevents chain link kills under CPU contention) - .golangci.yaml: replace-local: true for tss-lib fork
Update all imports from tss-lib/v2 to tss-lib/v3. The v3 module deletes the channel-based Party/Round/BaseUpdate API and retains only the pure round function API that continuum already uses. tss_examples: move old v2 channel-based examples to testdata/v2_channel_reference/ as documentation (does not compile against v3). Add v3_reference_test.go demonstrating keygen+sign using the round function API.
Remove local filesystem replace directive — CI has no access to /home/marco/Documents/src/x/tss-lib. Resolve to the pushed commit on origin/max/tss_changes (30339d0b0ce1). Bump go directive from 1.25.0 to 1.26.0 to match main (577d577). CI runs GOTOOLCHAIN=local with go 1.25.4 which refuses modules requiring >= 1.26. Remove stale nolint:prealloc directive — golangci-lint v2 dropped the prealloc linter. Add missing trailing newline to preparams.json fixture files. Add CHANGELOG entry for #796.
Wire format byte 0 (message type) and byte 1 (committee flags) were sharing the wireFlag prefix and colliding at 0x01. Split into two namespaces: msgTypeP2P/msgTypeBroadcast for byte 0, cflagToOld/cflagToNew/cflagFromNew for byte 1. Add maxWireDataLen (16 MiB) bounds check before the allocation in sendReshareRound (CodeQL integer-overflow finding). Name remaining bare literals: dialTimeout, promPollInterval in continuum.go; secp256k1KeySize, handshakeTimeout in protocol.go. Update all production code and test files.
runtime.Caller(0) does not resolve in CI test binaries, causing loadTestPreParams and loadPreParams to silently fall back to live Paillier generation (~30s per node, exceeds test timeout). Embed tss_examples/preparams.json via go:embed into preparams_test.go. Both tss_test.go and rpc_tss_test.go now call testPreParams() which fails hard on missing or corrupt fixture data.
Pick up SA1019 suppression, legacy build tags, coverage tests, and golangci-lint v2.11.3 sync from the x repo.
The race detector adds ~10x overhead to goroutine scheduling. With 100 nodes on a CI runner, maintain cycles fire before handshakes complete, causing duplicate-identity rejections and convergence timeout. The test validates gossip scaling, not concurrency correctness — the smaller mesh tests already cover race safety.
Pick up KAT hash tests, commitment binding tests, lint fixes, SA1019 suppression, and legacy build tags from the x repo.
The tss_examples sub-package existed to hold v2/v3 reference implementations and pre-computed Paillier fixtures. The v2 channel reference is dead code (v3 replaced it entirely) and the v3 reference test is redundant with the x repo's own example tests. Move preparams.json to testdata/ (used by go:embed in preparams_test.go). Delete everything else: v3_reference_test.go, v2_channel_reference/, README. -2,912 lines.
Suppress G118 false positive in registerCeremony — cancel is stored in CeremonyInfo and called on ceremony completion. Eliminate G115 int-to-uint64 conversion in election shuffle by keeping remaining as int. Annotate safe test conversions with nolint:gosec.
Replace bytes.Equal with subtle.ConstantTimeCompare at four sites where attacker-controlled input is compared against security-critical values: signature identity verification, payload hash verification, and both DNS identity checks. Leave bytes.Equal for zero-sentinel checks (ZeroChallenge, zeroKey) where the compared value is a public constant.
HashTSSMessage: add "continuum-tss-msg-v1" domain separator and 4-byte length prefix before data. Prevents cross-protocol signature replay and ambiguous field boundaries. Transport.Close: zero encryptKey, decryptKey, and nonce key on session teardown. Nil the ephemeral private key. Limits key material exposure in swap files and core dumps. Handshake challenge: add "continuum-challenge-v1" domain separator to Hash256(challenge || ETP) on both signing and verification sides. Prevents cross-protocol challenge-response replay. maintainConnections: replace math/rand/v2 Shuffle with crypto/rand Fisher-Yates. Remove math/rand/v2 import from production code.
TestVerifyRejectsWrongIdentity — exercises subtle.ConstantTimeCompare in Verify(), tests correct/wrong/bit-flipped identity paths. TestHashTSSMessageDomainSeparation — known-answer test proving the domain separator is present, verifies it differs from raw hash. TestHashTSSMessageLengthPrefix — different data lengths produce different hashes, determinism check. TestTransportCloseZerosKeys — asserts encryptKey, decryptKey, and nonce.key are zeroed after Close(), ephemeral private key is nil. TestChallengeHashDomainSeparation — proves domain-separated challenge hash differs from unseparated. TestSealBoxOpenBoxRoundTrip — e2e encryption round trip, positive path and wrong-sender-key rejection. Fix TestConnKeyExchange: move clientTransport.Close() after key assertions since Close() now zeros keys. Strip internal document references from comments.
Wire-initiated ceremony requests (KeygenRequest, SignRequest, ReshareRequest) are now only processed when built with the continuum_debug tag. Production binaries compile debug_off.go which returns nil from serverDebugInit(); debug builds compile debug_on.go which returns a debugInitiator. Previously newDebugInitiator() was called unconditionally in NewServer(), making the nil-checks in dispatch.go dead code. Any peer could trigger a ceremony over the wire. Add noopInitiator for production ceremonyLoop — blocks on nil channel until blockchain watcher is wired in. Tests wire up debug initiation explicitly in newTestServer().
Update hemilabs/x/tss-lib/v3 to 810b4757 which replaces binance-chain/edwards25519 with standard elliptic.Curve operations in eddsa/signing and adds pre-computed preparams fixtures for faster CI. binance-chain/edwards25519 removed from indirect deps.
Replace cleartext 3-byte size prefix with two-phase secretbox framing. Phase 1 is a fixed 44-byte encrypted header containing the body size. Phase 2 is the encrypted payload. Wire format (v2): [24-byte nonce_h][secretbox(4-byte body_size)] <- 44 bytes [24-byte nonce_p][secretbox(payload)] <- body_size bytes An attacker corrupting any byte of the header causes secretbox.Open to fail. The receiver never trusts an unauthenticated length. TransportVersion bumped from 1 to 2. TransportMaxSize reduced from 16 MB to 1 MB (sufficient for 100-party TSS keygen).
Replace static sender NaCl key with per-message ephemeral X25519
keypair (sealed-box pattern). Sender generates fresh keypair,
encrypts with nacl.box to the recipient's static X25519 key,
ships ephemeral public key in EncryptedPayload, destroys private
key immediately.
Sender authentication via secp256k1 compact signature over
SHA256("continuum-e2e-sig-v1" || EphemeralPub || Nonce ||
Ciphertext). Receiver verifies signature against Sender identity
before opening the box. Prevents forged payloads from anyone who
knows the recipient's gossip-advertised X25519 public key.
Provides sender-side forward secrecy: compromising a sender after
the fact cannot recover past ephemeral keys.
SealBox takes *Secret (signs envelope), OpenBox unchanged.
EncryptedPayload adds EphemeralPub and Signature fields.
decryptPayload verifies signature before decrypting.
Mix both parties' ephemeral public keys into the HKDF salt in canonical order (server first, client second). Salt: "continuum-hkdf-salt-v2" || serverPub || clientPub Public keys are fixed-length per curve, no length delimiters needed. Validated against the curve's actual key size. The caller provides them based on Transport.isServer. Zero the ECDH shared secret after key derivation. Go 1.26 runtime.ZeroMemory will provide a proper guarantee; for now we zero the slice contents but cannot prevent GC copies. Eliminates the static salt shared across all sessions. Ephemeral ECDH already guarantees unique shared secrets but session-specific salt prevents theoretical cross-session key derivation collision.
Add use-after-close guard: all store operations return ErrStoreClosed after Close(). Previously Close() zeroed the key and subsequent encrypts silently produced unrecoverable ciphertext with no error. Add keyID binding: encrypt prepends a length-prefixed keyID to plaintext before sealing. decrypt verifies the bound keyID matches the expected keyID. Prevents file-swap attacks where an attacker with filesystem access renames key files. Add atomic writes: writeAtomic uses temp file + fsync + rename. A crash at any point leaves either the old or new file, never a partial write. Add ErrEmptyKeyID validation on all Save/Load/Delete paths. Zero the encryption key copy after use in encrypt/decrypt. Copy encKey under mutex to avoid holding the lock during secretbox operations.
TSS transport falls back to SendEncrypted when no direct session exists, enabling ceremony completion across sparse meshes where committee members lack direct TCP sessions. Link-state routing via gossip topology: PeerRecord carries session adjacency, generation-gated BFS routing table rebuilt lazily on topology changes. SendTo and forward use route table with flood fallback when stale. Admin listener on dedicated port bypasses PeersWanted capacity limits. No gossip, no ping lifecycle — ceremony injection only. handle() takes isAdmin flag; admin sessions skip gossip exchange and rate limiting. Transport DDoS mitigations: per-session rate limiter drops messages exceeding messageRate, read deadlines on all I/O, reconnection cooldown for rejected peers. notifyAllPeers no longer closes transports on write failure; dead sessions are reaped by pingExpired instead. PrivateKeyHex neutered in release builds; test code uses DebugPrivateKeyHex (build-tagged continuum_debug).
Spins up 10 daemons with PeersWanted=3 (sparse mesh) in chain topology, runs keygen, sign, reshare, post-reshare sign, and second sign. Forces multi-hop encrypted envelope delivery for TSS messages between non-adjacent committee members. Build-tagged continuum_debug; uses admin listener for ceremony injection.
Transport.encrypt() reads encryptKey and nonce.key without holding t.mtx. Concurrent Close() zeroes those fields under the lock, causing a data race detected by -race in TestRPCTSSKeygenCorruptPostSign. Move lock acquisition in write() above the encrypt() call so the entire encrypt+write sequence is synchronized with Close(). Reorder Close() to close the conn before zeroing key material so that in-flight readers blocked in readExact unblock with an I/O error before decrypt/decryptFrameHeader can touch zeroed keys.
Go function signatures must be on a single line. Godoc requires it. Wrapped parameters are not idiomatic Go. Flatten 9 functions across tss.go, tss_round.go, rpc_tss_test.go, and continuum_e2e_test.go.
de15886 to
1e135fa
Compare
Unexport SendTo to sendTo — all TSS traffic uses SendEncrypted, sendTo is internal delivery for already-encrypted envelopes. Consolidate scattered const declarations into the main const block. Use reflect.TypeFor instead of reflect.TypeOf((*X)(nil)) in dispatch table and registration. Use early-continue in forward and forwardBroadcast instead of if/else on error. Unwrap three if statements in handle() for readability. Move spew.Sdump calls to Tracef to avoid evaluation when trace logging is disabled. Short-circuit isHostname behind DNS config check. Invert preparams file logic for readability: try open first, fall through to create on ErrNotExist. Use json.NewEncoder instead of MarshalIndent to avoid buffering. Fix SetIndent prefix. Simplify TTL cache initialization — direct field assignment. Simplify initPaillierPrimes call. Unwrap if in hemictl continuumStatus. Remove resolved XXX in continuum_ceremony.go.
Convert all four e2e polling loops to ticker + t.Context() checks. Fix e2e preparams path to use testdata/preparams.json. Use reflect.TypeFor in dispatch test. Merge TestDispatchMapCompleteness and signature test into single test function. Use json.NewEncoder in continuum_test.go preparams helper.
A TSSMessage must never carry a routing header (Destination != nil). Legitimate cleartext TSS is one-hop only (Destination == nil, sent via Write between direct peers). Multi-hop TSS must be wrapped in an EncryptedPayload. A routed cleartext TSSMessage means the sending peer is either buggy or actively leaking TSS round data to the mesh. Both intermediaries and destinations now reject it: the check runs in the handle() loop before forwarding or dispatch, and the offending peer is disconnected immediately (handle returns, triggering session cleanup).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Implements the continuum TSS service end-to-end: a peer mesh network that runs threshold ECDSA/EdDSA ceremonies (keygen, signing, resharing) over encrypted RPC transport.
Architecture
Peer mesh — TCP transport with X25519 ECDH key agreement and NaCl secretbox encryption. Peers discover each other via DNS seeding with forward verification, maintain connections through gossip and liveness pings, and track idle/stale peers via TTL-based eviction. The mesh targets PeersWanted total connections (inbound + outbound) and fills gaps each maintenance cycle by dialing shuffled candidates.
Ceremony lifecycle — Coordinator election picks the peer with the lowest key hash. The elected coordinator dispatches ceremonies (keygen/sign/reshare) to participants, who execute TSS rounds and exchange messages over the encrypted mesh. Ceremony state is context-scoped with proper cancellation propagation. Results are persisted to a NaCl-encrypted key store (HKDF-derived storage key).
TSS integration — Uses hemilabs/x tss-lib v3 channel-free round functions. Each ceremony is a loop over explicit round calls with message collection gated on committee membership. Resharing supports overlapping old/new committees. Wire format uses package-prefixed type discriminators for 32 message types (21 ECDSA + 11 EdDSA).
What's included
Core service (service/continuum/):
continuum.go — server lifecycle, peer tracking, gossip, maintenance, session management
protocol.go — RPC envelope format, handshake, message routing with hash verification
tss.go — TSS ceremony abstraction (tssImpl), Paillier precompute, key store with NaCl encryption
tss_round.go — round-function ceremony drivers for keygen/sign/reshare
tss_rpc.go — ceremony RPC message types and handlers
tss_wire.go — JSON wire format: marshal/unmarshal with type discriminators
ceremony.go — ceremony struct, context/cancel lifecycle
dispatch.go — type-switch dispatch map replacing monolithic handle()
election.go — coordinator election by lowest key hash
doc.go — package godoc with broadcast scaling analysis
Admin tooling:
cmd/hemictl/continuum.go — hemictl continuum subcommand: status, peers, key info
cmd/hemictl/continuum_ceremony.go — keygen, sign, reshare ceremony commands (gated behind continuum_debug build tag)
cmd/transfunctionerd/ — daemon entry point updates
docker/transfunctionerd/Dockerfile
Infrastructure:
Prometheus:
Metrics for ceremony counts, peer gauge, broadcast latency
Testing:
Integration tests (continuum_test.go, rpc_test.go, rpc_tss_test.go) — 5-node keygen with broadcast verification, full keygen→sign→reshare lifecycle, transport write/DNS/outbound verify paths, ceremony dispatch error paths, election fuzzing
Unit tests — dispatch map, wire format (38 round-trip + exhaustive type tests), TTL error paths, hemictl ceremony commands
Reference tests (tss_examples/v3_reference_test.go) — v3 round function API usage examples with pre-computed Paillier params
All test nodes use production tssImpl via rpcTransportAdapter over encrypted TCP
Zero time.Sleep in tests — all synchronization via context waits
Key fixes along the way
Unlock-before-cancel to prevent deadlock during broadcast I/O
handleCeremonyResult must not race SaveKeyShare
Sentinel errors and status constants for ceremony lifecycle
Transport payload hash verification (replay/tampering protection)
Session busy response instead of silent drop
Handshake semaphore to bound concurrent connection setup
Forward DNS verification as a policy gate (configurable, loopback exempt)