Skip to content

Conversation

@7layermagik
Copy link

@7layermagik 7layermagik commented Jan 28, 2026

Summary

This PR eliminates the runtime dependency on the snapshot manifest file for replay and replaces the memory-intensive full stake cache with streaming stake account processing from AccountsDB.

Key Architectural Changes

1. Remove Manifest Runtime Dependency

  • State file v2 now captures all manifest-derived fields at parse time
  • Replay no longer opens or reads manifest files during execution
  • All epoch boundary processing uses AccountsDB directly

2. Streaming Rewards (Zero Memory Spike)

  • Replace full stake cache (~1GB for 2.5M accounts) with streaming from AccountsDB
  • Two-pass streaming rewards calculation:
    • Pass 1: Stream stakes → compute total points
    • Pass 2: Stream stakes → recompute points + calculate rewards → write to spool
  • Per-partition spool files for O(1) reads during distribution
  • RAM stays flat across epoch boundaries

3. Stake Pubkey Index

  • Persistent index file (stake_pubkeys.idx) tracks all known stake accounts
  • New stake accounts appended incrementally during replay
  • Enables efficient streaming without full account scan

Changes by Component

State File (pkg/state/)

  • Add ManifestEpochStakes, ManifestEpochAuthorizedVoters, ManifestTransactionCount, ManifestEpochAcctsHash fields
  • Require v2 state file format (no manifest fallbacks)
  • Remove all runtime manifest file access

Global Context (pkg/global/)

  • Add StreamStakeAccounts() for parallel batched streaming from AccountsDB
  • Add stake pubkey index load/save/append helpers
  • Add TrackNewStakePubkey() for incremental index updates
  • Add voteStakeTotals aggregate map (replaces full stake cache at startup)

Epoch Processing (pkg/replay/epoch.go)

  • updateStakeHistorySysvar() → stream from AccountsDB
  • updateEpochStakesAndRefreshVoteCache() → single streaming pass builds both vote totals and effective stakes

Rewards (pkg/rewards/)

  • Add spool.go with per-partition binary spool files:
    • PartitionedSpoolWriters for buffered writes (1MB buffer)
    • PartitionReader for sequential reads per partition
    • 88-byte record format: stake_pubkey(32) + vote_pubkey(32) + stake(8) + credits(8) + reward(8)
  • Add CalculateRewardsStreaming() for two-pass streaming calculation
  • Add DistributeStakingRewardsFromSpool() for partition-based distribution
  • Channel-based single writer pattern with error capture
  • Strict error policy: any failure returns error (consensus correctness)
  • Remove ~300 lines of dead code (old partition tracking, points accumulator, etc.)

Snapshot (pkg/snapshot/)

  • Add manifest_seed.go for extracting epoch stakes from manifest at parse time

Performance Characteristics

Metric Before After
Startup stake cache ~1GB ~50MB (aggregates only)
Epoch boundary RAM spike ~1.5GB ~0 (streaming)
Spool file I/O N/A Sequential, buffered 1MB
Rewards calculation 2 passes over cache 2 streaming passes
Rewards distribution Random AccountsDB reads Sequential spool + batch AccountsDB

File Changes

 pkg/global/global_ctx.go      | +140 lines (streaming, index helpers)
 pkg/replay/block.go           | refactored (remove manifest reads)
 pkg/replay/epoch.go           | refactored (streaming stake processing)
 pkg/replay/rewards.go         | +53 lines (spool-based distribution)
 pkg/rewards/rewards.go        | +583/-541 (streaming calculation)
 pkg/rewards/spool.go          | +218 lines (new spool infrastructure)
 pkg/rewards/partitions.go     | -46 lines (deleted - dead code)
 pkg/rewards/points.go         | -38 lines (deleted - dead code)
 pkg/snapshot/manifest_seed.go | +133 lines (epoch stakes extraction)
 pkg/state/state.go            | +77 lines (manifest fields in state)

🤖 Generated with Claude Code

7layermagik and others added 12 commits January 27, 2026 23:03
Replay now reads all seed data from state file instead of manifest at
runtime. This eliminates the need for the manifest file after AccountsDB
is built.

Changes:
- Add manifest_* fields to MithrilState schema (v2)
- Create PopulateManifestSeed() to copy manifest data at build time
- Update configureInitialBlock/FromResume to use state file
- Update newReplayCtx to prefer state file over manifest
- Update buildInitialEpochStakesCache to use ManifestEpochStakes
- Clear ManifestEpochStakes after first replayed slot past snapshot
- Backwards compat: fall back to manifest for old state files

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Require state schema version 2 (no v0/v1 migration, error instead)
- Remove manifest fallback in configureInitialBlock (fatal if missing)
- Remove manifest fallback in buildInitialEpochStakesCache (fatal if missing)
- Remove manifest fallback in newReplayCtx (fatal if missing)
- Remove snapshotManifest parameter from ReplayBlocks signature
- Remove snapshotManifest parameter from configureInitialBlock signature
- Add ManifestTransactionCount and ManifestEpochAuthorizedVoters fields
- Add proper error handling for corrupted state file decoding
- Clean up unused imports (epochstakes, snapshot from replay package)

This ensures replay NEVER reads from manifest after AccountsDB build.
Old state files will error with "delete AccountsDB and rebuild from snapshot".

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Make ManifestEvictedBlockhash required in strict v2 mode (affects
  transaction age validation for first block)
- Switch manifest_seed.go from mr-tron/base58 to internal pkg/base58
  for consistency with block.go decode path

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…e timestamps

- EpochAuthorizedVoters: change from map[string]string to map[string][]string
  to support multiple authorized voters per vote account (matches original
  manifest behavior where PutEntry appends to a slice)

- VoteTimestamps: populate from ALL vote accounts in cache, not just those
  with non-zero stake (matches original manifest behavior where all vote
  accounts from Bank.Stakes.VoteAccounts had timestamps populated)

Note: This changes the state file schema for manifest_epoch_authorized_voters.
Existing state files will require rebuild from snapshot.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Stop populating global.StakeCache at startup. Instead, build an aggregate
map of vote pubkey → total stake directly from AccountsDB scan.

Changes:
- Add voteStakeTotals map + mutex + helpers in global_ctx.go
- Modify setupInitialVoteAcctsAndStakeAccts to build aggregate only:
  - Remove PutStakeCacheItemBulk calls
  - Each batch worker builds local map, merges into shared under mutex
  - Call SetVoteStakeTotals() to store aggregate for later use
- Full stake cache no longer populated at startup (memory savings)

Note: Epoch boundary functions still use StakeCache and will fail until
Step 2 adds on-demand cache rebuild. This is expected.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Replace full stake cache with streaming from AccountsDB for epoch
boundary processing and rewards calculation.

Key changes:
- Add StreamStakeAccounts() for parallel stake account streaming
- Add spool file infrastructure (SpoolWriter/SpoolReader) for binary
  serialization of stake rewards during calculation
- Modify updateStakeHistorySysvar() to stream from AccountsDB
- Modify updateEpochStakesAndRefreshVoteCache() to stream in single pass
- Add CalculateRewardsStreaming() for two-pass streaming rewards:
  - Pass 1: Calculate total points
  - Pass 2: Calculate rewards and write to spool file
- Add DistributeStakingRewardsFromSpool() for partition distribution
- Update recordStakeDelegation() to only track new pubkeys for index

Architecture:
- No full stake cache held in memory
- All stake data streamed directly from AccountsDB
- Rewards written to binary spool file during calculation
- Distribution reads from spool file partition-by-partition
- RAM stays flat across epoch boundaries

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Addresses issues identified in ChatGPT review of ce71c54:

1. Per-partition spool files (O(N) → O(1) reads per partition)
   - Write to reward_spool_<slot>_p<partition>.bin during calculation
   - Sequential read per partition during distribution (no indexing)
   - Eliminates O(N × partitions) file scanning

2. Channel-based single writer for error capture
   - All spool writes routed through one goroutine
   - Write errors captured in atomic.Value and propagated
   - Cleanup on failure

3. Silent account error tracking
   - Track failed account reads/unmarshals with atomic counters
   - Log warning with count and first error when failures occur

4. stakePointsCache elimination (already in ce71c54)
   - Two-pass streaming: Pass 1 for total points, Pass 2 recomputes + writes
   - Trades ~2x CPU for 0 extra RAM (~140MB saved)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Clean PartitionedRewardDistributionInfo: remove unused Credits,
  RewardPartitions, StakingRewards, WorkerPool fields
- Remove dead code: rewardDistributionTask, rewardDistributionWorker,
  InitWorkerPool, ReleaseWorkerPool, DistributeStakingRewardsForPartition
- DistributeStakingRewardsFromSpool now streams with reader.Next() loop
  instead of ReadAll() for flat RAM during distribution
- Use dynamic slices with append (no pre-allocated arrays with nils)
- Implement strict error policy: any account read/unmarshal/marshal
  failure returns error immediately for consensus correctness
- Add buffered I/O (1MB) to PartitionReader and partitionWriter
- Remove all legacy single-file spool types from spool.go

Net reduction: ~294 lines removed

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Replace atomic.Value with atomic.Pointer[error] to avoid potential
  issues with uninitialized atomic.Value (CompareAndSwap on zero value)
- Check workerPool.Invoke() return value and fail fast if pool rejects
  task (balance wg.Done since worker won't run)
- Proper pointer dereference (*werr, *ferr) for error formatting

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Delete pkg/rewards/partitions.go (old in-memory partition tracking)
- Delete pkg/rewards/points.go (CalculatedStakePointsAccumulator)
- Remove CalculateStakeRewardsAndPartitions, CalculateStakePoints,
  delegationAndPubkey, idxAndReward from rewards.go
- Remove unused rpc import
- Fix workerPool.Invoke error handling in StreamStakeAccounts

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
StreamStakeAccounts now waits for all already-queued batches to
complete before returning an error, preventing workers from running
against shared state after the caller has moved on.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The function and its call referenced the removed snapshot.SnapshotManifest
parameter, causing build failures.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants