Skip to content

Made autobahn producer use TxMempool#3224

Merged
pompon0 merged 53 commits intomainfrom
gprusak-mempool4
Apr 14, 2026
Merged

Made autobahn producer use TxMempool#3224
pompon0 merged 53 commits intomainfrom
gprusak-mempool4

Conversation

@pompon0
Copy link
Copy Markdown
Contributor

@pompon0 pompon0 commented Apr 10, 2026

Also

  • disabled bunch of reactors unused in autobahn mode.
  • fixed race conditions in mempool processPeerUpdates

Base automatically changed from gprusak-mempool3 to main April 10, 2026 18:29
// overflow, gas limit enforcement no longer works correctly. This preserves the
// historical behavior for backward compatibility.
func (txmp *TxMempool) ReapMaxTxsBytesMaxGas(maxTxs int, maxBytes, maxGasWanted, maxGasEstimated int64) (types.Txs, int64) {
if maxTxs < 0 {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's okay for maxTxs to be equal to 0?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose so

int(s.cfg.maxTxsPerBlock()), // nolint:gosec // config values fit into int on supported platforms.
utils.Max[int64](),
int64(s.cfg.MaxGasPerBlock), // nolint:gosec // config values stay within int64 range.
int64(s.cfg.MaxGasPerBlock), // nolint:gosec // config values stay within int64 range.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we won't have different maxGasWanted and maxGasEstimated in the future?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know, probably we will


txs, totalGas := s.txMempool.ReapMaxTxsBytesMaxGas(
int(s.cfg.maxTxsPerBlock()), // nolint:gosec // config values fit into int on supported platforms.
utils.Max[int64](),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably should have maxBytes as well?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

eventually yes

int64(s.cfg.MaxGasPerBlock), // nolint:gosec // config values stay within int64 range.
int64(s.cfg.MaxGasPerBlock), // nolint:gosec // config values stay within int64 range.
)
s.txMempool.RemoveTxs(txs)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just an observation: I think it's okay here to acquire mutex twice, since if the tx disappeared in between RemoveTxs still work.
But maybe some day we should re-think the mutex use in mempool, it looks correct and efficient to me in the current context, but if we add more and more features then it might be harder to reason.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed, Mempool handling is already very fragile imo

@pompon0 pompon0 requested a review from wen-coding April 13, 2026 12:57
panic("unreachable")
}

func (a *testApp) CheckTx(context.Context, *abci.RequestCheckTxV2) (*abci.ResponseCheckTxV2, error) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a test where a block failed to execute and commit somewhere?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only happy path tests for now until we get this to work somehow.

return scope.Run(ctx, func(ctx context.Context, s scope.Scope) error {
// Spawn outbound connections dialing.
for _, addr := range r.cfg.ValidatorAddrs {
s.Spawn(func() error {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying this PR on #3234. There was one error and all Giga Router stopped, no retries.

AI explains:
Spawn uses SpawnBg which calls s.Cancel(err) if the task returns an error. This cancels the entire scope — all other goroutines get cancelled.

So if runExecute (spawned with SpawnNamed which uses Spawn) returns an error early, the whole scope is cancelled including the dial retry loops.

Does that make sense?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

execution failure is a valid termination reason, because it means that we are not able to execute a finalized block.

@wen-coding
Copy link
Copy Markdown
Contributor

Testing PR #3224 with autobahn integration tests

Tested this PR using the autobahn docker cluster setup (PR #3220 + integration tests from #3234). Found the following issues:

1. Panic in executeBlock — staking params not initialized

panic: UnmarshalJSON cannot decode empty bytes

goroutine 329 [running]:
github.com/sei-protocol/sei-chain/sei-cosmos/x/params/types.Subspace.Get(...)
    sei-cosmos/x/params/types/subspace.go:109
github.com/sei-protocol/sei-chain/sei-cosmos/x/staking/keeper.Keeper.MaxValidators(...)
    sei-cosmos/x/staking/keeper/params.go:18
github.com/sei-protocol/sei-chain/sei-cosmos/x/staking/keeper.Keeper.GetBondedValidatorsByPower(...)
    sei-cosmos/x/staking/keeper/validator.go:221
github.com/sei-protocol/sei-chain/app.(*App).GetValidators(...)
    app/app.go:2192
github.com/sei-protocol/sei-chain/sei-tendermint/internal/p2p.(*GigaRouter).executeBlock(...)
    sei-tendermint/internal/p2p/giga_router.go:98

executeBlock calls app.GetValidators() which reads MaxValidators from the staking params store, but the params haven't been populated yet at the time the first block is being finalized after InitChain. This causes a panic on the very first block.

2. RPC /status nil pointer panic

panic: runtime error: invalid memory address or nil pointer dereference

types.(*ValidatorSet).Copy(...)
    types/validator_set.go:261
consensus.(*State).GetValidators(...)
    internal/consensus/state.go:310
rpc/core.(*Environment).validatorAtHeight(...)
    internal/rpc/core/status.go:120

When autobahn is enabled, the CometBFT consensus reactor is skipped, so ValidatorSet is nil. The RPC /status endpoint calls GetValidators() which tries to Copy() a nil set.

Reproduction

# On the wen/test-3224-with-autobahn branch (PR #3224 + #3220 cherry-picked)
AUTOBAHN=true make docker-cluster-start
# Check logs:
grep "panic" build/generated/logs/seid-0.log
# Try RPC:
curl http://localhost:26657/status

Note: The autobahn config also needs allow_empty_blocks: true for the cluster to produce blocks without submitted transactions. This is being fixed on our side.

@wen-coding
Copy link
Copy Markdown
Contributor

Follow-up on Bug #1: Missing Commit after InitChain

Dug deeper into the root cause. The panic in executeBlock calling app.GetValidators() is because runExecute calls InitChain but never calls Commit after it:

// giga_router.go runExecute()
if last == 0 {
    if _, err := app.InitChain(ctx, r.cfg.GenDoc.ToRequestInitChain()); err != nil {
        return fmt.Errorf("App.InitChain(): %w", err)
    }
    // No Commit() here!
}

for n := next; ; n += 1 {
    b, err := r.data.GlobalBlock(ctx, n)
    // ...
    if vals := r.cfg.App.GetValidators(); len(vals) > 0 {  // PANIC

InitChain populates the staking params in the deliverState, but GetValidators() (in app.go:2192) creates a new context via NewUncachedContext which reads from the committed store — which is still empty because Commit was never called.

In CometBFT's normal flow, the handshaker (replay.go) calls Commit after InitChain before any blocks are finalized. The fix is likely to add app.Commit() after InitChain in runExecute, similar to what CometBFT does.

@wen-coding
Copy link
Copy Markdown
Contributor

Follow-up: Height mismatch after InitChain + Commit fix

Tested the latest 5ed9b0686 (with the Commit after InitChain fix). New panic:

panic: invalid height: 1; expected: 2

at app/abci.go:33 in BeginBlock.

The issue is in runExecute:

if last == 0 {
    if _, err := app.InitChain(ctx, ...); err != nil { ... }
    if _, err := app.Commit(ctx); err != nil { ... }
    next, ok = utils.SafeCast[atypes.GlobalBlockNumber](r.cfg.GenDoc.InitialHeight)  // next = 1
}

for n := next; ; n += 1 {
    // executeBlock at height 1, but app already committed height 1 via InitChain+Commit
    // app expects height 2

InitChain + Commit advances the app to height 1. Then next is set to GenDoc.InitialHeight (1), so executeBlock tries to finalize height 1 again. The app rejects it because it expects height 2.

Fix: after InitChain + Commit, set next = InitialHeight + 1 (or query app.Info() again to get the actual height).

return fmt.Errorf("app.InitChain(): %w", err)
}
if _, err := app.Commit(ctx); err != nil {
return fmt.Errorf("app.Commit(): %w", err)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried again and another panic:
panic: invalid height: 1; expected: 2

BeginBlock at app/abci.go:33 expects height 2 but got height 1. This is because InitChain + Commit already advanced the app state to height 1, so when executeBlock tries to finalize block 1, the app expects block 2.

@pompon0 pompon0 enabled auto-merge April 14, 2026 16:37
@pompon0 pompon0 added this pull request to the merge queue Apr 14, 2026
Merged via the queue into main with commit 9795d18 Apr 14, 2026
71 of 74 checks passed
@pompon0 pompon0 deleted the gprusak-mempool4 branch April 14, 2026 17:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants