Skip to content

Fix transactional topic reader committing stale offsets after reconnect#834

Merged
vgvoleg merged 1 commit into
mainfrom
fix-tx-reader-stale-offset-commit
Jun 8, 2026
Merged

Fix transactional topic reader committing stale offsets after reconnect#834
vgvoleg merged 1 commit into
mainfrom
fix-tx-reader-stale-offset-commit

Conversation

@vgvoleg

@vgvoleg vgvoleg commented Jun 8, 2026

Copy link
Copy Markdown
Member

Problem

The non-transactional ReaderStream.commit() guards against committing a batch whose
partition session belongs to an already-reconnected stream (raises
PublicTopicReaderPartitionExpiredError). The transactional path
(receive_batch_with_tx_commit_batches_with_txUpdateOffsetsInTransaction)
had no such guard.

So if the reader reconnects between receive_batch_with_tx() and tx.commit(), the
transaction commits offsets of a dead partition session. On the server this surfaces as
a Gap (issue_code 2011), and the SDK does not raise it as a tx failure — the offset
silently never advances and the consumer looks healthy while making no progress.

Fix

Mirror the non-tx guard in _commit_batches_with_tx: if any batch's partition session
is no longer the current stream's, set a retriable external error on the tx and skip the
server call (leaving the current stream untouched). The session pool then retries and
re-reads from the committed offset.

Test

test_tx_commit_after_reconnect_does_not_commit_stale_offsets: reads a batch in a tx,
forces a reconnect, and asserts the commit fails loudly, no UpdateOffsetsInTransaction
is sent, and the consumer offset does not advance.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a transactional topic reader edge case where offsets could be committed for a partition session that became stale after a reader reconnect, leading to server-side “Gap” errors and a consumer that appears healthy while making no progress.

Changes:

  • Added a transactional commit guard to detect and reject committing batches whose partition session no longer belongs to the current reader stream.
  • Added an asyncio test that forces a reconnect between receive_batch_with_tx() and tx.commit() and asserts the commit fails and no offset-update RPC is sent.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
ydb/_topic_reader/topic_reader_asyncio.py Adds stale-partition-session detection for transactional offset commits and fails the tx early when detected.
tests/topics/test_topic_transactions.py Adds a regression test covering tx commit after reconnect to ensure stale offsets aren’t committed.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread ydb/_topic_reader/topic_reader_asyncio.py Outdated
If the reader reconnects between receive_batch_with_tx() and commit, the batch's partition session is dead; mirror the non-tx commit guard and fail the tx (retriable) instead of sending a gapped UpdateOffsetsInTransaction.
@vgvoleg vgvoleg force-pushed the fix-tx-reader-stale-offset-commit branch from b02923b to fe00ecb Compare June 8, 2026 09:07
@vgvoleg vgvoleg requested a review from Copilot June 8, 2026 09:07

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated no new comments.

@vgvoleg vgvoleg merged commit a2f391f into main Jun 8, 2026
35 checks passed
@vgvoleg vgvoleg deleted the fix-tx-reader-stale-offset-commit branch June 8, 2026 09:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants