Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,11 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

## [Unreleased]

## [0.3.14] - 2026-04-17

### Fixed
- Kafka consumer rebalance storm: on `ILLEGAL_GENERATION` / `UNKNOWN_MEMBER_ID` during commit, the source no longer closes and recreates the Consumer. Closing sends `LeaveGroup`, which triggers a group-wide rebalance that invalidates every other consumer's generation, causing them to recreate too — a self-sustaining cascade observed in production with 16 replicas evicting in perfect millisecond-synchrony every ~35s. librdkafka's group state machine already handles the rejoin automatically on the next `consume()` call, preserving `member.id` and keeping the rest of the group undisturbed. Commit errors are now log-and-continue for this error class.

## [0.3.13] - 2026-04-17

### Changed
Expand Down
23 changes: 11 additions & 12 deletions bizon/connectors/sources/kafka/src/source.py
Original file line number Diff line number Diff line change
Expand Up @@ -554,26 +554,25 @@ def get(self, pagination: dict = None) -> SourceIteration:
return self.read_topics_manually(pagination)

def commit(self):
"""Commit the offsets of the consumer"""
"""Commit the offsets of the consumer.

On ILLEGAL_GENERATION / UNKNOWN_MEMBER_ID we log and return without closing
or recreating the consumer. librdkafka's consumer group state machine handles
the rejoin internally on the next consume() call, preserving member.id and
avoiding the LeaveGroup -> cluster-wide rebalance cascade that closing would
trigger. Uncommitted records may be reprocessed by the new partition owner
after the rejoin -- this is Bizon's standard at-least-once contract.
"""
try:
self.consumer.commit(asynchronous=False)
except CimplKafkaException as e:
error_code = e.args[0].code() if e.args else None
if error_code in (KafkaError.ILLEGAL_GENERATION, KafkaError.UNKNOWN_MEMBER_ID):
# Consumer was evicted from the group. Close it, recreate in place, and
# let the pipeline continue — next subscribe()/assign() rejoins the group.
# The uncommitted batch may be reprocessed by the new partition owner
# (Kafka at-least-once); downstream must tolerate duplicates.
logger.warning(
f"Kafka commit rejected - consumer evicted from group (code={error_code}): {e}. "
f"Recreating consumer in place; previous iteration's records may be "
f"Kafka commit skipped - stale generation (code={error_code}): {e}. "
f"librdkafka will rejoin on next consume(); uncommitted records may be "
f"reprocessed by the new partition owner (at-least-once)."
)
try:
self.consumer.close()
except Exception as close_err:
logger.warning(f"Error closing evicted consumer: {close_err}")
self.consumer = Consumer(self.config.consumer_config)
return
logger.error(f"Kafka exception occurred during commit: {e}")
logger.info("Gracefully exiting without committing offsets due to Kafka exception")
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ build-backend = "hatchling.build"

[project]
name = "bizon"
version = "0.3.13"
version = "0.3.14"
description = "Extract and load your data reliably from API Clients with native fault-tolerant and checkpointing mechanism."
authors = [
{ name = "Antoine Balliet", email = "antoine.balliet@gmail.com" },
Expand Down
2 changes: 1 addition & 1 deletion uv.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading