Finding
Confluent issue confluentinc/parallel-consumer#803 ("Transactional Producer timeout getting commit lock") shares the exact same root cause as our confluentinc#857 fix.
Root cause: synchronized(commitCommand) in AbstractParallelEoSStreamProcessor.java creates a deadlock between the control thread (mid-commitSync()) and the poll thread (onPartitionsRevoked). The poll thread blocks on the lock; the control thread needs the poll thread responsive for Kafka's rebalance protocol. Deadlock manifests as timeout in ConsumerOffsetCommitter.CommitRequest.
Fix: PR #29 (bugs/857-paused-consumption-multi-consumers-bug), commit 2d434c0a0 — replaces synchronized(commitCommand) with ReentrantLock.tryLock() in onPartitionsRevoked. Non-blocking: if the control thread holds the lock, skip the commit. Kafka re-delivers uncommitted records to the new assignee.
Related Confluent issues also likely resolved: confluentinc#809 (sporadic commit timeouts), confluentinc#833 (InternalRuntimeException timeout).
Status
No new code needed. Resolved when PR #29 lands on master.
Finding
Confluent issue confluentinc/parallel-consumer#803 ("Transactional Producer timeout getting commit lock") shares the exact same root cause as our confluentinc#857 fix.
Root cause:
synchronized(commitCommand)inAbstractParallelEoSStreamProcessor.javacreates a deadlock between the control thread (mid-commitSync()) and the poll thread (onPartitionsRevoked). The poll thread blocks on the lock; the control thread needs the poll thread responsive for Kafka's rebalance protocol. Deadlock manifests as timeout inConsumerOffsetCommitter.CommitRequest.Fix: PR #29 (
bugs/857-paused-consumption-multi-consumers-bug), commit2d434c0a0— replacessynchronized(commitCommand)withReentrantLock.tryLock()inonPartitionsRevoked. Non-blocking: if the control thread holds the lock, skip the commit. Kafka re-delivers uncommitted records to the new assignee.Related Confluent issues also likely resolved: confluentinc#809 (sporadic commit timeouts), confluentinc#833 (InternalRuntimeException timeout).
Status
No new code needed. Resolved when PR #29 lands on master.