fix(mysql): handle binlog incident event (resync required)#4435
Conversation
Code reviewNo issues found. Checked for bugs and CLAUDE.md compliance. |
There was a problem hiding this comment.
Pull request overview
This PR adds explicit handling for MySQL binlog INCIDENT_EVENT (e.g. LOST_EVENTS) during CDC streaming so the connector fails fast with a resync-required error when the binlog stream has an unrecoverable gap, and ensures alerting classifies this condition as a binlog invalidation.
Changes:
- Detect
INCIDENT_EVENTin the MySQL CDC pull loop and return a newMySQLBinlogIncidentError(resync required). - Add
parseIncidentEventto extract incident number and message from the incident event payload. - Classify
MySQLBinlogIncidentErrorasErrorNotifyBinlogInvalidwith codeBINLOG_INCIDENTin the alerting classifier.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| flow/shared/exceptions/mysql.go | Adds a dedicated error type for binlog incident events indicating resync is required. |
| flow/connectors/mysql/cdc.go | Detects incident events during CDC streaming and parses incident details from the event payload. |
| flow/alerting/classifier.go | Maps the new incident error to a binlog invalidation notification class/code. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // parseIncidentEvent extracts the incident number and human-readable message from an | ||
| // Incident_log_event body: a little-endian uint16 incident number, a 1-byte message | ||
| // length, then the message. Best-effort: returns what it can if the body is truncated. | ||
| func parseIncidentEvent(data []byte) (uint16, string) { | ||
| if len(data) < 2 { | ||
| return 0, "" | ||
| } | ||
| incident := binary.LittleEndian.Uint16(data[:2]) | ||
| if len(data) < 3 { | ||
| return incident, "" | ||
| } | ||
| end := min(3+int(data[2]), len(data)) | ||
| return incident, string(data[3:end]) | ||
| } |
| } | ||
| case *replication.GenericEvent: | ||
| // INCIDENT_EVENT (LOST_EVENTS) - fail and require resync | ||
| if event.Header.EventType == replication.INCIDENT_EVENT { |
There was a problem hiding this comment.
Was this tested manually? MySQL can trigger this event in a debug build. Sadly we can't do this in CI as binlog is global but with testcontainers, maybe one day
There was a problem hiding this comment.
oook, tried testing it locally, couldn't reproduce it under various conditions
most critical binlog issues are already being catched by:
case 1236: // ER_MASTER_FATAL_ERROR_READING_BINLOG
There was a problem hiding this comment.
Feel free to merge, then I'll merge from main
Summary
Handle MySQL binlog
INCIDENT_EVENT(e.g.LOST_EVENTS) in the CDC pull loop. When the source emits an incident event, binlog events were lost from the stream and our CDC position can no longer be trusted — so we fail with a resync-required error rather than silently skipping the gap.INCIDENT_EVENT(decoded by go-mysql asGenericEvent, gated on the header event type) inPullRecordsand return a newMySQLBinlogIncidentError.parseIncidentEventto extract the incident number and message from the event body.MySQLBinlogIncidentErrorasErrorNotifyBinlogInvalid/BINLOG_INCIDENTin the alerting classifier.🤖 Generated with Claude Code