Skip to content

fix(mysql): handle transaction payload event#4433

Open
dtunikov wants to merge 6 commits into
mainfrom
fix/dbi-809/mysql-handle-transaction-payload-event
Open

fix(mysql): handle transaction payload event#4433
dtunikov wants to merge 6 commits into
mainfrom
fix/dbi-809/mysql-handle-transaction-payload-event

Conversation

@dtunikov

@dtunikov dtunikov commented Jun 16, 2026

Copy link
Copy Markdown
Collaborator
  • DBI-809 - handle transaction payload event (MySQL can send it if binlog_transaction_compression=ON)
  • handle missing GTID events (MariadbGTIDEvent, GtidTaggedLogEvent)

@codecov

codecov Bot commented Jun 16, 2026

Copy link
Copy Markdown

❌ 1 Tests Failed:

Tests completed Failed Passed Skipped
2314 1 2313 209
View the top 1 failed test(s) by shortest run time
github.com/PeerDB-io/peerdb/flow/connectors/clickhouse::TestIAMRoleCanIssueSelectFromS3
Stack Traces | 0.32s run time
=== RUN   TestIAMRoleCanIssueSelectFromS3
2026/06/16 08:54:37 INFO Received AWS credentials from SDK config for connector: clickhouse x-peerdb-additional-metadata={Operation:FLOW_OPERATION_UNKNOWN}
2026/06/16 08:54:37 INFO [clickhouse] executing DDL statement x-peerdb-additional-metadata={Operation:FLOW_OPERATION_UNKNOWN} query="CREATE TABLE IF NOT EXISTS `_peerdb_raw_test_iam_role_can_issue_select_from_s3_1781600077` (\n\t\t_peerdb_uid UUID,\n\t\t_peerdb_timestamp Int64,\n\t\t_peerdb_destination_table_name String,\n\t\t_peerdb_data String,\n\t\t_peerdb_record_type Int,\n\t\t_peerdb_match_data String,\n\t\t_peerdb_batch_id Int64,\n\t\t_peerdb_unchanged_toast_columns String\n\t) ENGINE = MergeTree() ORDER BY (_peerdb_batch_id, _peerdb_destination_table_name) TTL toDateTime(fromUnixTimestamp64Nano(_peerdb_timestamp)) + INTERVAL 36500 DAY"
    s3_iam_role_test.go:72: 
        	Error Trace:	.../connectors/clickhouse/s3_iam_role_test.go:72
        	Error:      	Not equal: 
        	            	expected: 0x3
        	            	actual  : 0x0
        	Test:       	TestIAMRoleCanIssueSelectFromS3
--- FAIL: TestIAMRoleCanIssueSelectFromS3 (0.32s)

To view more test analytics, go to the Test Analytics Dashboard
📋 Got 3 mins? Take this short survey to help us improve Test Analytics.

@github-actions

Copy link
Copy Markdown
Contributor

🔄 Flaky Test Detected

Analysis: The failures are flaky: the ClickHouse S3 test TestIAMRoleCanIssueSelectFromS3 returned COUNT 0 instead of 3 in one job but passed in the sibling job on the same commit, and the second job's mass e2e failures are uniform ~61s timeouts ("stuck in snapshot somehow") — all in ClickHouse paths unrelated to this MySQL-focused PR.
Confidence: 0.88

✅ Automatically retrying the workflow

View workflow run

@dtunikov dtunikov marked this pull request as ready for review June 16, 2026 09:45
@dtunikov dtunikov requested a review from a team as a code owner June 16, 2026 09:45
@github-actions

Copy link
Copy Markdown
Contributor

❌ Test Failure

Analysis: The flow/e2e package hit a clean 1200s package-level timeout (no assertion failures) on only the mysql-gtid shard, which is exactly the code path modified by this MySQL transaction-payload-event branch — a CDC-stall-shaped hang that points to a likely real regression rather than generic flakiness.
Confidence: 0.6

⚠️ This appears to be a real bug - manual intervention needed

View workflow run

@claude

claude Bot commented Jun 16, 2026

Copy link
Copy Markdown

Code review

No issues found. Checked for bugs and CLAUDE.md compliance.

Comment thread flow/connectors/mysql/cdc.go Outdated
@dtunikov dtunikov force-pushed the fix/dbi-809/mysql-handle-transaction-payload-event branch from cf3826e to bf92b75 Compare June 16, 2026 10:00
@dtunikov dtunikov requested a review from Copilot June 16, 2026 10:18
@github-actions

Copy link
Copy Markdown
Contributor

🔄 Flaky Test Detected

Analysis: The entire flow/e2e package hit the hard 1200s wall-clock timeout on only the mysql-gtid cell (sibling mysql-pos and maria cells passed) with no specific assertion failure, panic, or race — the classic signature of a high-concurrency e2e timeout flake, though a GTID-path hang regression can't be fully ruled out given the MySQL-GTID PR.
Confidence: 0.7

✅ Automatically retrying the workflow

View workflow run

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves MySQL/MariaDB CDC robustness by handling MySQL’s compressed transaction binlog format (TRANSACTION_PAYLOAD_EVENT), detecting binlog incident events (e.g. LOST_EVENTS) as an invalid-binlog/resync condition, and covering additional GTID event variants (MariaDB GTID + MySQL tagged GTIDs).

Changes:

  • Add CDC support for TRANSACTION_PAYLOAD_EVENT by processing inner events and correctly advancing checkpoints (GTID and file/pos).
  • Detect INCIDENT_EVENT and surface a dedicated “binlog invalid / resync required” error, including alert classification.
  • Add MySQL version gating + an e2e test that enables binlog_transaction_compression and asserts a payload event was actually emitted.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
flow/shared/exceptions/mysql.go Adds MySQLBinlogIncidentError for resync-required incident events.
flow/pkg/mysql/validation.go Adds minimum MySQL version constant for binlog transaction compression support.
flow/e2e/mysql.go Adds helper to scan binlog for TRANSACTION_PAYLOAD_EVENT to validate test coverage.
flow/e2e/clickhouse_mysql_test.go Adds e2e test exercising compressed transaction payload CDC path (MySQL-only, version-gated).
flow/connectors/mysql/cdc.go Adds decoding/processing for TransactionPayloadEvent, accumulates GTID set, and detects incident events.
flow/alerting/classifier.go Classifies MySQLBinlogIncidentError as “binlog invalid” for alerting.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread flow/e2e/mysql.go
@github-actions

Copy link
Copy Markdown
Contributor

🔄 Flaky Test Detected

Analysis: The flow/e2e package hit its 20-minute timeout because TestApiPg/TestDropMissing (11m24s) and TestApiMy/TestDropMissing (5m14s) hung waiting on a Temporal drop-flow workflow — an infra/flaky hang unrelated to this MySQL branch, and the same suite passed on the other two identical matrix shards.
Confidence: 0.8

✅ Automatically retrying the workflow

View workflow run

@github-actions

Copy link
Copy Markdown
Contributor

❌ Test Failure

Analysis: The e2e suite hung to a hard 20-minute package timeout (with MySQL-connector goroutines stuck on channel receives) exclusively in the mysql-gtid matrix variant while mysql-pos and maria passed, which correlates directly with this PR's change to MySQL GTID-mode transaction-payload event handling and indicates a real CDC deadlock/hang regression rather than flakiness.
Confidence: 0.78

⚠️ This appears to be a real bug - manual intervention needed

View workflow run

// StartSyncGTID keeps the passed set by reference (as the syncer's prevGset) and mutates it
// from the syncer goroutine as GTIDs arrive. Give it an independent clone so PullRecords can
// accumulate GTIDs into `gset` (advanceGset) and read it without racing the syncer on the same map.
stream, err := syncer.StartSyncGTID(gset.Clone())

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

otherwise we get:

fatal error: concurrent map read and map write
goroutine 11211 [running]:
internal/runtime/maps.fatal({0x5a6f5d5?, 0x5da7890cb860?})
	/usr/local/go/src/runtime/panic.go:1181 +0x20
github.com/go-mysql-org/go-mysql/mysql.(*MysqlGTIDSet).AddGTIDWithTag(0x5da78691b040, {0xdc, 0x3b, 0xb4, 0x9c, 0x5e, 0x84, 0x11, 0xf1, 0xa0, ...}, ...)

because both go-mysql binlogsyncer and pullRecords read/write the same gset object (before that we never hit this case because we always read a clone and never modified it gset = ev.GSet)

@dtunikov dtunikov force-pushed the fix/dbi-809/mysql-handle-transaction-payload-event branch from 14976f7 to 5c8b6ae Compare June 16, 2026 10:50
@dtunikov dtunikov changed the title Fix/dbi 809/mysql handle transaction payload event fix(mysql): handle transaction payload event Jun 16, 2026
@github-actions

Copy link
Copy Markdown
Contributor

❌ Test Failure

Analysis: The flow/e2e package hung to its 1200s timeout only in the mysql-gtid combo (while mysql-pos and maria passed) and reproduced on a re-run, indicating a real transaction-event-handling hang in exactly the code path this MySQL PR modifies — not a flake.
Confidence: 0.78

⚠️ This appears to be a real bug - manual intervention needed

View workflow run

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants