sink/mysql: align DDL time defaults with origin_default#12490
Conversation
Summary of ChangesHello @haiboumich, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses a critical issue where DDL statements containing CURRENT_TIMESTAMP-style default values could lead to inconsistent data downstream due to varying evaluation times or timezones. The solution involves dynamically setting the MySQL session TIMESTAMP variable to the upstream-evaluated timestamp (derived from StartTs or CommitTs and the column's origin_default) just before executing such DDLs. This ensures that newly added columns with these defaults are populated with values consistent with the source system, thereby maintaining data integrity across replication. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a mechanism to align DDL timestamp defaults with the upstream by setting the session TIMESTAMP variable. This is achieved by parsing the DDL to identify columns with CURRENT_TIMESTAMP defaults and using the origin_default value from TableInfo. The implementation correctly handles setting and resetting the session timestamp, including in error scenarios. The added tests cover the main logic well. My review focuses on improving logging and error handling to make the new logic more robust and easier to debug. Specifically, I've pointed out a few places where the wrong error is logged on rollback failures and a case where a DDL parsing error is silently ignored.
| zap.Error(err)) | ||
| if rbErr := tx.Rollback(); rbErr != nil { | ||
| log.Error("Failed to rollback", zap.String("namespace", m.id.Namespace), | ||
| zap.String("changefeed", m.id.ID), zap.Error(err)) |
There was a problem hiding this comment.
The log message "Failed to rollback" suggests that the error from the rollback operation (rbErr) should be logged. However, the original error err (from setSessionTimestamp) is being logged instead. This can be misleading during debugging. This pattern of logging the wrong error on rollback failure appears in a few other places in this file.
| zap.String("changefeed", m.id.ID), zap.Error(err)) | |
| zap.String("changefeed", m.id.ID), zap.Error(rbErr)) |
| log.Error("Failed to reset session timestamp after DDL execution", zap.Error(err)) | ||
| if rbErr := tx.Rollback(); rbErr != nil { | ||
| log.Error("Failed to rollback", zap.String("namespace", m.id.Namespace), | ||
| zap.String("changefeed", m.id.ID), zap.Error(err)) |
There was a problem hiding this comment.
| targetColumns, err := extractCurrentTimestampDefaultColumns(ddl.Query) | ||
| if err != nil || len(targetColumns) == 0 { | ||
| return 0, false | ||
| } |
There was a problem hiding this comment.
The error from extractCurrentTimestampDefaultColumns is silently ignored. If DDL parsing fails, it could indicate an issue with an unsupported DDL statement, and the system would fall back to the old behavior without aligning timestamps. This could lead to data inconsistency without any warning. It's better to log this error.
targetColumns, err := extractCurrentTimestampDefaultColumns(ddl.Query)
if err != nil {
log.Warn("Failed to extract columns with CURRENT_TIMESTAMP default from DDL, "+
"it may cause data inconsistency.",
zap.String("query", ddl.Query),
zap.Error(err))
return 0, false
}
if len(targetColumns) == 0 {
return 0, false
}|
@Debra-He: adding LGTM is restricted to approvers and reviewers in OWNERS files. DetailsIn response to this: Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
@zier-one: adding LGTM is restricted to approvers and reviewers in OWNERS files. DetailsIn response to this: Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/retest-required |
1 similar comment
|
/retest-required |
|
/retest-required |
1 similar comment
|
/retest-required |
|
Please add an integration test to cover it. |
…ailures only when the downstream returns specific “bad value / unknown var” errors
…... DEFAULT CURRENT_TIMESTAMP existing‑rows divergence by forcing a fixed session timestamp
… reset failures only when the downstream returns specific “bad value / unknown var” errors" This reverts commit 5ba109a.
| run_sql "INSERT INTO ${DB_NAME}.t VALUES (1, 10), (2, 20), (3, 30);" ${UP_TIDB_HOST} ${UP_TIDB_PORT} | ||
|
|
||
| run_sql "SET @@timestamp = 1000000000.123456; ALTER TABLE ${DB_NAME}.t ADD COLUMN c DATETIME(6) DEFAULT CURRENT_TIMESTAMP(6);" ${UP_TIDB_HOST} ${UP_TIDB_PORT} | ||
|
|
There was a problem hiding this comment.
Could you add more dml before and after adding the column?
There was a problem hiding this comment.
thx, i just added some more dml, PTAL
|
/retest-required |
|
@flowbehappy: adding LGTM is restricted to approvers and reviewers in OWNERS files. DetailsIn response to this: Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: Debra-He, flowbehappy, lidezhu, wk989898, zier-one The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/test pull-error-log-review |
|
/retest-required |
|
/test pull-dm-integration-test |
|
/retest |
|
/test pull-cdc-integration-kafka-test |
|
/test tide |
|
@haiboumich: The specified target(s) for Use DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/test pull-cdc-integration-pulsar-test |
|
/retest-required |
|
/test pull-dm-integration-test |
|
/test pull-cdc-integration-kafka-test |
5 similar comments
|
/test pull-cdc-integration-kafka-test |
|
/test pull-cdc-integration-kafka-test |
|
/test pull-cdc-integration-kafka-test |
|
/test pull-cdc-integration-kafka-test |
|
/test pull-cdc-integration-kafka-test |
Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
|
In response to a cherrypick label: new pull request created to branch |
What problem does this PR solve?
DDL statements that reference CURRENT_TIMESTAMP/NOW/LOCALTIME/LOCALTIMESTAMP can evaluate to different values downstream when executed later or in a different time zone. This can lead to inconsistent default values for newly added columns.
Issue Number: close #11368
What is changed and how it works?
TIMESTAMPfor each DDL execution using the DDL StartTs (fallback to CommitTs).origin_defaultfrom TableInfo to set the session timestamp so defaults match upstream evaluation.TIMESTAMPafter DDL execution and on failures to avoid leaking session state.Check List
Tests
Questions
Will it cause performance regression or break compatibility?
No. The change only affects DDL execution paths and sets a session variable around the DDL, then resets it.
Do you need to update user documentation, design documentation or monitoring documentation?
No.
Release note