sync-diff-inspector: improve checksum SQL with configurable algorithm and float handling#12533
sync-diff-inspector: improve checksum SQL with configurable algorithm and float handling#12533joechenrh wants to merge 5 commits intopingcap:masterfrom
Conversation
… algorithm (pingcap#12525) Add a new `checksum-algorithm` configuration flag to sync-diff-inspector to support FIPS-compliant environments where MD5 is disabled. Supported options: "md5" (default) and "sha256". Cherry-picked from pingcap/tidb-tools#886 Close pingcap/tidb-tools#885 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request enhances the Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a configurable checksum algorithm, allowing the use of SHA256 in addition to the default MD5. This is a valuable change for FIPS-compliant environments. The implementation looks solid, with changes correctly propagated through configuration, data sources, and utility functions. I've added a couple of suggestions to improve maintainability and performance.
…checksum SQL - Fix struct field alignment to pass gci formatter check - Rename GetCountAndMD5 to GetCountAndChecksum in Source interface and all implementations since the method now supports multiple algorithms - Optimize SQL query to compute hash once per row using a subquery instead of duplicating the hash expression Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
/retest |
|
/retest |
|
/test wip-pull-build |
|
/test ? |
|
@wuhuizuo: The following commands are available to trigger required jobs: Use DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/test wip-pull-check wip-pull-unit-test-cdc wip-pull-unit-test-dm wip-pull-unit-test-engine |
|
/test wip-pull-unit-test-engine wip-pull-unit-test-dm |
|
@joechenrh: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Signed-off-by: Ruihao Chen <joechenrh@gmail.com>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files
Flags with carried forward coverage won't be shown. Click here to find out more. @@ Coverage Diff @@
## master #12533 +/- ##
================================================
- Coverage 53.4349% 53.3609% -0.0740%
================================================
Files 1033 1010 -23
Lines 144458 139870 -4588
================================================
- Hits 77191 74636 -2555
+ Misses 61495 59624 -1871
+ Partials 5772 5610 -162 🚀 New features to boost your workflow:
|
Cherry-pick of pingcap/tidb-tools#886
What problem does this PR solve?
Issue Number: close #12535
What is changed and how it works?
This PR makes two improvements to the checksum SQL used by sync-diff-inspector:
Configurable checksum algorithm (
md5/sha256): sync-diff-inspector previously hardcoded MD5 for chunk checksumming. In FIPS-compliant environments (TiDB built withENABLE_FIPS=1), MD5 is disabled in the OpenSSL library used by TiKV, causing sync-diff-inspector to fail when TiKV's coprocessor tries to evaluate MD5 expressions. This adds achecksum-algorithmconfig flag withmd5(default) andsha256options.Replace
round()withCAST(... AS DECIMAL(65, 30))for float/double columns: The previousround(col, N-floor(log10(abs(col))))normalization for float/double columns returned NULL when the column value was 0 (sincelog10(0)is undefined), causing silent data mismatches. Replacing withCAST(col AS DECIMAL(65, 30))fixes this bug and also enables TiKV coprocessor pushdown sinceCAST ... AS DECIMALis a supported coprocessor expression.The checksum SQL is also refactored to use a subquery pattern (
SELECT COUNT(*), BIT_XOR(...) FROM (SELECT hash_func(...) as hash FROM ...) as t) for cleaner generated SQL.Check List
Tests
Questions
Will it cause performance regression or break compatibility?
No. Default checksum algorithm remains MD5 for backwards compatibility. The
CAST ... AS DECIMALchange may improve performance by enabling TiKV coprocessor pushdown for float/double columns.Do you need to update user documentation, design documentation or monitoring documentation?
No. The
checksum-algorithmflag is hidden for now.Release note