Skip to content

[DNM] dm: intergrate mariadb2dm#12520

Open
joechenrh wants to merge 8 commits intopingcap:masterfrom
joechenrh:mariadb2tidb-dm
Open

[DNM] dm: intergrate mariadb2dm#12520
joechenrh wants to merge 8 commits intopingcap:masterfrom
joechenrh:mariadb2tidb-dm

Conversation

@joechenrh
Copy link
Contributor

What problem does this PR solve?

Issue Number: close #xxx

What is changed and how it works?

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Questions

Will it cause performance regression or break compatibility?
Do you need to update user documentation, design documentation or monitoring documentation?

Release note

Please refer to [Release Notes Language Style Guide](https://pingcap.github.io/tidb-dev-guide/contribute-to-tidb/release-notes-style-guide.html) to write a quality release note.

If you don't think this PR needs a release note then fill it with `None`.

@ti-chi-bot ti-chi-bot bot added do-not-merge/needs-linked-issue release-note Denotes a PR that will be considered when it comes time to generate release notes. labels Feb 6, 2026
@ti-chi-bot
Copy link
Contributor

ti-chi-bot bot commented Feb 6, 2026

[FORMAT CHECKER NOTIFICATION]

Notice: To remove the do-not-merge/needs-linked-issue label, please provide the linked issue number on one line in the PR body, for example: Issue Number: close #123 or Issue Number: ref #456.

📖 For more info, you can check the "Contribute Code" section in the development guide.

@ti-chi-bot ti-chi-bot bot added the area/dm Issues or PRs related to DM. label Feb 6, 2026
@ti-chi-bot
Copy link
Contributor

ti-chi-bot bot commented Feb 6, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign benjamin2037, benmeadowcroft for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Feb 6, 2026
@gemini-code-assist
Copy link

Summary of Changes

Hello @joechenrh, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a robust mechanism for converting MariaDB DDLs to be compatible with TiDB. By integrating a dedicated conversion library and applying its rules across the data migration pipeline, it significantly reduces manual intervention and potential errors when migrating schemas from MariaDB to TiDB. The changes include new configuration parameters for fine-grained control over the conversion process and ensure that schema checks, data loading, and DDL replication all benefit from this compatibility layer.

Highlights

  • MariaDB to TiDB DDL Conversion Integration: The pull request integrates a new mariadb2tidb library to automatically convert MariaDB-specific DDL statements into TiDB-compatible SQL. This ensures smoother migration and replication from MariaDB sources.
  • Configurable Conversion Rules: New configuration options have been added to subtask.toml and task.go to control the mariadb2tidb conversion behavior, including enabling/disabling specific rules, setting strict mode, and defining charset/collation mappings.
  • Application Across DM Components: The DDL conversion logic is applied in key DM components: during schema checking (dm/checker), data loading (dm/loader/lightning), and DDL synchronization (dm/syncer). This ensures that incompatible DDLs are transformed at various stages of the data migration process.
  • Extensive Rule Set for Compatibility: A comprehensive set of transformation rules has been introduced to handle various MariaDB-specific features, such as unsupported column attributes, index types, function defaults, system versioning, and collation differences, making the converted DDL compatible with TiDB.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • dm/checker/checker.go
    • Imported mariadb2tidb and its configuration.
    • Added buildMariaDB2TiDBConverters function to create converters for MySQL instances.
    • Modified NewTablesChecker and NewPrimaryKeyChecker to accept and utilize these converters for DDL transformation.
  • dm/config/subtask.go
    • Added MariaDB2TiDBConfig struct to SubTaskConfig to hold conversion settings.
    • Initialized MariaDB2TiDB with default configuration in NewSubTaskConfig.
    • Integrated MariaDB2TiDB.Adjust() call into SubTaskConfig.Adjust() for validation and normalization.
  • dm/config/subtask.toml
    • Added a new commented-out section [mariadb2tidb] with detailed explanations for mode, strict-mode, enabled-rules, and disabled-rules.
  • dm/config/task.go
    • Defined MariaDB2TiDBMode constants and the MariaDB2TiDBConfig struct.
    • Implemented DefaultMariaDB2TiDBConfig, Adjust, EnabledForFlavor, and Strict methods for MariaDB2TiDBConfig.
    • Included MariaDB2TiDBConfig in TaskConfig and initialized it with defaults.
    • Added MariaDB2TiDB.Adjust() call to TaskConfig.adjust().
  • dm/config/task_converters.go
    • Ensured MariaDB2TiDB configuration is correctly propagated between TaskConfig and SubTaskConfig during conversion.
  • dm/config/task_test.go
    • Updated TestGenAndFromSubTaskConfigs to handle the new MariaDB2TiDB field in subtask configurations.
    • Adjusted TestTaskConfigForDowngrade to reflect the addition of the mariadb2tidb field.
  • dm/loader/lightning.go
    • Introduced a call to l.maybeTransformSchemaFiles(ctx) within the restore method to transform schema files before processing.
  • dm/loader/mariadb2tidb.go
    • Added new file.
    • Implemented maybeTransformSchemaFiles to read, transform, and write back MariaDB schema files using the mariadb2tidb.Converter.
    • Provided helper functions dumpStorage and isSchemaFile for file handling.
  • dm/master/task_advanced.yaml
    • Added a commented-out mariadb2tidb configuration block with comprehensive descriptions for user guidance.
  • dm/pkg/checker/primary_key.go
    • Imported mariadb2tidb.
    • Added a converters field to PrimaryKeyChecker.
    • Modified NewPrimaryKeyChecker to initialize the converters.
    • Applied maybeTransformCreateTable to DDL statements before parsing to ensure compatibility during primary key checks.
  • dm/pkg/checker/primary_key_test.go
    • Updated test cases to pass nil for the newly added converters argument in NewPrimaryKeyChecker calls.
  • dm/pkg/checker/table_structure.go
    • Imported mariadb2tidb.
    • Added a converters field to TablesChecker.
    • Modified NewTablesChecker to initialize the converters.
    • Applied maybeTransformCreateTable to DDL statements before parsing to ensure compatibility during table structure checks.
  • dm/pkg/checker/table_structure_test.go
    • Updated test cases to pass nil for the newly added converters argument in NewTablesChecker calls.
  • dm/pkg/checker/utils.go
    • Imported mariadb2tidb.
    • Added maybeTransformCreateTable function to conditionally apply MariaDB to TiDB conversion to CREATE TABLE statements.
  • dm/pkg/mariadb2tidb/config/config.go
    • Added new file.
    • Defined configuration structures (Config, FormattingConfig, CharsetMapping) for the mariadb2tidb library.
    • Provided DefaultConfig, IsRuleEnabled, and LoadConfig functions for managing conversion settings.
  • dm/pkg/mariadb2tidb/converter.go
    • Added new file.
    • Implemented the Converter struct to encapsulate the conversion logic.
    • Provided NewConverter for instantiation and TransformSQL to apply rules to SQL strings, including a non-strict fallback mechanism.
  • dm/pkg/mariadb2tidb/converter_test.go
    • Added new file.
    • Contains extensive unit tests for TransformSQL covering various scenarios like default rules, non-strict behavior, trailing commas, version macros, and specific rule applications (e.g., constraints, system versioning, column attributes, index options, fulltext/spatial indexes, engine/collation handling).
  • dm/pkg/mariadb2tidb/parser/loader.go
    • Added new file.
    • Defined Loader for parsing SQL, including preprocessSQL for text-based transformations (e.g., UUID normalization, encryption removal, charset/collation adjustments, index option stripping).
    • Integrated rules.NewRegistry for rule management.
  • dm/pkg/mariadb2tidb/parser/test_helper.go
    • Added new file.
    • Provides initTestLogger for consistent logging setup in tests.
  • dm/pkg/mariadb2tidb/parser/visitor.go
    • Added new file.
    • Defined Visitor interface and BaseVisitor for AST traversal.
    • Implemented Walker to facilitate AST traversal and rule application.
  • dm/pkg/mariadb2tidb/parser/writer.go
    • Added new file.
    • Defined Writer for converting AST back to formatted SQL strings.
    • Implemented WriteStatements and WriteStatement with specific formatting and cleanup logic (e.g., removing charset prefixes on empty string defaults).
  • dm/pkg/mariadb2tidb/rules/auto_increment_values.go
    • Added new file.
    • Implements AutoIncrementValuesRule to normalize non-positive AUTO_INCREMENT values to 1.
  • dm/pkg/mariadb2tidb/rules/collation.go
    • Added new file.
    • Implements CollationRule to transform collation and charset specifications based on configured mappings (e.g., latin1_swedish_ci to utf8mb4_0900_ai_ci).
  • dm/pkg/mariadb2tidb/rules/collation_fallback.go
    • Added new file.
    • Implements CollationFallbackRule to remove unsupported collations, allowing TiDB to use its default collations.
  • dm/pkg/mariadb2tidb/rules/column_attributes.go
    • Added new file.
    • Implements ColumnAttributesRule to strip unsupported column attributes like INVISIBLE, COMPRESSED, and PERSISTENT.
  • dm/pkg/mariadb2tidb/rules/constraints.go
    • Added new file.
    • Implements ConstraintsRule to strip unsupported foreign key clauses such as MATCH and SET DEFAULT.
  • dm/pkg/mariadb2tidb/rules/create_or_replace.go
    • Added new file.
    • Implements CreateOrReplaceRule to rewrite unsupported CREATE OR REPLACE statements into DROP IF EXISTS followed by CREATE.
  • dm/pkg/mariadb2tidb/rules/engine_options.go
    • Added new file.
    • Implements EngineOptionsRule to remove unsupported storage engine options (e.g., ENGINE=MyISAM).
  • dm/pkg/mariadb2tidb/rules/fulltext_index_normalize.go
    • Added new file.
    • Implements FulltextIndexNormalizeRule to normalize FULLTEXT indexes to a single-column form and remove unsupported options.
  • dm/pkg/mariadb2tidb/rules/function_default.go
    • Added new file.
    • Implements FunctionDefaultRule to remove unsupported function-based default values (e.g., uuid()) from column definitions.
  • dm/pkg/mariadb2tidb/rules/ignored_clause_cleanup.go
    • Added new file.
    • Implements IgnoredClauseCleanupRule to remove clauses that TiDB parses but ignores, such as NO_WRITE_TO_BINLOG and COLUMN_FORMAT.
  • dm/pkg/mariadb2tidb/rules/index_options.go
    • Added new file.
    • Implements IndexOptionsRule to remove unsupported index options, specifically converting VECTOR indexes to regular indexes.
  • dm/pkg/mariadb2tidb/rules/index_prefix.go
    • Added new file.
    • Implements IndexPrefixRule to add default prefix lengths (e.g., 255 for TEXT/BLOB, column length for VARCHAR) to indexed columns lacking them.
  • dm/pkg/mariadb2tidb/rules/index_type.go
    • Added new file.
    • Implements IndexTypeRule to drop unsupported index type qualifiers (e.g., USING HASH).
  • dm/pkg/mariadb2tidb/rules/integer_width.go
    • Added new file.
    • Implements IntegerWidthRule to remove integer display width specifications (e.g., INT(11) becomes INT).
  • dm/pkg/mariadb2tidb/rules/interface.go
    • Added new file.
    • Defines the Rule interface and RuleCategory types for the transformation engine.
  • dm/pkg/mariadb2tidb/rules/json_check.go
    • Added new file.
    • Implements JSONCheckRule to remove CHECK constraints that use the JSON_VALID() function.
  • dm/pkg/mariadb2tidb/rules/json_generated.go
    • Added new file.
    • Implements JSONGeneratedRule to convert generated columns using JSON functions into regular columns, as TiDB does not support such expressions in generated columns.
  • dm/pkg/mariadb2tidb/rules/keylength.go
    • Added new file.
    • Implements KeyLengthRule to truncate oversized VARCHAR lengths to 768 characters for TiDB key limit compatibility.
  • dm/pkg/mariadb2tidb/rules/mariadb_specific.go
    • Added new file.
    • Implements MariaDBSpecificRule to remove MariaDB-only table options (e.g., ROW_FORMAT, KEY_BLOCK_SIZE).
  • dm/pkg/mariadb2tidb/rules/on_update_current_timestamp.go
    • Added new file.
    • Implements OnUpdateCurrentTimestampRule to add DEFAULT CURRENT_TIMESTAMP when ON UPDATE CURRENT_TIMESTAMP is specified without an explicit default.
  • dm/pkg/mariadb2tidb/rules/preparse_helpers.go
    • Added new file.
    • Contains helper functions and regexes for text-based SQL preprocessing, including handling version macros, system versioning, column attributes, sequence types, CREATE OR REPLACE statements, and trailing commas.
  • dm/pkg/mariadb2tidb/rules/qualified_names.go
    • Added new file.
    • Implements QualifiedNamesRule to remove schema/table qualifiers from identifiers.
  • dm/pkg/mariadb2tidb/rules/registry.go
    • Added new file.
    • Manages the registration and retrieval of all transformation rules, ensuring they are applied in priority order.
    • Includes registerDefaultRules to set up the full suite of conversion rules.
  • dm/pkg/mariadb2tidb/rules/sequence_type.go
    • Added new file.
    • Implements SequenceTypeRule to remove MariaDB-only sequence type clauses (e.g., AS BIGINT).
  • dm/pkg/mariadb2tidb/rules/spatial_index_drop.go
    • Added new file.
    • Implements SpatialIndexDropRule to downgrade SPATIAL indexes to normal indexes.
  • dm/pkg/mariadb2tidb/rules/stubs.go
    • Added new file.
    • Provides a StubRule base implementation for rules that are not yet fully implemented.
  • dm/pkg/mariadb2tidb/rules/system_versioning.go
    • Added new file.
    • Implements SystemVersioningRule to remove unsupported system-versioning syntax.
  • dm/pkg/mariadb2tidb/rules/text_blob_defaults.go
    • Added new file.
    • Implements TextBlobDefaultRule to remove default values from TEXT/BLOB/JSON columns, which are not supported in TiDB.
  • dm/pkg/mariadb2tidb/rules/trailing_comma.go
    • Added new file.
    • Implements TrailingCommaRule to remove trailing commas in column and index lists before parsing.
  • dm/pkg/mariadb2tidb/rules/uuid_type.go
    • Added new file.
    • Implements UUIDTypeRule to normalize UUID-related column definitions (e.g., adding default empty string for NOT NULL UUID columns) and rename UUID index names.
  • dm/pkg/mariadb2tidb/rules/version_macros.go
    • Added new file.
    • Implements VersionMacrosRule to unwrap MariaDB/MySQL versioned comments (e.g., /*!40101 SET NAMES utf8 */).
  • dm/pkg/mariadb2tidb/rules/zero_timestamp.go
    • Added new file.
    • Implements ZeroTimestampRule to remove zero timestamp defaults (e.g., DEFAULT '0000-00-00 00:00:00') that are rejected by TiDB in strict mode.
  • dm/pkg/mariadb2tidb/transformer/engine.go
    • Added new file.
    • Defined the Engine struct to orchestrate the application of transformation rules.
    • Implemented NewEngine for initialization and Transform to apply all registered rules sequentially via AST walking.
  • dm/pkg/mariadb2tidb/utils/logger.go
    • Added new file.
    • Provides global logger initialization (InitLogger) and retrieval (GetLogger) functions for the mariadb2tidb package.
  • dm/syncer/ddl.go
    • Imported mariadb2tidb.
    • Added ddlConverter field to DDLWorker and initialized it in NewDDLWorker.
    • Modified HandleQueryEvent to apply ddlConverter.TransformSQL to incoming DDLs before parsing, storing the transformed SQL in qec.ddlSQL.
    • Updated error logging to include both original and transformed SQL for better debugging.
  • dm/syncer/mariadb2tidb.go
    • Added new file.
    • Implemented newDDLConverter function to create and configure a mariadb2tidb.Converter based on the subtask's configuration and flavor.
  • dm/syncer/syncer.go
    • Imported mariadb2tidb.
    • Added ddlConverter field to the Syncer struct.
    • Initialized s.ddlConverter in Syncer.Init() using the newDDLConverter helper.
    • Added ddlSQL field to queryEventContext to hold the transformed DDL statement.
Activity
  • The pull request is currently marked as 'Do Not Merge' ([DNM]), indicating it is a work in progress or a draft.
  • The description body is largely templated, suggesting that detailed information about the problem solved and specific changes is yet to be filled out by the author.
  • No specific human activity such as comments or reviews has been recorded in the provided context.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request integrates a MariaDB to TiDB SQL conversion library into DM, affecting the checker, loader, and syncer components, and extends configuration for this new feature. Critical security vulnerabilities and data corruption risks have been identified in the pre-parse rules, specifically a SQL injection vulnerability in the CREATE OR REPLACE rewriting logic and potential data corruption from context-unaware regex replacements. Additionally, there are critical issues in handling multi-statement transformations, where both the checker and syncer might incorrectly process only the first statement, leading to incorrect behavior or data inconsistency. A minor issue also exists in the checker's transformation logic. It is strongly recommended to move all transformations to the AST level or use a context-aware lexer for pre-parsing to address these critical concerns.

Comment on lines +193 to +198
func rewriteCreateOrReplace(sql string) string {
sql = createOrReplaceIndexRegex.ReplaceAllString(sql, "DROP INDEX IF EXISTS $1 ON $2; CREATE INDEX $1 ON $2")
sql = createOrReplaceTableRegex.ReplaceAllString(sql, "DROP TABLE IF EXISTS $1; CREATE TABLE $1")
sql = createOrReplaceSequenceRegex.ReplaceAllString(sql, "DROP SEQUENCE IF EXISTS $1; CREATE SEQUENCE $1")
return sql
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-critical critical

The rewriteCreateOrReplace function is vulnerable to SQL Injection. It uses regular expressions to rewrite CREATE OR REPLACE statements into DROP ...; CREATE ... statements. The regexes used to capture identifiers (table names, index names, sequence names) are too broad ([^\s(]+ and [^\s]+) and do not account for semicolons or other SQL-terminating characters.

An attacker with the ability to create tables, indexes, or sequences on the upstream MariaDB can craft an identifier containing a semicolon followed by malicious SQL commands. When DM transforms this SQL for the downstream TiDB, the injected commands will be executed as separate statements.

Example exploit:
Upstream: CREATE OR REPLACE TABLE users; DROP TABLE orders; -- (id INT)
Transformed: DROP TABLE IF EXISTS users; DROP TABLE orders; --; CREATE TABLE users; DROP TABLE orders; -- (id INT)

Impact: Critical. An attacker can execute arbitrary SQL commands on the downstream database.

Remediation: Avoid using regular expressions for SQL rewriting. Use a proper SQL parser to identify the components of the statement and then reconstruct the desired SQL. If regexes must be used, ensure they explicitly exclude SQL delimiters like semicolons.

Comment on lines +974 to 987
stmts, err := parserpkg.Parse(qec.p, parseSQL, "", "")
if err != nil {
// log error rather than fatal, so other defer can be executed
qec.tctx.L().Error("parse ddl", zap.String("event", "query"), zap.Stringer("query event context", qec))
return nil, terror.ErrSyncerParseDDL.Delegate(err, qec.originSQL)
if parseSQL != qec.originSQL {
qec.tctx.L().Error("parse ddl after mariadb2tidb transform",
zap.String("event", "query"),
zap.String("origin_sql", qec.originSQL),
zap.String("ddl_sql", parseSQL),
zap.Error(err))
} else {
qec.tctx.L().Error("parse ddl", zap.String("event", "query"), zap.Stringer("query event context", qec))
}
return nil, terror.ErrSyncerParseDDL.Delegate(err, parseSQL)
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The function parseOneStmt uses parserpkg.Parse which can return multiple statements, but it only returns the first statement (stmts[0]). This is incorrect when the MariaDB to TiDB transformation produces multiple statements (e.g., CREATE OR REPLACE becomes DROP and CREATE). The subsequent statements are lost, which will break the replication logic. The caller HandleQueryEvent should be updated to iterate over all statements returned from parsing and process them individually.

Comment on lines +159 to +166
func stripSystemVersioning(sql string) string {
sql = withSystemVersioningRegex.ReplaceAllString(sql, " ")
sql = withoutSystemVersioningRegex.ReplaceAllString(sql, " ")
sql = periodSystemTimeRegex.ReplaceAllString(sql, " ")
sql = rowStartRegex.ReplaceAllString(sql, " ")
sql = rowEndRegex.ReplaceAllString(sql, " ")
sql = alterSystemVersioningRegex.ReplaceAllString(sql, " ")
return sql

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

Several pre-parse rules (e.g., SystemVersioningRule, ColumnAttributesRule, SequenceTypeRule, VersionMacrosRule) use global regular expression replacement on the raw SQL string without considering the context. These regexes do not distinguish between SQL keywords and string literals or comments.

If a string literal in an INSERT or UPDATE statement contains text that matches one of these regexes (e.g., "WITH SYSTEM VERSIONING"), it will be incorrectly replaced or stripped, leading to data corruption in the downstream database.

Impact: High. Data corruption in the downstream database.

Remediation: Pre-parse rules should be aware of SQL syntax and only apply replacements to keywords and identifiers, avoiding string literals and comments. Move these transformations to the AST level where possible.

Comment on lines +140 to +202
sql = encryptedRegex.ReplaceAllString(sql, " ")

// Apply charset and collation transformations if configured
sql = l.applyCharsetMappings(sql)

// Replace standalone UUID data types with char(36) but keep functions and identifiers
matches := uuidRegex.FindAllStringIndex(sql, -1)
if matches == nil {
return sql
}

var result strings.Builder
last := 0
for _, m := range matches {
start, end := m[0], m[1]
result.WriteString(sql[last:start])

// Find preceding non-space/non-backtick character
j := start - 1
for j >= 0 && (isSpace(sql[j]) || sql[j] == '`') {
j--
}

// Find following non-space/non-backtick character
i := end
for i < len(sql) && (isSpace(sql[i]) || sql[i] == '`') {
i++
}

// Extract preceding word for context checks
wordEnd := j
for wordEnd >= 0 && (isAlphaNum(sql[wordEnd]) || sql[wordEnd] == '_') {
wordEnd--
}
precedingWord := strings.ToLower(sql[wordEnd+1 : j+1])

switch {
case i < len(sql) && sql[i] == '(':
// uuid used as function - leave unchanged
result.WriteString(sql[start:end])
case i < len(sql) && sql[i] == '`':
// uuid inside backticks - identifier
result.WriteString(sql[start:end])
case precedingWord == "key" || precedingWord == "unique" || precedingWord == "primary" || precedingWord == "constraint" || precedingWord == "index":
// index or constraint name - leave unchanged
result.WriteString(sql[start:end])
case j >= 0 && (isAlphaNum(sql[j]) || sql[j] == '_'):
// uuid used as a data type - replace
result.WriteString("char(36)")
default:
// uuid as column name or other - leave unchanged
result.WriteString(sql[start:end])
}

last = end
}

result.WriteString(sql[last:])

processed := result.String()

// Rename unique key named "uuid" to "uuid_key"
processed = uuidKeyRegex.ReplaceAllString(processed, "UNIQUE KEY uuid_key")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

The preprocessSQL function applies global regex replacements for encryption options and UUID keys without context awareness. This can lead to data corruption if string literals in DML statements contain text that matches these patterns.

Impact: High. Data corruption in the downstream database.

Remediation: Ensure regex replacements are context-aware or performed on tokens/AST.

return ret, nil
}

upstreamStmt, err := getCreateTableStmt(w.upstreamParser, createSQL)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

getCreateTableStmt uses parser.ParseOneStmt, which cannot handle multi-statement SQL. The maybeTransformCreateTable function can return multiple SQL statements (e.g., for CREATE OR REPLACE TABLE). This will cause getCreateTableStmt to fail, leading to an incorrect check result. You should modify the logic to handle multiple statements, for example by using parser.Parse and then finding the *ast.CreateTableStmt from the returned statements. This issue also exists in dm/pkg/checker/table_structure.go.

Comment on lines +191 to +193
if strings.TrimSpace(transformed) == "" {
return statement, nil
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

If the SQL transformation results in an empty string (e.g., the statement is completely removed), this function currently returns the original, untransformed statement. This is incorrect as it leads to processing a statement that should have been removed. The function should return an empty string in this case, and the caller should handle it appropriately (e.g., by skipping the check for this table).

Suggested change
if strings.TrimSpace(transformed) == "" {
return statement, nil
}
if strings.TrimSpace(transformed) == "" {
return "", nil
}

@wuhuizuo
Copy link
Contributor

/test ?

@ti-chi-bot
Copy link
Contributor

ti-chi-bot bot commented Mar 10, 2026

@wuhuizuo: The following commands are available to trigger required jobs:

/test pull-cdc-integration-kafka-test
/test pull-cdc-integration-mysql-test
/test pull-cdc-integration-pulsar-test
/test pull-cdc-integration-storage-test
/test pull-dm-compatibility-test
/test pull-dm-integration-test
/test pull-error-log-review
/test pull-syncdiff-integration-test
/test pull-verify
/test wip-pull-build
/test wip-pull-check
/test wip-pull-unit-test-cdc
/test wip-pull-unit-test-dm
/test wip-pull-unit-test-engine

Use /test all to run the following jobs that were automatically triggered:

pingcap/tiflow/ghpr_verify
pingcap/tiflow/pull_cdc_integration_kafka_test
pingcap/tiflow/pull_cdc_integration_pulsar_test
pingcap/tiflow/pull_cdc_integration_storage_test
pingcap/tiflow/pull_cdc_integration_test
pingcap/tiflow/pull_dm_compatibility_test
pingcap/tiflow/pull_dm_integration_test
pingcap/tiflow/pull_syncdiff_integration_test
pull-error-log-review
Details

In response to this:

/test ?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@wuhuizuo
Copy link
Contributor

/test wip-pull-build

@wuhuizuo
Copy link
Contributor

/test wip-pull-check

@wuhuizuo
Copy link
Contributor

/test wip-pull-unit-test-cdc

@wuhuizuo
Copy link
Contributor

/test wip-pull-unit-test-dm

@wuhuizuo
Copy link
Contributor

/test wip-pull-unit-test-engine

@ti-chi-bot
Copy link
Contributor

ti-chi-bot bot commented Mar 10, 2026

@joechenrh: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-error-log-review 436fb2d link true /test pull-error-log-review
pull-verify 436fb2d link true /test pull-verify
pull-dm-integration-test 436fb2d link true /test pull-dm-integration-test
wip-pull-build 436fb2d link true /test wip-pull-build
wip-pull-check 436fb2d link true /test wip-pull-check
wip-pull-unit-test-engine 436fb2d link true /test wip-pull-unit-test-engine
wip-pull-unit-test-dm 436fb2d link true /test wip-pull-unit-test-dm

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/dm Issues or PRs related to DM. do-not-merge/needs-linked-issue release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants