master(dm): avoid slow OpenAPI delete on unavailable downstream#12563
master(dm): avoid slow OpenAPI delete on unavailable downstream#12563GMHDBJD wants to merge 1 commit intopingcap:masterfrom
Conversation
|
Skipping CI for Draft Pull Request. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses a critical issue where OpenAPI task deletion could become excessively slow or hang indefinitely if the downstream database was unavailable or unresponsive. The changes introduce a specific timeout for downstream connections during deletion and modify the error handling to allow task deletion to proceed even if downstream metadata cleanup encounters issues. This significantly improves the reliability and responsiveness of the task deletion API, ensuring that tasks can be removed promptly without being blocked by external system states. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
The pull request refactors the deleteTask functionality in the DM master to improve metadata cleanup, specifically by separating internal and downstream metadata removal and introducing configurable timeouts for database operations. This allows for more resilient task deletion, where errors during downstream metadata cleanup are now logged as warnings instead of blocking the entire deletion process. New helper functions like AdjustTargetDBSessionCfgWithTimeout, removeMetaDataWithTimeout, removeInternalMetaData, removeDownstreamMetaData, GetDownstreamDBWithTimeout, and ApplyWithPingTimeout were introduced or modified to support these changes, along with updated and new test cases. Review comments suggest improvements to error handling, specifically by ensuring that removeInternalMetaData does not mask errors by returning nil and by addressing the incomplete error handling that could skip downstream metadata removal if internal metadata removal fails. Additionally, a code simplification was suggested for a redundant else if block in the deleteTask function.
| if err != nil { | ||
| return err | ||
| } |
There was a problem hiding this comment.
Returning nil here after encountering an error in s.optimist.RemoveMetaDataWithTask(taskName) or s.scheduler.RemoveLoadTaskAndLightningStatus(taskName) might mask issues. It would be better to return the error to ensure it's handled appropriately by the caller.
return err
}
return nil //nolint:nilerr
}
return err // Return the error instead of nil| } else if err = s.removeDownstreamMetaData(ctx, taskName, metaSchema, toDBCfg, openAPIDeleteTaskDownstreamTimeout); err != nil { | ||
| log.L().Warn("failed to remove downstream metadata when deleting task", zap.String("task", taskName), zap.Error(err)) | ||
| } |
There was a problem hiding this comment.
The else if block here is redundant. If the adjustErr check fails, the function continues, so the else is unnecessary. The code can be simplified by removing the else and letting the subsequent code execute unconditionally.
log.L().Warn("skip downstream metadata cleanup when deleting task", zap.String("task", taskName), zap.Error(adjustErr))
if err = s.removeDownstreamMetaData(ctx, taskName, metaSchema, toDBCfg, openAPIDeleteTaskDownstreamTimeout); err != nil {
log.L().Warn("failed to remove downstream metadata when deleting task", zap.String("task", taskName), zap.Error(err))| err := s.removeInternalMetaData(taskName) | ||
| if err != nil { | ||
| return err | ||
| } |
There was a problem hiding this comment.
This error handling seems incomplete. If s.removeInternalMetaData fails, the function returns, skipping the downstream metadata removal. It might be better to log this error and continue with downstream metadata removal, or to attempt downstream removal before internal metadata removal.
return err
}
// Log the error but continue with downstream metadata removal
log.L().Error("failed to remove internal metadata", zap.String("task", taskName), zap.Error(err))
}
What problem does this PR solve?
Issue Number: close #12562
What is changed and how it works?
Check List
Tests
make dm_unit_test_pkg PKG=github.com/pingcap/tiflow/dm/mastertests/run.sh openapi(currently blocked by an unrelated existing failure intest_multi_tasks:ERROR 1236 (HY000): Client requested master to start replication from position > file size)Questions
Will it cause performance regression or break compatibility?
No. It only changes the OpenAPI delete path when downstream is unavailable.
Do you need to update user documentation, design documentation or monitoring documentation?
No.
Release note