fix(dm): resolve flaky TestWorker panic on unexpected Resume() call#12546
fix(dm): resolve flaky TestWorker panic on unexpected Resume() call#12546Copilot wants to merge 3 commits intorelease-8.5from
Conversation
|
This cherry pick PR is for a release branch and has not yet been approved by triage owners. To merge this cherry pick:
DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
Hi @Copilot. Thanks for your PR. I'm waiting for a pingcap member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
…section Co-authored-by: wuhuizuo <2574558+wuhuizuo@users.noreply.github.com>
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
/test ? |
|
@wuhuizuo: The following commands are available to trigger required jobs: Use DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/test pull-verify |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files
Flags with carried forward coverage won't be shown. Click here to find out more. @@ Coverage Diff @@
## release-8.5 #12546 +/- ##
================================================
Coverage ? 57.5928%
================================================
Files ? 520
Lines ? 67406
Branches ? 0
================================================
Hits ? 38821
Misses ? 25818
Partials ? 2767 🚀 New features to boost your workflow:
|
|
/ok-to-test |
|
/test ? |
|
@wuhuizuo: The following commands are available to trigger required jobs: Use DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/retest |
|
/test ? |
|
@wuhuizuo: The following commands are available to trigger required jobs: Use DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
@Copilot: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
TestWorkerintermittently panicked whencheckAndAutoResumedispatched an unexpectedResume()call on the mock before the test registered that expectation.Root cause
newDMWorker()setsautoResume.LatestResumeTime = time.Now()at creation time. On slow CI, the SQLiteInit()call can take >1s, meaning by the time the firstStageErrortick runs,time.Since(LatestResumeTime) ≥ backoff.Current()(initial backoff =DefaultBackoffMin = 1s). This causesCheckResumeSubtaskto returnResumeDispatchon the first error tick — beforetime.Sleep(time.Second)— triggeringResume()on a mock with no registered expectation → panic.Fix
Reset
dmWorker.autoResume.LatestResumeTime = time.Now()immediately before the auto-resume test section. This makes the timing deterministic: the first error tick reliably getsResumeSkip, and after thetime.Sleep(time.Second)the second tick getsResumeDispatchas intended.Close #12545
Check List
Tests
Questions
Will it cause performance regression or break compatibility?
No — test-only change.
Do you need to update user documentation, design documentation or monitoring documentation?
No.
Release note
Original prompt
This section details on the original issue you should resolve
<issue_title>Flaky: engine/executor/dm TestWorker panics on unexpected mock Resume()</issue_title>
<issue_description>### Go package
github.com/pingcap/tiflow/engine/executor/dm(dir:engine/executor/dm)Test case
TestWorker(engine/executor/dm/worker_test.go)Flaky symptom
CI intermittently panics due to an unexpected testify/mock call (
Resume()), likely triggered by the auto-resume path in(*dmWorker).checkAndAutoResume.Failure log / stacktrace