fix(cubemaster): clean job rows after snapshot delete#559
fix(cubemaster): clean job rows after snapshot delete#559xiaojunxiang2023 wants to merge 1 commit into
Conversation
2c77322 to
879453f
Compare
|
Review: PR 559 |
Snapshot delete removed template_definition but left template_image_job rows behind. ListTemplates surfaces orphan jobs, so deleted snapshots still appeared in GET /templates and tpl ls until a second delete routed through template cleanup. Signed-off-by: xiaojunxiang <xiaojunxiang@kingsoft.com>
879453f to
8530777
Compare
| patches.ApplyFunc(runReplicaCleanup, func(ctx context.Context, templateID string, locators []templateCleanupLocator) error { | ||
| return nil | ||
| }) | ||
| patches.ApplyFunc(deleteSnapshotMetadataOnly, func(ctx context.Context, snapshotID string) error { |
There was a problem hiding this comment.
This test is risky for CI under the default Go compiler settings. deleteSnapshotMetadataOnly is just a tiny wrapper, so the compiler may inline it; when that happens, gomonkey.ApplyFunc(deleteSnapshotMetadataOnly, ...) does not intercept the call and the test still reaches the real cleanupTemplateMetadata. Since the test sets store.db to an empty &gorm.DB{}, that real DB path can panic.
I can reproduce this locally: go test ./pkg/templatecenter panics in TestRunSnapshotDeleteJobCleansTemplateJobs, while go test -gcflags=all=-l ./pkg/templatecenter passes after disabling inlining. Consider avoiding a patch on this tiny wrapper, for example by introducing a replaceable function variable or by using a real test DB to verify the job cleanup behavior.
Summary
Fixes a bug where deleted snapshots still appeared in template list APIs until a second delete.
After the first
DELETE /templates/{snapshotID}:GET /templatesandcubemastercli tpl lsstill listed the deleted snapshotRoot cause
Snapshot delete (
runSnapshotDeleteJob) removedtemplate_definitionmetadata but did not remove relatedtemplate_image_jobrows.ListTemplatesappends orphan job records when no definition exists:So the deleted snapshot was resurrected in list responses via job fallback entries.
On the second delete,
has_snapshotreturned false (definition gone, job fallback has noKind), routing the request through template delete (cleanupTemplateJobs) instead of snapshot delete.Fix
cleanupTemplateJobsafter successful snapshot metadata cleanup, matching template delete behaviorexecuteSnapshotDeleteJobwithout re-querying job rows that were just deletedDeploy And Verify