Skip to content

fix(cubemaster): clean job rows after snapshot delete#559

Open
xiaojunxiang2023 wants to merge 1 commit into
TencentCloud:masterfrom
xiaojunxiang2023:fix/snapshot-delete-orphan-job-list
Open

fix(cubemaster): clean job rows after snapshot delete#559
xiaojunxiang2023 wants to merge 1 commit into
TencentCloud:masterfrom
xiaojunxiang2023:fix/snapshot-delete-orphan-job-list

Conversation

@xiaojunxiang2023

Copy link
Copy Markdown
Contributor

Summary

Fixes a bug where deleted snapshots still appeared in template list APIs until a second delete.

After the first DELETE /templates/{snapshotID}:

  • Single-snapshot lookup correctly returned not found
  • GET /templates and cubemastercli tpl ls still listed the deleted snapshot
  • A second delete was required to remove it from the list
image

Root cause

Snapshot delete (runSnapshotDeleteJob) removed template_definition metadata but did not remove related template_image_job rows.

ListTemplates appends orphan job records when no definition exists:

for _, job := range jobs {
    if _, ok := seen[job.TemplateID]; ok {
        continue
    }
    out = append(out, templateInfoFromJob(&job))
}

So the deleted snapshot was resurrected in list responses via job fallback entries.

On the second delete, has_snapshot returned false (definition gone, job fallback has no Kind), routing the request through template delete (cleanupTemplateJobs) instead of snapshot delete.

Fix

  • Call cleanupTemplateJobs after successful snapshot metadata cleanup, matching template delete behavior
  • Return terminal READY job info from executeSnapshotDeleteJob without re-querying job rows that were just deleted

Deploy And Verify

image

Comment thread CubeMaster/pkg/templatecenter/snapshot_ops.go Outdated
@xiaojunxiang2023 xiaojunxiang2023 force-pushed the fix/snapshot-delete-orphan-job-list branch from 2c77322 to 879453f Compare June 15, 2026 06:22
Comment thread CubeMaster/pkg/templatecenter/job_dto.go
Comment thread CubeMaster/pkg/templatecenter/snapshot_ops_test.go
Comment thread CubeMaster/pkg/templatecenter/snapshot_ops_test.go
@cubesandboxbot

Copy link
Copy Markdown

Review: PR 559

Snapshot delete removed template_definition but left template_image_job
rows behind. ListTemplates surfaces orphan jobs, so deleted snapshots
still appeared in GET /templates and tpl ls until a second delete routed
through template cleanup.

Signed-off-by: xiaojunxiang <xiaojunxiang@kingsoft.com>
@xiaojunxiang2023 xiaojunxiang2023 force-pushed the fix/snapshot-delete-orphan-job-list branch from 879453f to 8530777 Compare June 15, 2026 06:31
patches.ApplyFunc(runReplicaCleanup, func(ctx context.Context, templateID string, locators []templateCleanupLocator) error {
return nil
})
patches.ApplyFunc(deleteSnapshotMetadataOnly, func(ctx context.Context, snapshotID string) error {

@fslongjin fslongjin Jun 16, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is risky for CI under the default Go compiler settings. deleteSnapshotMetadataOnly is just a tiny wrapper, so the compiler may inline it; when that happens, gomonkey.ApplyFunc(deleteSnapshotMetadataOnly, ...) does not intercept the call and the test still reaches the real cleanupTemplateMetadata. Since the test sets store.db to an empty &gorm.DB{}, that real DB path can panic.

I can reproduce this locally: go test ./pkg/templatecenter panics in TestRunSnapshotDeleteJobCleansTemplateJobs, while go test -gcflags=all=-l ./pkg/templatecenter passes after disabling inlining. Consider avoiding a patch on this tiny wrapper, for example by introducing a replaceable function variable or by using a real test DB to verify the job cleanup behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants