Skip to content

Non-transactional DB resource writes allow corruption under concurrency #967

@efiacor

Description

@efiacor

Status: Identified (test workaround in place)

pkgRevResourcesWriteToDB performs a non-transactional DELETE-all + INSERT-each loop. When two writers (server PRR handler and controller render) operate on the same package concurrently, row-level interleaving can corrupt resource data (e.g. cm.yaml ends up empty).

The server pod (PRR Update handler) and controller pod (render write-back) share the same PostgreSQL database but use separate connections. Without wrapping DELETE + INSERTs in a single transaction, concurrent writes produce non-deterministic final state.

Reproduction: only observed on slow CI runners (GH Actions) where the init render's DB write and the user's push overlap within ~100ms. Locally the init render completes before the push arrives.

Current mitigations:

  • Stale render check (checkRenderStale) prevents source-triggered renders from overwriting a concurrent push
  • Resilience test waits for init render DB visibility before pushing
  • Eventually wrappers on post-render content assertions handle transient read inconsistency

Proper fix: wrap the DELETE + INSERT sequence in pkgRevResourcesWriteToDB in a PostgreSQL transaction. Alternatively, use a single INSERT ... ON CONFLICT DO UPDATE without the preceding DELETE (requires schema change to handle resource deletion differently).

File: pkg/cache/dbcache/dbpackagerevisionresourcessql.go:107

Acceptance Criteria:

  • Concurrent resource writes from server and controller produce correct final state
  • No resource corruption under any interleaving
  • E2E resilience test passes without the waitForRendered guard

Additional manifestation: lifecycle transition wipes resources

The Draft→Proposed lifecycle transition in contentcache.UpdateLifecycle uses the draft/close cycle:

  1. repo.UpdatePackageRevision(pkgRev)pkgRevReadFromDB(readResources=true)
  2. draft.UpdateLifecycle() — sets lifecycle in memory
  3. repo.ClosePackageRevisionDraft()savePackageRevisionDraft()savePackageRevision(ctx, d, true)

Step 3 calls pkgRevResourcesWriteToDB which does DELETE-all + INSERT-each. If the in-memory resources field is empty (because step 1's DB read got empty resources due to visibility lag), it deletes all resources and writes nothing — permanently corrupting the package.

This is a permanent failure (not transient): the package can never publish because the git push requires a Kptfile, but the resources table is empty. The controller retries with exponential backoff but every attempt fails with package must contain Kptfile at root: file does not exist.

Root cause: savePackageRevisionDraft hardcodes saveResources=true even for lifecycle-only transitions that don't modify resources.

Fix options:

  1. Pass saveResources=false in savePackageRevisionDraft when only lifecycle changed (add a dirty flag to dbPackageRevision)
  2. Skip the draft/close cycle for Draft↔Proposed when pushDraftsToGit=false (use direct UpdateLifecycle like Proposed→Published does)
  3. Don't DELETE in pkgRevResourcesWriteToDB when len(pr.resources) == 0 — but this breaks intentional empty packages

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions