Skip to content

CRD E2E Ginkgo Suite#964

Merged
liamfallon merged 21 commits into
kptdev:mainfrom
Nordix:e2e-v1alpha2-ginkgo-new
May 6, 2026
Merged

CRD E2E Ginkgo Suite#964
liamfallon merged 21 commits into
kptdev:mainfrom
Nordix:e2e-v1alpha2-ginkgo-new

Conversation

@efiacor
Copy link
Copy Markdown
Collaborator

@efiacor efiacor commented Apr 29, 2026

Title

CRD E2E Ginkgo Suite


Description

  • What changed:
    Adds a comprehensive ginkgo-based end-to-end test suite (test/e2e/crd/) for the v1alpha2 PackageRevision CRD controller.

Test Coverage

  • Lifecycle — Draft/Proposed/Published transitions, deletion, DeletionProposed, workspace recreation, revision numbering, latest-revision label management, git tag/branch cleanup, zombie prevention
  • Rendering — Render failure/recovery, push-on-render-failure annotation, stale detection (rapid pushes)
  • Clone & Upgrade — Upstream ref clone, git URL clone, deployment repos, resource-merge/force-delete-replace/fast-forward/copy-merge upgrade strategies, bearer token auth
  • Copy — Workspace copy, latest-revision update, negative cases
  • Push/PRR — Content persistence, empty updates, rendered output verification, large packages, published package rejection
  • Validation — CEL source validation, lifecycle transition rules, published immutability, field selectors, repository webhook conflicts
  • Repository — Registration, mutable/immutable fields, re-sync preservation, cascade deletion, namespace isolation, directory filtering, unreachable repos, package discovery
  • Metadata — Labels/annotations, Kptfile sync, readinessGates, custom finalizers, garbage collection
  • Migration — v1alpha1→v1alpha2 migration, rollback, side-by-side API isolation, cross-version rejection, orphan cleanup
  • Resilience — Controller restart recovery (FunctionConfig store pre-population, render after restart)
  • Concurrency — Optimistic concurrency on lifecycle patches and PRR pushes
  • Metrics — Prometheus endpoint verification
  • FunctionConfig — Controller finalizer, observed generation status, dynamic tag rendering, tag removal propagation

Supporting Changes

  • test-blueprints.bundle — Added nstest/v1 package (builtin set-namespace pipeline) for fast render tests
  • scripts/create-deployment-blueprint.sh — Fixed RBAC copy ordering (loop ran before packagerevisions was added to ENABLED_RECONCILERS)
  • .github/workflows/porch-e2e-ci-jobs.yaml — Added CRD e2e suite to CI matrix
  • test/e2e/cli/testdata/rpkg-get/config.yaml — Updated expected output for new nstest package
  • controllers/functionconfigs/reconciler/functionconfigreconciler.go + pkg/engine/builtinruntime.go — Fixed data race in exec cache: added missing write lock to UpdateExecCache and replaced unsynchronized map access in builtinRuntime with GetProcessorFromCache (read-locked lookup)

Bug Fixes (discovered via CI)

Controller fixes

  • FunctionConfigStore pre-population (controllers/main.go) — Pre-populate store on startup via mgr.GetAPIReader().List() to prevent empty exec cache after pod restart
  • Render stale check for source-triggered renders (controllers/.../render.go) — Always run checkRenderStale, not just for annotation-triggered renders; prevents init render from overwriting a concurrent user push
  • Re-read PR after render (controllers/.../packagerevision_controller.go) — Re-read PR from API after reconcileRender before calling reconcileLifecycle to prevent stale Spec.Lifecycle from reverting concurrent transitions
  • Requeue on lifecycle transition failure (controllers/.../packagerevision_controller.go) — Return Requeue: true on UpdateLifecycle error so transient git push conflicts are retried automatically (partially addresses Sub-reconcile errors swallowed without requeue #925)

DB layer fixes

Test Stabilization (CI timing sensitivity)

  • updatePRRResources retries on conflict; waitForReady asserts no render in-flight
  • waitForRendered checks annotation match to avoid passing on stale Rendered=True
  • All post-render PRR content assertions wrapped in Eventually for cache propagation delay
  • waitForPRRVisible helper guards against DB commit visibility lag across connections
  • publishPackage includes PRR visibility check before lifecycle transitions
  • Fix nil map panic in updatePRRResources when PRR resources not yet populated after restart

Related Issue(s)


  • Why it's needed: e2e test coverage for PR Controller PoC #514
  • How it works: Ginkgo and gomega suite hooked into CI

Type of Change

  • Bug fix
  • New feature
  • Enhancement
  • Refactor
  • Documentation
  • Tests
  • Other: ________

Checklist

  • Code follows project style guidelines
  • Self-reviewed changes
  • Tests added/updated
  • Documentation added/updated
  • All tests and gating checks pass

Testing Instructions

  1. make run-in-kind-v1alpha2 (deploys porch with PR controller + FunctionConfig reconciler)
  2. make test-e2e-crd-all (runs full suite including migration tests)
  3. To run FunctionConfig tests only: E2E=1 go test -v ./test/e2e/crd -ginkgo.v -ginkgo.focus="FunctionConfig"

Additional Notes

  • Known issues:
    • Render failure tests (~41s each) are bounded by fn-runner pod startup + image pull timeout for non-existent images — this is expected system behaviour
    • test-blueprints.bundle must be reloaded on existing clusters: scripts/modify-gitea-test-blueprints.sh reload
    • Rapid-push stale detection test (PIt): Skipped pending render cancellation (Render cancellation on new push #931). When two pushes arrive within ~100ms, the first render can complete and write before the second push's annotation lands in etcd, causing stale detection to miss it.
  • Further improvements:
  • Review notes:
    • The deployment script fix (RBAC copy ordering) is a bug introduced by the PR Controller PoC — without it, packagerevisions RBAC is never deployed when v1alpha2 is enabled
    • FunctionConfig tests depend on the FunctionConfig reconciler wiring in controllers/main.go (merged in PR Controller PoC #514)
    • The nstest/v1 test blueprint was added specifically to avoid fn-runner pod cold-start latency in CI (~47s → <1s)
    • Controller and DB fixes are minimal and targeted — each addresses a real race condition exposed by CI timing
    • DB transactional fix is safe for v1alpha1 (same code path, same improvement)

AI Disclosure

[X] I have used AI in the creation of this PR.

AmazonQ Developer was used

@netlify
Copy link
Copy Markdown

netlify Bot commented Apr 29, 2026

Deploy Preview for porch ready!

Name Link
🔨 Latest commit 86b2c7a
🔍 Latest deploy log https://app.netlify.com/projects/porch/deploys/69fa6acf44acb200080288db
😎 Deploy Preview https://deploy-preview-964--porch.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.
🤖 Make changes Run an agent on this branch

To edit notification comments on pull requests, go to your Netlify project configuration.

@efiacor efiacor force-pushed the e2e-v1alpha2-ginkgo-new branch 5 times, most recently from 8856d37 to b1026a1 Compare May 1, 2026 07:09
liamfallon
liamfallon previously approved these changes May 5, 2026
efiacor added 2 commits May 5, 2026 22:22
Signed-off-by: Fiachra Corcoran <fiachra.corcoran@est.tech>
Signed-off-by: Fiachra Corcoran <fiachra.corcoran@est.tech>
@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud Bot commented May 5, 2026

@liamfallon liamfallon merged commit 77040f2 into kptdev:main May 6, 2026
37 of 39 checks passed
@efiacor efiacor deleted the e2e-v1alpha2-ginkgo-new branch May 6, 2026 09:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

github_actions Pull requests that update GitHub Actions code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Non-transactional DB resource writes allow corruption under concurrency Integration testing: complete lifecycle and edge cases

2 participants