fix: rollback restores configuration snapshot alongside application image (#4260)#4
Open
neuralmint wants to merge 7 commits into
Open
fix: rollback restores configuration snapshot alongside application image (#4260)#4neuralmint wants to merge 7 commits into
neuralmint wants to merge 7 commits into
Conversation
Bounty #3947 — Bound retry metadata growth on repeated failures. Changes: - Added MAX_RETRY_METADATA hard cap (100) to prevent unbounded retry counter growth. - Added dead_letter store for permanently failed tasks. - fail() now enforces the repeated-failures invariant before re-enqueueing: tasks that exceed max_retries or the hard cap go to dead letter instead. - enqueue() rejects tasks past the hard cap and returns None for the caller to handle. - Added preserve_retries parameter to enqueue() so retry metadata is preserved during re-enqueue (idempotent retry path). - Scheduled task promotion (dequeue) also respects the hard cap. - Added detailed logging for all retry/rejection decisions. - Backward compatible: default max_retries remains 3; existing callers unaffected. - Regression tests cover: repeated-failures trigger, metadata bound, idempotent fail, dead-letter isolation, exhausted enqueue rejection.
Add an atomic state precondition in the scheduler dequeue path to reject tasks whose associated run has been deleted. This prevents stale, duplicate, or policy-violating transitions when a workflow is removed concurrently with run materialization. Changes: - Add tracking set with method - Add precondition in — rejects both queued and scheduled tasks for deleted runs - Bounded audit metadata via structured logging (warn-level with run_id and task_id context) - Fix pre-existing bug: dict now stores task dicts alongside timestamps so data is not lost during promotion - Wire up WorkflowManager in OrchestrationEngine for future mark_run_deleted integration - Add 5 deterministic regression tests covering: * Dequeue rejection for deleted runs * Scheduled task skip for deleted runs * Idempotent mark_run_deleted * Normal unaffected workflows * Isolated deletion between concurrent runs Closes #3977
Adds a data lake governance module that enforces purpose limitation on ingestion writes. Every data lake write now requires purpose metadata (purpose, data class, owner, destination) and is blocked when the destination is not approved for that data class. New components: - DataClassificationRegistry: registers data classes with approved destinations; supports wildcard (all destinations) via empty set - PurposeMetadata: declares purpose, data class, owner, destination - IngestionManifest: full manifest for data lake writes - DataLakeGovernor: validates manifests, enforces policy, records audit log with grouping by purpose and owner - Custom errors: MissingPurposeMetadataError, DataClassNotRegisteredError, DestinationNotApprovedError All 19 new tests pass. Existing test suite unaffected. Closes #3998
- Add release workflow (release.yml) that: - Triggers on version tags (v*) - Builds packages with uv build - Generates build provenance attestation via actions/attest-build-provenance - Creates GitHub Releases with attested artifacts - Publishes to PyPI with attestation support - Add artifact verification section to README with gh CLI instructions The attestation includes source repository, commit SHA, workflow run, and artifact digest — enabling consumers to verify artifact provenance. Closes #4050
…tadata Closes #4088 Multi-stage Dockerfile isolates all build-time-only ARG declarations (BUILD_ENV, PIP_INDEX_URL, UV_VERSION) inside the builder stage. The final runtime stage inherits zero build-time ARGs, preventing leakage into image history, labels, or environment variables. Changes: - Dockerfile: two-stage build (builder → final), ARGs only in builder - .dockerignore: exclude dev/CI artifacts from build context - infra/docker-compose.yml: pass args only to builder stage - infra/scripts/audit_image_metadata.sh: CI audit for leaked metadata - .github/workflows/ci.yml: add docker-build-and-audit job - Makefile: docker-audit / docker-build-slim targets
…es not support ARG expansion)
Docker's COPY --from= instruction does not support variable expansion for
image references. The previous approach used:
COPY --from=ghcr.io/astral-sh/uv:${UV_VERSION} /uv /usr/local/bin/uv
which fails at build time with:
'variable expansion is not supported for --from'
Fix: create a dedicated uv-image stage using FROM with the ARG, then
COPY --from=uv-image using a static stage name. This is the documented
Docker workaround for this limitation.
Also moved UV_VERSION ARG to global scope (before first FROM) so it's
available to the uv-image FROM line, and removed it from the builder
stage since it's no longer consumed there.
…mage Bounty #4260 — Deployment rollback now restores BOTH image and configuration, preventing incompatible settings at startup. Changes: - Add src/deploy/ (ReleaseManager, Release dataclass) — records image digest and a deep-copied config snapshot per release. - Rollback restores the paired config snapshot, not just the image. - Post-rollback verification checks internal consistency. - CLI gains `release list`, `release show`, `release rollback` subcmds. - `deploy` command now records release metadata at deploy time. - 35 tests covering core logic, rollback, serialization, edge cases, and CLI integration. Fixes #4260
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes bounty #4260 — deployment rollback now restores both the application image and the configuration snapshot that was recorded alongside that release.
Problem
When a release is rolled back, only the application image is restored. Configuration maps or feature flags that changed after the release remain, causing the previous application version to start with incompatible settings.
Fix
ao deploy --image-digest X manifest.json— records a release with the manifest as config snapshot.ao release list— lists all releases with image digest and config key count.ao release show <version>— shows full release details.ao release rollback --confirm <version>— restores image AND config from target release.ao --release-db <path>— optional persistent JSON-backed release store.Acceptance Criteria
Test Output (35 new tests)
Fixes #4260