Image catalog: discover CCV image tags from OCI registries + add CodeCollection visibility#114
Image catalog: discover CCV image tags from OCI registries + add CodeCollection visibility#114stewartshea wants to merge 13 commits into
Conversation
- Added optional visibility field to CodeCollection model, allowing collections to be marked as 'public' or 'hidden'. - Updated codecollections.yaml to include image_source and image_registry fields for better image tracking. - Introduced a new scheduled task for syncing image tags from OCI registries, enhancing image catalog management. - Refactored various database queries to respect visibility settings, ensuring hidden collections are excluded from public-facing endpoints while still being accessible for internal processes. - Enhanced logging to reflect visibility status during collection creation and updates.
Container Images BuiltTag:
|
| and v.version_type == "tag" | ||
| and (stable_tag is None or v.version_name > stable_tag) | ||
| ): | ||
| stable_tag = v.image_tag |
There was a problem hiding this comment.
Stable pointer compares version_name against image_tag string
High Severity
In _entry_pointers, the stable-tag comparison v.version_name > stable_tag compares a version_name (e.g., "v2.0.0") against stable_tag, which holds an image_tag (e.g., "v1.0.0-aabbccd-e4f5a6b"). These are fundamentally different string formats. This causes incorrect ordering — for example, "v10.0.0" > "v9.0.0-abc-def" evaluates to False lexicographically because '1' < '9', so version 10 would never replace version 9 as stable. The fix is to track the winning version_name separately from the winning image_tag.
Reviewed by Cursor Bugbot for commit bc2d684. Configure here.
| deactivated = 0 | ||
| now = datetime.utcnow() | ||
|
|
||
| refs_by_name = {r.ref: r for r in refs} |
There was a problem hiding this comment.
Dict comprehension loses builds, breaks latest marking
High Severity
refs_by_name = {r.ref: r for r in refs} collapses multiple builds sharing the same ref name (e.g., multiple main builds) into one arbitrary entry. Meanwhile, resolve_latest examines the full list and may select a different build. The upserted row's image_tag won't match latest_tag, so is_latest is never set to True. This cascades into _entry_pointers and the /resolve?pointer=latest endpoint, which both rely on is_latest to find the current image — effectively making the latest pointer always return 404.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit bc2d684. Configure here.
Wires the OCI image source into codecollections.yaml for every repo that now has a per-repo GitHub Actions build producing catalog-shaped tags (<sanitized_ref>-<cc_sha7>-<rt_sha7>): - rw-public-codecollection (already registered) - rw-cli-codecollection (already registered) - rw-generic-codecollection (new) - rw-workspace-utils (new) - aws-c7n-codecollection (new) - azure-c7n-codecollection (new) ternary-codecollection is deliberately left without an image_source — no Dockerfile + build-push workflow yet. Documented inline. Once PR #114 is merged, sync_image_tags_task will discover each of these on its 5-minute beat schedule and upsert one CodeCollectionVersion row per discovered ref. Co-authored-by: Cursor <cursoragent@cursor.com>
Container Images BuiltTag:
|
These two byte-identical orphans were left behind by 4e1e0fc (Registry v2 Initial Release). Verified no code path references them: - All Celery tasks (registry_tasks.py, image_sync_tasks.py, sync_tasks.py) load from "/app/codecollections.yaml" (mounted from the repo-root file via docker-compose: ../../codecollections.yaml:/app/codecollections.yaml:ro). - mcp-server/indexer.py resolves via self.base_dir.parent.parent / "codecollections.yaml", which also lands on the repo-root file. - mcp-server/docker-compose.yml mounts the repo-root file at /app. - Frontend/Admin/openapi.yaml all reference "/app/codecollections.yaml". Removing now to prevent future accidental edits to a non-live config. Co-authored-by: Cursor <cursoragent@cursor.com>
`scripts/dry_run_oci_sources.py` exercises the real `app.sources` plugins
against every CC in `codecollections.yaml` without touching the database,
Celery, or FastAPI. For each CC with `image_source` configured it:
1. Loads the source via `get_source(name)`
2. Calls `discover_refs(cc)` against the live registry
3. Calls `resolve_latest(cc, refs)` / `resolve_stable(cc, refs)`
…and prints a per-CC summary plus (with `-v`) every parsed `DiscoveredImageRef`.
Use cases:
- Pre-flight check before flipping a new CC's `image_source` to "oci"
- Catching tag-schema regressions (build workflows changing the suffix
format would show up as zero parsed refs)
- Surfacing transient registry flakiness vs. real misconfiguration
Exit codes are intentionally distinct so this can drop into CI later:
0 = healthy
1 = source raised (network / auth / parser error)
2 = source returned zero refs (likely a tag-schema mismatch)
First run today flagged three useful real-world findings:
- rw-cli, aws-c7n, azure-c7n have only feature-branch / pr-* tags in
GHCR — no `main-<sha>-<rt>` build yet, so `resolve_latest` correctly
returns `(none)`. These will resolve once the open ccv/* PRs merge.
- rw-public, rw-generic, rw-workspace-utils all parse and resolve to
a clean `main-<cc_sha>-<rt_sha>` ref pair, exactly as designed.
No new runtime deps (uses requests + pyyaml that the backend already pins).
Co-authored-by: Cursor <cursoragent@cursor.com>
Container Images BuiltTag:
|
| def resolve_image( | ||
| slug: str, | ||
| pointer: Optional[str] = Query( | ||
| None, regex="^(latest|stable)$", |
There was a problem hiding this comment.
Deprecated regex parameter used instead of pattern
Medium Severity
The Query call uses regex="^(latest|stable)$" but the project uses Pydantic v2 (2.5.0), where this parameter was renamed to pattern. While FastAPI 0.104.1 may still accept regex as a deprecated alias, relying on deprecated behavior risks the validation silently not being applied in a future upgrade, allowing arbitrary strings through to the pointer parameter.
Reviewed by Cursor Bugbot for commit e132d97. Configure here.
- Added steps to retrieve a GitHub App token in both `build-images.yaml` and `release.yaml` workflows. - Updated the token usage in the image update steps to utilize the newly retrieved token instead of the previous Personal Access Token, improving security and access management.
Container Images BuiltTag:
|
1 similar comment
Container Images BuiltTag:
|
Adds cc-registry-v2/docs/CCV.md as the canonical reference for the
PAPI-facing CodeCollection image catalog. Covers:
- End-to-end flow (per-repo CI -> OCI registry -> 5-min poll ->
codecollection_versions -> /api/v1/catalog -> PAPI)
- codecollections.yaml schema (image_source, image_registry,
default_ref, static_path, visibility) with public/hidden semantics
pinned to "UX toggle, not security boundary"
- The <ref>-<cc_sha7>-<rt_sha7> tag-schema contract that build
pipelines must honor, including the parser regex and worked examples
- latest/stable pointer resolution (newest build on default_ref;
highest semver-looking ref with fallback)
- Full API reference with copy-pasteable curl examples for
/codecollections, /{slug}, /{slug}/refs[/{ref}], /{slug}/resolve
- Pluggable sources: oci (default), static (JSON file for self-hosted
catalogs + tests), and CC_REGISTRY_EXTRA_SOURCES for custom plugins
- Sync cadence (5-min Celery beat) + how to trigger a manual run
- Operational tooling: dry-run script with its exit-code contract
- Troubleshooting matrix (zero refs, null latest, transient timeouts,
unknown source) with the actual curl commands to diagnose each
- How PAPI consumes the catalog vs. the previous corestate-operator
flow (CRD-less, one HTTP read per workspace reconcile)
Also threads the new doc through both indexes (repo-root README and
cc-registry-v2/docs/README.md).
Co-authored-by: Cursor <cursoragent@cursor.com>
Container Images BuiltTag:
|
Two operability fixes prompted by hitting registry-test:
1) /docs / /redoc / /openapi.json were unreachable through the production
ingress. The frontend SPA owns /, and only /api/* routes to the
backend, so FastAPI's default docs URLs (mounted at the container root)
were being served as the SPA's index.html. Move them under /api/:
- docs_url -> /api/docs
- redoc_url -> /api/redoc
- openapi_url -> /api/openapi.json
- hand-written -> /api/openapi.yaml
The local-dev URLs change correspondingly (localhost:8001/api/docs);
updated cc-registry-v2/README.md, start.sh, and docs/CONFIGURATION.md
to match. MCP server (port 8000) is unaffected — separate FastAPI app.
2) Added POST /api/v1/tasks/sync-image-tags so operators can kick the CCV
image catalog sync on demand instead of waiting up to 5 minutes for the
Celery beat. Mirrors the existing /sync-collections pattern (admin
bearer auth, returns the Celery task_id for status polling). This is
the same task the scheduler runs every 5 min — useful immediately
after a deploy or when validating a new image_source config.
CCV.md updated with both the live Swagger UI URL and the new curl recipe.
Co-authored-by: Cursor <cursoragent@cursor.com>
Container Images BuiltTag:
|
- Updated the logic for constructing the REDIS_URL to prioritize Redis Sentinel when REDIS_SENTINEL_HOSTS is set, ensuring proper handling of both REDIS_URL and Sentinel configurations. - Enhanced the _configure_broker_url function to clarify precedence rules and added guardrails to prevent misconfigurations that could lead to connection errors. - Improved logging to provide clearer information about the chosen Redis configuration, aiding in troubleshooting and deployment clarity.
Container Images BuiltTag:
|
|
|
||
| def is_public(cc: CodeCollection) -> bool: | ||
| """Predicate version for code paths that already have a loaded row.""" | ||
| return (cc.visibility or PUBLIC_VISIBILITY) == PUBLIC_VISIBILITY |
There was a problem hiding this comment.
Unused exports in new visibility module
Low Severity
HIDDEN_VISIBILITY and is_public are defined in visibility.py but never imported or referenced anywhere else in the codebase. They are dead code that adds confusion about how visibility filtering is meant to be used.
Reviewed by Cursor Bugbot for commit 2fd8dfe. Configure here.
- Improved error handling in `load_schedules_from_yaml` to manage empty or comment-only YAML files, preventing crashes and ensuring a default empty schedule is used. - Updated the Flower deployment configuration to utilize the backend image, streamlining the broker URL setup and eliminating redundant shell commands for Redis Sentinel configuration. - Simplified the command for starting Flower, enhancing clarity and maintainability of the deployment script.
Container Images BuiltTag:
|
| if name not in refs_by_name and row.is_active: | ||
| row.is_active = False | ||
| row.updated_at = now | ||
| deactivated += 1 |
There was a problem hiding this comment.
Image sync deactivates all non-image CCV rows
Medium Severity
_upsert_versions queries all CodeCollectionVersion rows for a CC and deactivates any whose version_name isn't in the discovered image refs. The CodeCollectionVersion model has relationships to VersionCodebundle and RawRepositoryData, indicating other processes may create CCV rows for non-image purposes (git branch/tag tracking). Additionally, if the OCI registry returns a valid but empty tag list (e.g., during maintenance or before a CC's first build), every existing CCV row would be deactivated, wiping out the catalog state until the next successful sync.
Reviewed by Cursor Bugbot for commit fd4a8a5. Configure here.
- Refactored exception handling in multiple Celery tasks to utilize `logger.exception`, capturing full tracebacks and ensuring tasks are marked as FAILURE in case of errors. - Removed error message returns in favor of raising exceptions, aligning with Celery's error handling practices. - Updated YAML data loading tasks to raise exceptions for missing configurations, preventing silent failures and improving task reliability. - Added a new AdminCCVersions component to the frontend for better management of CodeCollection versions.
Container Images BuiltTag:
|
- Removed the legacy task management endpoints and integrated task triggering through the new schedules API, enhancing clarity and maintainability. - Updated the application to utilize environment-driven configuration for YAML file paths, allowing for directory-mounted ConfigMaps in Kubernetes, which auto-propagate updates without requiring pod restarts. - Cleaned up the frontend by removing the Task Manager page and adjusting navigation, as task triggering is now handled within the Schedules tab. - Enhanced documentation to reflect the new task management flow and configuration settings.
Container Images BuiltTag:
|
- Deleted the `update-statistics-hourly` task from the schedules and YAML configurations, as it was deemed unnecessary. - Removed references to the task in the backend code, including the Celery task imports and API responses. - Updated documentation to reflect the removal of the hourly statistics update from the scheduling and architecture sections.
Container Images BuiltTag:
|
| include_inactive: bool = Query(False), | ||
| db: Session = Depends(get_db), | ||
| ) -> list[ImageRef]: | ||
| cc = db.query(CodeCollection).filter(CodeCollection.slug == slug).first() |
There was a problem hiding this comment.
Deactivated collections returned by refs endpoints inconsistently
Medium Severity
list_refs and get_ref query CodeCollection without filtering on is_active, while every other catalog endpoint (get_catalog_entry, resolve_image, list_catalog) enforces is_active.is_(True). This means a soft-deleted CC returns a 404 from GET /codecollections/{slug} and GET /resolve, but happily returns refs from GET /refs and GET /refs/{ref}. PAPI could observe refs for a CC it cannot resolve, and the inconsistency makes deactivation of a CC non-atomic from the API consumer's perspective.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit 57fa08d. Configure here.
- Expanded the documentation in `analytics_tasks.py` to clarify the purpose and methodology of the task growth analytics computation. - Implemented a new function to extract task names from Robot Framework files, improving the accuracy of task attribution based on git history. - Updated the `compute_task_growth_analytics` function to refine the algorithm for calculating monthly cumulative task counts, ensuring recent additions are accurately reflected in the growth metrics. - Enhanced logging and error handling to improve the reliability of the analytics computation process.
Container Images BuiltTag:
|
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
There are 7 total unresolved issues (including 6 from previous reviews).
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit cd5d9d8. Configure here.
| app-id: ${{ secrets.CI_GITHUB_APP_ID }} | ||
| private-key: ${{ secrets.CI_GITHUB_APP_PRIVATE_KEY }} | ||
| owner: runwhen | ||
| repositories: infra-flux-nonprod-shared |
There was a problem hiding this comment.
Prod deploy scoped to nonprod infrastructure repository
Medium Severity
The deploy-to-prod job creates a GitHub App token scoped to infra-flux-nonprod-shared and dispatches update-registry-prod-images.yaml against that same nonprod repo. If production infrastructure lives in a separate infra-flux-prod repo, this token won't have access. The old PAT likely had broader permissions, so switching to the narrowly-scoped App token may have inadvertently locked prod deploys to the nonprod repo.
Reviewed by Cursor Bugbot for commit cd5d9d8. Configure here.


Summary
Adds the image catalog layer to
cc-registry-v2so PAPI can resolve a CodeCollection ref (branch / tag / sha) → concrete OCI image, plus avisibilityfield on CodeCollections so we can hide entries from the public registry UI without losing their CCV listings.Supersedes #113, which had drifted ~3 months from
mainand was carrying 4 commits' worth of intake-wizard work that was independently re-done and merged via #60. Cherry-picked only the catalog commit onto currentmain— clean apply, no conflicts (the intake router and the newcc_catalogrouter add separateinclude_routerlines inmain.py).What's added
Pluggable image sources (
cc-registry-v2/backend/app/sources/)base.ImageSourceABC +DiscoveredImageRefdataclassoci.OCISource— talks OCI Distribution v2 against any registry (GHCR by default); parses the codecollection tag schema<ref>-<cc_sha7>-<rt_sha7>; resolveslatest/stablepointersstatic.StaticSource— JSON-backed source for tests and pinned catalogsregistry.SOURCE_REGISTRY+CC_REGISTRY_EXTRA_SOURCESenv var for plugin loadingBackground polling
app/tasks/image_sync_tasks.py:sync_image_tags_task— Celery task that loadscodecollections.yaml, walks each CC's configured source, and upsertsCodeCollectionVersionrows with the image metadata (registry, tag, digest, commit_hash, rt_revision, built_at).schedules.yaml— new entry runs the sync every 5 minutes.Read-only catalog API (
cc-registry-v2/backend/app/routers/cc_catalog.py)GET /api/v1/catalog/codecollectionsGET /api/v1/catalog/codecollections/{slug}GET /api/v1/catalog/codecollections/{slug}/refsGET /api/v1/catalog/codecollections/{slug}/refs/{ref}GET /api/v1/catalog/codecollections/{slug}/resolveIntentionally bypasses the
visibilityfilter — PAPI is an internal consumer and needs to see hidden collections too.CodeCollection visibility
visibilitycolumn oncodecollections(publicdefault,hiddento exclude from public-facing endpoints).image_*columns oncodecollection_versionspopulated by the sync task.004_add_image_metadata_and_visibility.py.app/core/visibility.py::public_only()predicate applied uniformly across the public-audience endpoints (/api/v1/registry/collections,/api/v1/registry/tasks,/api/v1/codebundles, recent-codebundles, recent-tasks, stats) and across allversions.routercollection queries.app/tasks/registry_tasks.py:sync_all_collections_tasknow readsvisibilityfromcodecollections.yamland persists it.Config schema —
codecollections.yamldocuments the newvisibility,image_source,image_registry,default_reffields with examples for the codecollections that now have CCV-build workflows (rw-cli, aws-c7n, azure-c7n, rw-generic, rw-workspace-utils, rw-public).Test plan
alembic upgrade headapplies migration 004 cleanly on a fresh DBalembic downgrade -1reverts it cleanlysync_image_tags_taskon the 5-minute beat schedulecodecollection_versionsrows for the 6 repos with CCV builds (see related PRs: aws-c7n#33, azure-c7n farm#TBD, rw-cli #pending, rw-generic, rw-workspace-utils, rw-public — all built today)GET /api/v1/catalog/codecollections/rw-cli-codecollection/refsreturns themainandpr-*aliases plus the canonical immutable tagsGET /api/v1/catalog/codecollections/rw-cli-codecollection/resolve?ref=mainreturns the digest of the latest main buildvisibility: hiddenon a CC incodecollections.yamlhides it from/api/v1/registry/collectionsbut it still shows in/api/v1/catalog/codecollectionsMade with Cursor
Note
Medium Risk
Adds new DB columns/migration plus a scheduled Celery sync and new PAPI-facing API endpoints; mistakes could break catalog resolution or inadvertently hide/show collections in public endpoints.
Overview
Adds a PAPI-facing image catalog that mirrors OCI image tags into
codecollection_versionsand exposes read-only/api/v1/catalog/...endpoints (includingresolve) to translate CodeCollection refs/pointers to concrete image tags.Introduces CodeCollection
visibility(public/hidden) with a sharedpublic_only()filter and applies it across public registry/versions/stats/task listing endpoints so hidden collections are treated as 404/omitted while remaining available to the catalog.Implements a pluggable image source system (
oci+static) and a new scheduled Celery tasksync_image_tags_task(every 5 minutes) to discover refs and upsert/deactivate version rows, plus updates config handling (env-driven YAML paths), moves Swagger/OpenAPI under/api/*, improves Redis Sentinel precedence/guardrails, refines task-growth analytics attribution, and removes the legacy tasks router/UI in favor of admin tabs.CI deploy workflows now use a GitHub App token instead of a PAT to trigger downstream image update workflows, and docs are updated to reflect the catalog and new docs URL paths.
Reviewed by Cursor Bugbot for commit cd5d9d8. Bugbot is set up for automated code reviews on this repo. Configure here.