Skip to content

Cloud Deploy#661

Open
epompeii wants to merge 7 commits intocloudfrom
devel
Open

Cloud Deploy#661
epompeii wants to merge 7 commits intocloudfrom
devel

Conversation

@epompeii
Copy link
Member

@epompeii epompeii commented Feb 6, 2026

Deploy to Bencher Cloud

This changeset adds an OCI compliant registry to the Bencher API server.
Bencher Cloud will direct folks to use `registry.bencher.dev` to push images up.
Images will be pulled by the forthcoming Bencher Bare Metal Runner.
- Rename `oci` to `registry`.
- Fix S3 upload chunks race condition.
- Move `busy_timeout` from `plus.disaster_recovery` to `database`
section of the config.
- Auto-scale dev deployment for OCI conformance tests in CI
- OCI tag pagination
- Refactor OCI authentication logic
@epompeii epompeii self-assigned this Feb 6, 2026
@epompeii epompeii added the claude label Feb 6, 2026
@github-actions
Copy link

github-actions bot commented Feb 6, 2026

🤖 Claude Code Review

PR: #661
Base: cloud
Head: devel
Commit: 353f743a19bd782224f58278b07f714921baa729


I now have a comprehensive view of this PR. Let me compile the review.

Pull Request Review: Add OCI Distribution Spec Compliant Container Registry

Summary

This is a large feature PR (~11,000 lines added) that implements an OCI Distribution Spec compliant container registry as a Bencher Plus feature. The registry allows storing and distributing container images mapped to Bencher projects.


Overall Assessment

The implementation is well-structured with clear separation of concerns. The code follows project conventions and includes comprehensive tests. However, there are some areas that warrant attention.


Code Quality and Best Practices

Strengths

  1. Well-documented modules - Each module has clear documentation explaining its purpose (plus/api_oci/src/*.rs, plus/bencher_oci_storage/src/*.rs)

  2. Proper error handling - Uses thiserror for structured errors with OCI spec-compliant error codes (plus/bencher_oci_storage/src/error.rs:17-53)

  3. Comprehensive test coverage - Integration tests for all major endpoints (base, blobs, manifests, referrers, tags, uploads)

  4. Follows CLAUDE.md guidelines:

    • Uses #[expect(...)] instead of #[allow(...)] with reasons
    • Dependencies in workspace Cargo.toml
    • Feature flags properly structured
  5. Good separation of storage backends - Clean abstraction between S3 and local filesystem storage (plus/bencher_oci_storage/src/storage.rs)

Concerns

  1. Rate limiting change (lib/bencher_schema/src/context/rate_limiting/mod.rs:174-176):

    // Changed from Self::default() to Self::max()
    Self::max()

    This silently changes rate limiting behavior for self-hosted deployments to unlimited. The old todo comment indicated this was intentional for testing, but the change may have unintended consequences.

  2. SQLite PRAGMAs now set unconditionally (lib/bencher_config/src/config_tx.rs:187-197):

    • Previously, WAL mode and busy_timeout were only set when Litestream was configured
    • Now applied to all deployments, which is generally good but changes existing behavior
    • The busy_timeout configuration option was moved from litestream to database section

Security Concerns

Strengths

  1. Proper authentication flow - Uses separate OCI audience JWT tokens with scoped permissions (lib/bencher_token/src/audience.rs:15)

  2. RBAC integration - Pull requires admin privileges, push requires Create permission on claimed projects (lib/api_auth/src/oci/mod.rs:100-160)

  3. Rate limiting applied to OCI operations (plus/api_oci/src/auth.rs:405-435)

Concerns

  1. Upload session authentication (plus/api_oci/src/uploads.rs:44-49):

    "These endpoints do NOT require Bearer token authentication. The session ID itself serves as authentication"

    While UUID session IDs are unguessable, this pattern relies on UUID entropy. This is documented as intentional for OCI spec conformance, but worth noting.

  2. CORS wildcard - All OCI endpoints use Access-Control-Allow-Origin: * which is standard for registries but worth mentioning.

  3. Token validation error suppression (lib/api_auth/src/oci/mod.rs:65-66):

    #[expect(clippy::map_err_ignore, reason = "Intentionally discarding API key validation error for security")]

    Good - avoids leaking information about why authentication failed.


Performance Implications

Strengths

  1. Streaming blob downloads (plus/api_oci/src/blobs.rs:96-101):

    "Streams the blob content rather than loading it entirely into memory"

  2. Chunked upload support for large blobs (plus/api_oci/src/uploads.rs:30-40)

  3. Background stale upload cleanup (plus/bencher_oci_storage/src/local.rs:708-716)

  4. S3 multipart uploads with buffered chunks (plus/bencher_oci_storage/src/storage.rs:342-377)

Concerns

  1. Monolithic upload memory limitation (plus/api_oci/src/blobs.rs:154-168):

    "Dropshot's UntypedBody buffers the entire request into memory"

    This is well-documented and clients are directed to use chunked uploads for large blobs.

  2. Local storage tag listing (plus/bencher_oci_storage/src/local.rs:981-984):

    "For local storage, we must read all directory entries first, then apply sorting and pagination."

    Less efficient than S3 for large repositories, but acceptable for development use.


Potential Bugs

  1. OpenAPI spec regeneration - The services/api/openapi.json shows 1179+ lines changed. Per CLAUDE.md, this should be committed after running cargo gen-types. Verify this was done.

  2. Feature flag dependencies - Multiple crates have complex feature flag configurations (e.g., lib/api_auth/Cargo.toml:12-21). Ensure cargo check --no-default-features passes as required by CLAUDE.md.

  3. Database busy_timeout migration - The busy_timeout config option moved from plus.litestream.busy_timeout to database.busy_timeout. Existing configs using the old location will silently use the default (5 seconds). Consider logging a deprecation warning.


Compliance with Project Standards

  1. Dockerfiles updated ✅ - services/api/Dockerfile includes new crates

  2. Type generation - Verify cargo gen-types was run after API changes

  3. CI workflows updated ✅ - Added Go setup for OCI conformance tests, updated test workflow

  4. Documentation ✅ - Added registry configuration docs in multiple languages


Recommendations

  1. Consider adding deprecation warning for old busy_timeout location in litestream config

  2. Add integration test for the rate limiting change to verify self-hosted behavior

  3. Document the security model for OCI registry in user-facing docs (which projects can push to which registries, claimed vs unclaimed)

  4. Verify that cargo check --no-default-features and cargo gen-types pass


Files of Note

File Significance
plus/api_oci/src/auth.rs Core authentication logic with claimed/unclaimed project handling
plus/bencher_oci_storage/src/storage.rs S3 storage implementation (~2000 lines)
lib/bencher_token/src/claims.rs New OCI-specific JWT claims
lib/bencher_config/src/config_tx.rs Database configuration changes

Model: claude-opus-4-5-20251101

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant