Skip to content

fix: route dataset manifest and file preview through service for private buckets#795

Closed
KeitaW wants to merge 3 commits intoNVIDIA:mainfrom
KeitaW:fix/private-bucket-manifest-793
Closed

fix: route dataset manifest and file preview through service for private buckets#795
KeitaW wants to merge 3 commits intoNVIDIA:mainfrom
KeitaW:fix/private-bucket-manifest-793

Conversation

@KeitaW
Copy link
Copy Markdown
Contributor

@KeitaW KeitaW commented Apr 3, 2026

Summary

Fixes #793 — dataset file browser and preview broken on private S3 buckets.

  • Added GET /{bucket}/dataset/{name}/manifest service endpoint that reads manifest JSON from storage using bucket_config.default_credential (supports S3, GCS, Azure, Swift, TOS)
  • Added GET /{bucket}/dataset/{name}/file-content service endpoint that streams individual file content with storage path validation
  • Updated UI to fetch manifests via server action calling the backend directly (bypasses API gateway auth, works during SSR)
  • Updated file preview proxy to route through the service endpoint when storagePath is available

Test plan

  • Verify file browser loads on a dataset with a private S3 bucket
  • Verify file preview (image, video, text) works on private bucket files
  • Verify file browser still works on public bucket datasets (backward compat)
  • UI checks: pnpm type-check && pnpm lint && pnpm test --run — all passing (751/751)
  • Mock mode: pnpm dev:mock → navigate to dataset detail → file browser loads

Summary by CodeRabbit

Release Notes

  • New Features

    • Added API endpoints for fetching dataset manifests and streaming file content.
    • Enhanced file preview panel with improved dataset access handling and validation.
  • Improvements

    • Streamlined dataset identification using bucket, name, and version for more consistent access patterns across the platform.

KeitaW added 3 commits April 3, 2026 00:58
…t private buckets (NVIDIA#793)

The UI's fetchManifest and file preview proxy performed unsigned fetch()
against S3 HTTPS URLs, which fails with 403 on private buckets.

Added two service-side proxy endpoints:
- GET /{bucket}/dataset/{name}/manifest — reads manifest JSON from storage
  using bucket credentials (supports S3, GCS, Azure, Swift, TOS)
- GET /{bucket}/dataset/{name}/file-content — streams individual file
  content with storage_path validation against dataset container

Updated UI to call these endpoints through the existing /api catch-all
proxy instead of direct unsigned fetch. Removed unused fetchManifest
server action files.
The file preview panel sends a HEAD request before GET to check
content-type and access. FastAPI's @router.get does not handle HEAD,
returning 405 Method Not Allowed. Changed to @router.api_route with
both GET and HEAD methods.
fetchDatasetFiles used a relative fetch('/api/bucket/...') which fails
during SSR because the request goes through the proxy without browser
auth cookies, resulting in 403 from the API gateway.

Re-introduced the server action pattern: fetchManifest now calls the
backend service directly using getServerApiBaseUrl() (internal URL),
bypassing the auth gateway. Works for both SSR and client hydration.
@KeitaW KeitaW requested a review from a team as a code owner April 3, 2026 00:59
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 3, 2026

📝 Walkthrough

Walkthrough

The PR adds backend API endpoints for fetching dataset manifests and file content, shifting manifest and file access from direct unsigned frontend fetches against storage to authenticated service-proxied requests that resolve private bucket access failures.

Changes

Cohort / File(s) Summary
Backend API Endpoints
src/service/core/data/data_service.py
Added GET /{bucket}/dataset/{name}/manifest endpoint returning manifest JSON from validated READY version, and GET/HEAD /{bucket}/dataset/{name}/file-content endpoint streaming file bytes with storage path container validation.
Frontend Proxy Route
src/ui/src/app/proxy/dataset/file/route.impl.production.ts
Changed request parameter parsing from url-only to bucket/name/storagePath; replaced direct storage fetch with fetchUpstream service call; added response header forwarding via forwardResponseHeaders.
Frontend Components & Hooks
src/ui/src/features/datasets/detail/components/dataset-detail-content.tsx, src/ui/src/features/datasets/detail/components/file-preview-panel.tsx, src/ui/src/lib/api/adapter/datasets-hooks.ts, src/ui/src/lib/api/adapter/datasets.ts
Resolved and passed dataset identity (bucket, name, version) through component hierarchy and hook calls; refactored manifest fetching to accept these parameters instead of deriving from location alone; updated proxy URL construction to use bucket/name/storagePath.
Production Manifest Fetcher
src/ui/src/lib/api/server/dataset-actions.production.ts
Updated fetchManifest signature from (url) to (bucket, name, version); constructs and calls backend API endpoint using getServerApiBaseUrl instead of performing unsigned fetch.
Configuration & Cleanup
src/ui/next.config.ts, src/ui/src/lib/api/server/dataset-actions.ts
Removed production alias redirect for dataset-actions in Turbopack config; deleted legacy dataset-actions wrapper module (was conditional mock/production selector).
Mock Handlers
src/ui/src/mocks/handlers.ts
Added MSW handler for /api/bucket/:bucket/dataset/:name/manifest returning generated flat manifest; updated /proxy/dataset/file handler to parse both new (storagePath, name) and legacy (url) parameter schemes.

Sequence Diagram

sequenceDiagram
    actor Browser as Browser/UI
    participant Proxy as Frontend<br/>Proxy Route
    participant Service as Backend<br/>Service
    participant Storage as Object<br/>Storage (S3)

    Note over Browser,Storage: New Flow: Service-Proxied Authenticated Access
    
    Browser->>Proxy: GET /proxy/dataset/file?bucket=X&name=Y&storagePath=...
    Proxy->>Service: GET /api/bucket/X/dataset/Y/file-content?storage_path=...
    Note over Service: Validate storage_path container<br/>matches dataset hash_location
    Service->>Storage: fetch(signed_url_or_creds)
    Storage-->>Service: File bytes
    Service-->>Proxy: StreamingResponse<br/>(inferred mime-type)
    Proxy-->>Browser: Forwarded response<br/>with headers
    
    rect rgba(100, 200, 100, 0.5)
        Note over Browser,Storage: Manifest Flow
        Browser->>Proxy: Manifest request
        Proxy->>Service: GET /api/bucket/X/dataset/Y/manifest?version=V
        Service->>Storage: Load manifest from version location
        Storage-->>Service: JSON manifest
        Service-->>Browser: Parsed manifest
    end
Loading

Estimated Code Review Effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Poem

🐰 From unsigned fetches fraught with 403s,
We hop through service gates with verified keys.
Manifests bloom, files stream with grace,
Private buckets now find their safe place. ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 45.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title accurately describes the main change: routing dataset manifest and file preview through the service to support private buckets.
Linked Issues check ✅ Passed The PR fully addresses issue #793 by implementing service endpoints for manifest and file-content retrieval that bypass unsigned fetches, and updating the UI to route through these authenticated endpoints.
Out of Scope Changes check ✅ Passed All changes are directly scoped to fixing issue #793: adding backend service endpoints, updating UI routing, and removing obsolete client-side code. No unrelated modifications detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
⚔️ Resolve merge conflicts
  • Resolve merge conflict in branch fix/private-bucket-manifest-793

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/service/core/data/data_service.py`:
- Around line 1032-1036: The current check only compares
requested_backend.container to dataset_backend.container (constructed via
storage.construct_storage_backend), which allows any path in the same container;
instead validate that the supplied storage_path is within the dataset's storage
prefix by ensuring storage_path (or the backend's full path) starts with
dataset_info.hash_location (or its normalized prefix) before proceeding; if it
does not, raise osmo_errors.OSMOUserError with a clear message. Normalize both
paths (resolve trailing slashes, case/URL encoding as appropriate) prior to the
startswith check and keep the container equality check as a fast precondition.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: e0ab25ae-7964-4301-9a03-e53a0d59233a

📥 Commits

Reviewing files that changed from the base of the PR and between 762ea59 and 2b0c696.

📒 Files selected for processing (10)
  • src/service/core/data/data_service.py
  • src/ui/next.config.ts
  • src/ui/src/app/proxy/dataset/file/route.impl.production.ts
  • src/ui/src/features/datasets/detail/components/dataset-detail-content.tsx
  • src/ui/src/features/datasets/detail/components/file-preview-panel.tsx
  • src/ui/src/lib/api/adapter/datasets-hooks.ts
  • src/ui/src/lib/api/adapter/datasets.ts
  • src/ui/src/lib/api/server/dataset-actions.production.ts
  • src/ui/src/lib/api/server/dataset-actions.ts
  • src/ui/src/mocks/handlers.ts
💤 Files with no reviewable changes (2)
  • src/ui/next.config.ts
  • src/ui/src/lib/api/server/dataset-actions.ts

Comment on lines +1032 to +1036
requested_backend = storage.construct_storage_backend(storage_path)
dataset_backend = storage.construct_storage_backend(dataset_info.hash_location)
if requested_backend.container != dataset_backend.container:
raise osmo_errors.OSMOUserError(
'Storage path does not belong to this dataset.')
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Container-only validation may be overly permissive.

The validation only checks that requested_backend.container matches dataset_backend.container. This allows access to any file within the same container, not just files belonging to this specific dataset's storage prefix. Consider validating that storage_path starts with dataset_info.hash_location to restrict access to only the dataset's files.

🛡️ Proposed stricter path validation
     # Validate that the storage path belongs to this dataset's storage prefix
     requested_backend = storage.construct_storage_backend(storage_path)
     dataset_backend = storage.construct_storage_backend(dataset_info.hash_location)
-    if requested_backend.container != dataset_backend.container:
+    if requested_backend.container != dataset_backend.container or \
+       not storage_path.startswith(dataset_info.hash_location.rstrip('/')):
         raise osmo_errors.OSMOUserError(
             'Storage path does not belong to this dataset.')
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
requested_backend = storage.construct_storage_backend(storage_path)
dataset_backend = storage.construct_storage_backend(dataset_info.hash_location)
if requested_backend.container != dataset_backend.container:
raise osmo_errors.OSMOUserError(
'Storage path does not belong to this dataset.')
requested_backend = storage.construct_storage_backend(storage_path)
dataset_backend = storage.construct_storage_backend(dataset_info.hash_location)
if requested_backend.container != dataset_backend.container or \
not storage_path.startswith(dataset_info.hash_location.rstrip('/')):
raise osmo_errors.OSMOUserError(
'Storage path does not belong to this dataset.')
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/service/core/data/data_service.py` around lines 1032 - 1036, The current
check only compares requested_backend.container to dataset_backend.container
(constructed via storage.construct_storage_backend), which allows any path in
the same container; instead validate that the supplied storage_path is within
the dataset's storage prefix by ensuring storage_path (or the backend's full
path) starts with dataset_info.hash_location (or its normalized prefix) before
proceeding; if it does not, raise osmo_errors.OSMOUserError with a clear
message. Normalize both paths (resolve trailing slashes, case/URL encoding as
appropriate) prior to the startswith check and keep the container equality check
as a fast precondition.

cypres added a commit that referenced this pull request May 5, 2026
…oints

#795 was authored against an older Client.get_object_stream() shape that
accepted no positional args. On main, the public overloads now require a
remote_path argument. Single-object fetches should go through
storage.SingleObjectClient (already used by cli/dataset.py and
lib/data/dataset/downloading.py) — its get_object_stream() takes no path
because the path is bound at create() time from storage_uri.

mypy on data_service.py now passes; behavior is unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
cypres added a commit that referenced this pull request May 5, 2026
The conflict resolution when cherry-picking #795 left a multi-line
generateFlatManifest call that prettier wants on one line under the
project's print-width settings. Pure formatting; no behavior change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Hans Arnholm <harnholm@nvidia.com>
cypres added a commit that referenced this pull request May 5, 2026
Four review-driven fixes that all touched the dataset preview path:

1. parse_uri_to_link preserves override_url base path (CAIOS via reverse
   proxy). Previously dropped parsed.path; an override of
   'https://gateway.example.com/s3' produced URLs against
   gateway.example.com with no /s3 segment. Now host + base_path are
   preserved through both addressing styles, including the
   no-scheme-prefix case where urlparse drops the whole input into 'path'.

2. get_file_content validates against dataset hash_location prefix, not
   just the bucket container. This was the unresolved CodeRabbit feedback
   on #795 — container-only matching let an authenticated caller request
   any object in the same bucket via the dataset endpoint.

3. Forward Range request and accept-ranges/content-range response headers
   through the proxy so 206 Partial Content semantics survive (needed for
   video/large-file preview).

4. Remove the legacy 'url=' fallback from the proxy. It accepted any
   http(s) URL — a clear SSRF vector (cloud metadata endpoints, RFC1918,
   loopback). Manifests since #795 always carry storage_path; the legacy
   path was effectively dead. UI's file-preview-panel updated to drop the
   matching client-side fallback.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Hans Arnholm <harnholm@nvidia.com>
cypres added a commit that referenced this pull request May 5, 2026
The conflict resolution when cherry-picking #795 left a multi-line
generateFlatManifest call that prettier wants on one line under the
project's print-width settings. Pure formatting; no behavior change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Hans Arnholm <harnholm@nvidia.com>
cypres added a commit that referenced this pull request May 5, 2026
Four review-driven fixes that all touched the dataset preview path:

1. parse_uri_to_link preserves override_url base path (CAIOS via reverse
   proxy). Previously dropped parsed.path; an override of
   'https://gateway.example.com/s3' produced URLs against
   gateway.example.com with no /s3 segment. Now host + base_path are
   preserved through both addressing styles, including the
   no-scheme-prefix case where urlparse drops the whole input into 'path'.

2. get_file_content validates against dataset hash_location prefix, not
   just the bucket container. This was the unresolved CodeRabbit feedback
   on #795 — container-only matching let an authenticated caller request
   any object in the same bucket via the dataset endpoint.

3. Forward Range request and accept-ranges/content-range response headers
   through the proxy so 206 Partial Content semantics survive (needed for
   video/large-file preview).

4. Remove the legacy 'url=' fallback from the proxy. It accepted any
   http(s) URL — a clear SSRF vector (cloud metadata endpoints, RFC1918,
   loopback). Manifests since #795 always carry storage_path; the legacy
   path was effectively dead. UI's file-preview-panel updated to drop the
   matching client-side fallback.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Hans Arnholm <harnholm@nvidia.com>
@cypres
Copy link
Copy Markdown
Member

cypres commented May 6, 2026

Thank you for the contribution, it's cherry-picked into #957

@cypres cypres closed this May 6, 2026
@KeitaW
Copy link
Copy Markdown
Contributor Author

KeitaW commented May 6, 2026

Thanks for working on it @cypres !

cypres added a commit that referenced this pull request May 6, 2026
* fix(storage): build dataset URLs against override_url for S3-compatible backends

S3Backend.parse_uri_to_link hardcoded the AWS pattern
'https://<bucket>.s3.<region>.amazonaws.com/...' regardless of credential
override_url. The UI consumes these URLs to fetch dataset content; for a
CAIOS-backed bucket this produced
'osmo-on-cw-dev-harnholm-datasets.s3.us-east-14a.amazonaws.com', which
ENOTFOUNDs in DNS.

Thread the credential's override_url and addressing_style through
parse_uri_to_link. When override_url is set, S3Backend builds a URL
against that host (virtual-host by default; addressing_style='path'
yields path-style for localstack/MinIO without wildcard DNS). AWS S3
behavior (no override_url) is unchanged. Other backends accept the
new kwargs but ignore them.

Updated callsites:
  - data_service.py: dataset listing payload now respects bucket
    config's default_credential.override_url. This is the immediate
    UI fix.
  - dataset/uploading.py (×2) + migrating.py: new uploads/migrations
    record the correct URL in the manifest. Existing manifest entries
    persisted with AWS-pattern URLs are not migrated (separate concern).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: route dataset file browser and preview through service to support private buckets (#793)

The UI's fetchManifest and file preview proxy performed unsigned fetch()
against S3 HTTPS URLs, which fails with 403 on private buckets.

Added two service-side proxy endpoints:
- GET /{bucket}/dataset/{name}/manifest — reads manifest JSON from storage
  using bucket credentials (supports S3, GCS, Azure, Swift, TOS)
- GET /{bucket}/dataset/{name}/file-content — streams individual file
  content with storage_path validation against dataset container

Updated UI to call these endpoints through the existing /api catch-all
proxy instead of direct unsigned fetch. Removed unused fetchManifest
server action files.

* fix: support HEAD method on file-content endpoint for preview preflight

The file preview panel sends a HEAD request before GET to check
content-type and access. FastAPI's @router.get does not handle HEAD,
returning 405 Method Not Allowed. Changed to @router.api_route with
both GET and HEAD methods.

* fix: use server action for manifest fetch to avoid SSR auth failure

fetchDatasetFiles used a relative fetch('/api/bucket/...') which fails
during SSR because the request goes through the proxy without browser
auth cookies, resulting in 403 from the API gateway.

Re-introduced the server action pattern: fetchManifest now calls the
backend service directly using getServerApiBaseUrl() (internal URL),
bypassing the auth gateway. Works for both SSR and client hydration.

* fix(service): use SingleObjectClient for manifest + file-content endpoints

accepted no positional args. On main, the public overloads now require a
remote_path argument. Single-object fetches should go through
storage.SingleObjectClient (already used by cli/dataset.py and
lib/data/dataset/downloading.py) — its get_object_stream() takes no path
because the path is bound at create() time from storage_uri.

mypy on data_service.py now passes; behavior is unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Hans Arnholm <harnholm@nvidia.com>

* fix(ui): forward auth headers in SSR manifest fetch

The server-action call to /api/bucket/{bucket}/dataset/{name}/manifest
went out without the incoming request's session cookie, so oauth2-proxy
returned 401 with "No valid authentication in request" and the dataset
detail page crashed with "Failed to fetch manifest: 401".

getServerFetchHeaders() already exists in config.production.ts for
exactly this purpose — it pulls authorization + cookie from the SSR
request via next/headers and forwards them upstream. Use it.

The original docstring claimed the server action would "bypass the API
gateway auth layer" by hitting an internal service URL, but
getServerApiBaseUrl() returns the public hostname (Envoy + oauth2-proxy
front), so requests still pass through the gateway. Forwarding the
caller's session is the simpler fix and doesn't require a new internal
URL knob in deployment config.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Hans Arnholm <harnholm@nvidia.com>

* fix(ui): forward filename to file-content endpoint for Content-Type guessing

storage_path is hash-keyed (e.g. .../hashes/<etag>) and carries no
extension. mimetypes.guess_type(storage_path) returned None, the service
fell back to application/octet-stream, and the preview panel rendered
every text file as a binary download.

Forward the original relative_path through the proxy as a 'filename'
query param. The service uses it for media-type guessing only; access
control still hinges on storage_path's container check. Falls through
cleanly when filename is absent (preserves current behavior for any
legacy caller).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Hans Arnholm <harnholm@nvidia.com>

* chore(ui): prettier-format manifest mock handler

The conflict resolution when cherry-picking #795 left a multi-line
generateFlatManifest call that prettier wants on one line under the
project's print-width settings. Pure formatting; no behavior change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Hans Arnholm <harnholm@nvidia.com>

* fix: dataset proxy hardening and override URL prefix support

Four review-driven fixes that all touched the dataset preview path:

1. parse_uri_to_link preserves override_url base path (CAIOS via reverse
   proxy). Previously dropped parsed.path; an override of
   'https://gateway.example.com/s3' produced URLs against
   gateway.example.com with no /s3 segment. Now host + base_path are
   preserved through both addressing styles, including the
   no-scheme-prefix case where urlparse drops the whole input into 'path'.

2. get_file_content validates against dataset hash_location prefix, not
   just the bucket container. This was the unresolved CodeRabbit feedback
   on #795 — container-only matching let an authenticated caller request
   any object in the same bucket via the dataset endpoint.

3. Forward Range request and accept-ranges/content-range response headers
   through the proxy so 206 Partial Content semantics survive (needed for
   video/large-file preview).

4. Remove the legacy 'url=' fallback from the proxy. It accepted any
   http(s) URL — a clear SSRF vector (cloud metadata endpoints, RFC1918,
   loopback). Manifests since #795 always carry storage_path; the legacy
   path was effectively dead. UI's file-preview-panel updated to drop the
   matching client-side fallback.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Hans Arnholm <harnholm@nvidia.com>

* fix(ui): cache hardening on auth-gated dataset proxy + HEAD query

Two review fixes for the auth-sensitive dataset preview path:

1. Stop forwarding upstream Cache-Control on /proxy/dataset/file. Storage
   providers commonly set 'public, max-age=…' on object responses; passing
   that through on a per-user authenticated route lets intermediate caches
   serve one user's bytes to another. Drop 'cache-control' from
   FORWARDED_RESPONSE_HEADERS and explicitly set 'Cache-Control:
   private, no-store' on every response. ('vary' was never forwarded — left
   as is.)

2. Drop staleTime: Infinity from the file-preview HEAD useQuery. Caching a
   401/403 forever would prevent revalidation after the user's session
   refreshes. staleTime: 0 restores standard React Query revalidation
   semantics (refetch on focus/mount when stale).

The text-preview useQuery one block up still has staleTime: Infinity; left
alone here since the reviewer scoped the finding to the HEAD query, but
worth a follow-up if we see stuck 401s on the text preview specifically.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Hans Arnholm <harnholm@nvidia.com>

* fix(service): thread override_url/addressing_style through query_dataset link

query_dataset's metadata-enabled branch was the third parse_uri_to_link
callsite I missed when threading override_url + addressing_style for
private S3-compatible buckets — get_collection_info and get_dataset_info
already do this. Without it, /api/bucket/{bucket}/query results
on a CAIOS-backed bucket return AWS-pattern hostnames that ENOTFOUND on
the UI side, same failure mode the original parse_uri_to_link bug had.

Mirrors the default_credential fallback pattern (override_url and
addressing_style live on bucket_config.default_credential, not directly
on bucket_config).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Hans Arnholm <harnholm@nvidia.com>

---------

Signed-off-by: Hans Arnholm <harnholm@nvidia.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Keita Watanabe <keitaw09@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Dataset file browser and preview broken on private storage buckets — unsigned fetch

2 participants