Skip to content

bug/feat: osmo dataset upload fails on S3-compatible backends that reject path-style GetBucketLocation (CAIOS, R2-strict, Wasabi-strict) #940

@narenandu

Description

@narenandu

Describe the bug.

The OSMO CLI (6.2.10) issues a GetBucketLocation request as a precheck on every data-plane operation (osmo dataset upload, osmo data upload, osmo data check, etc.). The bundled boto3 client is configured with the default addressing_style: path, so the precheck is sent in path-style:

GET https://<endpoint>/<bucket>/?location

S3-compatible object stores that only honor GetBucketLocation in virtual-hosted-style reject this with:

ClientError: An error occurred (PathStyleRequestNotAllowed) when calling
the GetBucketLocation operation: The path style requests are not allowed
for this method, please switch to hostname-based requests.

Because OSMO gates the rest of the upload pipeline on this precheck, the CLI cannot upload, download, or even check a bucket on these stores. As a result, the OSMO UI Datasets view stays permanently empty for these deployments — there is no path to register a dataset.

This affects any S3-compatible backend with this constraint. The case observed in the wild was CoreWeave Object Storage (CAIOS) / cwobject.com on a CoreWeave CKS deployment.

Affected versions. 6.2.10 (CLI + service charts 1.2.1). Likely earlier.


Reproduction

Setup: an OSMO 6.2.10 deployment with a CAIOS-backed bucket configured. The bucket itself is reachable from outside OSMO using path-style for ListObjectsV2 / PutObject / GetObject — only GetBucketLocation is virtual-host-only.

# Cluster-side: register the dataset bucket so the CLI sees it
curl -sf -X PUT -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  $OSMO_API/api/configs/dataset -d '{
    "configs": {
      "default_bucket": "default",
      "buckets": {
        "default": {
          "dataset_path": "s3://my-caios-bucket/datasets",
          "region": "us-east-1",
          "mode": "read-write",
          "default_credential": {
            "endpoint": "s3://my-caios-bucket/datasets",
            "region": "us-east-1",
            "override_url": "https://cwobject.com",
            "access_key_id": "...",
            "access_key": "..."
          }
        }
      }
    }
  }'
# HTTP 200, bucket registered

# Client-side: bucket is visible
$ osmo bucket list
Bucket              Description   Location                       Mode         Default Cred
default (default)   ...           s3://my-caios-bucket/datasets  read-write   Yes

$ osmo profile set bucket default
Profile Set

# Now any data-plane op fails identically:

$ osmo dataset upload smoke:v1 ./small-dir --desc "smoke"
Unknown error: An error occurred (PathStyleRequestNotAllowed) when calling
  the GetBucketLocation operation: The path style requests are not allowed
  for this method, please switch to hostname-based requests.
Error code: 1

$ osmo data upload s3://my-caios-bucket/datasets/foo ./small-dir
Unknown error: An error occurred (PathStyleRequestNotAllowed) ...

$ osmo data check s3://my-caios-bucket/datasets
Unknown error: An error occurred (PathStyleRequestNotAllowed) ...

The standard env-var workaround for boto3 does NOT bite, because boto3.client('s3', config=Config(s3={'addressing_style': 'path'})) overrides the env:

$ AWS_S3_ADDRESSING_STYLE=virtual osmo dataset upload smoke:v1 ./small-dir
Unknown error: An error occurred (PathStyleRequestNotAllowed) ...

Setting the dataset's default_credential.override_url to a virtual-hosted-style URL makes the SDK double-prepend the bucket and the URL no longer resolves:

# override_url = https://my-caios-bucket.cwobject.com
$ aws --endpoint-url https://my-caios-bucket.cwobject.com s3 ls s3://my-caios-bucket/
Could not connect to the endpoint URL:
"https://my-caios-bucket.my-caios-bucket.cwobject.com/?list-type=2&..."

Direct aws s3 cp (without OSMO) works fine against https://cwobject.com in path-style for PutObject / ListObjects / GetObject. So the bucket and credentials are correct; the CLI's GetBucketLocation precheck is the only step that fails.


Why this matters

  1. The OSMO UI Datasets view is empty by design when running on CAIOS — there is no CLI/UI path to register a dataset. Workflows can still submit, schedule, and complete (the in-cluster credential paths are unaffected), but datasets / app pushes / direct data ops are blocked.
  2. Affects every S3-compatible store with this constraint. CAIOS surfaces this on a stock CoreWeave CKS install of OSMO. Other tenants on Cloudflare R2 / Backblaze B2 / Wasabi with similar path-style policies would hit the same issue.
  3. No graceful surface. The error bubbles up as a generic Unknown error: An error occurred (PathStyleRequestNotAllowed). There is no hint that the underlying problem is "your S3 backend doesn't support path-style GetBucketLocation, configure addressing-style virtual" — operators end up debugging boto3 internals.
  4. Bucket region is already known. The BucketConfig.region field is mandatory (default: us-east-1), so the GetBucketLocation precheck is computing something OSMO already has.

Suggested fix

Three concrete options, ordered by impact and minimum scope:

Option A — drop the GetBucketLocation precheck. BucketConfig.region is already mandatory in OSMO's data model. Whatever the precheck is using the location for can be sourced from there directly. This is the minimal-diff fix and unblocks every backend that doesn't speak path-style GetBucketLocation.

Option B — add addressing_style to BucketConfig (and propagate to the boto3 Config). Per-bucket toggle. Default "auto" (current behavior); operators set "virtual" for CAIOS / R2 / etc. Backwards compatible.

Option C — honor AWS_S3_ADDRESSING_STYLE in the boto3 client init. Don't pass an explicit addressing_style in the Config so the SDK falls back to env / ~/.aws/config. Lowest-friction for operators who already use the standard AWS env.

Probably (A) + (C) together: drop the unnecessary precheck AND stop hardcoding addressing-style. (B) becomes optional polish.

The relevant code is the boto3 client factory used by osmo dataset upload / osmo data upload / osmo data check. From the CLI bundle structure (/usr/local/osmo/osmo-cli/_internal/), it appears OSMO modules are compiled into the PyInstaller PYZ, so I haven't been able to point at the exact line — but the call site is whichever helper builds the boto3 s3 client for these subcommands.


Documentation gap (separate, smaller fix)

The OSMO deployment guide should call out CAIOS / non-AWS S3 backends as a known incompatibility for the dataset CLI in 6.2.10, with the upstream issue linked. Today the failure is opaque from the user's side — the error doesn't suggest "this is a known incompatibility, here's the workaround".


Workaround

Until upstream lands a fix, the only way to populate a CAIOS-backed bucket on OSMO is direct aws s3 cp (path-style works for PutObject / ListObjects / GetObject):

AWS_ACCESS_KEY_ID=$AKID AWS_SECRET_ACCESS_KEY=$SAK \
  aws --endpoint-url https://cwobject.com --region us-east-1 \
  s3 cp ./local-file s3://my-caios-bucket/path/

This bypasses OSMO entirely — files land in the bucket but are NOT registered as OSMO datasets, so the UI Datasets view stays empty.


Environment

  • OSMO 6.2.10 (charts 1.2.1 / app 6.2.10.fa46f7f09)
  • OSMO CLI installed via the published Mac binary
  • Kubernetes v1.35.4 on CoreWeave CKS
  • Backend: CoreWeave Object Storage (https://cwobject.com) — S3-compatible
  • Reproduced on CKS cluster after fresh deploy of OSMO + a CAIOS bucket configured per the deployment guide.

Cross-references:

Metadata

Metadata

Assignees

Labels

externalThe author is not in @NVIDIA/osmo-dev

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions