Self-hosting Cosmos 3 generation with the vllm/vllm-omni:cosmos3 image, the open nvidia/Cosmos3-Nano weights download under OpenMDW, but the server then exits at startup while loading a gated dependency. This is the cookbook quickstart command:
vllm serve nvidia/Cosmos3-Nano --omni \
--model-class-name Cosmos3OmniDiffusersPipeline \
--allowed-local-media-path / --port 8000 --init-timeout 1800
and this is the error:
Cannot access gated repo for url https://huggingface.co/nvidia/Cosmos-1.0-Guardrail/...
Access to model nvidia/Cosmos-1.0-Guardrail is restricted and you are not in the authorized list.
The open generation model loads nvidia/Cosmos-1.0-Guardrail, which is a gated Hugging Face repo, so an HF token with access to the open model is not enough to start the server. The failure happens at startup rather than at request time, which makes it look like the whole deployment is broken instead of one optional component failing to load.
The model card references guardrails as an optional toggle ("guardrails": true) and lists cosmos_guardrail as a dependency, but it does not say that the guardrail model is gated or that the default generation path fails without access to it. The --deploy-config workaround that disables guardrails is in the main README, but not in the generator cookbook quickstart where a first-time user is working.
Proposed change
Add the no-guardrails setup to the generator quickstart (cookbooks/cosmos3/generator/audiovisual/README.md) so a user who only wants to run generation can start the server without requesting access to the gated guardrail repo:
# no_guardrails.yaml
async_chunk: false
stages:
- stage_id: 0
max_num_seqs: 1
enforce_eager: true
trust_remote_code: true
model_class_name: Cosmos3OmniDiffusersPipeline
model_config:
guardrails: false
offload_guardrail_models: false
vllm serve nvidia/Cosmos3-Nano --omni \
--model-class-name Cosmos3OmniDiffusersPipeline \
--deploy-config no_guardrails.yaml --port 8000 --init-timeout 1800
Environment
Single NVIDIA H200 (141 GB) rented through NVIDIA Brev (DigitalOcean instance), driver 590.48, Docker 29.1.3 with the NVIDIA runtime, prebuilt vllm/vllm-omni:cosmos3 image.
Happy to send a small docs PR for this if helpful.
Self-hosting Cosmos 3 generation with the
vllm/vllm-omni:cosmos3image, the opennvidia/Cosmos3-Nanoweights download under OpenMDW, but the server then exits at startup while loading a gated dependency. This is the cookbook quickstart command:and this is the error:
The open generation model loads
nvidia/Cosmos-1.0-Guardrail, which is a gated Hugging Face repo, so an HF token with access to the open model is not enough to start the server. The failure happens at startup rather than at request time, which makes it look like the whole deployment is broken instead of one optional component failing to load.The model card references guardrails as an optional toggle (
"guardrails": true) and listscosmos_guardrailas a dependency, but it does not say that the guardrail model is gated or that the default generation path fails without access to it. The--deploy-configworkaround that disables guardrails is in the main README, but not in the generator cookbook quickstart where a first-time user is working.Proposed change
Add the no-guardrails setup to the generator quickstart (
cookbooks/cosmos3/generator/audiovisual/README.md) so a user who only wants to run generation can start the server without requesting access to the gated guardrail repo:Environment
Single NVIDIA H200 (141 GB) rented through NVIDIA Brev (DigitalOcean instance), driver 590.48, Docker 29.1.3 with the NVIDIA runtime, prebuilt
vllm/vllm-omni:cosmos3image.Happy to send a small docs PR for this if helpful.