OpenShell GPU sandbox: CUDA cuInit fails under Landlock on Spark/GB10 despite nvidia-smi working

### Description

On a DGX Spark / NVIDIA GB10 host, a NemoClaw/OpenShell sandbox created with GPU passthrough can see the NVIDIA GPU with `nvidia-smi`, but CUDA initialization fails when the sandbox is created with the default Landlock policy.

Expected: a GPU-enabled OpenShell sandbox should allow CUDA workloads to initialize successfully, or onboarding should fail with a clearer validation error than a passing `nvidia-smi` proof.

Actual: `nvidia-smi` succeeds, but `cuInit(0)` through the normal `openshell sandbox exec` path returns `304`. Recreating the same sandbox image with the same GPU devices and policy minus the `landlock` block fixes CUDA initialization.

This makes `nvidia-smi` an insufficient GPU proof for CUDA workloads on this setup.

### Reproduction Steps

1. Configure NVIDIA CDI on the Spark host:

   ```shell
   sudo nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml
   sudo systemctl restart docker
   nvidia-ctk cdi list
   ```

2. Onboard a NemoClaw sandbox with direct sandbox GPU enabled:

   ```shell
   nemoclaw onboard --name drclaw --gpu --sandbox-gpu --sandbox-gpu-device nvidia.com/gpu=all --non-interactive --yes --yes-i-accept-third-party-software
   ```

3. Verify `nvidia-smi` works via OpenShell:

   ```shell
   openshell sandbox exec -n drclaw -- nvidia-smi
   ```

4. Test CUDA initialization via OpenShell:

   ```shell
   openshell sandbox exec -n drclaw -- python3 -c 'import ctypes,sys; lib=ctypes.CDLL("libcuda.so.1"); rc=lib.cuInit(0); print("cuInit(0)=%s" % rc); sys.exit(0 if rc == 0 else 1)'
   ```

   Result with default Landlock policy:

   ```shell
   cuInit(0)=304
   ```

5. Test direct Docker exec into the same GPU-enabled sandbox container:

   ```shell
   docker exec --user sandbox <openshell-drclaw-container> python3 -c 'import ctypes,sys; lib=ctypes.CDLL("libcuda.so.1"); rc=lib.cuInit(0); print("cuInit(0)=%s" % rc); sys.exit(0 if rc == 0 else 1)'
   ```

   Result:

   ```shell
   cuInit(0)=0
   ```

6. Create a temporary GPU sandbox from the same image and same policy, but omit the `landlock:` section. CUDA initialization through OpenShell succeeds:

   ```shell
   cuInit(0)=0
   ```

### Environment

- Hardware: DGX Spark / NVIDIA GB10
- OS: Linux aarch64
- NVIDIA driver: `580.126.09`
- CUDA reported by `nvidia-smi`: `13.0`
- GPU: `NVIDIA GB10`
- NemoClaw: `v0.0.41`
- OpenShell: `0.0.39`
- NemoClaw source revision installed locally: `5818cfa8962084717f281bfff5c08ae0435a30a7`
- Docker GPU mode selected by onboarding: `--gpus all`
- CDI devices present:
  - `nvidia.com/gpu=0`
  - `nvidia.com/gpu=GPU-96e354d9-34ac-8927-0cbb-d761e87ba109`
  - `nvidia.com/gpu=all`

### Debug Output

The relevant split is:

```shell
# OpenShell exec with default Landlock policy
openshell sandbox exec -n drclaw -- python3 /sandbox/.openclaw/drclaw-cuda-probe.py
cuInit(0)=304

# Direct Docker exec into same container
docker exec --user sandbox <openshell-drclaw-container> python3 /sandbox/.openclaw/drclaw-cuda-probe.py
cuInit(0)=0

# OpenShell exec after recreating sandbox without the landlock block
openshell sandbox exec -n drclaw -- python3 /sandbox/.openclaw/drclaw-cuda-probe.py
cuInit(0)=0
```

`nvidia-smi` succeeds through OpenShell in both cases.

### Logs

With the default policy, `nvidia-smi` succeeds but `cuInit(0)` fails with CUDA result `304`.

I also observed that `/proc/<pid>/task/<tid>/comm` writes fail under the default OpenShell execution path:

```shell
sh: 1: cannot create /proc/<pid>/task/<pid>/comm: Permission denied
```

Allowing `/proc` read-write in the policy did not fix CUDA while Landlock remained enabled. Removing the `landlock:` section at sandbox creation time did fix CUDA. OpenShell rejects changing Landlock on a live sandbox, so this had to be tested by recreating a temporary sandbox.

### Checklist

- [x] I confirmed this bug is reproducible
- [x] I searched existing issues and this is not a duplicate


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OpenShell GPU sandbox: CUDA cuInit fails under Landlock on Spark/GB10 despite nvidia-smi working #4016

Description

Reproduction Steps

Environment

Debug Output

Logs

Checklist

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

OpenShell GPU sandbox: CUDA cuInit fails under Landlock on Spark/GB10 despite nvidia-smi working #4016

Description

Description

Reproduction Steps

Environment

Debug Output

Logs

Checklist

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions