How to mount GPU devices correctly in nsjail?

Hi, I'm trying to run a simple "pytorch tensor add" on GPU under nsjail on a GCP `nvidia-tesla-t4` node and i'm getting the following error.

nsjail_pytorch.cfg

```
mount {
  src: "/home/current_user_ldap/pytorch_env"
  dst: "/home/current_user_ldap/pytorch_env"
  is_bind: true
}
mount {
  src: "/dev/nvidia0"
  dst: "/dev/nvidia0"
  is_bind: true
  rw: true
}
mount {
  src: "/dev/nvidiactl"
  dst: "/dev/nvidiactl"
  is_bind: true
  rw: true
}
mount {
  src: "/dev/nvidia-uvm"
  dst: "/dev/nvidia-uvm"
  is_bind: true
  rw: true
}
mount {
  src: "/usr"
  dst: "/usr"
  is_bind: true
  rw: true
}
# for libs
mount {
  src: "/lib64"
  dst: "/lib64"
  is_bind: true
}
mount {
  src: "/lib"
  dst: "/lib"
  is_bind: true
  rw: true
}
cwd: "/home/current_user_ldap/pytorch_env/"

```

# Running simple PyTorch Tensor Add on CPU works.

```
nsjail -Mo --chroot /   --rlimit_nproc 6553   --rlimit_fsize inf --rlimit_as inf   -- /usr/bin/python3 -c "import torch; a = torch.tensor([1.0, 2.0], device='cpu') + torch.tensor([3.0, 4.0], device='cpu'); print(a)" 
```

This prints the expected tensor output of [4, 6]


# Running simple PyTorch Tensor Add on GPU fails

```
nsjail -Mo --config nsjail_pytorch.cfg  --chroot /  --rlimit_nproc 6553   --rlimit_fsize inf --rlimit_as inf    -- /usr/bin/python3 -c "import torch; print(torch.cuda.is_available());"
```

```
[I][2024-08-10T02:03:04+0000] Mode: STANDALONE_ONCE
[I][2024-08-10T02:03:04+0000] Jail parameters: hostname:'NSJAIL', chroot:'/', process:'/usr/bin/python3', bind:[::]:0, max_conns:0, max_conns_per_ip:0, time_limit:600, personality:0, daemonize:false, clone_newnet:true, clone_newuser:true, clone_newns:true, clone_newpid:true, clone_newipc:true, clone_newuts:true, clone_newcgroup:true, clone_newtime:false, keep_caps:false, disable_no_new_privs:false, max_cpus:0
[I][2024-08-10T02:03:04+0000] Mount: '/' -> '/' flags:MS_RDONLY|MS_BIND|MS_REC|MS_PRIVATE type:'' options:'' dir:true
[I][2024-08-10T02:03:04+0000] Mount: '/home/current_user_ldap/pytorch_env' -> '/home/current_user_ldap/pytorch_env' flags:MS_RDONLY|MS_BIND|MS_REC|MS_PRIVATE type:'' options:'' dir:true
[I][2024-08-10T02:03:04+0000] Mount: '/dev/nvidia0' -> '/dev/nvidia0' flags:MS_BIND|MS_REC|MS_PRIVATE type:'' options:'' dir:false
[I][2024-08-10T02:03:04+0000] Mount: '/dev/nvidiactl' -> '/dev/nvidiactl' flags:MS_BIND|MS_REC|MS_PRIVATE type:'' options:'' dir:false
[I][2024-08-10T02:03:04+0000] Mount: '/dev/nvidia-uvm' -> '/dev/nvidia-uvm' flags:MS_BIND|MS_REC|MS_PRIVATE type:'' options:'' dir:false
[I][2024-08-10T02:03:04+0000] Mount: '/usr' -> '/usr' flags:MS_BIND|MS_REC|MS_PRIVATE type:'' options:'' dir:true
[I][2024-08-10T02:03:04+0000] Mount: '/lib64' -> '/lib64' flags:MS_RDONLY|MS_BIND|MS_REC|MS_PRIVATE type:'' options:'' dir:true
[I][2024-08-10T02:03:04+0000] Mount: '/lib' -> '/lib' flags:MS_BIND|MS_REC|MS_PRIVATE type:'' options:'' dir:true
[I][2024-08-10T02:03:04+0000] Uid map: inside_uid:1002 outside_uid:1002 count:1 newuidmap:false
[I][2024-08-10T02:03:04+0000] Gid map: inside_gid:1003 outside_gid:1003 count:1 newgidmap:false
[I][2024-08-10T02:03:06+0000] Executing '/usr/bin/python3' for '[STANDALONE MODE]'
/home/current_user_ldap/.local/lib/python3.9/site-packages/torch/cuda/__init__.py:128: UserWarning: CUDA initialization: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 304: OS call failed or operation not supported on this OS (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:108.)
  return torch._C._cuda_getDeviceCount() > 0
False
[I][2024-08-10T02:03:08+0000] pid=28434 ([STANDALONE MODE]) exited with status: 0, (PIDs left: 0)
```


# NVIDIA-SMI runs fine under nsjail

```
nsjail -Mo --config nsjail_pytorch.cfg  --chroot /  --rlimit_nproc 6553 --rlimit_as inf   -- /bin/nvidia-smi
```

The above prints, the actual nvidia-smi output successfully. 

# Notes

* PyTorch works fine under nsjail (No issues)
* nvidia-smi works under nsjail 
* Running PyTorch without nsjail on GPU succeeds.

This doesn't look like pytorch or the host issue provided pytorch works on GPU without nsjail.  Any help appreciated. 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to mount GPU devices correctly in nsjail? #237

Running simple PyTorch Tensor Add on CPU works.

Running simple PyTorch Tensor Add on GPU fails

NVIDIA-SMI runs fine under nsjail

Notes

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

How to mount GPU devices correctly in nsjail? #237

Description

Running simple PyTorch Tensor Add on CPU works.

Running simple PyTorch Tensor Add on GPU fails

NVIDIA-SMI runs fine under nsjail

Notes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions