Machine Learning (ML) requires access to serious computing hardware, most notably Graphical Processing Units (GPUs) capable of crunching vast amount of data in order to train complicated deep (learning) neural networks models. NVidia is currently the dominant GPU provider on the market. A large fraction of deep learning models rely upon NVidia GPUs. Before we setup a docker environment that has access to GPUs available on the host machine, lets ensure that the following works. To keep things simple, lets assume that you are using a Linux system.
- Nvidia GPU hardware is correctly installed on your machine.
- You have downloaded and installed the latest drivers for your GPU/OS combination.
- You have also installed the correct version of Nvidia cuda toolkit.
- In addition, you have also installed Docker and docker-compose.
Which GPUs are available
Use the following commands to see GPU hardware available on your machine.
$ nvidia-smiWhich version of cuda is installed
Use the following command to check cuda compiler version installed on your machine.
$ nvcc --versionWhich version of linux is this system
Check linux version
$ cat /etc/*releaseAlternately, you can use lsb_release -d command.
You can install lsb package if not already installed
$ apt-get -y install lsb-coreNvidia has provided Nvidia Container Toolkit package that allows GPUs to be made available within the docker container. Install it as follows
$ sudo apt-get install -y nvidia-docker2
$ sudo systemctl restart dockerYou're almost there.
You can docker pull to get a pre-built container. Check here and here for more information.
$ docker pull nvidia/cuda:11.3.1-cudnn8-runtime-ubuntu20.04This will require docker login.
This is my preferred way of doing things. Create a Dockerfile as follows. The first line is important and it specifies the base image nvidia/cuda image to build from. The rest is standard docker stuff as seen here.
Dockerfile
FROM nvidia/cuda:10.1-cudnn7-runtime-ubuntu18.04
ARG DEBIAN_FRONTEND=noninteractive
RUN apt-get update && apt-get -y upgrade
RUN apt-get install -y build-essential python3 python3-pip python-dev sudo
RUN mkdir -p /tmp
COPY requirements.txt /tmp/
RUN pip3 -q install pip --upgrade
RUN pip3 install -r /tmp/requirements.txt -f https://download.pytorch.org/whl/torch_stable.html
RUN groupadd -g 1010 dockeruser
RUN useradd -r -m -g 1010 dockeruser
RUN chown -R dockeruser /home/dockeruser
RUN chmod -R g+rwx /home/dockeruser
USER dockeruserSince I am interested in PyTorch, my requirements.txt file contains the following.
requirements.txt
torch==1.8.0+cu111
torchvision==0.9.0+cu111
matplotlib
jupyterFind out the appropriate torch package with the correct cuda support here. This sort of needs to match with the cuda support available on the host and also the nvidia/cuda base image that you use as the base image (as seen in the docker file).
Build docker container as usual.
$ docker build -t myml .You can run docker container as follows
$ docker run -it --gpus=all --ipc=host myml bashFrom within you can see if PyTorch is able to access host GPUs as follows.
[Container] $ python3
Python 3.6.9 (default, Jan 26 2021, 15:33:00)
[GCC 8.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.cuda.is_available()
True
>>> torch.cuda.get_device_name(0)
'GeForce GTX TITAN X'
>>> torch.cuda.get_device_name(1)
'GeForce GTX 980'
>>>Happy hacking!