...
- Provision new Centos or Ubuntu instance.
- Select layouyt layout ending with
eumetsat
-gpu and one of the plans listed above. Beside that, configure your instance as preferred and continue deployment process. - Once VM is deployed, you can verify GPUs for example using
nvidia-smi
program from command line (see below for confirming library installations and drivers).
Usage
Useful commands
You can see GPU information using nvidia-smi
Code Block |
---|
[tervo@gpu-test-centos ~]$ nvidia-smi TueMon AprFeb 5 1213:2201:47 2022 43 2024 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 470.82223.0102 Driver Version: 470.82223.0102 CUDA Version: 11.4 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 NVIDIA RTXA6000-6C On | 00000000:00:05.0 Off | 0 | | N/A N/A P8 N/A / N/A | 512MiB / 5976MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID | | GPU GI CI PID Type Process name Usage GPU Memory | | ID ID Usage | |============================================================================================|| | No running processes found | +--------------------------------------------------------------------------------------+ |
NVIDIA tools are available in /usr/local/cuda-11.4/bin/. You can add them to PATH following:
Code Block |
---|
export PATH=$PATH:/usr/local/cuda-11.4/bin/ |
Libraries
CUDA version is currently 11.4 which need to be the same with drivers and thus can't be changed. Tensorflow library compatibility is available at: https://www.tensorflow.org/install/source#gpu. We have tested that TensorFlow > 2.6.1 work.
...
---+ |
NVIDIA tools are available in /usr/local/cuda-11.8/bin/. You can add them to PATH following:
Code Block |
---|
$ export PATH=$PATH:/usr/local/cuda-11.8/bin/ |
Libraries
CUDA version is currently 11.4 which need to be the same with drivers and thus can't be changed. Tensorflow library compatibility is available at: https://www.tensorflow.org/install/source#gpu. We have tested that TensorFlow > 2.6.1 work.
Using Conda
Update and conda installation
Code Block |
---|
# change shell to bash for installations
$ bash
# install miniforge (or any anaconda manager)
$ wget https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-Linux-x86_64.sh
$ chmod +x Miniforge3-Linux-x86_64.sh
$ ./Miniforge3-Linux-x86_64.sh
#When it asks, conda init? answer yes
#Do you wish the installer to initialize Miniforge3
#by running conda init? [yes|no]
#[no] >>>
$ yes
$ exit
$ bash |
Library installations
Code Block |
---|
# create conda environment
$ conda create -n ML python=3.8
# activate the environment
$ conda activate ML
# install packages, note that installing tensorflow-gpu and keras also installs: CUDA toolkit, cuDNN (CUDA Deep Neural Network library), Numpy, Scipy, Pillow
$ conda install tensorflow-gpu keras
# (OPTIONAL) cudatoolkit is installed automatically while installing keras and tensorflow-gpu, but if you need a specific (or latest) version run below command.
$ conda install -c anaconda cudatoolkit
# (OPTIONAL) Installing pytorch GPU, pytorch might need cuda 11.8
$ conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia |
Confirmation of installations
Code Block |
---|
$ nvidia-smi
Mon Feb 5 13:14:45 2024
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.223.02 Driver Version: 470.223.02 CUDA Version: 11.4 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA RTXA6000-6C On | 00000000:00:05.0 Off | 0 |
| N/A N/A P8 N/A / N/A | 512MiB / 5976MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
$ python3 --version
Python 3.8.18
$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:33:58_PDT_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0
$ whereis cuda
cuda: /usr/local/cuda
$ cat /home/<USERNAME>/miniforge3/envs/ML/include/cudnn.h
.
.
.
/* cudnn : Neural Networks Library
*/
#if !defined(CUDNN_H_)
#define CUDNN_H_
#include <cuda_runtime.h>
#include <stdint.h>
#include "cudnn_version.h"
#include "cudnn_ops_infer.h"
#include "cudnn_ops_train.h"
#include "cudnn_adv_infer.h"
#include "cudnn_adv_train.h"
#include "cudnn_cnn_infer.h"
#include "cudnn_cnn_train.h"
#include "cudnn_backend.h"
#if defined(__cplusplus)
extern "C" {
#endif
#if defined(__cplusplus)
}
#endif
#endif /* CUDNN_H_ */
$ conda list | grep tensorflow
tensorflow 2.13.1 cuda118py38h409af0c_1 conda-forge
tensorflow-base 2.13.1 cuda118py38h52ca5c6_1 conda-forge
tensorflow-estimator 2.13.1 cuda118py38ha2f8a09_1 conda-forge
tensorflow-gpu 2.13.1 cuda118py38h0240f8b_1 conda-forge
$ conda list | grep keras
keras 2.13.1 pyhd8ed1ab_0 conda-forge
$ python
import tensorflow as tf
tf.test.is_built_with_cuda()
True
tf.config.list_physical_devices('GPU')
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
print(tf.__version__)
2.13.1
# (OPTIONAL) Check pytorch
$ python
import torch
$ python
print(torch.__version__) # Print PyTorch version
2.2.0
$ python
print(torch.cuda.is_available()) # Check if CUDA is available
True
$ python
print(torch.version.cuda) # Print the CUDA version PyTorch is using
11.8
$ python
if torch.cuda.is_available():
# Create a tensor and move it to GPU
x = torch.tensor([1.0, 2.0]).cuda()
print(x) # Print the tensor to verify it's on the GPU
else:
print("CUDA is not available. Check your PyTorch installation.")
tensor([1., 2.], device='cuda:0')
|
#Using Docker
If you want to use GPUs in docker, you need to take few extra steps after creating the VM.
Install Docker
In ubuntu:Code Block sudo apt install -y docker.io sudo usermod -aG docker $USER
In Centos:
Code Block sudo yum-config-manager \ --add-repo \ https://download.docker.com/linux/centos/docker-ce.repo sudo yum install docker-ce docker-ce-cli containerd.io sudo systemctl --now enable docker sudo usermod -aG docker $USER
- Logout and login again
Install nvidia-container toolkit
Ubuntu:Code Block distribution=$(. /etc/os-release;echo $ID$VERSION_ID) curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit sudo systemctl restart docker
Centos:
Code Block distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \ && curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.repo | sudo tee /etc/yum.repos.d/nvidia-container-toolkit.repo sudo yum clean expire-cache && sudo yum install -y nvidia-docker2 sudo systemctl restart docker
Run GPU-compatible notebook. For example:
Code Block sudo docker run --gpus all --env NVIDIA_DISABLE_REQUIRE=1 -it --rm -v $(realpath ~/notebooks):/tf/notebooks -p 8888:8888 tensorflow/tensorflow:latest-gpu-jupyter