Page History

...

Provision new Centos or Ubuntu instance.
Select layout ending with eumetsat-gpu and one of the plans listed above. Beside that, configure your instance as preferred and continue deployment process.
Once VM is deployed, you can verify GPUs for example using nvidia-smi program from command line (see below for confirming library installations and drivers).

Usage

Useful commands

You can see GPU information using nvidia-smi

Code Block

$ nvidia-smi
Mon JanFeb  85 1013:2401:5943 2024
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.161223.0302   Driver Version: 470.161223.0302   CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA RTXA6000...-6C  On   | 00000000:00:05.0 Off |                    0 |
| N/A   N/A    P8    N/A /  N/A |    3712MiB512MiB / 48895MiB 5976MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

NVIDIA tools are available in /usr/local/cuda-11.48/bin/. You can add them to PATH following:

Code Block
$ export PATH=$PATH:/usr/local/cuda-11.48/bin/

Libraries

CUDA version is currently 11.4 which need to be the same with drivers and thus can't be changed. Tensorflow library compatibility is available at: https://www.tensorflow.org/install/source#gpu. We have tested that TensorFlow > 2.6.1 work.

Using Conda

Update and conda installation

Code Block

# change shell to bash for installations
$ bash

# updateinstall default packages
$ sudo apt-get updateminiforge (or any anaconda manager)
$ sudo apt-get update

# it's possible to get some update key and dirmngr errors while updating, below commands supply a workaround. After running the workaround, run update & upgrade again.
$ sudo apt install dirmngr
$ sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys <YOUR-KEY-LIKE-AA16FCBCA621E701>

# install miniforge (or any anaconda manager)
$ wget https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-Linux-x86_64.sh
$ chmod +x Miniforge3-Linux-x86_64.sh
$ ./Miniforge3-Linux-x86_64.sh

#When it asks, conda init? answer yes
#Do you wish the installer to initialize Miniforge3
#by running conda init? [yes|no]
#[no] >>> 
$ yes

$ exit
$ bash

Library installations

wget https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-Linux-x86_64.sh
$ chmod +x Miniforge3-Linux-x86_64.sh
$ ./Miniforge3-Linux-x86_64.sh

#When it asks, conda init? answer yes
#Do you wish the installer to initialize Miniforge3
#by running conda init? [yes|no]
#[no] >>> 
$ yes

$ exit
$ bash

Library installations

Code Block

# create conda environment
$ conda create -n ML python=3.8

# activate the environment
$ conda activate ML

# install packages, note that installing tensorflow-gpu and keras also installs: CUDA toolkit, cuDNN (CUDA Deep Neural Network library), Numpy, Scipy, Pillow
$ conda install tensorflow-gpu keras

# (OPTIONAL) cudatoolkit is installed automatically while installing keras and tensorflow-gpu, but if you need a specific (or latest) version run below command.
$ conda install -c anaconda cudatoolkit

# (OPTIONAL) Installing pytorch GPU, pytorch might need cuda 11.8
$ conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia

Code Block

# create conda environment
$ conda create -n ML python=3.8

# activate the environment
$ conda activate ML

# install packages, note that installing tensorflow-gpu and keras also installs: CUDA toolkit, cuDNN (CUDA Deep Neural Network library), Numpy, Scipy, Pillow
$ conda install tensorflow-gpu keras

# (OPTIONAL) cudatoolkit is installed automatically while installing keras and tensorflow-gpu, but if you need a specific (or latest) version run below command.
$ conda install -c anaconda cudatoolkit

Confirmation of installations

Code Block

$ nvidia-smi
Mon JanFeb  85 1013:2414:5945 2024
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.161223.0302   Driver Version: 470.161223.0302   CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA RTXA6000...-6C  On   | 00000000:00:05.0 Off |                    0 |
| N/A   N/A    P8    N/A /  N/A |   3712MiB 512MiB / 48895MiB 5976MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

$ python3 --version
Python 3.8.18

$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-20212022 NVIDIA Corporation
Built on MonWed_OctSep_1121_2110:2733:0258_PDT_20212022
Cuda compilation tools, release 11.48, V11.48.15289
Build cuda_11.48.r11.48/compiler.3052143531833905_0

$ whereis cuda
cuda: /usr/local/cuda

$ cat /home/<USERNAME>/miniforge3/envs/myenvML/include/cudnn.h
.
.
.
/*   cudnn : Neural Networks Library

*/

#if !defined(CUDNN_H_)
#define CUDNN_H_

#include <cuda_runtime.h>
#include <stdint.h>

#include "cudnn_version.h"
#include "cudnn_ops_infer.h"
#include "cudnn_ops_train.h"
#include "cudnn_adv_infer.h"
#include "cudnn_adv_train.h"
#include "cudnn_cnn_infer.h"
#include "cudnn_cnn_train.h"

#include "cudnn_backend.h"

#if defined(__cplusplus)
extern "C" {
#endif

#if defined(__cplusplus)
}
#endif

#endif /* CUDNN_H_ */

$ conda list | grep tensorflow
tensorflow                2.13.1          cuda118py38h409af0c_1    conda-forge
tensorflow-base           2.13.1          cuda118py38h52ca5c6_1    cuda118py38h52ca5c6_1    conda-forge
tensorflow-estimator      2.13.1          cuda118py38ha2f8a09_1    conda-forge
tensorflow-gpu            2.13.1          cuda118py38h0240f8b_1    conda-forge

$ conda list | grep keras
keras                     2.13.1             pyhd8ed1ab_0    conda-forge

$ python
import tensorflow as tf
tf.test.is_built_with_cuda()
True
tf.config.list_physical_devices('GPU')
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
print(tf.__version__)
2.13.1conda-forge
tensorflow-estimator      2.13.1          cuda118py38ha2f8a09_1    conda-forge
tensorflow-gpu            2.13.1          cuda118py38h0240f8b_1    conda-forge

$ conda list | grep keras
keras                     2.13.1             pyhd8ed1ab_0    conda-forge

$ python
import tensorflow as tf
tf.test.is_built_with_cuda()
True
tf.config.list_physical_devices('GPU')
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
print(tf.__version__)
2.13.1

# (OPTIONAL) Check pytorch
$ python
import torch

$ python
print(torch.__version__)  # Print PyTorch version
2.2.0

$ python
print(torch.cuda.is_available())  # Check if CUDA is available
True

$ python
print(torch.version.cuda)  # Print the CUDA version PyTorch is using
11.8

$ python
if torch.cuda.is_available():
    # Create a tensor and move it to GPU
    x = torch.tensor([1.0, 2.0]).cuda()
    print(x)  # Print the tensor to verify it's on the GPU
else:
    print("CUDA is not available. Check your PyTorch installation.")

tensor([1., 2.], device='cuda:0')

#Using Docker

If you want to use GPUs in docker, you need to take few extra steps after creating the VM.

Install Docker
In ubuntu:

Code Block
sudo apt install -y docker.io sudo usermod -aG docker $USER

In Centos:

Code Block

sudo yum-config-manager \
    --add-repo \
    https://download.docker.com/linux/centos/docker-ce.repo
sudo yum install docker-ce docker-ce-cli containerd.io
sudo systemctl --now enable docker
sudo usermod -aG docker $USER

Logout and login again

Install nvidia-container toolkit
Ubuntu:

Code Block

distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
sudo systemctl restart docker

Centos:

Code Block

	distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
   && curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.repo | sudo tee /etc/yum.repos.d/nvidia-container-toolkit.repo
sudo yum clean expire-cache && sudo yum install -y nvidia-docker2
sudo systemctl restart docker

Run GPU-compatible notebook. For example:

Code Block
# might need sudo sudo docker run --gpus all --env NVIDIA_DISABLE_REQUIRE=1 -it --rm -v $(realpath ~/notebooks):/tf/notebooks -p 8888:8888 tensorflow/tensorflow:latest-gpu-jupyter

Space shortcuts

Page tree

Versions Compared

Old Version 8

New Version Current

Key

Usage

Useful commands

Libraries

Using Conda

#Using Docker