Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Warning

Access to GPUs is not enabled by default for all tenants. Please raise an issue to the support portal or contact support@europeanweather.cloud to request access if you wish to use them.

The current pilot infrastructure at ECMWF features 2x5 NVIDIA Tesla V100 32 x A100 80 GB cards targeting Machine Learning workloads. They are exposed as Virtual GPUS to the instances on the cloud, which allows for multiple VMs to transparently share the same Physical GPU card.

How to provision a GPU-enabled instance

Once your tenant is granted access to the GPUs, creating a new VM with access to a virtual GPU is very straightforward. Follow the process on Provision a new instance - web paying special attention on the Configuration Step:

CentOSRocky

  1. On the Library screen, choose CentOS. Ubuntu is currently not supported 
  2. On Layout, select the item with "-gpu" suffix (e.g.: "centosrocky-79.92-gpu" )
  3. On Plan, pick one of the plans with the "gpu" suffix, depending on how much resources are needed, including the amount of GPU memory:
    1. 8cpu-4gbmem64gbmem-20gbdisk-4gbgpu30gbdisk-a100.1g.10gbgpu
    2. 8cpu-8gbmem64gbmem-20gbdisk-4gbgpu
    3. 8cpu-32gbmem-40gbdisk-8gbgpu
    4. 16cpu-32gbmem-80gbdisk-16gbgpu
      Image Removed

Ubuntu

  1. On the Library screen, choose Ubuntu.
  2. On the Configuration Options select Version 20.04
  3. On Layout, select the item with "-gpu" suffix (i.e.: "ubuntu-20.04-gpu" )
  4. On Plan, for the Ubuntu instance select the following plan :  16cpu-32gbmem-80gbdisk-16gbgpu  ( * )

Image Removed

    1. 30gbdisk-a100.2g.20gbgpu
    2. 16cpu-128gbmem-30gbdisk-40gbgpu    
    3. 48cpu-384gbmem-30gbdisk-80gbgpu ( * )

( * ) The latest plan "48cpu-384gbmem-30gbdisk-80gbgpu" is only available upon request for a limited amount of time for justified use cases requirements.

Image Added( * ) Note:  the "16cpu-32gbmem-80gbdisk-16gbgpu" plan has limited availability in the current EWC infrastructure. Please contact EWC team via Support Portal if you encounter any problem during the deployment as it might be caused by the temporary lack of availability.



Info

Once your instance is running, you can check wether your instance can see the GPU with:

No Format
$> nvidia-smi 
TueMon NovOct 1701 1509:2017:3813 20202023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.87  525.105.17     Driver Version: 440.87    525.105.17   CUDA Version: 1012.20     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GRID V100A100D-4C 40C       On   | 00000000:00:05.0 Off |                    0 |
| N/A   N/A    P0    N/A /  N/A |    304MiB5MiB /  4096MiB40960MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                GPU Memory |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+


Content by Label
showLabelsfalse
max5
spacesEWCLOUDKB
showSpacefalse
sortmodified
reversetrue
typepage
excludeCurrenttrue
cqllabel in ("gpu","provisioning","machine-learning","nvidia") and type = "page" and space = "EWCLOUDKB"
labelsgpu machine-learning nvidia provisioning

...