...
The current infrastructure at ECMWF features 32 x A100 80 GB cards targeting Machine Learning workloads. They are exposed as Virtual GPUS to the instances on the cloud, which allows for multiple VMs to transparently share the same Physical GPU card.
How to provision a GPU-enabled instance
Once your tenant is granted access to the GPUs, creating a new VM with access to a virtual GPU is very straightforward. Follow the process on Provision a new instance - web paying special attention on the Configuration Step:
...
Info |
---|
Once your instance is running, you can check wether your instance can see the GPU with: No Format |
---|
$> nvidia-smi
Mon Oct 01 09:17:13 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.105.17 Driver Version: 525.105.17 CUDA Version: 12.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 GRID A100D-40C On | 00000000:00:05.0 Off | 0 |
| N/A N/A P0 N/A / N/A | 5MiB / 40960MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+ |
|
Related articles
Content by Label |
---|
showLabels | false |
---|
max | 5 |
---|
spaces | EWCLOUDKB |
---|
showSpace | false |
---|
sort | modified |
---|
reverse | true |
---|
type | page |
---|
excludeCurrent | true |
---|
cql | label in ("gpu","provisioning","machine-learning","nvidia") and type = "page" and space = "EWCLOUDKB" |
---|
labels | gpu machine-learning nvidia provisioning |
---|
|
...