...
Tip | ||
---|---|---|
| ||
Every node has a total of 256 virtual cores (128 physical). Every core will have an ID, with IDs 0 and 128 being the two hardware threads on the same physical core. See Atos HPCF: System overview for all the details. |
For this tests we will use David McKain's version of the Cray xthi code to visualise how the process and thread placement takes place.
MPI single threaded execution
...
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
#!/bin/bash #SBATCH -q np #SBATCH -n 128 export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK:-1} ml xthi srun check-affinity -c ${SLURM_CPUS_PER_TASK:-1} xthi |
Code Block | ||||
---|---|---|---|---|
| ||||
[MPI rank Host=ac2-2083 MPI Rank= 0 ThreadCPU=128 NUMA Node=0 Node at1-1000.bullx] Core affinity: CPU Affinity= 0,128 [Host=ac2-2083 MPI rankRank= 1 CPU= 1 Thread NUMA Node=0 Node at1-1000.bullx] Core affinity: CPU Affinity= 1,129 [MPI rank Host=ac2-2083 MPI Rank= 2 ThreadCPU=130 NUMA Node=0 Node at1-1000.bullx] Core affinity: CPU Affinity= 2,130 [Host=ac2-2083 MPI rankRank= 3 3CPU= Thread 3 0NUMA Node at1-1000.bullx] Core affinity: =0 CPU Affinity= 3,131 ... [MPI rank 124 Thread 0 Node at1-1000.bullx] Core affinity: 124,252 [MPI rank 125 Thread 0 Node at1-1000.bullx] Core affinity: 125,253 [MPI rank 126 Thread 0 Node at1-1000.bullx] Core affinity: 126,254 [MPI rank 127 Thread 0 Node at1-1000.bullx] Core affinity: Host=ac2-2083 MPI Rank=124 CPU=252 NUMA Node=7 CPU Affinity=124,252 Host=ac2-2083 MPI Rank=125 CPU=125 NUMA Node=7 CPU Affinity=125,253 Host=ac2-2083 MPI Rank=126 CPU=254 NUMA Node=7 CPU Affinity=126,254 Host=ac2-2083 MPI Rank=127 CPU=127 NUMA Node=7 CPU Affinity=127,255 |
Disabling multithread use
...
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
#!/bin/bash #SBATCH -q np #SBATCH -n 128 #SBATCH --hint=nomultithread export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK:-1} ml xthi srun check-affinity -c ${SLURM_CPUS_PER_TASK:-1} xthi |
Code Block | ||||
---|---|---|---|---|
| ||||
[MPI rank Host=ac1-2083 MPI Rank= 0 Thread CPU= 0 NUMA Node at1-1000.bullx] Core affinity: 0 [MPI rank =0 CPU Affinity= 0 Host=ac1-2083 MPI Rank= 1 CPU= 1 Thread NUMA Node=0 Node at1-1000.bullx] Core affinity: CPU Affinity= 1 [MPI rank Host=ac1-2083 MPI Rank= 2 ThreadCPU= 2 0 Node at1-1000.bullx] Core affinity: 2 [MPI rank 3 Thread 0 Node at1-1000.bullx] Core affinity: 3 ... [MPI rank 124 Thread 0 Node at1-1000.bullx] Core affinity: 124 [MPI rank 125 Thread 0 Node at1-1000.bullx] Core affinity: 125 [MPI rank 126 Thread 0 Node at1-1000.bullx] Core affinity: 126 [MPI rank 127 Thread 0 Node at1-1000.bullx] Core affinity: NUMA Node=0 CPU Affinity= 2 Host=ac1-2083 MPI Rank= 3 CPU= 3 NUMA Node=0 CPU Affinity= 3 ... Host=ac1-2083 MPI Rank=124 CPU=124 NUMA Node=7 CPU Affinity=124 Host=ac1-2083 MPI Rank=125 CPU=125 NUMA Node=7 CPU Affinity=125 Host=ac1-2083 MPI Rank=126 CPU=126 NUMA Node=7 CPU Affinity=126 Host=ac1-2083 MPI Rank=127 CPU=127 NUMA Node=7 CPU Affinity=127 |
Note |
---|
By defining the nomultithread, the maximum number of threads is halved from 256 to 128. |
...
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
#!/bin/bash #SBATCH -q np #SBATCH -n 32 #SBATCH -c 4 export OMP_NUM_THREADS= ml xthi srun -c ${SLURM_CPUS_PER_TASK:-1} srun check-affinityxthi |
Code Block | ||||
---|---|---|---|---|
| ||||
[Host=ac1-1035 MPI rankRank= 0 OMP Thread=0 Thread CPU= 0 NUMA Node at1-1000.bullx] Core affinity:=0 CPU Affinity= 0,1,128,129 [Host=ac1-1035 MPI rankRank= 0 0OMP Thread=1 CPU=128 1NUMA Node at1-1000.bullx] Core affinity:=0 CPU Affinity= 0,1,128,129 [Host=ac1-1035 MPI rankRank= 0 OMP 0Thread=2 Thread CPU= 20 Node at1-1000.bullx] Core affinity: NUMA Node=0 CPU Affinity= 0,1,128,129 [Host=ac1-1035 MPI rankRank= 0 OMP Thread=3 0CPU= Thread 0 3NUMA Node at1-1000.bullx] Core affinity: 0=0 CPU Affinity= 0,1,128,129 [MPI rank 1 Thread 0 Node at1-1000.bullx] Core affinity:Host=ac1-1035 MPI Rank= 1 OMP Thread=0 CPU= 2 NUMA Node=0 CPU Affinity= 2,3,130,131 [MPI rank 1 Thread 1 Node at1-1000.bullx] Core affinity: 2,3,130,131 [MPI rank 1 Thread 2 Node at1-1000.bullx] Core affinity:Host=ac1-1035 MPI Rank= 1 OMP Thread=1 CPU= 3 NUMA Node=0 CPU Affinity= 2,3,130,131 [MPI rank 1 Thread 3 Node at1-1000.bullx] Core affinity:Host=ac1-1035 MPI Rank= 1 OMP Thread=2 CPU=130 NUMA Node=0 CPU Affinity= 2,3,130,131 ... [Host=ac1-1035 MPI rankRank= 1 30OMP Thread=3 CPU=131 NUMA Node=0 Node at1-1000.bullx] Core affinity: 60,61,188,189 [MPI rank 30 Thread 1 Node at1-1000.bullx] Core affinity: 60,61,188,189 [MPI rank 30 Thread 2 Node at1-1000.bullx] Core affinity: 60,61,188,189 [MPI rank 30 Thread 3 Node at1-1000.bullx] Core affinity: 60,61,188,189 [MPI rank 31 Thread 0 Node at1-1000.bullx] Core affinity: 62,63,190,191 [MPI rank 31 Thread 1 Node at1-1000.bullx] Core affinity: 62,63,190,191 [MPI rank 31 Thread 2 Node at1-1000.bullx] Core affinity: 62,63,190,191 [MPI rank 31 Thread 3 Node at1-1000.bullx] Core affinity: 62,63,190,191 CPU Affinity= 2,3,130,131 ... Host=ac1-1081 MPI Rank=30 OMP Thread=0 CPU=124 NUMA Node=7 CPU Affinity=124,125,252,253 Host=ac1-1081 MPI Rank=30 OMP Thread=1 CPU=253 NUMA Node=7 CPU Affinity=124,125,252,253 Host=ac1-1081 MPI Rank=30 OMP Thread=2 CPU=252 NUMA Node=7 CPU Affinity=124,125,252,253 Host=ac1-1081 MPI Rank=30 OMP Thread=3 CPU=125 NUMA Node=7 CPU Affinity=124,125,252,253 Host=ac1-1081 MPI Rank=31 OMP Thread=0 CPU=255 NUMA Node=7 CPU Affinity=126,127,254,255 Host=ac1-1081 MPI Rank=31 OMP Thread=1 CPU=126 NUMA Node=7 CPU Affinity=126,127,254,255 Host=ac1-1081 MPI Rank=31 OMP Thread=2 CPU=127 NUMA Node=7 CPU Affinity=126,127,254,255 Host=ac1-1081 MPI Rank=31 OMP Thread=3 CPU=254 NUMA Node=7 CPU Affinity=126,127,254,255 |
Warning | ||
---|---|---|
In some cases with low number of tasks/threads, it may be necessary to force that binding defining the hint:
|
...
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
#!/bin/bash #SBATCH -q np #SBATCH -n 32 #SBATCH -c 4 export OMP_NUM_THREADS=PLACES=threads ml xthi srun -c ${SLURM_CPUS_PER_TASK:-1} export OMP_PLACES=threads srun check-affinity-1} xthi |
Code Block | ||||
---|---|---|---|---|
| ||||
[Host=ac2-1078 MPI rankRank= 0 OMP Thread=0 Thread CPU= 0 NUMA Node at1-1000.bullx] Core affinity: 0 [MPI rank 0 Thread 1 Node at1-1000.bullx] Core affinity: 128 [MPI rank 0 Thread 2 Node at1-1000.bullx] Core affinity: 1 [MPI rank 0 Thread 3 Node at1-1000.bullx] Core affinity: 129 [MPI rank 1 Thread 0 Node at1-1000.bullx] Core affinity: 2 [MPI rank 1 Thread 1 Node at1-1000.bullx] Core affinity: 130 [MPI rank 1 Thread 2 Node at1-1000.bullx] Core affinity: 3 [MPI rank 1 Thread 3 Node at1-1000.bullx] Core affinity: 131 ... [MPI rank 30 Thread 0 Node at1-1000.bullx] Core affinity: 60 [MPI rank 30 Thread 1 Node at1-1000.bullx] Core affinity: 188 [MPI rank 30 Thread 2 Node at1-1000.bullx] Core affinity: 61 [MPI rank 30 Thread 3 Node at1-1000.bullx] Core affinity: 189 [MPI rank 31 Thread 0 Node at1-1000.bullx] Core affinity: 62 [MPI rank 31 Thread 1 Node at1-1000.bullx] Core affinity: 190 [MPI rank 31 Thread 2 Node at1-1000.bullx] Core affinity: 63 [MPI rank 31 Thread 3 Node at1-1000.bullx] Core affinity: 191=0 CPU Affinity= 0 Host=ac2-1078 MPI Rank= 0 OMP Thread=1 CPU=128 NUMA Node=0 CPU Affinity=128 Host=ac2-1078 MPI Rank= 0 OMP Thread=2 CPU= 1 NUMA Node=0 CPU Affinity= 1 Host=ac2-1078 MPI Rank= 0 OMP Thread=3 CPU=129 NUMA Node=0 CPU Affinity=129 Host=ac2-1078 MPI Rank= 1 OMP Thread=0 CPU= 2 NUMA Node=0 CPU Affinity= 2 Host=ac2-1078 MPI Rank= 1 OMP Thread=1 CPU=130 NUMA Node=0 CPU Affinity=130 Host=ac2-1078 MPI Rank= 1 OMP Thread=2 CPU= 3 NUMA Node=0 CPU Affinity= 3 Host=ac2-1078 MPI Rank= 1 OMP Thread=3 CPU=131 NUMA Node=0 CPU Affinity=131 ... Host=ac2-1078 MPI Rank=30 OMP Thread=0 CPU=116 NUMA Node=7 CPU Affinity=116 Host=ac2-1078 MPI Rank=30 OMP Thread=1 CPU=244 NUMA Node=7 CPU Affinity=244 Host=ac2-1078 MPI Rank=30 OMP Thread=2 CPU=117 NUMA Node=7 CPU Affinity=117 Host=ac2-1078 MPI Rank=30 OMP Thread=3 CPU=245 NUMA Node=7 CPU Affinity=245 Host=ac2-1078 MPI Rank=31 OMP Thread=0 CPU=118 NUMA Node=7 CPU Affinity=118 Host=ac2-1078 MPI Rank=31 OMP Thread=1 CPU=246 NUMA Node=7 CPU Affinity=246 Host=ac2-1078 MPI Rank=31 OMP Thread=2 CPU=119 NUMA Node=7 CPU Affinity=119 Host=ac2-1078 MPI Rank=31 OMP Thread=3 CPU=247 NUMA Node=7 CPU Affinity=247 |
Warning |
---|
As you can see, this is not using all the physical cores in the node. |
...
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
#!/bin/bash #SBATCH -q np #SBATCH -n 32 #SBATCH -c 4 #SBATCH --hint=nomultithread export OMP_NUM_THREADS=PLACES=threads ml xthi srun -c ${SLURM_CPUS_PER_TASK:-1} export OMP_PLACES=threads srun check-affinity xthi |
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
Host=ac2-1078 MPI Rank= 0 OMP Thread=0 CPU=[MPI rank 0 Thread 0 NUMA Node at1-1000.bullx] Core affinity: 0 [MPI rank 0 Thread =0 CPU Affinity= 0 Host=ac2-1078 MPI Rank= 0 OMP Thread=1 CPU= 1 NUMA Node at1-1000.bullx] Core affinity: 1 [MPI rank 0 Thread =0 CPU Affinity= 1 Host=ac2-1078 MPI Rank= 0 OMP Thread=2 CPU= 2 NUMA Node at1-1000.bullx] Core affinity: 2 [MPI rank 0 Thread 3 Node at1-1000.bullx] Core affinity: 3 [MPI rank 1 Thread 0 Node at1-1000.bullx] Core affinity: 4 [MPI rank 1 Thread 1 Node at1-1000.bullx] Core affinity: 5 [MPI rank 1 Thread 2 Node at1-1000.bullx] Core affinity: 6 [MPI rank 1 Thread 3 Node at1-1000.bullx] Core affinity: 7 ... [MPI rank 30 Thread 0 Node at1-1000.bullx] Core affinity: 120 [MPI rank 30 Thread 1 Node at1-1000.bullx] Core affinity: 121 [MPI rank 30 Thread 2 Node at1-1000.bullx] Core affinity: 122 [MPI rank 30 Thread 3 Node at1-1000.bullx] Core affinity: 123 [MPI rank 31 Thread 0 Node at1-1000.bullx] Core affinity: 124 [MPI rank 31 Thread 1 Node at1-1000.bullx] Core affinity: 125 [MPI rank 31 Thread 2 Node at1-1000.bullx] Core affinity: 126 [MPI rank 31 Thread 3 Node at1-1000.bullx] Core affinity: =0 CPU Affinity= 2 Host=ac2-1078 MPI Rank= 0 OMP Thread=3 CPU= 3 NUMA Node=0 CPU Affinity= 3 Host=ac2-1078 MPI Rank= 1 OMP Thread=0 CPU= 4 NUMA Node=0 CPU Affinity= 4 Host=ac2-1078 MPI Rank= 1 OMP Thread=1 CPU= 5 NUMA Node=0 CPU Affinity= 5 Host=ac2-1078 MPI Rank= 1 OMP Thread=2 CPU= 6 NUMA Node=0 CPU Affinity= 6 Host=ac2-1078 MPI Rank= 1 OMP Thread=3 CPU= 7 NUMA Node=0 CPU Affinity= 7 ... Host=ac2-1078 MPI Rank=30 OMP Thread=0 CPU=120 NUMA Node=7 CPU Affinity=120 Host=ac2-1078 MPI Rank=30 OMP Thread=1 CPU=121 NUMA Node=7 CPU Affinity=121 Host=ac2-1078 MPI Rank=30 OMP Thread=2 CPU=122 NUMA Node=7 CPU Affinity=122 Host=ac2-1078 MPI Rank=30 OMP Thread=3 CPU=123 NUMA Node=7 CPU Affinity=123 Host=ac2-1078 MPI Rank=31 OMP Thread=0 CPU=124 NUMA Node=7 CPU Affinity=124 Host=ac2-1078 MPI Rank=31 OMP Thread=1 CPU=125 NUMA Node=7 CPU Affinity=125 Host=ac2-1078 MPI Rank=31 OMP Thread=2 CPU=126 NUMA Node=7 CPU Affinity=126 Host=ac2-1078 MPI Rank=31 OMP Thread=3 CPU=127 NUMA Node=7 CPU Affinity=127 |
Further customisation
Note |
---|
Only recommended for expert users who want to have full control of how the binding and distribution is done. |
...