Parallel jobs run on the compute partition and use the np QoS for regular users.
This queue is not the default, so make sure you explicitly define it your job directives before submission.
Parallel jobs are allocated exclusive nodes, so they will not share resources with other jobs.
Make sure the job is configured to fully utilise all the computing resources. For small parallel executions you may want to consider using fractional jobs instead. |
See HPC2020: Affinity for more information on how to set up the cpu binding properly for your parallel runs |
To spawn an MPI application you must use |
#!/bin/bash #SBATCH --job-name=test-mpi #SBATCH --qos=np #SBATCH --ntasks=512 #SBATCH --time=10:00 #SBATCH --output=test-mpi.%j.out #SBATCH --error=test-mpi.%j.out srun my_mpi_app |
The example above would run a 512 task MPI application
To spawn an MPI application you must use |
This example runs a hybrid application spawning 128 MPI tasks, with each one of them opening up 4 threads.
#!/bin/bash #SBATCH --job-name=test-hybrid #SBATCH --qos=np #SBATCH --ntasks=128 #SBATCH --cpus-per-task=4 #SBATCH --time=10:00 #SBATCH --output=test-hybrid.%j.out #SBATCH --error=test-hybrid.%j.out # Ensure OpenMP correct pinning export OMP_PLACES=threads srun -c $SLURM_CPUS_PER_TASK my_mpi_openmp_app |
See |
To spawn an MPI application you must use |
This example runs a hybrid Multiple Program Multiple Data (MPMD) application, requiring different geometries for different parts of the MPI execution. The job allocates 3 nodes, and then uses the first one to run executable1 with 64 tasks and 2 threads per rank, while the remaining two nodes are used to run the second executable on 64 tasks and 4 threads per rank.
#!/bin/bash #SBATCH --job-name=test-het #SBATCH --qos=np #SBATCH --nodes=3 #SBATCH --hint=nomultithread #SBATCH --time=10:00 #SBATCH --output=test-het.%j.out #SBATCH --error=test-het.%j.out # Needed to avoid occasional job hang at exit export SLURM_MPI_TYPE=none # Ensure OpenMP correct pinning export OMP_PLACES=threads srun -N1 -n 64 -c 2 executable1 : -N2 -n 64 -c 4 executable2 |
The minimum allocation for each part of the heterogeneous execution is one node. |