Reference Documentation
Let's explore how to use the Slurm Batch System to the ATOS HPCF or ECS.
Basic job submission
Access the default login node of the ATOS HPCF or ECS.
Create a directory for this tutorial so all the exercises and outputs are contained inside:
mkdir ~/batch_tutorial cd ~/batch_tutorial
Create and submit a job called
simplest.sh
with just default settings that runs the commandhostname
. Can you find the output and inspect it? Where did your job run?Configure your
simplest.sh
job to direct the output tosimplest-<jobid>.out
, the error tosimplest-<jobid>.err
both in the same directory, and the job name to just "simplest". Note you will need to use a special placeholder for the -<jobid>
.From a terminal session outside the Atos HPCF or ECS your VDI or computer, submit the
simplest.sh
job remotely. What hostname should you use?
Basic job management
Create a new job script
sleepy.sh
with the contents below:sleepy.sh#!/bin/bash sleep 120
Submit
sleepy.sh
to the batch system and check its status. Once it is running, cancel it and inspect the output.Can you get information about the jobs you have run so far today, including those that have finished already?
Can you get information of all the jobs run today by you that were cancelled?
The default information shown on the screen when querying past jobs is limited. Can you extract the submit, start, and end times of your cancelled jobs today? What about their output and error path? Hint: use the corresponding man page for all the options.
Common pitfalls
Reference Documentation
We will now attempt to troubleshoot some issues
Create a new job script
broken1.sh
with the contents below and try to submit the job. What happened? Can you fix the job and keep trying until it runs successfully?Create a new job script
broken2.sh
with the contents below and try to submit the job. What happened? Can you fix the job and keep trying until it runs successfully?Create a new job script
broken3.sh
with the contents below and try to submit the job. What happened? Can you fix the job and keep trying until it runs successfully?Create a new job script
broken4.sh
with the contents below and try to submit the job. You should not see the message in the output. What happened? Can you fix the job and keep trying until it runs successfully?
Understanding your limits
Although most limits are described in HPC2020: Batch system, you can also check them (or reach them) for yourself in the system.
Create a new job script
naughty.sh
with the following contents:naughty.sh#!/bin/bash #SBATCH --mem=100 #SBATCH --output=naughty.out MEM=300 perl -e "\$a='A'x($MEM*1024*1024/2);sleep 60"
Submit
naughty.sh
to the batch system and check its status. What happened to the job?Edit
naughty.sh
to comment the request for memory, and then play with the MEM value.naughty.sh#!/bin/bash #SBATCH --output=naughty.out ##SBATCH --mem=100 MEM=300 perl -e "\$a='A'x($MEM*1024*1024/2);sleep 60"
How high can you with the default memory limit on the default QoS before the system kills it?
How could you have checked this beforehand instead of taking the trial and error approach?
Can you check, without trial and error this time, what is the maximum wall clock time, maximum CPUs, and maximum memory you can request to Slurm for each QoS?
How many jobs could you potentially have running concurrently? How many jobs could you have in the system (pending or running), before a further submission fails?
Running small parallel jobs - fractional
Reference Documentation
So far we have only run serial jobs. You may also want to run small parallel jobs, either concurrently using just multiple threads, multiple processes or both. Examples of this are MPI and OpenMP programs. We call these kind of small parallel jobs "fractional", because they will run on a fraction of a node, sharing it with other users.
If you followed this tutorial so far, you will have realised ECS users may run very small parallel jobs on the default ef QoS, whereas HPCF users may run slightly bigger jobs (up to half a GPIL node) on the default nf QoS.
For this tests we will use David McKain's version of the Cray xthi code to visualise how the process and thread placement takes place.
Load the
xthi
module with:module load xthi
Run the program interactively to familiarise yourself with the ouptut:
$ xthi Host=ac6-200 MPI Rank=0 CPU=128 NUMA Node=0 CPU Affinity=0,128
As you can see, only 1 process and 1 thread are run, and they may run on one of two virtual cores assigned to my session (which correspond to the same physical CPU). If you try to run with 4 OpenMP threads, you will see they will effectively fight each other for those same two cores, impacting the performance of your application but not anyone else in the login node:
$ OMP_NUM_THREADS=4 xthi Host=ac6-200 MPI Rank=0 OMP Thread=0 CPU=128 NUMA Node=0 CPU Affinity=0,128 Host=ac6-200 MPI Rank=0 OMP Thread=1 CPU= 0 NUMA Node=0 CPU Affinity=0,128 Host=ac6-200 MPI Rank=0 OMP Thread=2 CPU=128 NUMA Node=0 CPU Affinity=0,128 Host=ac6-200 MPI Rank=0 OMP Thread=3 CPU= 0 NUMA Node=0 CPU Affinity=0,128
Create a new job script
fractional.sh
to runxthi
with 2 MPI tasks and 2 OpenMP threads, submit it and check the output to ensure the right number of tasks and threads were spawned.Here is a job template to start with:
Can you ensure each one of the OpenMP threads runs on a single physical core, without exploiting the hyperthreading, for optimal performance?
Running parallel jobs - HPCF only
Reference Documentation
For bigger parallel executions, you will need to use the HPCF's parallel QoS, np, which gives access to the biggest partition of nodes in every complex.
When running in such configuration, your job will get exclusive use of the nodes where it will run so external interferences are minimised. It is important then that the resources allocated are used efficiently.
Here is a very simplified diagram of the Atos HPCF node that you should keep in mind when deciding your job geometries:
- If not already on HPCF, open a session on
hpc-login
. Create a new job script
parallel.sh
to runxthi
with 32 MPI tasks and 4 OpenMP threads, leaving hyperthreading enabled. Submit it and check the output to ensure the right number of tasks and threads were spawned. Take note of what cpus are used, and how much SBUs you used.Here is a job template to start with:
Modify the
parallel.sh
job geometry (number of tasks, threads and use of hyperthreading) so that you fully utilise all the physical cores, and only those, i.e. 0-127.Modify the
parallel.sh
job geometry so it still runs on the np QoS, but only with 2 tasks and 2 threads. Check the SBU cost. Since the execution is 32 times smaller, did it cost 32 times less than the previous? Why?