Page History

Versions Compared

Key

This line was added.
This line was removed.
Formatting was changed.

...

Create a directory for this tutorial so all the exercises and outputs are contained inside:
No Format
mkdir ~/batch_tutorial cd ~/batch_tutorial

No Format
mkdir ~/batch_tutorial cd ~/batch_tutorial

Create and submit a job called simplest.sh with just default settings that runs the command hostname. Can you find the output and inspect it? Where did your job run?

Expand

title	Solution

Using your favourite editor, create a file called simplest.sh with the following content

Code Block

language	bash
title	simplest.sh

#!/bin/bash
hostname

You can submit it with sbatch:

No Format
sbatch simplest.sh

The job should be run shortly. When finished, a new file called slurm-<jobid>.out should appear in the same directory. You can check the output with:

No Format

$ cat $(ls -r11 slurm-*.out | headtail -n1)
ab6-202.bullx
[ECMWF-INFO -ecepilog] ----------------------------------------------------------------------------------------------------
[ECMWF-INFO -ecepilog] This is the ECMWF job Epilogue
[ECMWF-INFO -ecepilog] +++ Please report issues using the Support portal +++
[ECMWF-INFO -ecepilog] +++ https://support.ecmwf.int                     +++
[ECMWF-INFO -ecepilog] ----------------------------------------------------------------------------------------------------
[ECMWF-INFO -ecepilog] Run at 2023-10-25T11:31:53 on ecs
[ECMWF-INFO -ecepilog] JobName                   : simplest.sh
[ECMWF-INFO -ecepilog] JobID                     : 64273363
[ECMWF-INFO -ecepilog] Submit                    : 2023-10-25T11:31:36
[ECMWF-INFO -ecepilog] Start                     : 2023-10-25T11:31:51
[ECMWF-INFO -ecepilog] End                       : 2023-10-25T11:31:53
[ECMWF-INFO -ecepilog] QueuedTime                : 15.0
[ECMWF-INFO -ecepilog] ElapsedRaw                : 2
[ECMWF-INFO -ecepilog] ExitCode                  : 0:0
[ECMWF-INFO -ecepilog] DerivedExitCode           : 0:0
[ECMWF-INFO -ecepilog] State                     : COMPLETED
[ECMWF-INFO -ecepilog] Account                   : myaccount
[ECMWF-INFO -ecepilog] QOS                       : ef
[ECMWF-INFO -ecepilog] User                      : user
[ECMWF-INFO -ecepilog] StdOut                    : /etc/ecmwf/nfs/dh1_home_a/user/slurm-64273363.out
[ECMWF-INFO -ecepilog] StdErr                    : /etc/ecmwf/nfs/dh1_home_a/user/slurm-64273363.out
[ECMWF-INFO -ecepilog] NNodes                    : 1
[ECMWF-INFO -ecepilog] NCPUS                     : 2
[ECMWF-INFO -ecepilog] SBU                       : 0.011
[ECMWF-INFO -ecepilog] ----------------------------------------------------------------------------------------------------

You can then see that the script has run on a different node than the one you are on.

If you repeat the operation, you may get your job to run on a different node every time, whichever happens to be free at the time.

Configure your simplest.sh job to direct the output to simplest-<jobid>.out, the error to simplest-<jobid>.err both in the same directory, and the job name to just "simplest". Note you will need to use a special placeholder for the -<jobid>.

Expand

title	Solution

Using your favourite editor, open the simplest.sh job script and add the relevant #SBATCH directives:

Code Block

language	bash
title	simplest.sh

#!/bin/bash
#SBATCH --job-name=simplest
#SBATCH --output=simplest-%j.out
#SBATCH --output=simplest-%j.err
hostname

You can submit it again with:

No Format
sbatch simplest.sh

After a few moments, you should see the new files appear in your directory (job id will be different than the one displayed here):

No Format
$ ls simplest-. simplest-64274497.err simplest-64274497.out

You can check that the job name was also changed in the end of job report:

No Format
$ grep -i jobname $(ls -r11 simplest-*.err \| headtail -n1) [ECMWF-INFO -ecepilog] JobName : simplest

From a terminal session outside the Atos HPCF or ECS your VDI or computer, submit the simplest.sh job remotely. What hostname should you use?

Expand

title	Solution

You must use hpc-batch for HPCF job submissions, or ecs-batch for remote submissions:

No Format
ssh hpc-batch "cd ~/batch_tutorial; sbatch simplest.sh"

No Format
ssh ecs-batch "cd ~/batch_tutorial; sbatch simplest.sh"

Note the change of directory so both the job script, the working directory of the job and its outputs are generated in the right place.

An alternative way of doing this without changing directory would be to tell sbatch to do it for you:

No Format
ssh hpc-batch sbatch -D ~/batch_tutorial ~/batch_tutorial/simplest.sh

or for ECS:

No Format
ssh ecs-batch sbatch -D ~/batch_tutorial ~/batch_tutorial/simplest.sh

...

Create a new job script broken1.sh with the contents below and try to submit the job. What happened? Can you fix the job and keep trying until it runs successfully?

Code Block

language	bash
title	broken1.sh
collapse	true

#SBATCH --job-name = broken 1
#SBATCH --output = broken1-%J.out
#SBATCH --error = broken1-%J.out
#SBATCH --qos = express
#SBATCH --time = 00:05:00 

echo "I was broken!"

Expand

title	Solution

The job above has the following problems:

There is no shebang at the beginning of the script.
There should be no spaces in the directives
There should be no space
QoS "express" does not exist

Here is an amended version:

Code Block

language	bash
title	broken1_fixed.sh

#!/bin/bash
#SBATCH --job-name=broken1
#SBATCH --output=broken1-%J.out
#SBATCH --error=broken1-%J.out
#SBATCH --time=00:05:00 

echo "I was broken!"

Note that the QoS line was removed, but you may also use the following if running on ECS:

No Format
#SBATCH --qos=ef

or the alternatively, if on Atos HPCF:

No Format
#SBATCH --qos=nf

Check that the actual job run and generated the expected output:

No Format
$ grep -v ECMWF-INFO $(ls -1 broken1-*.out \| headtail -n1) I was broken!

Create a new job script broken2.sh with the contents below and try to submit the job. What happened? Can you fix the job and keep trying until it runs successfully?

Code Block

language	bash
title	broken2.sh
collapse	true

#!/bin/bash
#SBATCH --job-name=broken2
#SBATCH --output=broken2-%J.out
#SBATCH --error=broken2-%J.out
#SBATCH --qos=ns
#SBATCH --time=10-00

echo "I was broken!"

Expand

title	Solution

The job above has the following problems:

QoS "ns" does not exist. Either remove to use the default or use the corresponding QoS on ECS (ef) or HPCF (nf)
The time requested is 10 days, which is longer than the maximum allowed. it was probably meant to be 10 minutes

Here is an amended version:

Code Block

language	bash
title	broken1.sh

#!/bin/bash
#SBATCH --job-name=broken2
#SBATCH --output=broken2-%J.out
#SBATCH --error=broken2-%J.out
#SBATCH --time=10:00

echo "I was broken!"

Again, note that the QoS line was removed, but you may also use the following if running on ECS:

No Format
#SBATCH --qos=ef

or the alternatively, if on Atos HPCF:

No Format
#SBATCH --qos=nf

Check that the actual job run and generated the expected output:

No Format
$ grep -v ECMWF-INFO $(ls -1 broken2-*.out \| headtail -n1) I was broken!

Create a new job script broken3.sh with the contents below and try to submit the job. What happened? Can you fix the job and keep trying until it runs successfully?

Code Block

language	bash
title	broken3.sh
collapse	true

#!/bin/bash
#SBATCH --job-name=broken3
#SBATCH --chdir=$SCRATCH
#SBATCH --output=broken3output/broken3-%J.out
#SBATCH --error=broken3output/broken3-%J.out

echo "I was broken!"

Expand

title	Solution

The job above has the following problems:

Variables are not expanded on job directives. You must specify your paths explicitly

The directory where the output and error files will go must exist beforehand. Otherwise the job will fail but you will not get any hint as to what may have happened to the job. The only hint would be if checking sacct:

No Format

$ sacct -X --name=broken3
JobID                 JobName       QOS      State ExitCode    Elapsed   NNodes             NodeList 
------------ ---------------- --------- ---------- -------- ---------- -------- -------------------- 
64281800              broken3        ef     FAILED     0:53   00:00:02        1              ad6-201

You will need to create the output directory with:

No Format
mkdir -p $SCRATCH/broken3output/

Here is an amended version of the job:

Code Block

language	bash
title	broken3.sh

#!/bin/bash
#SBATCH --job-name=broken3
#SBATCH --chdir=/scratch/<your_user_id>
#SBATCH --output=broken3output/broken3-%J.out
#SBATCH --error=broken3output/broken3-%J.out

echo "I was broken!"

Check that the actual job run and generated the expected output:

No Format
$ grep -v ECMWF-INFO $(ls -1 $SCRATCH/broken3output/broken3-*.out \| headtail -n1) I was broken!

You may clean up the output directory with

No Format
rm -rf $SCRATCH/broken3output

...

If not already on HPCF, open a session on hpc-login.

Create a new job script parallel.sh to run xthi with 32 MPI tasks and 4 OpenMP threads, leaving hyperthreading enabled. Submit it and check the output to ensure the right number of tasks and threads were spawned. Take note of what cpus are used, and how much SBUs you used.

Here is a job template to start with:

Code Block

language	bash
title	parallel.sh
collapse	true

#!/bin/bash
#SBATCH --output=parallel-%j.out
#SBATCH --qos=np
# Add here the missing SBATCH directives for the relevant resources  

export OMP_PLACES=threads
srun -c $SLURM_CPUS_PER_TASK ./xthi

Expand

title	Solution

Using your favourite editor, create a file called parallel.sh with the following content:

Code Block

language	bash
title	paralell.sh

#!/bin/bash 
#SBATCH --output=parallel-%j.out
#SBATCH --qos=np 
# Add here the missing SBATCH directives for the relevant resources
#SBATCH --ntasks=32
#SBATCH --cpus-per-task=4

export OMP_PLACES=threads
srun -c $SLURM_CPUS_PER_TASK ./xthi

You need to request 32 tasks, and 4 cpus per task in the job. Then we will use srun to spawn our parallel run, which should inherit the job geometry requested, except the cpus-per-task, which must be explicitly passed to srun.

You can submit it with sbatch:

No Format
sbatch fractionalparallel.sh

The job should be run shortly. When finished, a new file called parallel-<jobid>.out should appear in the same directory. You can check the relevant output with:

No Format
grep -v ECMWF-INFO $(ls -1 parallel-*.out \| headtail -n1)

You should see an output similar to:

No Format

Host=ac2-4046  MPI Rank= 0  OMP Thread=0  CPU=  0  NUMA Node=0  CPU Affinity=  0
Host=ac2-4046  MPI Rank= 0  OMP Thread=1  CPU=128  NUMA Node=0  CPU Affinity=128
Host=ac2-4046  MPI Rank= 0  OMP Thread=2  CPU=  1  NUMA Node=0  CPU Affinity=  1
Host=ac2-4046  MPI Rank= 0  OMP Thread=3  CPU=129  NUMA Node=0  CPU Affinity=129
Host=ac2-4046  MPI Rank= 1  OMP Thread=0  CPU=  2  NUMA Node=0  CPU Affinity=  2
Host=ac2-4046  MPI Rank= 1  OMP Thread=1  CPU=130  NUMA Node=0  CPU Affinity=130
Host=ac2-4046  MPI Rank= 1  OMP Thread=2  CPU=  3  NUMA Node=0  CPU Affinity=  3
Host=ac2-4046  MPI Rank= 1  OMP Thread=3  CPU=131  NUMA Node=0  CPU Affinity=131
...
Host=ac2-4046  MPI Rank=30  OMP Thread=0  CPU=116  NUMA Node=7  CPU Affinity=116
Host=ac2-4046  MPI Rank=30  OMP Thread=1  CPU=244  NUMA Node=7  CPU Affinity=244
Host=ac2-4046  MPI Rank=30  OMP Thread=2  CPU=117  NUMA Node=7  CPU Affinity=117
Host=ac2-4046  MPI Rank=30  OMP Thread=3  CPU=245  NUMA Node=7  CPU Affinity=245
Host=ac2-4046  MPI Rank=31  OMP Thread=0  CPU=118  NUMA Node=7  CPU Affinity=118
Host=ac2-4046  MPI Rank=31  OMP Thread=1  CPU=246  NUMA Node=7  CPU Affinity=246
Host=ac2-4046  MPI Rank=31  OMP Thread=2  CPU=119  NUMA Node=7  CPU Affinity=119
Host=ac2-4046  MPI Rank=31  OMP Thread=3  CPU=247  NUMA Node=7  CPU Affinity=247

Note the following facts:

Both the main cores (0-127) and hyperthreads (128-256) where were used.
You get consecutive threads on the same physical CPU (0 with 128, 1 with 129...).
There are physical cpus entirely unused, since their cpu number does show in the output.

In terms of SBUs, this job cost:

No Format
$ grep SBU $(ls -1 parallel-*.out \| headtail -n1) [ECMWF-INFO -ecepilog] SBU : 26.689051

Modify the parallel.sh job geometry (number of tasks, threads and threadsuse of hyperthreading) so that you fully utilise all the physical cores of the node but none of the hyperthreads, and only those, i.e. 0-127.

Expand

title	Solution

Without using hyperthreading, an Atos HPCF node has 128 phyisical cores available. Any combination of tasks and threads that adds up to that figure will fill the node. Examples include 32 tasks x 4 threads, 64 tasks x 2 threads or 128 single-threaded tasks. For this example, we picked the first one:

Code Block

language	bash
title	paralell.sh

#!/bin/bash

Using your favourite editor, create a file called parallel.sh with the following content:

Code Block

language	bash
title	paralell.sh

#!/bin/bash 
#SBATCH --output=parallel-%j.out
#SBATCH --qos=np 
# Add here the missing SBATCH directives for the relevant resources
#SBATCH --ntasks=32
#SBATCH --cpus-per-task=4
#SBATCH --hint=nomultithread

export OMP_PLACES=threads
srun -c $SLURM_CPUS_PER_TASK ./xthi

No Format
sbatch fractionalparallel.sh

The job should be run shortly. When finished, a new file called parallel-<jobid>.out should appear in the same directory. You can check the relevant output with:

No Format
grep -v ECMWF-INFO $(ls -1 parallel-*.out \| headtail -n1)

You should see an output similar to:

No Format

Host=ac2ac3-40462015  MPI Rank= 0  OMP Thread=0  CPU=  0  NUMA Node=0  CPU Affinity=  0
Host=ac2-4046  MPI Rank= 0  OMP Thread=1  CPU=128  NUMA Node=0  CPU Affinity=128
Host=ac2-4046  MPI Rank= 0  OMP Thread=2  CPU=  1  NUMA Node=0  CPU Affinity=  1
Host=ac2-4046  MPI Rank= 0  OMP Thread=3  CPU=129  NUMA Node=0  CPU Affinity=129
Host=ac2-4046  MPI Rank= 1  OMP Thread=0  CPU=  2  NUMA Node=0  CPU Affinity=  2
Host=ac2-4046  MPI Rank= 1  OMP Thread=1  CPU=130  NUMA Node=0  CPU Affinity=130
Host=ac2-4046  MPI Rank= 1  OMP Thread=2  CPU=  3  NUMA Node=0  CPU Affinity=  3
Host=ac2-4046                            
Host=ac3-2015  MPI Rank= 10  OMP Thread=31  CPU=131  1  NUMA Node=0  CPU Affinity=131
...
Host=ac2-4046  MPI1 Rank=30  OMP Thread=0  CPU=116  NUMA Node=7  CPU Affinity=116
Host=ac2-4046  MPI Rank=30  OMP Thread=1  CPU=244  NUMA Node=7  CPU Affinity=244
Host=ac2-4046  MPI Rank=30  OMP Thread=2  CPU=117  NUMA Node=7  CPU Affinity=117
Host=ac2-4046                                                                                    
Host=ac3-2015  MPI Rank=30 0  OMP Thread=32  CPU=245  2  NUMA Node=7  CPU Affinity=245
Host=ac2-40460  CPU Affinity=  2                                                                                                              
Host=ac3-2015  MPI Rank= 0  OMP Thread=3  CPU=  3  NUMA Node=0  CPU Affinity=  3
Host=ac3-2015  MPI Rank= 1  OMP Thread=0  CPU=  4  NUMA Node=0  CPU Affinity=  4
Host=ac3-2015  MPI Rank= 1  OMP Thread=1  CPU=  5  NUMA Node=0  CPU Affinity=  5
Host=ac3-2015  MPI Rank= 1  OMP Thread=2  CPU=  6  NUMA Node=0  CPU Affinity=  6
Host=ac3-2015  MPI Rank= 1  OMP Thread=3  CPU=  7  NUMA Node=0  CPU Affinity=  7
... 
Host=ac3-2015  MPI Rank=30  OMP Thread=0  CPU=120  NUMA Node=7  CPU Affinity=120
Host=ac3-2015  MPI Rank=30  OMP Thread=1  CPU=121  NUMA Node=7  CPU Affinity=121
Host=ac3-2015  MPI Rank=30  OMP Thread=2  CPU=122  NUMA Node=7  CPU Affinity=122
Host=ac3-2015  MPI Rank=30  OMP Thread=3  CPU=123  NUMA Node=7  CPU Affinity=123
Host=ac3-2015  MPI Rank=31  OMP Thread=0  CPU=124  NUMA Node=7  CPU Affinity=124
Host=ac3-2015  MPI Rank=31  OMP Thread=1  CPU=125  NUMA Node=7  CPU Affinity=125
Host=ac3-2015  MPI Rank=31  OMP Thread=2  CPU=126  NUMA Node=7  CPU Affinity=126
Host=ac3-2015  MPI Rank=31  OMP Thread=3  CPU=127  NUMA Node=7  CPU Affinity=127

Note the following facts:

Only the main cores (0-127) were used.
Each thread gets one and only one cpu pinned to it.
All the phyisical cores are in use

In terms of SBUs, this job cost:

No Format
$ grep SBU $(ls -1 parallel-*.out \| tail -n1) [ECMWF-INFO -ecepilog] SBU : 5.379

Modify the parallel.sh job geometry so it still runs on the np qos, but only with 2 tasks and 2 threads. Check the SBU cost. Since the execution is 32 times smaller, did it cost 32 times less than the previous? Why?

Expand

title	Solution

Let's use the following job:

Code Block

language	bash
title	paralell.sh

#!/bin/bash 
#SBATCH --output=parallel-%j.out
#SBATCH --qos=np 
# Add here the missing SBATCH directives for the relevant resources
#SBATCH --ntasks=2
#SBATCH --cpus-per-task=2
#SBATCH --hint=nomultithread

export OMP_PLACES=threads
srun -c $SLURM_CPUS_PER_TASK ./xthi

You can submit it with sbatch:

No Format
sbatch fractional.sh

The job should be run shortly. When finished, a new file called parallel-<jobid>.out should appear in the same directory. You can check the relevant output with:

No Format
grep -v ECMWF-INFO $(ls -1 parallel-*.out \| tail -n1)

You should see an output similar to:

No Format

Host=ac2-3073  MPI Rank=310  OMP Thread=0  CPU=118 0  NUMA Node=70  CPU Affinity=118 0
Host=ac2-40463073  MPI Rank=310  OMP Thread=1  CPU=246 1  NUMA Node=70  CPU Affinity=246 1
Host=ac2-40463073  MPI Rank=311  OMP Thread=20  CPU=11916  NUMA Node=71  CPU Affinity=11916
Host=ac2-40463073  MPI Rank=311  OMP Thread=3  CPU=247  NUMA Node=7  CPU Affinity=247

Note the following facts:

Both the main cores (0-127) and hyperthreads (128-256) where used.

You get consecutive threads on the same physical CPU (0 with 128, 1 with 129...).

There are physical cpus entirely unused, since their cpu number does show in the output.

1  CPU=17  NUMA Node=1  CPU Affinity=17

In terms of SBUs, this job cost:

No Format
$ grep SBU $(ls -1 parallel-*.out \| headtail -n1) [ECMWF-INFO -ecepilog] SBU : 2.6894.034

This is in a similar scale to the previous one which 32 times bigger one. The reason behind it is that on the np QoS the allocation is done in full nodes. The SBU cost takes into account the allocated nodes for a given period of time, no matter how they are used.

Content

Space Tools

Versions Compared

Old Version 17

New Version 18

Key