Slurm is the batch system available. Any script can be submitted as a job with no changes, but you might want to see Writing SLURM jobs to customise it.
To submit a script as a serial job with default options enter the command:
sbatch yourscript.sh
You may query the queues to see the jobs currently running or pending with:
squeue
And cancel a job with
scancel <jobid>
Currently the "scancel" command shall be executed on the login node of the same cluster where the job is running.
See the Slurm documentation for more details on the different commands available to submit, query or cancel jobs.
QoS available
These are the different QoS (or queues) available for standard users on the four complexes:
QoS name | Type | Suitable for... | Shared nodes | Maximum jobs per user | Default / Max Wall Clock Limit | Default / Max CPUs | Default / Max Memory |
---|---|---|---|---|---|---|---|
nf | fractional | serial and small parallel jobs. It is the default | Yes | - | average runtime + standard deviation / 2 days | 1 / 64 | 8 GB / 128 GB |
ni | interactive | serial and small parallel interactive jobs | Yes | 1 | 12 hours / 7 days | 1 / 32 | 8 GB / 32 GB |
np | parallel | parallel jobs requiring more than half a node | No | - | average runtime + standard deviation / 2 days | - | 240GB / 240 GB per node (all usable memory in a node) |
ECS
For those using ECS, these are the different QoS (or queues) available for standard users of this service:
QoS name | Type | Suitable for... | Shared nodes | Maximum jobs per user | Default / Max Wall Clock Limit | Default / Max CPUs | Default / Max Memory |
---|---|---|---|---|---|---|---|
ef | fractional | serial and small parallel jobs - ECGATE service | Yes | - | average job runtime + standard deviation / 2 days | 1 / 8 | 8 GB / 16 GB |
ei | interactive | serial and small parallel interactive jobs - ECGATE service | Yes | 1 | 12 hours / 7 days | 1 / 4 | 8 GB / 8 GB |
el | long | serial and small parallel interactive jobs - ECGATE service | Yes | - | average job runtime + standard deviation / 7 days | 1 / 8 | 8 GB / 16 GB |
et | Time-critical Option 1 | serial and small parallel Time-Critical jobs. Only usable through ECACCESS Time Critical Option-1 | Yes | - | average job runtime + standard deviation / 12 hours | 1 / 8 | 8 GB / 16 GB |
Time limit management
See HPC2020: Job Runtime Management for more information on how the default Wall Clock Time limit is calculated.
Limits are not set in stone
Different limits on the different QoSs may be introduced or changed as the system evolves.
Checking QoS setup
If you want to get all the details of a particular QoS on the system, you may run, for example:
sacctmgr list qos names=nf
Submitting jobs remotely
If you are submitting jobs from a different platform via ssh, please use the *-batch dedicated nodes instead of the *-login equivalents:
- For generic remote job submission on HPCF: hpc-batch or hpc2020-batch
- For remote job submission on a specific HPCF complex: <complex_name>-batch
- For remote job submission to the ECS virtual complex: ecs-batch
For example, to submit a job from a remote platform onto the Atos HCPF:
ssh hpc-batch "sbatch myjob.sh"
HPC2020: Writing SLURM jobs
Any shell script can be submitted as a Slurm job with no modifications. In such a case, sensible default values will be applied to the job. However, you can configure the script to fit your needs through job directives. In Slurm, these are just special comments in your script, usually at the top just after the shebang line, with the form:
HPC2020: Submitting a serial or small parallel job
Serial and small parallel jobs, called fractional, run on the gpil partition and use the same QoS, typically nf for regular users in the Atos HPCF service. For ECS users, they will run on the ecs partition on queue ef.
These are the default queue and partition. They will be used if no directives are specified.
HPC2020: Submitting a parallel job
Parallel jobs run on the compute partition and use the np QoS for regular users.
This queue is not the default, so make sure you explicitly define it your job directives before submission.
Parallel jobs are allocated exclusive nodes, so they will not share resources with other jobs.
HPC2020: Slurm - PBS cheatsheet
Top tips when working with SLURM
- Put all your SLURM directives at the top of the script file, above any commands. Any directive after an executable line in the script is ignored.
- Note that you can pass SLURM directives as options to the
sbatch
command.
HPC2020: Running an interactive job
If you wish to run interactively but are constrained by the limits on the CPUs, CPU Time or memory, you may run a small interactive job requesting the resources you want.
By doing that, you will get a dedicated allocation of CPUs and memory to run your application interactively. There are several ways to do this, depending on your use case:
HPC2020: Multi-complex SLURM management
With 4 identical Atos complexes (also known as clusters) installed in our Data Centre in Bologna - see Atos HPCF: System overview -, we are now able to provide a more reliable computing service at ECMWF, including for batch work. For example, during a system session on one complex, we will submit batch jobs to a different complex. This enhanced batch service however may require the use of some ECMWF customised SLURM commands.
HPC2020: Job Runtime Management
ECMWF enforces killing jobs if they have reached their wall time if #SBATCH --time
or command line option --time
were provided with the job.
If no time limit is specified in the job, an automatic time limit will be configured based on average runtimes of previous similar jobs and allow some grace before it will be killed.
HPC2020: example Slurm serial batch job scripts for ECS
Job scripts
Here you find some simple serial batch job examples which are designed to be submitted to and run in the ef queue of the ECS virtual complex, but can be easily adapted to run on any other complex on the other complexes just changing the QoS to nf. Use them as templates to learn from, or as starting points to construct your own jobs.
HPC2020: example Slurm parallel batch job scripts
Job scripts
Here you find some simple parallel batch job examples which are designed to be submitted to and run in the nf queue on the Atos HPCF. Use them as templates to learn from, or as starting points for constructing your own jobs.
Do not forget to modify the scripts with your own workdir, UID and GID as necessary!
HPC2020: Batch jobs not starting - reasons
There may be a number of reasons why a submitted job does not start running. When that happens, it is a good idea to use squeue
and pay attention to the STATE
and NODELIST(REASON)
columns:
$> squeue -j 64243399 JOBID NAME USER QOS STATE TIME TIME_LIMIT NODES FEATURES NODELIST(REASON) 64243399 my_job user nf PENDING 0:00 03:00:00 1 (null) (Priority)
HPC2020: Affinity
When running parallel jobs, SLURM will automatically set up some default process affinity. This means that every task spawned by srun
(each MPI rank on an MPI execution) will be pinned to a specific core or set of cores within every computing node.
However, the default affinity may not be what you would expect, and depending on the application it could have a significant impact in performance.