Slurm is the batch system available. Any script can be submitted as a job with no changes, but you might want to see Writing SLURM jobs to customise it.
...
And cancel a job with
No Format |
---|
scancel <jobid> |
Note |
---|
Currently the "scancel" command shall be executed on the login node of the same cluster where the job is running. |
See the Slurm documentation for more details on the different commands available to submit, query or cancel jobs.
...
QoS name | Type | Suitable for... | Shared nodes | Maximum jobs per user | Default / Max Wall Clock Limit | Default / Max CPUs | Default / Max Memory | ||
---|---|---|---|---|---|---|---|---|---|
nf | fractional | serial and small parallel jobs. It is the default | Yes | - | 6 hours average runtime + standard deviation / 2 days | 1 / 64 | 8 GB / 128 GB | ||
ni | interactive | serial and small parallel interactive jobs | Yes | 1 | 12 hours / 7 days | 1 / 32 | 8 GB / 32 GB | ||
np | parallel | parallel jobs requiring more than half a node | No | - | 6 hours / 2 days | - | -average runtime + standard deviation / 2 days | - | 240GB / 240 GB per node (all usable memory in a node) |
Show If | ||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ||||||||||||||||||
GPU special PartitionOn the AC complex there is also the ng queue that gives access to the special partition with GPU-enabled nodes. See HPC2020: GPU usage for AI and Machine Learning for all the details on how to make use of those special resources.
|
ECS
For those using ECS, these are the different QoS (or queues) available for standard users of this service:
QoS name | Type | Suitable for... | Shared nodes | Maximum jobs per user | Default / Max Wall Clock Limit | Default / Max CPUs | Default / Max Memory |
---|---|---|---|---|---|---|---|
ef | fractional | serial and small parallel jobs - ECGATE service | Yes | - | 12 hours average job runtime + standard deviation / 2 days | 1 / 8 | 8 GB / 16 GB |
ei | interactive | serial and small parallel interactive jobs - ECGATE service | Yes | 1 | 12 hours / 7 days | 1 / 4 | 8 GB / 8 GB |
el | long | serial and small parallel interactive jobs - ECGATE service | Yes | - | 12 hours average job runtime + standard deviation / 7 days | 1 / 8 | 8 GB / 16 GB |
et | Time-critical Option 1 | serial and small parallel Time-Critical jobs. Only usable through ECACCESS Time Critical Option-1 | Yes | - | 12 hours average job runtime + standard deviation / 12 hours | 1 / 8 | 8 GB / 16 GB |
Info | ||
---|---|---|
| ||
See HPC2020: Job Runtime Management for more information on how the default Wall Clock Time limit is calculated. |
Note | ||
---|---|---|
| ||
Different limits on the different QoSs may be introduced or changed as the system evolves. |
Tip | ||
---|---|---|
| ||
If you want to get all the details of a particular QoS on the system, you may run, for example:
| ||
Note | ||
| ||
Different limits on the different QoSs may be introduced or changed as the system evolves to its final configuration. |
Submitting jobs remotely
If you are submitting jobs from a different platform via ssh, please use the *-batch dedicated nodes instead of the *-login equivalents:
...