Page History
With 4 identical Atos complexes (also known as clusters) (installed in our Data Centre in Bologna - see Atos HPCF: System overview) -, we are now able to provide a more reliable computing service at ECMWF, including for batch work. For example, during a system session on one complex, we will submit batch jobs to a different complex. This enhanced batch service however may require the use of some ECMWF customised SLURM commands.
...
Note | ||
---|---|---|
| ||
If you use the SLURM sbatch command, in /usr/bin, you will not benefit from the cross-complex job submission. E.g., under cron and by default, PATH only contains /usr/bin; you will then only submit jobs to the complex you cron entry is on. All SLURM sbatch options are available with the ECMWF customised sbatch command Job IDs are unique amongst all complexes, no risk to have duplicated ones. |
Monitoring a job: ecsqueue
...
No Format |
---|
$ ecscancel --help usage: ecscancel [-h] [-u USER] [-t STATE] [-f] [-b] [-i] [-q QOS] [-n JOBNAME] [-s SIGNAL] [-M CLUSTERS] [jobid [jobid ...]] positional arguments: jobid list of jobids optional arguments: -h, --help show this help message and exit -u USER, --user USER scancel for particular user -t STATE, --state STATE scancel for particular state -f, --full scancel full -b, --batch scancel batch step -i, --interactive scancel interactive -q QOS, --qos QOS scancel qos -n JOBNAME, --jobname JOBNAME scancel jobname -s SIGNAL, --signal SIGNAL scancel with a signal -M CLUSTERS, --clusters CLUSTERS scancel for particular cluster, or comma separated list of clusters $ ecscancel <jobid> # will cancel canceljobjob <jobid> on one of the four complexes. |
Note | ||
---|---|---|
| ||
ecsqueue is located in /usr/local/bin. You may need to adapt your PATH. Only limited SLURM scancel options are available with ecscancel. |