ECMWF enforces killing jobs if they have reached their wall time if #SBATCH --time
or command line option --time
were provided with the job.
If no time limit is specified in the job, an automatic time limit will be configured based on average runtimes of previous similar jobs and allow some grace before it will be killed.
Similar jobs are identified by a job tag generated based on user ID, job name, job geometry and job output path.
After the first successful run the estimated runtime will be the average runtime of previous successful runs, plus one standard deviation.
For jobs were no recent previous runtime could be found, the assumed runtime is 24h plus an allowance of another 24h as grace time, allowing a new job to run for up to 48h.
Info | ||
---|---|---|
| ||
If an automatic time limit has been set up for the job, it will be given a 24h grace time to allow for system sessions and other issues holding up job progressing before killing it. |
Code Block | ||
---|---|---|
| ||
[ECMWF-INFO -sbatch] - -----------------------
[ECMWF-INFO -sbatch] - jobscript: /home/sycw/slurm/time.job
[ECMWF-INFO -sbatch] - --- SCRIPT OPTIONS ---
[ECMWF-INFO -sbatch] - #SBATCH --job-name=test
[ECMWF-INFO -sbatch] - #SBATCH --nodes=2
[ECMWF-INFO -sbatch] - #SBATCH --mem-per-cpu=100
[ECMWF-INFO -sbatch] - #SBATCH --qos=np
[ECMWF-INFO -sbatch] - #SBATCH --output=/home/sycw/slurm/time_test.out
[ECMWF-INFO -sbatch] - -----------------------
[ECMWF-INFO -sbatch] - --- POST-PROCESSED OPTIONS ---
[ECMWF-INFO -sbatch] - ARG --positional=['time.job']
[ECMWF-INFO -sbatch] - ARG --job_name=test
[ECMWF-INFO -sbatch] - ARG --nodes=2
[ECMWF-INFO -sbatch] - ARG --output=/home/sycw/slurm/time_test.out
[ECMWF-INFO -sbatch] - ARG --qos=np
[ECMWF-INFO -sbatch] - ARG --mem_per_cpu=100
[ECMWF-INFO -sbatch] - ------------------------------
[ECMWF-INFO -sbatch] - jobtag: sycw-test-2x512-/home/sycw/slurm/time_test.out
[ECMWF-INFO -sbatch] - ------------------------------
[ECMWF-INFO -sbatch] - Average Walltime 6 with a Standard Deviation 3
[ECMWF-INFO -sbatch] - Runtime history
[ECMWF-INFO -sbatch] - Date | Cores Cluster Walltime Mem
[ECMWF-INFO -sbatch] - 10.01.2023 - 12:18 | 512 ad 6 200M
[ECMWF-INFO -sbatch] - 10.01.2023 - 12:17 | 512 ad 7 200M
[ECMWF-INFO -sbatch] - ['/usr/bin/sbatch', '--job-name=test', '--nodes=2', '--output=/home/sycw/slurm/time_test.out', '--qos=np', '--mem-per-cpu=100', '--licenses=h2resw01', '--time=00:00:09', '/home/sycw/slurm/time.job']
OR
[ECMWF-INFO -sbatch] - ------------------------------
[ECMWF-INFO -sbatch] - jobtag: sycw-testit6-1x2-/home/sycw/slurm/slurm-_JOBID_.out
[ECMWF-INFO -sbatch] - ------------------------------
[ECMWF-INFO -sbatch] - Average Walltime 6 with a Standard Deviation 3
[ECMWF-INFO -sbatch] - Runtime history
[ECMWF-INFO -sbatch] - Date | Cores Cluster Walltime Mem
[ECMWF-INFO -sbatch] - 03.01.2023 - 15:24 | 2 ad 9 120000M
[ECMWF-INFO -sbatch] - 12.12.2022 - 12:50 | 2 aa 6 1000M
[ECMWF-INFO -sbatch] - 06.12.2022 - 15:29 | 2 ad 7 1000M
[ECMWF-INFO -sbatch] - 05.12.2022 - 08:31 | 2 ac 7 1000M
[ECMWF-INFO -sbatch] - 02.12.2022 - 10:15 | 2 aa 6 1000M
[ECMWF-INFO -sbatch] - 02.12.2022 - 10:15 | 2 aa 6 1000M
[ECMWF-INFO -sbatch] - 01.12.2022 - 11:00 | 2 aa 7 1000M
[ECMWF-INFO -sbatch] - 30.11.2022 - 14:33 | 2 ad 7 1000M
[ECMWF-INFO -sbatch] - 23.11.2022 - 13:34 | 2 ac 7 1000M
[ECMWF-INFO -sbatch] - 23.11.2022 - 13:30 | 2 ac 7 1000M
[ECMWF-INFO -sbatch] - 23.11.2022 - 13:23 | 2 ac 7 1000M
[ECMWF-INFO -sbatch] - 23.11.2022 - 13:17 | 2 ac 6 1000M
[ECMWF-INFO -sbatch] - 23.11.2022 - 13:14 | 2 ac 7 1000M
[ECMWF-INFO -sbatch] - 23.11.2022 - 13:10 | 2 ac 7 1000M
[ECMWF-INFO -sbatch] - ['/usr/bin/sbatch', '--cpus-per-task=1', '--job-name=testit6', '--ntasks=1', '--no-requeue', '--nodes=1', '--qos=nf', '--mem=120000m', '--licenses=h2resw01', '--time=00:00:09', '/home/sycw/slurm/gpil.job'] |
...