There may be a number of reasons why a submitted job does not start running. When that happens, it is a good idea to use squeue
and pay attention to the STATE
and NODELIST(REASON)
columns:
$> squeue -j 64243399 JOBID NAME USER QOS STATE TIME TIME_LIMIT NODES FEATURES NODELIST(REASON) 64243399 my_job user nf PENDING 0:00 03:00:00 1 (null) (Priority)
If the job is in a PENDING state, it means it has not been dispatched to any available node to run. Check the reason why this happens.
All the reasons can be found in the squeue
man page
man squeue
Here is a list of the most common ones:
Reason | Descriiption |
---|---|
Priority | Your job is ready to be dispatched, but there are other jobs with more priority which will be dispatched before yours. |
Resources | Your job is ready to be dispatched and it is at the top of the queue, but there are no free resources to satisfy your job requirements. |
AssocMaxJobsLimit | You have reached a limit in the number of jobs you can submit to the system in a given project account. Your job will not be considered until your other jobs in the same project complete. |
QOSMaxJobsPerUserLimit | You have reached a limit in the number of jobs you can submit to a given QoS. Your job will not be considered until your other jobs in the same QoS complete. |
ReqNodeNotAvail | There are no nodes available to dispatch your job. A System Session or outage may be going on. Check our service status on https://www.ecmwf.int/en/service-status |
Licenses | Your job requires some resources that are temporarily not available. A System Session or outage may be going on. Check our service status on https://www.ecmwf.int/en/service-status |