...
Create a new job script
broken1.sh
with the contents below and try to submit the job. What happened? Can you fix the job and keep trying until it runs successfully?Code Block language bash title broken1.sh collapse true #SBATCH --job-name = broken 1 #SBATCH --output = broken1-%J.out #SBATCH --error = broken1-%J.out #SBATCH --qos = express #SBATCH --time = 00:05:00 echo "I was broken!"
Expand title Solution The job above has the following problems:
- There is no shebang at the beginning of the script.
- There should be no spaces in the directives
- There should be no space
- QoS "express" does not exist
Here is an amended version:
Code Block language bash title broken1_fixed.sh #!/bin/bash #SBATCH --job-name=broken1 #SBATCH --output=broken1-%J.out #SBATCH --error=broken1-%J.out #SBATCH --time=00:05:00 echo "I was broken!"
Note that the QoS line was removed, but you may also use the following if running on ECS:
No Format #SBATCH --qos=ef
or the alternatively, if on Atos HPCF:
No Format #SBATCH --qos=nf
Check that the actual job run and generated the expected output:
No Format $ grep -v ECMWF-INFO $(ls -1 broken1-*.out | head -n1) I was broken!
Create a new job script
broken2.sh
with the contents below and try to submit the job. What happened? Can you fix the job and keep trying until it runs successfully?Code Block language bash title broken2.sh collapse true #!/bin/bash #SBATCH --job-name=broken2 #SBATCH --output=broken2-%J.out #SBATCH --error=broken2-%J.out #SBATCH --qos=ns #SBATCH --time=10-00 echo "I was broken!"
Expand title Solution The job above has the following problems:
- QoS "ns" does not exist. Either remove to use the default or use the corresponding queue on ECS (ef) or HPCF (nf)
- The time requested is 10 days, which is longer than the maximum allowed. it was probably meant to be 10 minutes
Here is an amended version:
Code Block language bash title broken1.sh #!/bin/bash #SBATCH --job-name=broken2 #SBATCH --output=broken2-%J.out #SBATCH --error=broken2-%J.out #SBATCH --time=10:00 echo "I was broken!"
Again, note that the QoS line was removed, but you may also use the following if running on ECS:
No Format #SBATCH --qos=ef
or the alternatively, if on Atos HPCF:
No Format #SBATCH --qos=nf
Check that the actual job run and generated the expected output:
No Format $ grep -v ECMWF-INFO $(ls -1 broken2-*.out | head -n1) I was broken!
Create a new job script
broken3.sh
with the contents below and try to submit the job. What happened? Can you fix the job and keep trying until it runs successfully?Code Block language bash title broken3.sh collapse true #!/bin/bash #SBATCH --job-name=broken3 #SBATCH --chdir=$SCRATCH #SBATCH --output=broken3output/broken3-%J.out #SBATCH --error=broken3output/broken3-%J.out echo "I was broken!"
Expand title Solution The job above has the following problems:
- Variables are not expanded on job directives. You must specify your paths explicitly
The directory where the output and error files will go must exist beforehand. Otherwise the job will fail but you will not get any hint as to what may have happened to the job. The only hint would be if checking sacct:
No Format $ sacct -X --name=broken3 JobID JobName QOS State ExitCode Elapsed NNodes NodeList ------------ ---------------- --------- ---------- -------- ---------- -------- -------------------- 64281800 broken3 ef FAILED 0:53 00:00:02 1 ad6-201
You will need to create the output directory with:
No Format mkdir -p $SCRATCH/broken3output/
Here is an amended version of the job:
Code Block language bash title broken3.sh #!/bin/bash #SBATCH --job-name=broken3 #SBATCH --chdir=/scratch/<your_user_id> #SBATCH --output=broken3output/broken3-%J.out #SBATCH --error=broken3output/broken3-%J.out echo "I was broken!"
Check that the actual job run and generated the expected output:
No Format $ grep -v ECMWF-INFO $(ls -1 $SCRATCH/broken3output/broken3-*.out | head -n1) I was broken!
You may clean up the output directory with
No Format rm -rf $SCRATCH/broken3output