If you wish to run interactively but are constrained by the limits on the CPUs, CPU Time or memory, you may run a small interactive job requesting the resources you want.
By doing that, you will get a dedicated allocation of CPUs and memory to run your application interactively.
Using srun directly
If you have a single script or a command you wish to run interactively, one way to do this through the batch system is with a direct call to srun from within session in the login node. It would feel as if you were running locally, but it is instead using a job with dedicated resources:
$ cat myscript.sh #!/bin/bash echo "This is my super script" echo "Doing some heavy work on $HOSTNAME..." $ ./myscript.sh This is my super script Doing some heavy work on at1-11... $ srun ./myscript.sh This is my super script Doing some heavy work on at1-105...
In that example the submitted job would have run using the default settings (default qos, with just 1 cpu and default memory). You can of course pass additional options to srun
to customise the resources allocated to this interactive job. For example, to run with 4 cpus, 12 GB with a limit of 6 hours:
$ srun -c 4 --mem=12G -t 06:00:00 ./myscript.sh
Check man srun
for a complete list of options.
Persistent interactive job with ecinteractive
However, you may want to To facilitate that task, we are providing the ecinteractive tool
$ ecinteractive -h Usage : /usr/local/bin/ecinteractive [options] [--] -d|desktop Submits a vnc job (default is interactive ssh job) More Options: -h|help Display this message -v|version Display script version -A|account Project account -c|cpus Number of CPUs (default 2) -m|memory Requested Memory (default 4G) -t|time Wall clock limit (default 06:00:00) -r|reservation Submit the job into a SLURM reservation -g|cgroups Launch cgroups watcher -k|kill scancel the running job (if any). To cancel vnc jobs, use together with -d -x set -x
Main features
- You may specify the project account where it will be accounted for, as well as the resources needed (cpu, memory and time). Some defaults are set if those options are not specified.
- Only one interactive job is allowed at a time, but if run again, ecinteractive will reattach to your existing job.
- You may manually reattach with an ssh to the allocated node given to you by ecinteractive.
- You can use ecinteractive to kill an existing interactive job with the -k option.
- You may open a basic graphical desktop for X11 applications.
Getting a shell with 4 cpus and 16 GB or memory for 12 hours
[user@at1-11 ~]$ ecinteractive -k cancelling job... JOBID NAME USER QOS STATE TIME TIME_LIMIT NODES FEATURES NODELIST(REASON) 63769 user-ecint user ni RUNNING 0:56 12:00:00 1 (null) at1-103 [user@at1-11 ~]$ ecinteractive -c 4 -m 16G -t 12:00:00 Interactive batch job is launched with following resources: Maximum run time (hours:min:sec): 12:00:00 Maximum memory (MB): 16G Number of cores/threads: 4 Submitted batch job 63770 Found 1 interactive job running on at1-103 ... attaching to it To manually re-attach: ssh at1-103 To cancel the job on tems: /usr/local/bin/ecinteractive -c 4 -m 16G -t 12:00:00 -k [ECMWF-INFO-z_ecmwf_local.sh] /usr/bin/bash INTERACTIVE on at1-103 at 20210319_174542.052, PID: 428874, JOBID: 63770 [ECMWF-INFO-z_ecmwf_local.sh] $SCRATCH=/lus/pfs1/scratch/user [ECMWF-INFO-z_ecmwf_local.sh] $PERM=/perm/user [ECMWF-INFO-z_ecmwf_local.sh] $HPCPERM=/lus/pfs1/hpcperm/user [ECMWF-INFO-z_ecmwf_local.sh] $TMPDIR=/etc/ecmwf/ssd/ssd1/tmpdirs/user.63770 [ECMWF-INFO-z_ecmwf_local.sh] $SCRATCHDIR=/lus/pfs1/scratchdir/user/0/63770 [ECMWF-INFO-z_ecmwf_local.sh] $_EC_ORIG_TMPDIR=N/A [ECMWF-INFO-z_ecmwf_local.sh] $_EC_ORIG_SCRATCHDIR=N/A [user@at1-103 ~]$
Reattaching to an existing interactive job
[user@at1-11 ~]$ ecinteractive -c 4 -m 16G -t 12:00:00 Found 1 interactive job running on at1-103 ... attaching to it To manually re-attach: ssh at1-103 To cancel the job on tems: /usr/local/bin/ecinteractive -c 4 -m 16G -t 12:00:00 -k [ECMWF-INFO-z_ecmwf_local.sh] /usr/bin/bash INTERACTIVE on at1-103 at 20210319_174956.074, PID: 429252, JOBID: 63770 [ECMWF-INFO-z_ecmwf_local.sh] $SCRATCH=/lus/pfs1/scratch/user [ECMWF-INFO-z_ecmwf_local.sh] $PERM=/perm/user [ECMWF-INFO-z_ecmwf_local.sh] $HPCPERM=/lus/pfs1/hpcperm/user [ECMWF-INFO-z_ecmwf_local.sh] $TMPDIR=/etc/ecmwf/ssd/ssd1/tmpdirs/user.63770 [ECMWF-INFO-z_ecmwf_local.sh] $SCRATCHDIR=/lus/pfs1/scratchdir/user/0/63770 [ECMWF-INFO-z_ecmwf_local.sh] $_EC_ORIG_TMPDIR=N/A [ECMWF-INFO-z_ecmwf_local.sh] $_EC_ORIG_SCRATCHDIR=N/A [user@at1-103 ~]$
Killing a running interactive job
[user@at1-11 ~]$ ecinteractive -k cancelling job... JOBID NAME USER QOS STATE TIME TIME_LIMIT NODES FEATURES NODELIST(REASON) 63770 user-ecint user ni RUNNING 5:31 12:00:00 1 (null) at1-103
Opening a graphical desktop within your interactive job
[user@at1-11 ~]$ ecinteractive -c 4 -m 16G -t 12:00:00 -d Interactive batch job is launched with following resources: Maximum run time (hours:min:sec): 12:00:00 Maximum memory (MB): 16G Number of cores/threads: 4 Submitted batch job 63771 A vnc session job is running on tems node at1-103 - this tool will re-attach to it. To manually re-attach: vncviewer -passwd ~/.vnc/passwd at1-103:9598 To cancel the job on tems: /usr/local/bin/ecinteractive -c 4 -m 16G -t 12:00:00 -d -k TigerVNC Viewer 64-bit v1.10.1 Built on: 2020-10-06 13:51 Copyright (C) 1999-2019 TigerVNC Team and many others (see README.rst) See https://www.tigervnc.org for information on TigerVNC. Fri Mar 19 17:52:35 2021 DecodeManager: Detected 256 CPU core(s) DecodeManager: Creating 4 decoder thread(s) CConn: Connected to host at1-103 port 9598 CConnection: Server supports RFB protocol version 3.8 CConnection: Using RFB protocol version 3.8 CConnection: Choosing security type VeNCrypt(19) CVeNCrypt: Choosing security type VncAuth (2) CConn: Using pixel format depth 24 (32bpp) little-endian rgb888 CConnection: Enabling continuous updates