If you wish to run interactively but are constrained by the limits on the CPUs, CPU Time or memory, you may run a small interactive job requesting the resources you want.
By doing that, you will get a dedicated allocation of CPUs and memory to run your application interactively.
If you have a single script or a command you wish to run interactively, one way to do this through the batch system is with a direct call to srun from within session in the login node. It would feel as if you were running locally, but it is instead using a job with dedicated resources:
$ cat myscript.sh #!/bin/bash echo "This is my super script" echo "Doing some heavy work on $HOSTNAME..." $ ./myscript.sh This is my super script Doing some heavy work on at1-11... $ srun ./myscript.sh This is my super script Doing some heavy work on at1-105... |
In that example the submitted job would have run using the default settings (default qos, with just 1 cpu and default memory). You can of course pass additional options to srun
to customise the resources allocated to this interactive job. For example, to run with 4 cpus, 12 GB with a limit of 6 hours:
$ srun -c 4 --mem=12G -t 06:00:00 ./myscript.sh |
Check man srun
for a complete list of options.
However, you may want to To facilitate that task, we are providing the ecinteractive tool
$ ecinteractive -h Usage : /usr/local/bin/ecinteractive [options] [--] -d|desktop Submits a vnc job (default is interactive ssh job) More Options: -h|help Display this message -v|version Display script version -A|account Project account -c|cpus Number of CPUs (default 2) -m|memory Requested Memory (default 4G) -t|time Wall clock limit (default 06:00:00) -r|reservation Submit the job into a SLURM reservation -g|cgroups Launch cgroups watcher -k|kill scancel the running job (if any). To cancel vnc jobs, use together with -d -x set -x |
[user@at1-11 ~]$ ecinteractive -k cancelling job... JOBID NAME USER QOS STATE TIME TIME_LIMIT NODES FEATURES NODELIST(REASON) 63769 user-ecint user ni RUNNING 0:56 12:00:00 1 (null) at1-103 [user@at1-11 ~]$ ecinteractive -c 4 -m 16G -t 12:00:00 Interactive batch job is launched with following resources: Maximum run time (hours:min:sec): 12:00:00 Maximum memory (MB): 16G Number of cores/threads: 4 Submitted batch job 63770 Found 1 interactive job running on at1-103 ... attaching to it To manually re-attach: ssh at1-103 To cancel the job on tems: /usr/local/bin/ecinteractive -c 4 -m 16G -t 12:00:00 -k [ECMWF-INFO-z_ecmwf_local.sh] /usr/bin/bash INTERACTIVE on at1-103 at 20210319_174542.052, PID: 428874, JOBID: 63770 [ECMWF-INFO-z_ecmwf_local.sh] $SCRATCH=/lus/pfs1/scratch/user [ECMWF-INFO-z_ecmwf_local.sh] $PERM=/perm/user [ECMWF-INFO-z_ecmwf_local.sh] $HPCPERM=/lus/pfs1/hpcperm/user [ECMWF-INFO-z_ecmwf_local.sh] $TMPDIR=/etc/ecmwf/ssd/ssd1/tmpdirs/user.63770 [ECMWF-INFO-z_ecmwf_local.sh] $SCRATCHDIR=/lus/pfs1/scratchdir/user/0/63770 [ECMWF-INFO-z_ecmwf_local.sh] $_EC_ORIG_TMPDIR=N/A [ECMWF-INFO-z_ecmwf_local.sh] $_EC_ORIG_SCRATCHDIR=N/A [user@at1-103 ~]$ |
[user@at1-11 ~]$ ecinteractive -c 4 -m 16G -t 12:00:00 Found 1 interactive job running on at1-103 ... attaching to it To manually re-attach: ssh at1-103 To cancel the job on tems: /usr/local/bin/ecinteractive -c 4 -m 16G -t 12:00:00 -k [ECMWF-INFO-z_ecmwf_local.sh] /usr/bin/bash INTERACTIVE on at1-103 at 20210319_174956.074, PID: 429252, JOBID: 63770 [ECMWF-INFO-z_ecmwf_local.sh] $SCRATCH=/lus/pfs1/scratch/user [ECMWF-INFO-z_ecmwf_local.sh] $PERM=/perm/user [ECMWF-INFO-z_ecmwf_local.sh] $HPCPERM=/lus/pfs1/hpcperm/user [ECMWF-INFO-z_ecmwf_local.sh] $TMPDIR=/etc/ecmwf/ssd/ssd1/tmpdirs/user.63770 [ECMWF-INFO-z_ecmwf_local.sh] $SCRATCHDIR=/lus/pfs1/scratchdir/user/0/63770 [ECMWF-INFO-z_ecmwf_local.sh] $_EC_ORIG_TMPDIR=N/A [ECMWF-INFO-z_ecmwf_local.sh] $_EC_ORIG_SCRATCHDIR=N/A [user@at1-103 ~]$ |
[user@at1-11 ~]$ ecinteractive -k cancelling job... JOBID NAME USER QOS STATE TIME TIME_LIMIT NODES FEATURES NODELIST(REASON) 63770 user-ecint user ni RUNNING 5:31 12:00:00 1 (null) at1-103 |
[user@at1-11 ~]$ ecinteractive -c 4 -m 16G -t 12:00:00 -d Interactive batch job is launched with following resources: Maximum run time (hours:min:sec): 12:00:00 Maximum memory (MB): 16G Number of cores/threads: 4 Submitted batch job 63771 A vnc session job is running on tems node at1-103 - this tool will re-attach to it. To manually re-attach: vncviewer -passwd ~/.vnc/passwd at1-103:9598 To cancel the job on tems: /usr/local/bin/ecinteractive -c 4 -m 16G -t 12:00:00 -d -k TigerVNC Viewer 64-bit v1.10.1 Built on: 2020-10-06 13:51 Copyright (C) 1999-2019 TigerVNC Team and many others (see README.rst) See https://www.tigervnc.org for information on TigerVNC. Fri Mar 19 17:52:35 2021 DecodeManager: Detected 256 CPU core(s) DecodeManager: Creating 4 decoder thread(s) CConn: Connected to host at1-103 port 9598 CConnection: Server supports RFB protocol version 3.8 CConnection: Using RFB protocol version 3.8 CConnection: Choosing security type VeNCrypt(19) CVeNCrypt: Choosing security type VncAuth (2) CConn: Using pixel format depth 24 (32bpp) little-endian rgb888 CConnection: Enabling continuous updates |