...
Table of Contents maxLevel 1
Special filesystems
Warning | ||
---|---|---|
| ||
WS1 is not available yet. Meanwhile, please use ws2 only |
Time critical option 2 users, or zids, have a special set of filesystems different from the regular user. They are served from different storage servers in different computing halls, and are not kept in sync automatically. It is the user's responsibility to ensure the required files and directory structures are present on both sides and synchronise them if and when needed. This means, for example, that zids will have 2 HOMES, one on each Storage Host. All the following storage locations can be referenced by the corresponding environment variables, which will be defined automatically for each session or job.
...
The Storage server to use is controlled by the environment variable STHOST
, which may take the values "ws1
" or "ws2
". This variable needs to be defined when logging in, and also for all the jobs that need to run in batch. If logging in interactively without passing the environment variable, you will be prompted to choose the desired STHOST
:
No Format |
---|
WARNING: ws1 is not currently available.
1) ws1
2) ws2
Please select the desired timecrit storage set for $STHOST: 2
##### # # # ###### #### ##### # #####
# # ## ## # # # # # # #
# # # ## # ##### # # # # #
# # # # # # ##### # #
# # # # # # # # # # #
# # # # ###### #### # # # #
# # #### ###### ##### ###### # # #
# # # # # # # # # #
# # #### ##### # # # # # #
# # # # ##### # # # #
# # # # # # # # # # #
#### #### ###### # # ###### ###### ####
[ECMWF-INFO -ecprofile] /usr/bin/ksh93 INTERACTIVE on aa6-100 at 20220207_152402.512, PID: 53964, JOBID: N/A
[ECMWF-INFO -ecprofile] $HOME=/ec/ws2/tc/zlu/home=/lus/h2tcws01/tc/zlu/home
[ECMWF-INFO -ecprofile] $TCWORK=/ec/ws2/tc/zlu/tcwork=/lus/h2tcws01/tc/zlu/tcwork
[ECMWF-INFO -ecprofile] $SCRATCHDIR=/ec/ws2/tc/zlu/scratchdir/4/aa6-100.53964.20220207_152402.512
[ECMWF-INFO -ecprofile] $TMPDIR=/etc/ecmwf/ssd/ssd1/tmpdirs/zlu.53964.20220207_152402.512 |
...
Because "#SBATCH --export" option doesn't work with a simple sbatch submission on the Atos HPC, ecsbatch ("/usr/local/bin/ecsbatch") command must be used instead. Troika is configured to use ecsbatch by default.
Tip | ||
---|---|---|
| ||
Like any other SBATCH directive, you may alternatively pass the export in the
|
Troika module on Atos HPC is configured to use ecsbatch by default.
Note | ||
---|---|---|
| ||
Make sure you include this line right after the SBATCH directives header:
|
Remote submission from ecFlow
When submitting jobs from ecFlow, you should ensure the STHOST variable is properly passed int the ssh connection.
If using troika, you should ensure that your job management variables export the variable before calling troika:
Code Block | ||||
---|---|---|---|---|
| ||||
edit ECF_JOB_CMD STHOST=%STHOST% troika submit -o %ECF_JOBOUT% %SCHOST% %ECF_JOB%
edit ECF_KILL_CMD STHOST=%STHOST% troika kill %SCHOST% %ECF_JOB%
edit ECF_STATUS_CMD STHOST=%STHOST% troika monitor %SCHOST% %ECF_JOB% |
If not using troika, make sure you pass the STHOST environment variable to the submitting shell:
Code Block | ||||
---|---|---|---|---|
| ||||
edit ECF_JOB_CMD STHOST=%STHOST% ssh -o SendEnv=STHOST tc-login ...
edit ECF_KILL_CMD STHOST=%STHOST% ssh -o SendEnv=STHOST tc-login ...
edit ECF_STATUS_CMD STHOST=%STHOST% ssh -o SendEnv=STHOST tc-login ... |
High-priority batch access
...
- The name of the machine will be ecflow-tc2-zid-number. If you don't have a server yet, please raise an issue through the ECMWF support portal requesting one.
- The HOME on the VM running the server is not the same as any of the two $HOMEs on HPCF, depending on the STHOST selected.
- For your convenience, the ecFlow server's home can be accessed directly from any Atos HPCF node on /home/zid.
- You should keep the suite files (.ecf files and headers) on the ecFlow server's HOME, while using the native Lustre filesystems in the corresponding STHOST as working directories for your jobs. so it can
- You may also want to use the server's HOME for your job standard output and error. That should make it easier when it comes to monitoring with ecFlowUI. Otherwise, you You may also need to run a log server on the HPCF (using the hpc-log node) depending on where if your job output goes and where you run your ecflowUI if you need to inspect the job outputsto a Lustre-based filesystem.
Similarly to the general purpose ecFlow servers, these TC2 ecFlow servers come with "troika", the tool used in production at ECMWF to manage the submission, kill and status query for operational jobs.