...
This is a series of exercises that will walk you through the basic tasks as you get on board the Atos HPCF or ECS computing platforms.
Table of Contents |
---|
Prerequisites
In order to follow this tutorial, these are the prerequisites you must fulfil before starting:
- You must have a valid ECMWF account with privileges to access HPCF or ECS. If you only have access to ECS, you may need to skip certain exercises involving HPCF.
- You must have 2 Factor Authentication enabled with TOTP.
- You must be able to connect with at least one of the following methods:
- Using the Virtual Desktop Infrastructure (VDI). See the corresponding documentation to get started.
- Using Teleport SSH from your end user device .
Logging into Atos HPCF and ECS
Info |
---|
Reference: HPC2020: How to connect |
First of all, let's try to connect to the computing services via SSH:
Accessing a login node
Access the default login node of the ATOS HPCF or ECS and take note of what node you are in
...
title | Solution - HPCF |
---|
No Format |
---|
ssh hpc-login
hostname |
...
title | Solution - ECS |
---|
No Format |
---|
ssh ecs-login
hostname |
Open a new tab in your terminal and connect again. Did you get the same hostname? Why is that?
Expand | ||
---|---|---|
| ||
Both aliases will always point to a working login node, and the actual node and complex behind it may change depending on the load, system sessions or outages. |
Now, from your open SSH session on Atos HPCF or ECS, connect to the main login alias again. Did it ask for a password? Can you set your account up so jumps between hosts are done without a password?
Expand | ||
---|---|---|
| ||
Password-less SSH between ECMWF hosts such as Atos HPCF or ECS nodes, or VDI hosts is not set up by default. If you were asked for a password, you can run the following command from your Atos HPCF, ECS or VDI session to set up key-based authentication:
After this you should be able to jump between hosts without having to introduce your password. Besides being convenient, this setup is also necessary for other tools such as ECACCESS or ecinteractive to work properly. |
Interactive session
Info |
---|
Reference: HPC2020: Persistent interactive job with ecinteractive |
Standard sessions on login nodes do not guarantee access to dedicated resources such as cpus or memory, and strict limits on those are imposed.
Can you get a dedicated interactive session with 10 GB of memory and 4 cpus for 8 hours?
...
title | Solution |
---|
You can use ecinteractive. It is installed and available on all the Atos HPCF and ECS nodes, as well as the VDI, so you can run it from there
No Format |
---|
ecinteractive -c 4 -m 10 -t 8:00 |
This will create an interactive job with the requested configuration and land you on a shell in a given node.
...
- own computer
...
Log out of that interactive session. Can you reattach to it?
...
title | Solution |
---|
Your job kept running in the background, and there can only be one interactive job per user. You can attach as many concurrent shells to the same interactive session, for example in different terminal tabs, with:
No Format |
---|
ecinteractive |
Cancel your interactive session
...
title | Solution |
---|
No Format |
---|
ssh hpc-login
hostname |
Storage spaces
Info |
---|
Reference: HPC2020: Filesystems |
We will now explore the different options when it comes to storing your data.
Main filesystems
Connect to Atos HPCF or ECS main login node. What is your default filesystem? Can you try 4 different ways to accessing that space?
Expand | ||
---|---|---|
| ||
The default directory is your HOME directory, which is /home/$USER. It is a dedicated personal space for you, and you can always come back to that with either of the following commands:
Your HOME directory is accessible across all Atos HPCF, ECS, VDI and EcFlow services. |
There are 3 more main storage spaces. Create an empty file called del.me
on each one of them? Check that they have been created with ls
, and then remove them with rm
.
...
title | Answer |
---|
Besides HOME, you also have also access to PERM, HPCPERM and SCRATCH. Like HOME, they are all dedicated personal spaces with their corresponding environment variable. Using those environment variables over hardcoded paths is strongly recommended.
You can use touch to create the test files:
No Format |
---|
touch $PERM/del.me
touch $HPCPERM/del.me
touch $SCRATCH/del.me |
Check they exist with:
No Format |
---|
ls -l $PERM/del.me
ls -l $HPCPERM/del.me
ls -l $SCRATCH/del.me |
Remove them with:
No Format |
---|
rm $PERM/del.me
rm $HPCPERM/del.me
rm $SCRATCH/del.me |
How much space have you used in each of your main 4 filesystems? How much can you store?
...
title | Answer |
---|
All the filesystems have quotas enforced. You can check them with the quota command
No Format |
---|
quota |
For HOME
and PERM
, the snippet should look similar to:
No Format |
---|
Quota for $HOME:
home_b user 1234 <space used> <space limit> <number of files stored> - *
Quota for $PERM
POSIX User 1234 <space used> <space limit> <number of files stored> none |
For SCRATCH
and HPCPERM
the format is slightly different:
No Format |
---|
Project quota for $SCRATCH and $SCRATCHDIR:
Disk quotas for prj 1000001798 (pid 1000001798):
Filesystem used quota limit grace files quota limit grace
/ec/res4 XXX YYY YYY - ZZZ WWW WWW -
Project quota for $HPCPERM:
Disk quotas for prj 2000001798 (pid 2000001798):
Filesystem used quota limit grace files quota limit grace
/ec/res4 XXX YYY YYY - ZZZ WWW WWW - |
- .
Children Display | ||||
---|---|---|---|---|
|
HTML |
---|
<style>
div#content h2 a::after {
content: " - [read more]";
}
</style> |
If you are on the VDI, open a new terminal there. Can you access your HOME
, PERM,
SCRATCH
and HPCPERM
?
Expand | ||
---|---|---|
| ||
However, |
EXTRA: For long term archival purposes, users with access to HPCF may also use ECFS. Files will be stored in ECMWF's Data Handling System on Tape. Create a small text file and copy it to your ECFS space, then ensure it is there, retrieve it and remove it.
...
title | Solution |
---|
No Format |
---|
echo "hello world" > test_file.txt
ecp test_file.txt ec:
els -l ec:test_file.txt
ecp ec:test_file.txt retrieved_test_file.txt
diff test_file.txt retrieved_test_file.txt
erm ec:test_file.txt |
Temporary spaces
There are a number of temporary spaces you can use in your session or job.
Create a file called testfile
on the $TMPDIR
, $SCRATCHDIR
and /tmp/
.
...
title | Solution |
---|
No Format |
---|
touch $TMPDIR/testfile
touch $SCRATCHDIR/testfile
touch /tmp/testfile |
Open another session in the same login node with ssh $HOSTNAME
. Can you find the files you have created earlier?
...
title | Solution |
---|
No Format |
---|
ls -l $TMPDIR/testfile
ls -l $SCRATCHDIR/testfile
ls -l /tmp/testfile |
You will not see the files you created in any of those locations, since every session or job will have a different location. This includes /tmp, which is also a dedicated ramdisk for session.
Filesystem Usage
Can you decide what would be the best filesystem to use in the following cases? Why would you make that choice?
Store the source code, scripts and configuration of your programs and workflows
Expand | ||
---|---|---|
| ||
|
Store Climate Files to be used by your model runs on Atos HPCF.
Expand | ||
---|---|---|
| ||
|
Working directory for your jobs.
Expand | ||
---|---|---|
| ||
|
Store data that that you use frequently, which is considerable in size.
Expand | ||
---|---|---|
| ||
|
Store data for longer term which is considerable in size, such as experiment results. You are not going to use it often.
Expand | ||
---|---|---|
| ||
ECFS would be the right place for longer term archival or storing backups. This is by far the place where you can store However, data on tapes needs to be retrieved to another disk space before it can be used, so it is costly in terms of time. In order to use ECFS efficiently, remember to store fewer but bigger files, so it is a good idea to use tools like tar or zip to bundle together big directories with lots of files. |
Temporary files that you don't need beyond the end of the session or job
Expand | ||
---|---|---|
| ||
$TMPDIR if performance is important and size is small, since TMPDIR is either in memory (for parallel jobs on HPCF), or on SSD disk. $SCRATCHDIR if size of the files is big and does not fit TMPDIR. |
Recovering Deleted files
Info |
---|
Reference: HPC2020: Filesystems |
Imagine you have accidentally deleted ~/.profile
in your HOME directory. Can you get back the latest version?
Expand | ||
---|---|---|
| ||
You can use the snapshots . You can list all the versions available with:
To recover, you would just need to copy the file back into place. For longer time spans, use the utility |
Imagine you have accidentally deleted a file in your PERM directory. Can you get back the latest version?
Expand | ||
---|---|---|
| ||
You can use the snapshots . You can list all the versions available with:
Note that the snapshots are less frequent in |
Imagine you have accidentally deleted a file in your SCRATCH
or HPCPERM
directories. Can you get back the latest version?
Expand | ||
---|---|---|
| ||
Unfortunately there are no snapshots or backups for those filesystems, so the data has been lost permanently. |
Managing your software stack environment
Info |
---|
Reference: HPC2020: Software stack |
Atos HPCF and ECS computing platforms offer a wide range of software, libraries and tools.
Basic software environment management
You want to use CDO, a popular tool to manipulate climate and NWP model data. What do you need to do to get the following result?
No Format |
---|
$ cdo --version
Climate Data Operators version X.Y.Z (https://mpimet.mpg.de/cdo)
System: x86_64-pc-linux-gnu
... |
...
title | Solution |
---|
If you run the command without any prior action, you may get:
No Format |
---|
$ cdo --version
-bash: cdo: command not found |
Many software packages and tools are not part of your default environment, and need to be explicitly loaded via modules.
So the following commands would be sufficient to get to the desired result:
No Format |
---|
module load cdo
cdo --version |
...
title | ml shortcut |
---|
You can also use the ml shortcut to load the module
No Format |
---|
ml cdo |
Note that we did not ask for any specific version. In those cases, you will get the one defined as default.
How many versions of CDO can be used at ECMWF? Can you pick the newest?
...
title | Solution |
---|
There are hundreds of different packages with their corresponding different versions installed at ECMWF. You can use:
No Format |
---|
module avail |
To see what modules can be loaded at any time.
However, not all modules can be loaded at any time, some will only become available if a certain combination of modules is loaded.
You can also use the following command for an overview or all the packages that are installed, including those that may not be visible in module avail:
No Format |
---|
module spider |
In this case we are only interested in CDO so we can do either:
No Format |
---|
module avail cdo |
or
No Format |
---|
module spider cdo |
To load the newest, you can either explicitly pick up the latest version explicitly, so assuming that it was "X.Y.Z":
No Format |
---|
module load cdo/X.Y.Z |
But you can also use the module tag "new":
No Format |
---|
module load cdo/new |
or also ask for the latest with:
No Format |
---|
module --latest load cdo |
Tip | ||
---|---|---|
| ||
If you had another version of the module loaded, the system will automatically swap it by the new one requested. |
Load the netcdf4
module. Can you see what modules do you have loaded in your environment now?
Expand | ||||||
---|---|---|---|---|---|---|
| ||||||
To load the netcdf4 module just do:
Then, you can see what your software environment looks like with:
or with just the shortcut:
You should see both the CDO and netcdf4, beside the default modules loaded in your environment. |
Remove the netcdf4
module from your environment and check it is gone/
...
title | Solution |
---|
To unload the netcdf4
module just do:
No Format |
---|
module unload netcdf4 |
or with just the shortcut:
No Format |
---|
ml -netcdf4 |
Then, you can see what your software environment looks like with:
No Format |
---|
module list |
Can you restore the default environment you had when you logged in? Check that
...
title | Solution |
---|
If you log out of your session, next time you log in you will start with a fresh default environment. Modules are only loaded for that specific session.
However, if you don't want to log out, you can also reset your module environment with:
No Format |
---|
module reset |
You can then check the effects with
No Format |
---|
module list |
...
title | reset vs purge |
---|
...