...
You want to use CDO, a popular tool to manipulate climate and NWP model data. What do you need to do to get the following result?
No Format $ cdo --version Climate Data Operators version X.Y.Z (https://mpimet.mpg.de/cdo) System: x86_64-pc-linux-gnu ...
Expand title Solution If you run the command without any prior action, you may get:
No Format $ cdo --version -bash: cdo: command not found
Many software packages and tools are not part of your default environment, and need to be explicitly loaded via modules.
So the following commands would be sufficient to get to the desired result:
No Format module load cdo cdo --version
Tip title ml shortcut You can also use the ml shortcut to load the module
No Format ml cdo
Note that we did not ask for any specific version. In those cases, you will get the one defined as default.
How many versions of CDO can be used at ECMWF? Can you pick the newest?
Expand title Solution There are hundreds of different packages with their corresponding different versions installed at ECMWF. You can use:
No Format module avail
To see what modules can be loaded at any time.
However, not all modules can be loaded at any time, some will only become available if a certain combination of modules is loaded.
You can also use the following command for an overview or all the packages that are installed, including those that may not be visible in module avail:
No Format module spider
In this case we are only interested in CDO so we can do either:
No Format module avail cdo
or
No Format module spider cdo
To load the newest, you can either explicitly pick up the latest version explicitly, so assuming that it was "X.Y.Z":
No Format module load cdo/X.Y.Z
But you can also use the module tag "new":
No Format module load cdo/new
or also ask for the latest with:
No Format module --latest load cdo
Tip title No swap needed If you had another version of the module loaded, the system will automatically swap it by the new one requested.
Load the
netcdf4
module. Can you see what modules do you have loaded in your environment now?Expand title Solution To load the netcdf4 module just do:
No Format module load netcdf4
Then, you can see what your software environment looks like with:
No Format module list
or with just the shortcut:
No Format ml
You should see both the CDO and netcdf4, beside the default modules loaded in your environment.
Remove the
netcdf4
module from your environment and check it is gone.Expand title Solution To unload the
netcdf4
module just do:No Format module unload netcdf4
or with just the shortcut:
No Format ml -netcdf4
Then, you can see what your software environment looks like with:
No Format module list
Can you check what is the installation directory of the default netCDF4 library?
Expand title Solution All modules at ECMWF will define a <PACKAGE_NAME>_DIR environment variable that can be useful to pass to configuration fiiles or scripts. Packages providing libraries such as netCDF4 will also typically define
<PACKAGE_NAME>_LIB
and<PACKAGE_NAME>_INCLUDE
.You can check the values of all those variables that a module would define without loading it running:
No Format module show netcdf4
or with just the shortcut:
No Format ml show netcdf4
You can then spot there the value of
NETCDF4_DIR
pointing to/usr/local/apps/netcdf4/X.Y.Z/COMPILER_FAMILY/COMPILER_VERSION
Can you restore the default environment you had when you logged in? Check that the environment is back to the desired state.
Expand title Solution If you log out of your session, next time you log in you will start with a fresh default environment. Modules are only loaded for that specific session.
However, if you don't want to log out, you can also reset your module environment with:
No Format module reset
You can then check the effects with
No Format module list
Tip title reset vs purge There is a subtile difference between module reset and module purge. While the former will go back the default environment, which typically contains some default modules, the latter will completely unload all modules and leave you with a blank environment.
You want the git module to be loaded by default on every session and job on the Atos HPCF or ECS. How would you do that? Check that it works by opening a new sessionIf you log out of your session, next time you log in you will start with a fresh default environment. Modules are only loaded for that specific session.However, if you don't want to log out, you can also reset your module environment with:module resetYou can then check the effects withmodule list
There is a subtile difference between module reset and module purge. While the former will go back the default environment, which typically contains some default modules, the latter will completely unload all modules and leave you with a blank environment.Expand title Solution You can use the
~/.profile
(or~/.bash_profile
if it exists) shell initialisation file to add the modules you wish to have loaded by default. Edit the file with your favourite editor, and add the following snippet:No Format if [[ "$ECPLATFORM" == "hpc2020" ]]; then module load git fi
Info title Atos specific setting Note that we use this if statement to make sure this is not attempted on other platforms with no modules that share the same HOME, such as your VDI or ecflow VMs. Otherwise you may get errors when working on those platforms.
You can now open a new tab in your terminal and connect and open a new session on Atos HPCF or ECS. You should see the git module loaded when doing:
No Format module list
You may now remove the snippet you just added to the shell initialisation file.
ECMWF tools
Info | ||
---|---|---|
| ||
...
Info | ||
---|---|---|
| ||
To ensure a default environment for the following exercises, reset your modules with:
No Format module reset
Try to run the command below. Why does it fail? Can you make it work without installing pandas yourself?
No Format $ python3 -c "import pandas as pd; print(pd.__version__)" Traceback (most recent call last): File "<string>", line 1, in <module> ModuleNotFoundError: No module named 'pandas'
Expand title Solution The system Python 3 installation is very limited and does not come with many popular extra packages such as pandas. You may use the Python3 stack available in modules, which comes with almost 400 of those extra packages :
No Format module load python3
After that, if you repeat the command it should complete successfully and print pandas version.
No Format python3 -c "import pandas as pd; print(pd.__version__)"
Run the command below. It will try to check if you have a working setup for using Metview within Python:
No Format python3 -m metview selfcheck
Did it work? What do you need to do to get the following output?
No Format $ python3 -m metview selfcheck Trying to connect to a Metview installation... Hello world - printed from Metview! Metview version X.Y.Z found Your system is ready.
Expand title Solution Certain Python extra packages which are bindings to non-python libraries and tools such as Metview, benefit from the existing installations on the system. You will need to ensure the appropriate modules are loaded in the system before running your Python code. In this case, since Metview is part of
ecmwf-toolbox
module:No Format module load ecmwf-toolbox python3 -m metview selfcheck
What do you need to do to make Python use the latest version of Metview available on the system?
Expand title Solution Just ensure you have the latest
ecmwf-toolbox
loaded :No Format module load --latest ecmwf-toolbox python3 -m metview selfcheck
You need to use the latest version of pandas to run a given application. What can you do (without using conda)?
Expand title Solution In that case you could use pip to install it yourself. However, installing it directly into your user environment is highly discouraged since it may interfere with other applications you may run or after default software updates on the system side. Instead, for small additions to the default environment it is much more robust to use a python virtual environment.
In this case, you may create a virtual environment based on the installations provided, and just add the new version of pandas:
No Format module load python3 mkdir -p $PERM/venvs cd $PERM/venvs python3 -m venv --system-site-packages myvenv
Then you can activate it only when you need it with:
No Format source $PERM/venvs/myenv/bin/activate
Note that we used
$PERM/venvs
as the location of these virtual environments, but you may decide to put them in another location.With the environment activated, you can now install the new version of pandas:
No Format pip install -U pandas
Then you can rerun the version command to check you got the latest
No Format python3 -c "import pandas as pd; print(pd.__version__)"
When you have finished with your environment, you can deactivate it with:
No Format deactivate
You may also use conda to create your own software stack with python packages and beyond. In order to use conda, you can load the corresponding module:
No Format module load conda
What happened?
Expand title Answer While conda may be seen as a way to set up custom Python environments, it also manages software beyond that, installing other packages and libraries not necessarily related to Python itself.
Because those may conflict with the software made available through modules, loading the conda module effectively disables all the other modules that may be loaded in your environment.
You have seen how the module system may have disabled a number of modules. You can also check it by running:
No Format module list
You would then need to install everything you need to run your application or workflow in your conda environment.
If you want to go back to the previous environment without conda but with all the other modules, the recommended way is to reset the environment and then load explicitly all the necessary modules again
No Format module reset module load python3
Create your new conda environment with latest pandas in it. Check the version Hint: you can also use mamba to speed up the environment creation process
Expand title Solution In that case you could use pip to install it yourself. However, installing it directly into your user environment is highly discouraged since it may interfere with other applications you may run or after default software updates on the system side. Instead, for small additions to the default environment it is much more robust to use a python virtual environment.
In this case, you may create a virtual environment based on the installations provided, and just add the new version of pandas:
No Format mamba create -n mypandas -c conda-forge python pandas conda activate mypandas python3 -c "import pandas as pd; print(pd.__version__)"
...
Info | ||
---|---|---|
| ||
To ensure a default environment for the following exercise, reset your modules with:
No Format module reset
The default psql command, part of the PostgreSQL package is not up to date. You need to run the latest version, but you do not want to build it from source. A possible solution is to use a containerised version of this application. Can you run this on Atos HPCF or ECS?
Expand title Solution You can use Apptainer to run docker or any OCI-compatible container images. We can use the official postgres container image from DockerHub:
No Format module load apptainer apptainer exec docker://postgres:latest psql --version
You can also download the image and run it directly later with:
No Format apptainer pull docker://postgres:latest ./postgres_latest.sif psql --version