- Created by Xavier Abellan, last modified on Jan 10, 2023
You are viewing an old version of this page. View the current version.
Compare with Current View Page History
« Previous Version 14 Next »
This is the User guide for the Atos Sequana XH2000 HPCF, installed in ECMWF's data centre in Bologna. This platform provides both the HPCF (AA, AB, AC, AD complexes) and ECGATE services (ECS), which in the past had been on separate platforms.
Below you will find some basic information on the different parts of the system. Please click on the headers or links to get all the details for the given topic.
News Feed
2023-11-22 Change of default versions of ECMWF software packages
When?
The changes will take place on Wednesday 22 November 2023 09:00 UTC
Do I need to do anything?
2023-05-31 Change of default versions of ECMWF and third-party software packages
When?
The changes will take place on Wednesday 31 May 2023 09:00 UTC
Do I need to do anything?
2023-03-27 Scratch automatic purge enabled
From the automatic purge of unused files in SCRATCH is enforced. Any files that have not been accessed at any time in the previous 30 days will be automatically deleted. This purge will be conducted regularly, in order to keep the usage of this filesystem within optimal parameters.
SCRATCH is designed to hold temporary large files and to act as the main storage and working filesystem for your jobs and experiments input and output files, but not to keep data for long term.
2023-01-18 Improving the time and memory limit management for batch jobs
Explicit time limit honoured
From ECMWF will enforce killing jobs if they have reached their wall time if #SBATCH --time
or command line option -–time
was provided with the job.
Alternatively ECMWF accepts jobs without #SBATCH --time
or command line option -–time
and ECMWF will instead use average runtime of previous "similar" jobs by generating job tag based on user, job name, job geometry and job output path.
2022-12-07 Important change in the new Slurm on the Atos HPC
On Slurm on Atos AD complex was updated to version 22.05. Since AD has been the default cluster with hpc-login and hpc-batch being aliases for nodes on AD complex.
The same version of Slurm 22.05 has also been installed on AA and AB complexes and will be installed on AC complex on .
2022-11-30 Unavailability of AA Atos cluster due to system update
On at 08 UTC AA, the default Atos cluster will became unavailable for essential Slurm and security updates. In preparation of this session:
- The default Atos login/batch cluster has been changed to AD on at 9 UTC
- Batch work on AA will be drained and jobs scheduled to finish after at 06 UTC will be automatically redirected to other complexes.
2022-08-10 SSH host keys fixed on all nodes and Cron service
On 2022-08-10 we have set up the same ssh key across the 4 complexes using the ac-login node as a master key. This change addressed the issue of ssh errors when connecting regarding host key changes for a given host after an update, such as:
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ @ WARNING: POSSIBLE DNS SPOOFING DETECTED! @ @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ The ECDSA host key for hpc-login has changed, and the key for the corresponding IP address 10.100.192.100 is unknown. This could either mean that DNS SPOOFING is happening or the IP address for the host and its host key have changed at the same time. @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ @ WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED! @ @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY! Someone could be eavesdropping on you right now (man-in-the-middle attack)! It is also possible that a host key has just been changed. The fingerprint for the ECDSA key sent by the remote host is SHA256:QdNPyN2jAR5m7ngLbtIUjc2JgzknvFP2flMOGbd1i5k. Please contact your system administrator. Add correct host key in /home/user/.ssh/known_hosts to get rid of this message. Offending ECDSA key in /home/user/.ssh/known_hosts:4 ECDSA host key for hpc-login has changed and you have requested strict checking. Host key verification failed.
2022-07-20 Atos ecgate cluster in Bologna available to all ECMWF Member State ecgate users
We are pleased to announce the availability of the General purpose Atos computing service - named 'ecs' - in Bologna, which will replace the ecgate service in Reading.
We invite you to start testing your activities currently running on ecgate in Reading onto 'ecs' in Bologna. To help you with this work, we have made available the Atos HPCF and ecgate Documentation. We strongly encourage you to read carefully through those pages before you start your tests on 'ecs'. Interactive login access to the systems in Bologna will no longer be through ECaccess, but through Teleport.
2022-06-22 AA unavailability and new generic Atos HPCF host names
On 22 June 2022 at 10 UTC the AA complex became unavailable for essential hardware and software upgrades. In preparation for it, the complex was drained and jobs due to finish after 22 June 2022 at 10 UTC did not run. Those jobs still running on the system at the time would have been killed. You may continue to use the AB complex for your interactive and batch workloads.
2022-03-01 Switch PERM to its final NFS location
In the early days of the Atos HPCF your PERM lived temporarily in Lustre while we waited for the storage infrastructure to be completed. On Tuesday 1 March 2022 at 09UTC during a 2 hour system session we switched to the new permanent location based on TrueNAS NFS, with greater capacity.
This relocation should have been transparent to you, so you don't need to do anything. On the days before the session all your data was being copied in the background from your Lustre to the new PERM, with the final sync happening during the system session to ensure a clean switchover. Your new PERM is now automounted when you access a node either interactively or via Slurm.
- No labels