You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 10 Next »

This system allows users to automatically submit jobs to be run when certain points in the daily ECMWF operational forecast suite have been reached. The main purpose is to ensure that certain data is available before e.g. submitting a MARS request. This facility is running using the ECaccess environment. It is available either through the Web interface of ECaccess or with the ECaccess Web Toolkit available at ECMWF on ecgate or the HPCs or installed locally.

Method

The jobs are to be submitted via ECaccess. Events, also known as notifications, have been added to ECaccess. These events correspond to the different stages when the ECMWF operational activity has produced certain data or products. When submitting your job through ECaccess, users will have to specify to which events they want to attach these jobs. When using the web interface, the list of events available to the user will be shown when submitting a new job. If you use the ECaccess Web Toolkit, the command ecaccess-event-list will show you the list of events. The command ecaccess-job-submit will be used to submit your job.

Job submission

Before submitting the job, several environment variables (starting with MSJ) are set by ECaccess and can be used within your job. These variable should make it easier for you to extract the correct operational data just produced.

Note that the jobs submitted through ECaccess will be kept in the ECaccess spool. Jobs attached to one event of the ECMWF operational suite will remain in standby mode in ECaccess up until they are submitted to the batch service , e.g. Slurm on the Atos HPC. The job and job output files can be retrieved using the ECaccess command ecaccess-job-submit.

We offer job rerunning facilities. If one job fails, you can ask ECaccess to rerun your job using the ecaccess-job-submit command.

Job monitoring

The ECMWF operators can monitor your job, via a special interface. In order to report the correct status of your job to this interface, it is important that you make your job fail if an error occurs. The easiest option will be to use the "set -e" command in the Korn Shell. Note that this command is radical; it may stop your job's even when an unimportant error occurs. This command "set -e" is also vital for ECaccess to automatically restart your job on failure.

In order to allow the operators to see your job output files, we recommend you not to specify the job standard output and error files. In which case, these files will be managed by ECaccess and they will be visible to the operators.

If you want to change the content of one of your operational jobs, you should delete the job in standby mode and resubmit the modified version to ECaccess. If you want to remove an operational job, you will delete the job in standby mode using the ecaccess-job-submit

Summary

  1. Take your existing batch job.
  2. Optionally, remove the batch directives redirecting the job output and error files, to allow the operators to see these files.
  3. For your convenience, make use of the dynamic environmental variables starting with MSJ_.
  4. Optionally, include the "set -e" command, to notify the correct status to ECaccess and the monitoring interface.
  5. Check the events available with the command ecaccess-job-submit.
  6. Submit your job to ECaccess and attach it to the appropriate event, using the ECaccess web interface or the ecaccess-job-submit command.
  7. If you have to correct your job, you should delete the job (ecaccess-job-submit) in standby mode and resubmit the new version.

For more information please refer to the User Guide (1.4MB).

  • No labels