- aborted
-
When the ECF_JOB_CMD fails or the job file sends a ecflow_client –abort child command, then the task is placed into a aborted state.
- active
-
If job creation was successful, and job file has started, then the ecflow_client –init child command is received by the ecflow_server and the task is placed into a active state
- autocancel
- autocancel is a way to automatically delete a node which has completed. For BNF see autocancel
- check point
The check point file is like the suite definition, but includes all the state information.
It is periodically saved by the ecflow_server.
It can be used to recover the state of the node tree should server die, or machine crash.
By default when a server is started it will look to load the check point file.
- child command
Child command’s(or task requests) are called from within the ecf script files. They include:
ecflow_client –init # Sets the task to the active status
ecflow_client –event # Set an event
ecflow_client –meter # Change a meter
ecflow_client –label # Change a label
ecflow_client –msg # Send a message to ecFlow-logfile
ecflow_client –wait # wait for a expression to evaluate
ecflow_client –abort # Sets the task to the abort status
ecflow_client –complete # Sets the task to the complete status
- clock
A clock is an attribute of a suite.
A clock always runs in phase with the system clock (UTC in UNIX) but can have any offset from the system clock.
The clock must be either hybrid or real:
Under a hybrid clock, the date never changes unless specifically altered or unless the suite restarts, either automatically or from a begin command.
Under a real clock, the date advances by one day at midnight.
Time and date dependencies work a little differently under the two clocks. The default clock type is hybrid. For BNF see clock
- complete
-
The node can be set to complete:
By the complete trigger
At job end when the task receives the ecflow_client –complete child command
- complete trigger
Force a node to be complete if the expression evaluates, without running any of the nodes.
This allows you to have tasks in the suite which a run only if others fail. In practice the node would need to have a trigger also.
- cron
- Like time, cron defines time dependency for a node, but it can allow the node to be repeated indefinitely. For BNF see cron
- date
This defines a date dependency for a node.
There can be multiple date dependencies. The European format is used for dates, which is: dd.mm.yy as in 31.12.2007. Any of the three number fields can be expressed with a wildcard * to mean any valid value. Thus, 01.*.* means the first day of every month of every year. For BNF see date
- day
This defines a day dependency for a node.
There can be multiple day dependencies. For BNF see day
- defstatus
Defines the default status for a task/family to be assigned to the node when the begin command is issued.
By default node gets queued when you use begin on a suite. defstatus is useful in preventing suites from running automatically once begun or in setting tasks complete so they can be run selectively. For BNF see defstatus
- dependencies
Dependencies are attributes of node. They include trigger, date, day, time today, cron, complete trigger, inlimit and limit.
A node that is dependent can not be started as long as some dependency is holding it.
- directives
directives are expanded during pre-processing. Examples include:
%include <filename>
%comment : start’s a comment, which is ended by %end directive. The section enclosed by %comment - %end is removed during :term:` pre-processing`
%manual : start’s a manual, which is ended by %end directive. The section enclosed by %manual - %end is removed during :term:` pre-processing` However the manual directive is used to create the manual page
%nopp : stops pre-processing until a line stating with %end is found
%end : End pre-processing of %comment, %manual or %nopp
%VAR% : This direct’s the server to perform variable substitution. This involves searching for a suite definition variable or generated variable of name VAR and substituting in the value of the variable.
- ecf script
The ecFlow script refers to an ‘.ecf’ file.
This is similar to a UNIX shell script. The differences, however, includes the addition of “C” like pre-processing directives and ecFlow variable‘s.
- ecFlow
- ecFlow is the Supervisor Monitoring Scheduler software in place at ECMWF that helps computer jobs design, submission and monitoring both in the research and the operations departments.
- ecflow_client
This executable is a command line program; it is used for all communication with the server.
To see the full range of commands that can be sent to the ecflow_server type the following in a UNIX shell:
ecflow_client –helpThis functionality is also provided by the ecFlow Python Api see class ecflow.Client
- ecflow_server
This executable is the server.
It is responsible for scheduling the jobs and responding to ecflow_client requests
Multiple servers can be run on the same machine/host providing they are assigned a unique port number.
The server record’s all request’s in the log file.
The server will periodically write out a check point file.
A check point file is the suite definition with additional state information.
- ecflowview
ecflowview executable is the GUI based client, that is used to visualise and monitor
The hierarchical structure of the suite definition
state changes in the node‘s and the ecflow_server, using colour coding
Attributes of the nodes and any dependencies
ecf script file and the corresponding job file
- event
The purpose of an event is to signal partial completion of a task and to be able to trigger another job which is waiting for this partial completion.
Only tasks can have events and they can be considered as an attribute of a task.
There can be many events and they are displayed as nodes.
An event has a number and possibly a name. If it is only defined as a number, its name is the text representation of the number without leading zeroes. For BNF see event
- extern
This allows an external node to be used in a trigger expression.
All node‘s in trigger‘s must be known to ecflow_server by the end of the load command. No cross-suite dependencies are allowed unless the names of tasks outside the suite are declared as external. An external trigger reference is considered unknown if it is not defined when the trigger is evaluated. You are strongly advised to avoid cross-suite dependencies.
Families and suites that depend on one another should be placed in a single suite. If you think you need cross-suite dependencies, you should consider merging the suites together and have each as a top-level family in the merged suite. For BNF see extern
- family
A family is an organisational entity that is used to provide hierarchy and grouping. It consists of a collection of task‘s and families.
Typically you place tasks that are related to each other inside the same family, analogous to the way you create directories to contain related files. For BNF see family
It serves as an intermediate node in a suite definition.
- halted
Is a ecflow_server state
The following tables reflects the server capabilities in the different states
State User Request Task Request Job Scheduling Auto-Check-pointing running yes yes yes yes shutdown yes yes no yes halted yes no no no - inlimit
The inlimit works in conjunction with limit for providing simple load management
inlimit is added to the node that needs to be limited.
- job creation
The process of job creation includes:
o Locating ecf script files , corresponding to the task in the suite definition
The steps above transforms an ecf script to a job file that can be submitted.
The running jobs will communicate back to the ecflow_server by calling child command‘s.
This causes status changes on the node‘s in the ecflow_server and flags can be set to indicate various events.
- job file
The job file is created by the ecflow_server during job creation.
It is derived from the ecf script after expanding the pre-processing directives.
It has the extension ”.job{try number}”, i.e. t1.job1
- label
- A label has a name and a value and is a way of displaying information in ecflowview For BNF see label
- late
Define a tag for a node to be late.
Suites cannot be late, but you can define a late tag for submitted in a suite, to be inherited by the families and tasks. When a node is classified as being late, the only action ecflow_server takes is to set a flag. ecflowview will display these alongside the node name as an icon (and optionally pop up a window). For BNF see late
- limit
- limit provides a means of providing simple load management by say limiting the number of tasks submitted to a specific server. Typically you either define limits on suite level or define a separate suite to hold limits so that they can be used by multiple suites. For BNF see limit and inlimit
- manual page
Manual pages are part of the ecf script.
This is to ensure that the manual page is updated when the script is updated. The manual page is a very important operational tool allowing you to view a description of a task, and possibly describing solutions to common problems. The pre-processing can be used to extract the manual page from the script file and is visible in ecflowview. The manual page is the text contained within the %manual and %end directives. They can be seen using the manual button on ecflowview.
- meter
- The purpose of a meter is to signal proportional completion of a task and to be able to trigger another job which is waiting on this proportional completion For BNF see meter
- node
suite, family and task form a hierarchy. Where a suite serves as the root of the hierarchy. The family provides the intermediate nodes, and the task provide the leaf’s.
Collectively suite, family and task can be referred to as nodes.
- pre-processing
Pre-processing takes place during job creation and acts on directives specified in ecf script file.
This involves:
o expanding any includes file directives. i.e similar to ‘c’ language pre-processing
o removing comments and manual directives
o performing variable substitution
- queued
-
After the begin command, the task without a defstatus are placed into the queued state
- repeat
Repeats provide looping functionality. There can only be a single repeat on a node.
repeat day step [ENDDATE] # only for suites
repeat integer VARIABLE start end [step]
repeat enumerated VARIABLE first [second [third ...]]
repeat string VARIABLE str1 [str2 ...]
repeat file VARIABLE filename
repeat date VARIABLE yyyymmdd yyyymmdd [delta]
The repeat VARIABLE can be used in trigger and complete trigger expressions For BNF see repeat
- running
Is a ecflow_server state.
The following tables reflects the server capabilities in the different states
State User Request Task Request Job Scheduling Auto-Check-pointing running yes yes yes yes shutdown yes yes no yes halted yes no no no - scheduling
The ecflow_server is responsible for task scheduling.
It will check dependencies in the suite definition every minute. If these dependencies are free, the ecflow_server will submit the task. See job creation.
- shutdown
Is a ecflow_server state.
The following tables reflects the server capabilities in the different states
State User Request Task Request Job Scheduling Auto-Check-pointing running yes yes yes yes shutdown yes yes no yes halted yes no no no - status
Each node in suite definition has a status.
Status reflects the state of the node. In ecflowview the background colour of the text reflects the status.
task status are: unknown, queued, submitted, active, complete, aborted and suspended
ecflow_server status are: shutdown, halted, running this is shown on the root node in ecflowview
- submitted
-
When the task dependencies are resolved/free the ecflow_server places the task into a submitted state. However if the ECF_JOB_CMD fails, the task is placed into the aborted state
- suite
A suite is organisational entity. It is serves as the root node in a suite definition. It should be used to hold a set of jobs that achieve a common function. It can be used to hold user variable s that are common to all of its children.
Only a suite node can have a clock.
It is a collection of family‘s, variable‘s, repeat and a single clock definition. For a complete list of attributes look at BNF for suite
- suite definition
The suite definition is the hierarchical node tree.
It describes how your task‘s run and interact.
It can built up using
Ascii text file by following the rules defined in the ecFlow Definition file Grammar.
Hence any language can be used, to generate this format.
Once the definition is built, it can be loaded into the ecflow_server, and started. It can be monitored by ecflowview
- suspended
Is a node state. A node can be placed into the suspended state via a defstatus or via ecflowview
A suspended node including any of its children can not take part in scheduling until the node is resumed.
- task
A task represents a job that needs to be carried out. It serves as a leaf node in a suite definition
Only tasks can be submitted.
A job inside a task ecf script should generally be re-entrant so that no harm is done by rerunning it, since a task may be automatically submitted more than once if it aborts.
For BNF see task
- time
This defines a time dependency for a node.
Time is expressed in the format [h]h:mm. Only numeric values are allowed. There can be multiple time dependencies for a node, but overlapping times may cause unexpected results. To define a series of times, specify the start time, end time and a time increment. If the start time begins with ‘+’, times are relative to the beginning of the suite or, in repeated families, relative to the beginning of the repeated family. For BNF see time
- today
Like time, but “today” does not wrap to tomorrow.
If suites’ begin time is past the time given for the “today” command the node is free to run (as far as the time dependency is concern.) For BNF see today
- trigger
Triggers defines a dependency for a task or family.
There can be only one trigger dependency per node, but that can be a complex boolean expression of the status of several nodes. Triggers should be avoided on suites. A node with a trigger can only be activated when its trigger has expired. A trigger holds the node as long as the trigger’s expression evaluation returns false.
- unknown
-
This is the default node status when a suite definition is loaded into the ecflow_server
- variable
ECF makes heavy use of different kinds of variables.There are several kinds of variables:
Environment variables: which are set in the UNIX shell before the ecFlow starts. These control ecflow_server, and ecflow_client .
suite definition variables: Also referred to as user variables. These control ecflow_server, and ecflow_client and are available for use in job file.
Generated variables: These are generated within the suite definition node tree during job creation and are available for use in the job file.
For BNF see variable
- variable inheritance
When a variable is needed at job creation time, it is first sought in the task itself.
If it is not found in the task, it is sought from the task’s parent and so on, up through the node levels until found.
For any node, there are two places to look for variables.
Suite definition variables are looked for first, and then any generated variables.
- variable substitution
Takes place during pre-processing
It involves searching each line of ecf script file, for ECF_MICRO character. typically ‘%’
The text between two % character, defines a variable. i.e %VAR%
This variable is searched for in the suite definition.
First the suite definition variables( sometimes referred to as user variables) are searched and then the generated variables.
The value of the variable is replaced between the % characters.
If the variable is not found in the suite definition during pre-processing then job creation fails, and an error message is written to the log file, and the task is placed into the aborted state.
To avoid this variables in the ecf script can be defined as:
%VAR:replacement% : This is similar to %VAR% but if VAR is not found in the suite definition then ‘replacement’ is used.