...
aborted | When the ECF_JOB_CMD fails or the job file sends a ecflow_client –abort child command , then the task is placed into a aborted state. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
active | If job creation was successful, and job file has started, then the ecflow_client –init child command is received by the ecflow_server and the task is placed into a active state | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
autocancel | autocancel is a way to automatically delete a node which has completed. The delete may be delayed by an amount of time in hours and minutes or expressed in days. Any node may have a single autocancel attribute. If the auto cancelled node is referenced in the trigger expression of other nodes it may leave the node waiting. This can be solved by making sure the trigger expression also checks for the unknown state. i.e...:
This guards against the ‘node_to_cancel’ being undefined or deleted For python see ecflow.Autocancel and ecflow.Node.add_autocancel . For text BNF see autocancel | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
check point | The check point file is like the suite definition , but includes all the state information. It is periodically saved by the ecflow_server (this period can be changed, see ecflow_client --help check_pt) It can be used to recover the state of the node tree should server die, or machine crash. By default when a ecflow_server is started it will look to load the check point file. The default check point file name is <host>.<port>.ecf.check. This can be overridden by the ECF_CHECK environment variable. The check point file format is the same as the defs file format.( from release 4.7.0 onwards). However the indentation has been removed to preserve space. To view with indentation use :
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
child command | Child command’s(or task requests) are called from within the ecf script files. The table also includes the the default action(from version 4.0.4) if the child command is part of a zombie. They include:
The following environment variables must be set for the child commands. ECF_HOST, ECF_NAME ,ECF_PASS and ECF_RID. See ecflow_client . | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
clock | A clock is an attribute of a suite . A gain can be specified to offset from the given date. The hybrid and real clock’s always runs in phase with the system clock (UTC in UNIX) but can have any offset from the system clock. The clock can be :
time , day and date and cron dependencies work a little differently under the clocks. If the ecflow_server is shutdown or halted the job scheduling is suspended. If this suspension is left for period of time, then it can affect task submission under hybrid and real clocks. In particular it will affect task s with time , today or cron dependencies .
For python see ecflow.Clock and ecflow.Suite.add_clock . For text BNF see clock | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
complete | The node can be set to complete:
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
complete expression | Force a node to be complete if the expression evaluates, without running any of the nodes. This allows you to have tasks in the suite which a run only if others fail. In practice the node would need to have a trigger also. For python see ecflow.Expression and ecflow.Node.add_complete | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
cron | Like time , cron defines time dependency for a node , but it will be repeated indefinitely
When the node becomes complete it will be queued immediately. This means that the suite will never complete, and the output is not directly accessible through ecflow_ui If tasks abort, the ecflow_server will not schedule it again. If the time the job takes to complete is longer than the interval a time "slot" is missed, e.g. cron 10:00 20:00 01:00 if the 10:00 run takes more than an hour, the 11:00 run will be skipped. If the cron defines months, days of the month, or week days or a single time slot the it relies on a day change, hence if a hybrid clock is defined, then it will be set to complete at the beginning of the suite , without running the corresponding job. Otherwise under a hybrid clock the suite would never complete . For python see ecflow.Cron and ecflow.Node.add_cron . For text BNF see cron | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
date | This defines a date dependency for a node. There can be multiple date dependencies. In this case the node is free to run when any of dates occur. The European format is used for dates, which is: dd.mm.yy as in 31.12.2007. Any of the three number fields can be expressed with a wildcard * to mean any valid value. Thus, 01.*.* means the first day of every month of every year. If a hybrid clock is defined, any node held by a date dependency will be set to complete at the beginning of the suite , without running the corresponding job. Otherwise under a hybrid clock the suite would never complete . For python see: ecflow.Date and ecflow.Node.add_date . For text BNF see date | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
day | This defines a day dependency for a node. There can be multiple day dependencies. If any of day's occur the effect is to have 'or type behaviour. If a hybrid clock is defined, any node held by a day dependency will be set to complete at the beginning of the suite , without running the corresponding job. Otherwise under a hybrid clock the suite would never complete . For python see: ecflow.Day and ecflow.Node.add_day . For text BNF see day | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
defstatus | Defines the default status for a task/family to be assigned to the node when the begin command is issued. By default node gets queued when you use begin on a suite . defstatus is useful in preventing suites from running automatically once begun or in setting tasks complete so they can be run selectively. For python see ecflow.DState and ecflow.Node.add_defstatus . For text BNF see defstatus | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
dependencies | Dependencies are attributes of node, that can suppress/hold a task from taking part in job creation . They include trigger , date , day , time , today , cron , complete expression , inlimit and limit . A task that is dependent can not be started as long as some dependency is holding it or any of its parent node s. The ecflow_server will check the dependencies every minute, during normal scheduling and when any child command causes a state change in the suite definition . | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
directives | Directives start with a % charater. This is referred to as ECF_MICRO character. The directives are used in two main context.
Directives are expanded during pre-processing . Examples include:
From ecflow release 4.4.0, will also allow use of %VAR% (variable substitution) as a part of the filename. i.e..
Care should be take to avoid spaces in the variable values. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
ecf file location algorithm | ecflow_server and job creation checking uses the following algorithm to locate the ‘.ecf’ file corresponding to a task :
The search can be reversed, by adding a variable ECF_FILES_LOOKUP, with a value of "prune_leaf". ( from ecflow 4.12.0) Then ecFlow will use the following search pattern.
However please be aware this will also affect the search in ECF_HOME
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
ecf script | The ecFlow script refers to an '.ecf' file. The script file is transformed into the job file by the job creation process. The base name of the script file must match its corresponding task . i.e.. t1.ecf , corresponds to the task of name ‘t1’. The script if placed in the ECF_FILES directory, may be re-used by multiple tasks belonging to different families, providing the task name matches. The ecFlow script is similar to a UNIX shell script. The differences, however, includes the addition of 'c' like pre-processing directives and ecFlow variable ‘s. Also the script must include calls to the init and complete child command s so that the ecflow_server is aware when the job starts (i.e... changes state to active ) and finishes ( i.e.. changes state to complete ) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
ECF_DUMMY_TASK | This is a user variable that can be added to task to indicate that there is no associated ecf script file. If this variable is added to suite or family then all child tasks are treated as dummy. This stops the server from reporting an error during job creation . edit ECF_DUMMY_TASK '' | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
ECF_HOME | This is user defined variable; it has four functions:
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
ECF_INCLUDE | This is a user defined variable. It is used to specify directory locations, that are used to search for include files. edit ECF_INCLUDE /home/fred/course/include # a single directory edit ECF_INCLUDE /home/fred/course/include:/home/fred/course/include2:/home/fred/course/include_me # s set of directories to search | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
ECF_JOB | This is a generated variable . If defines the path name location of the job file. The variable is composed as: ECF_HOME/ECF_NAME.job<ECF_TRYNO> | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
ECF_JOB_CMD | This variable should point to a script that can submit the job. (i.e to the queing system, via, SLURM,PBS). The ecFlow server will detect abornal termination of this command. When this happens a flag is set. This should be visible in the GUI. If ECF_JOB command fails, and the task is in a submitted state, then the task is set to the aborted state. However if the task was active or complete, then we do NOT abort the task. Instead the zombie flag is set. (since ecflow 4.17.1) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
ECF_JOBOUT | This is a generated variable . This variable defines the path name for the job output file. The variable is composed as following. If ECF_OUT is specified: ECF_OUT/ECF_NAME.ECF_TRYNO otherwise: ECF_HOME/ECF_NAME.ECF_TRYNO | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
ECF_LISTS | This is the server variable. The variable specifies the path to the White list file. This file controls who has read/write access to the server via the user commands. It has a very simple format. The file path specified by ECF_LISTS environment, is read by the server on start up. The contents of the white list can be modified, and reloaded by the server. ( However the path to the white-list file can NOT be modified after the server has started) If ECF_LISTS is not set, the server will look for a file named <host>.<port>.ecf.lists (i.e.. my_host.3141.ecf.lists) in same directory where the server was started. If the file specified by ECF_LISTS or <host>.<port>.ecf.lists, does not exist or exists but is empty, then all users will have read/write access to suites on the server. Special care must be taken, so that user reloading the white list file does not remove write access for the administrator.
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
ECF_PASSWD | This is environment variable that point to a password file for both client and server. This enables password based authentication for ecFlow user commands. The password file is required for the client and server.
The server administrator needs to set unix file permission, so that this file is only readable by ecFlow server and the administrator. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
ECF_MICRO | This is a suite and generated variable . The default value is %. This variable is used in variable substitution during command invocation and default directive character during pre-processing . It can be overridden, but must be replaced by a single character. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
ECF_NAME | This is a generated variable . It defines the path name of the task. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
ECF_NO_SCRIPT | This is a user variable, that can be added to a Node.(introduced with ecFlow release 4.3.0). It is used to inform the ecflow_server that there is no SCRIPT associated with a task. However unlike ECF_DUMMY_TASK, the task can still be submitted provided the ECF_JOB_CMD is set up. This is suitable for very lightweight tasks that want to minimize latency. The output can still be seen, if it is redirected to ECF_JOBOUT. Care must be taken to ensure the path to ecflow_client is accessible.
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
ECF_PASS | This is a generated variable . During job generation process in the server, a unique password is generated and stored in the task. It then replaces %ECF_PASS% in the scripts(.ecf), with the actual value. When the job runs, ecflow_client reads this, as an environment variable, and passes it to the server. The server then compares this password with the one held on the task. This is used as a part of the authentication for child commands, and is used to detect zombies. The authentication process can be bypassed, and allow the job to proceed (i.e.. when the user is sure that there is only a single process, trying to communicate with the server), by adding it as a user variable. i.e.. ecflow_client --alter add variable ECF_PASS FREE <path to task> This functionality is also available in the GUI. Select a task. RMB->Special->Free password. However it is important not leave this in place, as it will always bypass the authentication. Just delete the variable. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
ECF_SCRIPT | This is a generated variable . If defines the path name for the ecf script | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
ECF_TRIES | This is generated variable added at the server level with a default value of 2. It can be overridden by the user and controls the number of times job should re-run should it abort. Provided:
Please note this allows your scripts to self aware of the number times it is being run. i.e.
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
ECF_TRYNO | This is a generated variable that is used in file name generation. It represents the current try number for the task . It can also be referenced inside .ecf script, to allow the job to take a different course dependent on the ECF_TRYNO. After begin it is set to 1. The number is advanced if the job is re-run. It is re-set back to 1 after a re-queue. It is used in output and job file numbering. (i.e.. It avoids overwriting the job file output during multiple re-runs) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
ECF_OUT | This is user/suite variable that specifies a directory PATH. It controls the location of job output(stdout and stderr of the process) on a remote file system. It provides an alternate location for the job and cmd output files. If it exists, it is used as a base for ECF_JOBOUT, but it is also used to search for the output by ecFlow, when asked by ecflow_ui /CLI. If the output is in ECF_OUT/ECF_NAME.ECF_TRYNO it is returned, otherwise ECF_HOME/ECF_NAME.ECF_TRYNO is used. The user must ensure that all the directories exists, including suite/family. If this is not done, you may well find task remains stuck in a submitted state. At ECMWF our submission scripts will ensure that directories exists. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
ecFlow | Is the ECMWF work flow manager. A general purpose application designed to schedule a large number of computer process in a heterogeneous environment. Helps computer jobs design, submission and monitoring both in the research and operation departments. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
ecflow_client | This executable is a command line program; it is used for all communication with the ecflow_server . To see the full range of commands that can be sent to the ecflow_server type the following in a UNIX shell:
This functionality is also provided by the Client Server API . The following variables affect the execution of ecflow_client. Since the ecf script can call ecflow_client( i.e.. child command ) then typically some are set in an include header. i.e.. head.h . Environment Variable common for user and child commands
Environment Variables for child commands
Variables specific to User commands
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
ecflow_server | This executable is the server. It is responsible for scheduling the jobs and responding to ecflow_client requests Multiple servers can be run on the same machine/host providing they are assigned a unique port number. The server record’s all request’s in the log file. The server will periodically(See ECF_CHECKINTERVAL) write out a check point file. The following environment variables control the execution of the server and may be set before the start of the server. ecflow_server will start happily with out any of these variables being set, since all of them have default values.
The server can be in several states. The default when first started is halted , See server states | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
ecflow_ui | ecflow_ui executable in the new GUI based client. It is used to visualise and monitor the hierarchical structure of the suite definition . | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
ecflowview | ecflowview executable is the GUI based client, that is used to visualise and monitor the hierarchical structure of the suite definition. ( this will be deprecated in the future )
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
event | The purpose of an event is to signal partial completion of a task and to be able to trigger another job which is waiting for this partial completion. Only tasks can have events and they can be considered as an attribute of a task . There can be many events and they are displayed as nodes. The event is updated by placing the –event child command in a ecf script . An event has a number and possibly a name. If it is only defined as a number, its name is the text representation of the number without leading zeroes. For python see: ecflow.Event and ecflow.Node.add_event For text BNF see event If the event child command s, results in a zombie , then the default action if for the server to fob, this allows the ecflow_client command to exit normally. (i,e without any errors). This default can be overridden by using a zombie attribute. Events can be referenced in trigger and complete expression s. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
extern | This allows an external node to be used in a trigger expression. All node ‘s in trigger ‘s must be known to ecflow_server by the end of the load command. No cross-suite dependencies are allowed unless the names of tasks outside the suite are declared as external. An external trigger reference is considered unknown if it is not defined when the trigger is evaluated. You are strongly advised to avoid cross-suite dependencies . Families and suites that depend on one another should be placed in a single suite . If you think you need cross-suite dependencies, you should consider merging the suites together and have each as a top-level family in the merged suite. For BNF see extern | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
family | A family is an organisational entity that is used to provide hierarchy and grouping. It consists of a collection of task ‘s and families. Typically you place tasks that are related to each other inside the same family, analogous to the way you create directories to contain related files. For python see ecflow.Family . For BNF see family It serves as an intermediate node in a suite definition . | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
halted | Is a ecflow_server state. See server states | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
hybrid clock | A hybrid clock is a complex notion: the date and time are not connected. The date has a fixed value during the complete execution of the suite . This will be mainly used in cases where the suite does not complete in less than 24 hours. This guarantees that all tasks of this suite are using the same date . On the other hand, the time follows the time of the machine. Hence the date never changes unless specifically altered or unless the suite restarts, either automatically or from a begin command. Under a hybrid clock any node held by a date , day or cron dependency will be set to complete at the beginning of the suite. (i.e.. without its job ever running). Otherwise the suite would never complete . | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
inlimit | The inlimit works in conjunction with limit / ecflow.Limit for providing simple load management inlimit is added to the node that needs to be limited.
For python see ecflow.InLimit and ecflow.Node.add_inlimit . For text BNF see inlimit | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
job creation | Job creation or task invocation can be initiated manually via ecflow_ui but also by the ecflow_server during scheduling when a task (and all of its parent node s) is free of its dependencies . The process of job creation includes:
The steps above transforms an ecf script to a job file that can be submitted by performing variable substitution on the ECF_JOB_CMD variable and invoking the command. The running jobs will communicate back to the ecflow_server by calling child command ‘s. This causes status changes on the node ‘s in the ecflow_server and flags can be set to indicate various events. If a task is to be treated as a dummy task( i.e.. is used as a scheduling task) and is not meant to to be run, then a variable of name ECF_DUMMY_TASK can be added.
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
job file | The job file is created by the ecflow_server during job creation using the ECF_TRYNO variable It is derived from the ecf script after expanding the pre-processing directives . It has the form <task name>.job< ECF_TRYNO >”, i.e.. t1.job1. Note job creation checking will create a job file with an extension with zero. i.e.. ‘.job0’. See ecflow.Defs.check_job_creation When the job is run the output file has the ECF_TRYNO as the extension. i.e.. t1.1 where ‘t1’ represents the task name and ‘1’ the ECF_TRYNO | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
label | A label has a name and a value and is a way of displaying information in ecflow_ui By placing a label child command s in the ecf script the user can be informed about progress in ecflow_ui . If the label child command s, results in a zombie then the default action if for the server to fob, this allows the ecflow_client command to exit normally. (i,e without any errors). This default can be overridden by using a zombie attribute. For python see ecflow.Label and ecflow.Node.add_label . For text BNF see label | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
late | Define a tag for a node to be late. A node can only have one late attribute. The late attribute only applies to a task. You can define it on a Suite/Family in which case it will be inherited. Any late defined lower down the hierarchy will override the aspect(submitted,active, complete) defined higher up.
Suites and families cannot be late, but you can define a late tag for submitted in a suite, to be inherited by the families and tasks. When a node is classified as being late, the only action ecflow_server takes is to set a flag. ecflow_ui will display these alongside the node name as an icon (and optionally pop up a window).
The late attribute can be added/deleted to any suite/family/task.
For python see ecflow.Late and ecflow.Node.add_late . For text BNF see late | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
limit | Limits provide simple load management by limiting the number of tasks submitted by a specific ecflow_server . Typically you either define limits on suite level or define a separate suite to hold limits so that they can be used by multiple suites. Setting limits on a separate suite, has the benefit that by setting the limit value to zero, you can control task submission over a number of suites.
The limits are used in conjunction with inlimit The limit max value can be changed on the command line
It can also be changed in python:
For python see ecflow.Limit and ecflow.Node.add_limit . For BNF see limit and inlimit | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
manual page | Manual pages are part of the ecf script . This is to ensure that the manual page is updated when the ecf script is updated. The manual page is a very important operational tool allowing you to view a description of a task, and possibly describing solutions to common problems. The pre-processing can be used to extract the manual page from the script file and is visible in ecflow_ui . The manual page is the text contained within the %manual and %end directives . They can be seen using the manual button on ecflow_ui . The text in the manual page in not included in the job file . There can be multiple manual sections in the same ecf script file. When viewed they are simply concatenated. It is good practice to modify the manual pages when the script changes. The manual page may have the %include directives. Suite and families may also have a manual page. These will also be available in the GUI. Ecflow will look for a file <node_name>.man (where node_name is the name of suite or family) using a backwards search algorithm first in ECF_FILES directory, then ECF_HOME directory. Note that errors in variable pre-processing are ignored inside of a manual section. It should also be noted that for family and suite manuals, the %manual and %end directives are not strictly necessary, as the whole file is treated as a manual. If we have family: /suite/big/f1, ecflow will search for "f1.man" in:
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
meter | The purpose of a meter is to signal proportional completion of a task and to be able to trigger another job which is waiting on this proportional completion. The meter is updated by placing the –meter child command in a ecf script . For python see: ecflow.Meter and ecflow.Node.add_meter . For text BNF see meter If the meter child command s, results in a zombie, then the default action if for the server to fob , this allows the ecflow_client command to exit normally. (i,e without any errors). This default can be overridden by using a zombie attribute. Meter’s can be referenced in trigger and complete expression expressions. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
node | suite , family and task form a hierarchy. Where a suite serves as the root of the hierarchy. The family provides the intermediate nodes, and the task provide the leaf’s. Collectively suite , family and task can be referred to as nodes. For python see ecflow.Node . | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
pre-processing | Pre-processing takes place during job creation and acts on directives specified in ecf script file. This involves:
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
queued | After the begin command, the task without a defstatus are placed into the queued state | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
real clock | A suite using a real clock will have its clock matching the clock of the machine. Hence the date advances by one day at midnight. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
repeat | Repeats provide looping functionality. There can only be a single repeat on a node .
The repeat variable name is available as a generated variable. The repeat date defines additional generated variables(from ecflow 4.7.0) , which are scoped with prefix of the variable name i.e.
For example:
If a repeat is added to a family/suite, then the repeat will ONLY loop(and automatically re-queue its children) if all the children are complete. Hence additional care needs to be taken. i.e.. if the parent node has a repeat and the child has a cron attribute then the cron will always force a re-queue on the node once it has run, and hence will stop the parent from looping. If we use relative time attribute. i.e. time +02:00, under a repeat, then the time is relative to the repeat re-queue. The repeat VARIABLE can be used in trigger and complete expression expressions. Depending on the kind of repeat the value can vary:
If a “repeat date” or "repeat datelist" VARIABLE is used in a trigger expression then date arithmetic is used, when the expression uses addition and subtraction. i.e..
Now when task 'a' and Task 'b' complete, the repeat is incremented, and any relative time attributes are reset. In this case effectively delaying the starting of task 'a' for 1 minute. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
running | Is a ecflow_server state. See server states | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
scheduling | The ecflow_server is responsible for task scheduling. It will check dependencies in the suite definition every minute. If these dependencies are free, the ecflow_server will submit the task. See job creation . | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
server states | The following tables reflects the ecflow_server capabilities in the different states
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
shutdown | Is a ecflow_server state. See server states | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
status | Each node in suite definition has a status. Status reflects the state of the node . In ecflow_ui the background colour of the text reflects the status. task status are: unknown , queued , submitted , active , complete , aborted and suspended ecflow_server status are: shutdown , halted , running this is shown on the root node in ecflow_ui | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
submitted | When the task dependencies are resolved/free the ecflow_server places the task into a submitted state. However if the ECF_JOB_CMD fails, the task is placed into the aborted state | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
suite | A suite is organisational entity. It is serves as the root node in a suite definition . It should be used to hold a set of jobs that achieve a common function. It can be used to hold user variable s that are common to all of its children. Only a suite node can have a clock . Suite generated variables:
It is a collection of family ‘s, variable ‘s, repeat and a single clock definition. For a complete list of attributes look at BNF for suite . For python see ecflow.Suite . | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
suite definition | The suite definition is the hierarchical node tree. It describes how your task ‘s run and interact. It can built up using:
Once the definition is built, it can be loaded into the ecflow_server , and started. It can be monitored by ecflow_ui | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
suspended | Is a node state. A node can be placed into the suspended state via a defstatus or via ecflow_ui A suspended node including any of its children can not take part in scheduling until the node is resumed. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
task | A task represents a job that needs to be carried out. It serves as a leaf node in a suite definition Only tasks can be submitted. A job inside a task ecf script should generally be re-entrant so that no harm is done by rerunning it, since a task may be automatically submitted more than once if it aborts. For python see ecflow.Task . For text BNF see task | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
time dependencies | This includes, time,today, day date, cron. When we have multiple time dependencies on the same task, then time dependency of the same type are or'ed together, and and'ed with the different types.
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
time | This defines a time dependency for a node. Time is expressed in the format [h]h:mm. Only numeric values are allowed. There can be multiple time dependencies for a node, but overlapping times may cause unexpected results.
To define a series of times, specify the start time, end time and a time increment. If the start time begins with ‘+’, times are relative to the beginning of the suite or, in repeated families, relative to the beginning/re-queue of the repeated family. If the time the job takes to complete is longer than the interval a time 'slot' is missed, e.g. time 10:00 20:00 01:00 if the 10:00 run takes more than an hour, the 11:00 run will never occur. For python see ecflow.Time and ecflow.Node.add_time . For BNF see time | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
today | Like time , but If the suites begin time is past the time given for the “today” , then the node is free to run (as far as the time dependency is concerned). For example: task x today 10:00 If we begin or re-queue the suite at 9.00 am, then the task in held until 10.00 am. However if we begin or re-queue the suite at 11.00am, the task is run immediately. Now lets look at time: task x time 10:00 If we begin or re-queue the suite at 9.00am, then the task in held until 10.00 am. If we begin or re-queue the suite at 11.00am, the task is still held. If the time the job takes to complete is longer than the interval a 'slot' is missed, e.g. today 10:00 20:00 01:00 if the 10:00 run takes more than an hour, the 11:00 run will never occur. For python see ecflow.Today . For text BNF see today | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
trigger | Triggers defines a dependency for a task or family . There can be only one trigger dependency per node , but that can be a complex boolean expression of the status of several nodes. Triggers can not be added to the suite node. A node with a trigger can only be activated when its trigger has expired. A trigger holds the node as long as the trigger’s expression evaluation returns false. Trigger evaluation occurs when ever the child command communicates with the server. i.e.. whenever there is a state change in the suite definition and at least once every 60 seconds The keywords in trigger expressions are: unknown , suspended , complete , queued , submitted , active , aborted and clear and set for event status. Triggers can also reference Node attributes like event , meter , variable , repeat and generated variables and limits. Triggers can also reference the late flag on a node. Trigger evaluation for node attributes uses integer arithmetic:
Here are some examples:
What happens when we have multiple node attributes of the same name, referenced in trigger expressions ?
In this case ecFlow will use the following precedence: Hence in the example above expression ‘foo:blah >= 0’ will reference the event. For python see ecflow.Expression and ecflow.Node.add_trigger | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
unknown | This is the default node status when a suite definition is loaded into the ecflow_server | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
user commands | User commands are any client to server requests that are not child command s. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
variable | ecFlow makes heavy use of different kinds of variables.There are several kinds of variables:
Variables can be referenced in trigger and complete expression s . The value part of the variable should be convertible to an integer otherwise a default value of 0 is used. For python see ecflow.Node.add_variable . For BNF see variable | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
variable inheritance | When a variable is needed at job creation time, it is first sought in the task itself. If it is not found in the task , it is sought from the task’s parent and so on, up through the node levels until found. For any node , there are two places to look for variables. Suite definition variables are looked for first, and then any generated variables. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
variable substitution | Takes place during pre-processing or command invocation.(i.e.. ECF_JOB_CMD,ECF_KILL_CMD,etc) It involves searching each line of ecf script file or command, for ECF_MICRO character. typically ‘%’ The text between two % character, defines a variable. i.e.. %VAR% This variable is searched for in the suite definition . First the suite definition variables( sometimes referred to as user variables) are searched and then Repeat variable name, and finally the generated variables.If no variable is found then the same search pattern is repeated up the node tree. The value of the variable is replaced between the % characters. If the micro character are not paired and an error message is written to the log file, and the task is placed into the aborted state. If the variable is not found in the suite definition during pre-processing then job creation fails, and an error message is written to the log file, and the task is placed into the aborted state. To avoid this, variables in the ecf script can be defined as: %VAR:replacement% This is similar to %VAR% but if VAR is not found in the suite definition then ‘replacement’ is used. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
virtual clock | Like real clock until the ecflow_server is suspended (i.e.. shutdown or halted ), the suites clock is also suspended. Hence will honour relative times in cron , today and time dependencies. It is possible to have a combination of hybrid/real and virtual. More useful when we want complete adherence to time related dependencies at the expense being out of sync with system time. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
zombie | Zombies are running jobs that fail authentication when communicating with the ecflow_server child command s like (init, event,meter, label, abort,complete) are placed in the ecf script file and are used to communicate with the ecflow_server . The ecflow_server authenticates each connection attempt made by the child command . Authentication can fail for a number of reasons:
When authentication fails the job is considered to be a zombie. The ecflow_server will keep a note of the zombie for a period of time, before it is automatically removed. However the removed zombie, may well re-appear. ( this is because each child command will continue attempting to contact the ecflow_server for 24 hours. This is configurable see ECF_TIMEOUT on ecflow_client ) For python see ecflow.ZombieAttr , ecflow.ZombieUserActionType There are several types of zombies see zombie type and ecflow.ZombieType | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
zombie attribute | The zombie attribute defines how a zombie should be handled in an automated fashion. Very careful consideration should be taken before this attribute is added as it may hide a genuine problem. It can be added to any node . But is best defined at the suite or family level. If there is no zombie attribute the default behaviour for init,complete,wait and abort child command s, is to block, whereas for label, event, meter the default behaviour is to fob. (from version 4.0.4, previously all child command s blocked). To add a zombie attribute in python, please see: ecflow.ZombieAttr | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
zombie type | See zombie and class ecflow.ZombieAttr for further information. How do zombies arise.
There are several types of zombies:
The type of the zombie is not fixed and may change. |
...