...
Horizontal Navigation Bar | ||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
In the real world, suites can have several thousands thousand tasks. These tasks are not required all the time.
Having a server with hundreds of thousands an extremely large number of tasks can cause performance issues.
- The server writes to the checkpoint file periodically. This disk i/o can interfere with job scheduling , when dealing with an excessively large number of tasks.
- Clients like GUI(ecflow_ui), are also adversely affected by the memory requirements., and slow interactive experience
- Network traffic is heavily affected
...
autoarchive will write a portion of the definition to disk and autorestore can restore from disk on re-queue/begin.
- Archives suite or family nodes *IF* they have child nodes(otherwise does nothing).
- Saves the suite/family nodes to disk, and then removes the in-memory child nodes from the definition.
- It improves time taken to checkpoint and reduces network bandwidth
- If the archived node is re-queued or begun, the child nodes are automatically restored
- The nodes are saved to ECF_HOME/<host>.<port>.ECF_NAME.check, where '/' has been replaced with ':' in ECF_NAME
- Care must be taken if you have trigger reference to the archived nodes
...
- ecflow_client --archive=/s1 # archive suite s1
- ecflow_client --archive=/s1/f1 /s2 # archive family /s1/f1 and suite /s2
- ecflow_client --archive=force /s1 /s2 # archive suites /s1,/s2 even if they have active tasks
Autorestore can also be done automatically, but is only applied when a node completes.
To restore archived nodes manually use :
- ecflow_client --restore=/s1/f1 # restore family /s1/f1
- ecflow_client --restore=/s1 /s2 # restore suites /s1 and /s2
Text
Let us modify the suite definition file again. To avoid waiting this exercise will archive immediately.
Code Block | ||
---|---|---|
| ||
# Definition of the suite test. suite test edit ECF_INCLUDE "$HOME/course" edit ECF_HOME "$HOME/course" edit SLEEP 20 family lf1 autoarchive 0 task t1 ; task t2 ; task t3 ; task t4; task t5 ; task t6; task t7; task t8 ; task t9 endfamily family lf2 autoarchive 0 task t1 ; task t2 ; task t3 ; task t4; task t5 ; task t6; task t7; task t8 ; task t9 endfamily family lf3 autoarchive 0 task t1 ; task t2 ; task t3 ; task t4; task t5 ; task t6; task t7; task t8 ; task t9 endfamily family restore trigger ./lf1lf1<flag>archived == complete and ./lf2 == complete lf2<flag>archived and ./lf3lf3<flag>archived == complete testtask t1 edit SLEEP 60 # wait for autoarchive autorestore ../lf1 ../lf2 ../lf3. # restore when t1 completes endfamily endsuite |
Python
Code Block | ||||
---|---|---|---|---|
| ||||
import os from ecflow import Defs,Suite,Family,Task,Edit,Trigger,Complete,Event,Meter,Time,Day,Date,Label, \ RepeatString,RepeatInteger,RepeatDate,InLimit,Limit,Autoarchive,Autorestore def create_family(name) : return Family(name, Autoarchive(0), [ Task('t{}'.format(i)) for i in range(1,10) ] ) def create_family_restore() : return Family("restore", Trigger("./lf1lf1<flag>archived == complete and ./lf2lf2<flag>archived == complete and ./lf3 == completelf3<flag>archived"), Task('t1', Edit(SLEEP=60), Autorestore(["../lf1","../lf2","../lf3"]))) print("Creating suite definition") home = os.path.join(os.getenv("HOME"),"course") defs = Defs( Suite("test", Edit(ECF_INCLUDE=home,ECF_HOME=home,SLEEP=20), create_family("lf1"),create_family("lf2"),create_family("lf3"), create_family_restore() ) ) print(defs) print("Checking job creation: .ecf -> .job0") print(defs.check_job_creation()) print("Checking trigger expressions and inlimits") assert len(defs.check()) == 0,defs.check() print("Saving definition to file 'test.def'") defs.save_as_defs("test.def") |
What to do
- Type in the changes, cp -r f5 lf1; cp -r f5 lf2; cp -r f5 lf3
- Replace the suite definition
- Run the suite, you should see the task late flag set nodes getting archived, then restored in ecflow_ui
- Experiment with archive and restore in ecflow_uiWhen the job completes, if you re-queue family node f6 or task t1, it will clear the late flag. The late flag can also be cleared manually, select task t1, then with Right Mouse Button , → Special → Clear late flag.
- Experiment with archive and restore from the command line.
Note |
---|
The Autoarchive(0) can take up to one minute to take effect. The server has a 1-minute resolution. |
Button Group | ||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
...