ecFlow's documentation is now on readthedocs!

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 7 Next »

This page contains macros or features from a plugin which requires a valid license.

You will need to contact your administrator.

Previous Up Next


In the real world suites can have several thousands tasks. These tasks are not required all the time.

Having a server with hundreds of thousands of tasks can cause performance issues.

  • The server writes to the checkpoint file periodically. This disk i/o can interfere with job scheduling, when dealing with excessively large number tasks.
  • Clients like GUI(ecflow_ui), are also adversely affected by the memory requirements.
  • Network traffic is heavily affected

This is where autoarchive becomes useful.

autoarchive example
autoarchive +01:00 # archive one hour after complete
autoarchive 01:00  # archive at 1 am in morning after complete
autoarchive 10     # archive 10 days after complete
autoarchive 0      # archive immediately after complete, can take up to a minute

autoarchive will write a portion of the definition to disk and autorestore can restore from disk on re-queue/begin

  • Archives suite or family nodes *IF* they have child nodes(otherwise does nothing).
  • Saves the suite/family nodes to disk, and then removes the in memory child nodes from the definition.
  •  It improves time taken to checkpoint and reduces network bandwidth
  •  If the node is re-queued or begun, the child nodes are automatically restored
  • The nodes are saved to ECF_HOME/<host>.<port>.ECF_NAME.check, where '/' has been replaced with ':' in ECF_NAME
  • Care must be taken if you have trigger reference to the archived nodes

Autorestore can also be done automatically, but is is only applied when a node completes.

Use  ecflow_client --archive to archive manually

  • ecflow_client --archive=/s1                       # archive suite s1
  • ecflow_client --archive=/s1/f1 /s2            # archive family /s1/f1 and suite /s2
  • ecflow_client --archive=force /s1 /s2      # archive suites /s1,/s2 even if they have active tasks

To restore archived nodes manually use : 

  • ecflow_client --restore=/s1/f1     # restore family /s1/f1
  • ecflow_client  --restore=/s1 /s2  # restore suites /s1 and /s2

Text

Let us modify the suite definition file again

# Definition of the suite test.
suite test
 edit ECF_INCLUDE "$HOME/course"
 edit ECF_HOME    "$HOME/course"
 edit SLEEP 20
 family lf1
     autoarchive 0
     task t1 ;  task t2 ; task t3 ; task t4; task t5 ; task t6; task t7; task t8 ; task t9
 endfamily
 family lf2
     autoarchive 0
     task t1 ;  task t2 ; task t3 ; task t4; task t5 ; task t6; task t7; task t8 ; task t9
 endfamily
 family lf3
     autoarchive 0
     task t1 ;  task t2 ; task t3 ; task t4; task t5 ; task t6; task t7; task t8 ; task t9
 endfamily
 family restore
    trigger ./lf1 == complete and ./lf2 == complete and ./lf3 == complete
    test t1
       edit SLEEP 60                       # wait for autoarchive                
       autorestore ../lf1 ../lf2 ../lf3.   # restore when t1 completes
 endfamily
endsuite

Python

$HOME/course/test.py
import os
from ecflow import Defs,Suite,Family,Task,Edit,Trigger,Complete,Event,Meter,Time,Day,Date,Label, \
                   RepeatString,RepeatInteger,RepeatDate,InLimit,Limit,Autoarchive,Autorestore
         
def create_family(name) :
    return Family(name, 
                  Autoarchive(0),
                  [ Task('t{}'.format(i)) for i in range(1,10) ] )

def create_family_restore() :
    return Family("restore",
                 Trigger("./lf1 == complete and ./lf2 == complete and ./lf3 == complete"),
                 Task('t1', 
                    Edit(SLEEP=60),
                    Autorestore(["../lf1","../lf2","../lf3"])))
     
print("Creating suite definition") 
home = os.path.join(os.getenv("HOME"),"course")
defs = Defs(
        Suite("test",
            Edit(ECF_INCLUDE=home,ECF_HOME=home,SLEEP=20),
            create_family("lf1"),create_family("lf2"),create_family("lf3"),
            create_family_restore()
        )
      )
print(defs)
 
print("Checking job creation: .ecf -> .job0") 
print(defs.check_job_creation())
 
print("Checking trigger expressions and inlimits")
assert len(defs.check()) == 0,defs.check()
 
print("Saving definition to file 'test.def'")
defs.save_as_defs("test.def")

What to do

  1. Type in the changes
  2. Replace the suite definition
  3. Run the suite, you should see the task late flag set in ecflow_ui
  4. When the job completes, if you re-queue family node f6 or task t1, it will clear the late flag. The late flag can also be cleared manually, select task t1, then with Right Mouse Button , → Special →  Clear late flag


  • No labels