Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Horizontal Navigation Bar


Button Group

Button Hyperlink
titlePrevious
typestandard
urlhttps://confluence.ecmwf.int/display/ECFLOW/Limit-families
Button Hyperlink
titleUp
typestandard
urlhttps://softwareconfluence.ecmwf.int/wiki/display/ECFLOW/Advanced+Topics
Button Hyperlink
titleNext
typestandard
urlhttps://confluence.ecmwf.int/display/ECFLOW/Late+Attribute



In the real world, suites can have several thousands thousand tasks. These tasks are not required all the time.

Having a server with hundreds of thousands of an extremely large number of tasks can cause performance issues.

  • The server writes to the checkpoint file periodically. This disk i/o can interfere with job scheduling , when dealing with an excessively large number of tasks.
  • Clients like GUI(ecflow_ui), are also adversely affected by the memory requirements., and slow interactive experience 
  • Network traffic is heavily affected

...

autoarchive will write a portion of the definition to disk and autorestore can restore from disk on re-queue/begin.

  • Archives suite or family nodes *IF* they have child nodes(otherwise does nothing).
  • Saves the suite/family nodes to disk, and then removes the in-memory child nodes from the definition.
  •  It improves time taken to checkpoint and reduces network bandwidth
  •  If the archived node is re-queued or begun, the child nodes are automatically restored
  • The nodes are saved to ECF_HOME/<host>.<port>.ECF_NAME.check, where '/' has been replaced with ':' in ECF_NAME
  • Care must be taken if you have trigger reference to the archived nodes

...

Use  ecflow_client --archive to archive manually

  • ecflow_client --archive=/s1                       # archive suite s1
  • ecflow_client --archive=/s1/f1 /s2            # archive family /s1/f1 and suite /s2
  • ecflow_client --archive=force /s1 /s2      # archive suites /s1,/s2 even if they have active tasks

Autorestore can also be done automatically, but is only applied when a node completes.

To restore archived nodes manually use : 

  • ecflow_client --restore=/s1/f1     # restore family /s1/f1
  • ecflow_client  --restore=/s1 /s2  # restore suites /s1 and /s2

Text

Let us modify the suite definition file again. To avoid waiting this exercise will archive immediately.

Code Block
languagebash
# Definition of the suite test.
suite test
 edit ECF_INCLUDE "$HOME/course"
 edit ECF_HOME    "$HOME/course"
 edit SLEEP 20
 family lf1
     autoarchive 0
     task t1 ;  task t2 ; task t3 ; task t4; task t5 ; task t6; task t7; task t8 ; task t9
 endfamily
 family lf2
     autoarchive 0
     task t1 ;  task t2 ; task t3 ; task t4; task t5 ; task t6; task t7; task t8 ; task t9
 endfamily
 family lf3
     autoarchive 0
     task t1 ;  task t2 ; task t3 ; task t4; task t5 ; task t6; task t7; task t8 ; task t9
 endfamily
 family restore
    trigger ./lf1 == completelf1<flag>archived and ./lf2lf2<flag>archived == complete and ./lf3lf3<flag>archived == complete
    testtask t1
       edit SLEEP 60                       # wait for autoarchive                
       autorestore ../lf1 ../lf2 ../lf3.   # restore when t1 completes
 endfamily
endsuite


Python

Code Block
languagepy
title$HOME/course/test.py
import os
from ecflow import Defs,Suite,Family,Task,Edit,Trigger,Complete,Event,Meter,Time,Day,Date,Label, \
                   RepeatString,RepeatInteger,RepeatDate,InLimit,Limit,Autoarchive,Autorestore
         
def create_family(name) :
    return Family(name, 
                  Autoarchive(0),
                  [ Task('t{}'.format(i)) for i in range(1,10) ] )

def create_family_restore() :
    return Family("restore",
                 Trigger("./lf1lf1<flag>archived == complete and ./lf2 == completelf2<flag>archived and ./lf3 == completelf3<flag>archived"),
                 Task('t1', 
                    Edit(SLEEP=60),
                    Autorestore(["../lf1","../lf2","../lf3"])))
     
print("Creating suite definition") 
home = os.path.join(os.getenv("HOME"),"course")
defs = Defs(
        Suite("test",
            Edit(ECF_INCLUDE=home,ECF_HOME=home,SLEEP=20),
            create_family("lf1"),create_family("lf2"),create_family("lf3"),
            create_family_restore()
        )
      )
print(defs)
 
print("Checking job creation: .ecf -> .job0") 
print(defs.check_job_creation())
 
print("Checking trigger expressions and inlimits")
assert len(defs.check()) == 0,defs.check()
 
print("Saving definition to file 'test.def'")
defs.save_as_defs("test.def")

What to do

  1. Type in the changes, cp -r f5 lf1; cp -r f5 lf2; cp -r f5 lf3 
  2. Replace the suite definition
  3. Run the suite, you should see nodes getting archived, then restored in ecflow_ui
  4. Experiment with archive and restore in ecflow_ui.
  5. Experiment with archive and restore from the command line.


Note

The Autoarchive(0) can take up to one minute to take effect. The server has a 1-minute resolution.


Button Group

Button Hyperlink
titlePrevious
typestandard
urlhttps://confluence.ecmwf.int/display/ECFLOW/Limit-families
Button Hyperlink
titleUp
typestandard
urlhttps://softwareconfluence.ecmwf.int/wiki/display/ECFLOW/Advanced+Topics
Button Hyperlink
titleNext
typestandard
urlhttps://confluence.ecmwf.int/display/ECFLOW/Late+Attribute

...