A zombie is a running job that fails authentication when communicating with the ecflow_server
More rarer causes might be:
The default behaviour is to block the job.
ecflowview provides a dialog which lists all the zombies and the actions that can be taken. These include:
Terminate:
Fob:
Allow the job to continue. The child command completes and hence no longer blocks the job.
Delete:
Rescue:
Kill:
Of the four action above, only Rescue will allow child command to change the state of the node tree. |
<div class="section" id="zombie"> <span id="index-0"></span><span id="id1"></span> <p>A <a class="reference internal" href="/wiki/display/ECFLOW/Glossary#term-zombie"><em class="xref std std-term">zombie</em></a> is a running job that fails authentication when communicating with the <a class="reference internal" href="/wiki/display/ECFLOW/Glossary#term-ecflow-server"><em class="xref std std-term">ecflow_server</em></a></p> <div class="section" id="how-are-zombies-created"> <h2>How are zombies created ?<a class="headerlink" href="#how-are-zombies-created" title="Permalink to this headline">¶</a></h2> <div class="line-block"> <div class="line">There are wide variety of reasons why a <a class="reference internal" href="/wiki/display/ECFLOW/Glossary#term-zombie"><em class="xref std std-term">zombie</em></a> is created.</div> <div class="line">The most common causes are due to user action:</div> </div> <ul class="simple"> <li>The <a class="reference internal" href="/wiki/display/ECFLOW/Glossary#term-node"><em class="xref std std-term">node</em></a> tree is deleted, replaced or reloaded whilst jobs are running</li> <li>A <a class="reference internal" href="/wiki/display/ECFLOW/Glossary#term-task"><em class="xref std std-term">task</em></a> is rerun, whilst in a <a class="reference internal" href="/wiki/display/ECFLOW/Glossary#term-submitted"><em class="xref std std-term">submitted</em></a> or <a class="reference internal" href="/wiki/display/ECFLOW/Glossary#term-active"><em class="xref std std-term">active</em></a> state</li> <li>A job is forced to new state, i.e <a class="reference internal" href="/wiki/display/ECFLOW/Glossary#term-complete"><em class="xref std std-term">complete</em></a></li> </ul> <p>More rarer causes might be:</p> <ul class="simple"> <li><a class="reference internal" href="/wiki/display/ECFLOW/Glossary#term-ecf-script"><em class="xref std std-term">ecf script</em></a> errors, where we have multiple calls to init and complete <a class="reference internal" href="/wiki/display/ECFLOW/Glossary#term-child-command"><em class="xref std std-term">child command</em></a> s</li> <li>The <a class="reference internal" href="/wiki/display/ECFLOW/Glossary#term-child-command"><em class="xref std std-term">child command</em></a> s in the <a class="reference internal" href="/wiki/display/ECFLOW/Glossary#term-ecf-script"><em class="xref std std-term">ecf script</em></a> are placed in the background. In this case order in which the <a class="reference internal" href="/wiki/display/ECFLOW/Glossary#term-child-command"><em class="xref std std-term">child command</em></a> contact the server, may be indeterminate.</li> <li>Load leveler submitting a job twice</li> <li>Server crash and recovered <a class="reference internal" href="/wiki/display/ECFLOW/Glossary#term-check-point"><em class="xref std std-term">check point</em></a> file is out of date</li> <li>Machine crash</li> </ul> </div> <div class="section" id="how-can-zombie-s-be-handled"> <h2>How can zombie’s be handled ?<a class="headerlink" href="#how-can-zombie-s-be-handled" title="Permalink to this headline">¶</a></h2> <p>The default behaviour is to <strong>block</strong> the job.</p> <div class="line-block"> <div class="line">The <a class="reference internal" href="/wiki/display/ECFLOW/Glossary#term-child-command"><em class="xref std std-term">child command</em></a> continues attempting to contact the <a class="reference internal" href="/wiki/display/ECFLOW/Glossary#term-ecflow-server"><em class="xref std std-term">ecflow_server</em></a>.</div> <div class="line">This is done for period of 24 hours. (This period is configurable see ECF_TIMEOUT on <a class="reference internal" href="/wiki/display/ECFLOW/Glossary#term-ecflow-client"><em class="xref std std-term">ecflow_client</em></a>).</div> </div> <div class="line-block"> <div class="line">The jobs can also configured, so that if the server denies the communication, then</div> <div class="line">the <a class="reference internal" href="/wiki/display/ECFLOW/Glossary#term-child-command"><em class="xref std std-term">child command</em></a> can be set to fail immediately. (See ECF_DENIED on <a class="reference internal" href="/wiki/display/ECFLOW/Glossary#term-ecflow-client"><em class="xref std std-term">ecflow_client</em></a>)</div> </div> <p><a class="reference internal" href="/wiki/display/ECFLOW/Glossary#term-ecflowview"><em class="xref std std-term">ecflowview</em></a> provides a dialog which lists all the zombies and the actions that can be taken. These include:</p> <ul> <li><p class="first">Terminate:</p> <div class="line-block"> <div class="line">The <a class="reference internal" href="/wiki/display/ECFLOW/Glossary#term-child-command"><em class="xref std std-term">child command</em></a> is asked to <strong>fail</strong>.</div> <div class="line">Depending on your scripts,this may cause the abort <a class="reference internal" href="/wiki/display/ECFLOW/Glossary#term-child-command"><em class="xref std std-term">child command</em></a> to be called.</div> <div class="line">Which again will be flagged as a <a class="reference internal" href="/wiki/display/ECFLOW/Glossary#term-zombie"><em class="xref std std-term">zombie</em></a>.</div> </div> </li> <li><p class="first">Fob:</p> <p>Allow the job to continue. The <a class="reference internal" href="/wiki/display/ECFLOW/Glossary#term-child-command"><em class="xref std std-term">child command</em></a> completes and hence no longer blocks the job.</p> <div class="line-block"> <div class="line">Great care should be taken when this action is chosen.</div> <div class="line">If we have two jobs running, they may cause data corruption.</div> <div class="line">Even when we have a single job, issues can arise.</div> <div class="line">i.e if the associated command was an event <a class="reference internal" href="/wiki/display/ECFLOW/Glossary#term-child-command"><em class="xref std std-term">child command</em></a>, then the</div> <div class="line"><a class="reference internal" href="/wiki/display/ECFLOW/Glossary#term-event"><em class="xref std std-term">event</em></a> would not be set. If this <a class="reference internal" href="/wiki/display/ECFLOW/Glossary#term-event"><em class="xref std std-term">event</em></a> was used in a <a class="reference internal" href="/wiki/display/ECFLOW/Glossary#term-trigger"><em class="xref std std-term">trigger</em></a> expression,</div> <div class="line">it would never evaluate.</div> </div> </li> <li><p class="first">Delete:</p> <div class="line-block"> <div class="line">Remove the <a class="reference internal" href="/wiki/display/ECFLOW/Glossary#term-zombie"><em class="xref std std-term">zombie</em></a> from the server. The job will continue blocking, hence</div> <div class="line">when the <a class="reference internal" href="/wiki/display/ECFLOW/Glossary#term-child-command"><em class="xref std std-term">child command</em></a> next contacts the <a class="reference internal" href="/wiki/display/ECFLOW/Glossary#term-ecflow-server"><em class="xref std std-term">ecflow_server</em></a>, the <a class="reference internal" href="/wiki/display/ECFLOW/Glossary#term-zombie"><em class="xref std std-term">zombie</em></a> will re-appear.</div> <div class="line">If the job is killed manually, then this option can be used.</div> </div> </li> <li><p class="first">Rescue:</p> <div class="line-block"> <div class="line"><strong>Adopt</strong> the zombie and update the node tree.</div> <div class="line">The ECF_PASS on the zombie is copied over to the <a class="reference internal" href="/wiki/display/ECFLOW/Glossary#term-task"><em class="xref std std-term">task</em></a>, so that the next</div> <div class="line"><a class="reference internal" href="/wiki/display/ECFLOW/Glossary#term-child-command"><em class="xref std std-term">child command</em></a> will continue as normal.</div> </div> </li> <li><p class="first">Kill:</p> <div class="line-block"> <div class="line">Applies the kill command (ECF_KILL_CMD ) using the process id stored on the <a class="reference internal" href="/wiki/display/ECFLOW/Glossary#term-zombie"><em class="xref std std-term">zombie</em></a>.</div> <div class="line">If the script has correct signal trapping, this should end up calling abort.</div> <div class="line">Note: path zombies will need to be killed manually.</div> </div> </li> </ul> <div class="admonition warning"> <p class="first admonition-title">Warning</p> <p class="last">Of the four action above, only Rescue will allow <a class="reference internal" href="/wiki/display/ECFLOW/Glossary#term-child-command"><em class="xref std std-term">child command</em></a> to change the state of the node tree.</p> </div> <p><strong>What to do:</strong></p> <ol class="arabic simple"> <li>Create a <a class="reference internal" href="/wiki/display/ECFLOW/Glossary#term-zombie"><em class="xref std std-term">zombie</em></a> by starting a <a class="reference internal" href="/wiki/display/ECFLOW/Glossary#term-task"><em class="xref std std-term">task</em></a>, and setting it to <a class="reference internal" href="/wiki/display/ECFLOW/Glossary#term-complete"><em class="xref std std-term">complete</em></a> immediately via <a class="reference internal" href="/wiki/display/ECFLOW/Glossary#term-ecflowview"><em class="xref std std-term">ecflowview</em></a></li> <li>Inspect the log file, it will show you how the zombie has arisen.</li> <li>Inspect the zombie dialog in <a class="reference internal" href="/wiki/display/ECFLOW/Glossary#term-ecflowview"><em class="xref std std-term">ecflowview</em></a> (right mouse button selection on the host node)</li> <li>Experiment with the different actions on the zombie</li> <li>Select host node and invoke the <strong>option...</strong> menu selection. Select the Zombies button. This enables zombie notification via window pop up</li> </ol> </div> </div> |