What's a model restart?
A restart allows the model to continue a forecast with a succession of individual runs, each one starting where the other one left off. e.g. running an extended forecast of 100 days with 5 separate runs of 20 days each.A restart will always be exact, that is, the results of the restarted forecast should always match a single run of the model for the same total length (assuming all settings remain the same).
This page describes how to configure and use the model restart facility.
The restarted run must always have the same parallel decomposition (TASKS x THREADS) otherwise the restart will fail.
How to configure restarts
The namelist NAMRES
controls model restarts (see ifs/namelist/namres.h
). The most useful variables in this namelist are: NFRRES
and NRESTS
.
NFRRES
can be used to set a regular restart frequency. NRESTS
can be used to set specific restart times.
NFRRES : frequency of restart file writes (restart file interval). Use this to set a regular restart frequency. e.g. restart files created every 24hrs.
This can be either positive or negative.
If positive it represents a frequency in model timesteps. e.g. NFRRES=144 means a restart file will be created every 144 model timesteps. With a timestep of 10mins this would mean restart files are written every day.
If negative it represents a frequency in hours. e.g. NFRRES=-48, would mean the model creates restart files every 2 days regardless of the model timestep.
NRESTS : list of restart times. Use this to set an irregular restart. e.g. restart file created at 24hrs, 72hrs, 144hrs.
Values for NRESTS can be either positive in which case they are interpreted as the model timesteps at which a restart file is made, or negative in which case they specify hours.
The first value for NRESTS must indicate the number of valid restarts required and be of the same sign as the rest of the entries. See examples below.
Examples
&NAMRES NFRRES=-24, /
In this trivial example, the model will write restart files every 24hrs (as NFRRES is negative). NRESTS does not need to be specified (defaults to zero).
&NAMRES NFRRES=1, NRESTS=-3,-48,-120,-192, /
In this example, the first entry (-3) indicates 3 restart write times are requested. As all values are negative (first value must also be negative), the units are hours. This would produce restart files at 48hrs, 120hrs and 192hrs. Note the restart files do not need to be created at equally spaced intervals if using a list of restart times in NRESTS.
The value of NFRRES is normally set to 1 if using NRESTS. If NFRRES is set > 1 it is multiplied to the restart time. In the above example, if NFRRES was changed to 2, the model would still give 3 restart files but this time at 96hrs, 240hrs and 384hrs.
Files created
Restart files
The output files all begin with the name 'srf'. One file will be created per MPI task. The files are written as unformatted binary (not GRIB) in order to preserve precision.
The file name includes a date as : srfddddhhmm, where dddd is the day number of the run, hh is the hour and mm is the minutes. e.g. srf00000120.0002 would be for day 1 and 20mins into the run written from MPI task 2.
Deletion of old restart files
By default, old restart files are not deleted. This might cause problems with limited file quotas e.g. one restart file at T1279 is approx 500Mb when using 128 MPI tasks, giving a total restart file requirement of approx 64Gb per output instance.
To change this behaviour edit the namelist file fort.4
and change the default value of LDELRES
to TRUE
. e.g.
&NAMRES LDELRES=.true., /
Restart namelist
A file, 'rcf' will be created by the model at each timestep when the restart files are written. This file contains the NAMELIST NAMRCF that informs the model what it needs to know to restart the model.
If the file rcf is present in the same directory as the restart files (srf*), the model will always assume it is doing a restart.
Do not delete this file, otherwise the model will be unable to restart - regardless of whether the actual restart files (those beginning with srf) are present.
Conversely, if you don't want to run a restart but want to repeat the run, rename the rcf file (e.g. rcf.old) or delete it (and the srf files). If you don't the model will attempt to continue the run according to the namelist in the rcf file as this takes precedence over the namelists read from fort.4.
Changing location or name of restart files
The restart file by default is written to the same directory as the model grib output files. The files all begin with the prefix 'srf'.
To change this prefix or the directory the files are written to, use the CIOSPRF
character variable in the namelist NAMIOS
:
&NAMIOS CIOSPRF='./myrestarts/srf', CFRCF='./myrestarts/rcf', /
Note that the location of the 'rcf' file, which contains the restart namelist, is also changed for consistency (recommended).
Continuing the forecast
Namelist changes
There is only 1 change required to the model namelist, fort.4, in order to continue the forecast.
Increase the value of NSTOP
in NAMCT0
to ensure the model runs past the timestep of the last restart. If this is not done the model will start but see that NSTOP
matches the time of the restart it's using and immediately finish.
Note the model will still expect to find the initial files in the experiment directory. It reads these files to get information about the grid.
How to use a specific restart
This can be done by careful editing of the NAMELIST NAMRCF
contained in the 'rcf' file and best explained by an example. In order to restart successfully you must have all the restart files for the model tasks; there is one restart file per task.
Suppose the model has been run for two hours and a restart created every hour (say at T21 with a 10min timestep). The model will write out restart files with names:
srf00000100.0001 and srf00000200.0001 (the format is srf<day:dddddd><hour:hh><min:mm>).
The file 'rcf' will always refer to the latest restart. The top of the file looks like:
&NAMRCF CSTEP=" 12", CTIME="00000200 ",
In this case '12' is the number of timesteps at which the restart was written and 'CTIME' refers to the string following the 'srf' part of the restart filename. In this case, it means 2hrs (12 steps x 10min timestep).
Edit this file to look like:
&NAMRCF CSTEP=" 6", CTIME="00000100 ",
and rerun the model. It will now start from timestep '6' and look for files called 'srf00000100' to restart from.
Note:
- if restarting from an earlier restart, the model will overwrite any existing output and restart files for the subsequent timesteps.
- always keep the 'rcf' and 'srf' files together. The rcf namelist contains important information about the grid decomposition and mass fixes to ensure an exact restart
Model code
The key subroutines for restarts are:
monio.F90
- sets up the internal arrays to determine write times.
wrresf.F90
- calls the I/O subsystem to write out the restart files. If you add more arrays to the model and want them to appear in restart files, change this routine.
reresf.F90
- calls the I/O subsystem to read the restart files. Counterpart to wrresf.F90. Any changes to wrresf must be mirrored by changes to reresf.F90.