The archiving of UERRA data in MARS should be done by each provider on his own with full support of ECMWF's team.
The ECMW's account is needed to be able to archive data in MARS and access the database online or via web-api interface.
As for any good data archive it is crucial to have working data quality checking in place. Every data provider should double check that their parameters are fully compliant with required and agreed definitions linked in the main parameter page. A special attention before any archiving activities must be paid for example to correct units, fluxes sign conventions etc as specified in the GRIB2 encoding page. The additional checking tools provided by ECMWF and described below cannot discover all types of possible fundamental errors from principle (like incorrect units or reverse sign of fluxes where values range is both positive and negative). |
Here are three steps to follow to archive data in MARS:
It is important to archive UERRA data effectively following MARS design specific for this type of dataset. The possible suboptimal archiving (e.g. one parameter by one in each archiving request instead of archiving them all together) would cause various MARS performance issues which could be avoided.
Using archiving scripts which will be provided (see below) each provider will archive always all data from one day at once for given origin, stream and levtype (i.e. all parameters, times, steps and levels together). If any MARS performance issue occurs during test and production archiving with that approach it might be needed to change it.
As a consequence of MARS design for UERRA datasets up to one full month model data could be theoretically archived at once (again for given origin, stream and levtype as usually).
Following tools will be available for providers to allow smooth data processing and archiving in MARS:
For models which produce the input data for MARS still in GRIB1 the conversion scripts to the required GRIB2 format will be provided. They are based on GRIB-API grib_filter tool which requires the GRIB-API version fully supporting all UERRA parameters.
Examples of grib_filter rules for GRIB1 HARMONIE parameters:
An example how to archive full data sample produced by given model will be provided to each partner. Basically only the date should change in the script otherwise the content of archive is expected not to be changing for the whole archiving period (i.e. homogeneous data without gaps or any other variation is expected).
Example of MARS archiving request for full surface, soil and model level data from HARMONIE deterministic reanalysis (origin=eswi, stream=oper, type=an):
# $date must be parsed with appropriate date before running MARS request below archive,class=ur,database=marsscratch, stream=oper,type=an,levtype=sfc,expver=TEST,date=$date,origin=eswi,time=0/6/12/18, param=lcc/lsm/msl/skt/sp/orog/mcc/rsn/sd/hcc/10wdir/2t/al/10si/sr/tcw/2r/tcc, level=0/2/10, step=0, number=off, expect=72, source=an.$date.sfc.grib2 archive,class=ur,database=marsscratch, stream=oper,type=an,levtype=sol,expver=TEST,date=$date,origin=eswi,time=0/6/12/18, param=vsw/sot, level=1/to/3, step=0, number=off, expect=24, source=an.$date.sol.grib2 archive,class=ur,database=marsscratch, stream=oper,type=an,levtype=ml,expver=TEST,date=$date,origin=eswi,time=0/6/12/18, param=u/t/v/q, level=1/to/65, step=0, number=off, expect=1040, source=an.$date.ml.grib2 |
There are 2 types of checking tools which must be run before and after archiving to try to minimize possible errors in the archive
UERRA-GRIB2 checking tool (tigge_check)
This tool should be run on all input files already in GRIB2 UERRA compliant format before archiving them. It checks all encoding details so that only fully compliant UERRA files following exactly required definitions would pass. It can check also allowed value ranges for each parameter if used with the option -v.
The tigge_check can check only the encoded GRIB2 keys to have them compliant with expected UERRA definitions specified generally for all UERRA datasets and each particular parameter. Some types of possible fundamental errors cannot be nevertheless revealed by the the tool from principle (e.g. incorrect units which are never encoded in GRIB2 files or a possibly wrong (reverse) sign of fluxes where values range is both positive and negative. The data min/max value checking provided by tigge_check (-v) must be considered only as a helping option to have some chance to reveal sometimes real data issues. In some cases e.g. for radiation fluxes the allowed limits must be very flexible as for example direct solar radiation in 1-hourly outputs changes from 0 to 1+e9 depending on forecast step. The tuning of the allowed limits for numerous parameters from different models on varying domains as in UERRA case is tricky and it is an ongoing process. On top of it there are sometimes clearly wrong values coming from some models during specific weather situations (e.g. grid point storms with wind speed exceeding 900 m/s). After an agreement with data provider the tool can allow such unrealistic values for given model as "normal". It should be understood that such data will have user impact and might be still considered as poor output data quality checking. |
MARS archive content checking script
This kind of script is required to be run after each archiving to check that only expected fields were archived successfully (always the same parameters without any change). The checking script below is based on MARS list functionality.
Example of MARS list request checking the content of MARS for COSMO data from 1993-12-31.
list, class = ur, stream = oper, type = all, date = 19931231, time = all, levtype = all, origin = eswi, expver = test, hide = file/length/offset/id/missing/cost/branch/date/hdate/month/year, target = tree.out, database = marsscratch, output = tree list, class = ur, hide = file/length/offset/id/missing/cost/branch/param/levtype/levelist/expver/type/class/stream/origin/date/time/step/number/hdate/month/year, target = cost.out, output = table |
The tree.out content should be the same for all archived dates what can be easily checked e.g. with unix diff tool against the reference MARS list output created from the very 1st properly archived day for given model.
class=ur,expver=test,levtype=hl,origin=eswi,stream=oper,type=an,time=00:00:00/06:00:00/12:00:00/18:00:00,param=10/130/157/3031/54,levelist=100/15/150/200/250/30/300/400/50/500/75 class=ur,expver=test,levtype=ml,origin=eswi,stream=oper,type=an,param=130/131/132/133,time=00:00:00/06:00:00/12:00:00/18:00:00,levelist=1/10/11/12/13/14/15/16/17/18/19/2/20/21/22/23/24/25/26/27/28/29/3/30/31/32/33/34/35/36/37/38/39/4/40/41/42/43/44/45/46/47/48/49/5/50/51/52/53/54/55/56/57/58/59/6/60/61/62/63/64/65/7/8/9 class=ur,expver=test,levtype=pl,origin=eswi,stream=oper,type=an,time=00:00:00/06:00:00/12:00:00/18:00:00,param=130/131/132/156/157,levelist=10/100/1000/150/20/200/250/30/300/400/50/500/600/70/700/750/800/825/850/875/900/925/950/975 class=ur,expver=test,levtype=sfc,origin=eswi,stream=oper,type=an,time=00:00:00/06:00:00/12:00:00/18:00:00,param=134/136/151/167/172/173/207/228002/228141/228164/235/260242/260260/260509/3073/3074/3075/33 class=ur,expver=test,levtype=sol,origin=eswi,stream=oper,type=an,param=260199/260360,levelist=1/2/3,time=00:00:00/06:00:00/12:00:00/18:00:00 class=ur,expver=test,levtype=hl,origin=eswi,stream=oper,type=fc,param=10/130/157/246/247/3031/54,levelist=100/15/150/200/250/30/300/400/50/500/75 time=00:00:00/12:00:00,step=1/12/15/18/2/21/24/27/3/30/4/5/6/9 time=06:00:00/18:00:00,step=1/2/3/4/5/6 class=ur,expver=test,levtype=pl,origin=eswi,stream=oper,type=fc,param=130/131/132/156/157/246/247/260257,levelist=10/100/1000/150/20/200/250/30/300/400/50/500/600/70/700/750/800/825/850/875/900/925/950/975 time=00:00:00/12:00:00,step=1/12/15/18/2/21/24/27/3/30/4/5/6/9 time=06:00:00/18:00:00,step=1/2/3/4/5/6 class=ur,expver=test,levtype=sfc,origin=eswi,stream=oper,type=fc time=00:00:00/06:00:00/12:00:00/18:00:00,step=1/2/3/4/5/6,param=134/136/146/147/151/167/169/173/174008/175/176/177/201/202/207/228141/228144/228164/228228/235/260242/260259/260260/260264/260430/260509/3073/3074/3075/33/49 time=00:00:00/12:00:00,step=12/15/18/21/24/27/30/9,param=134/136/151/167/169/175/176/177/201/202/207/228144/228164/228228/235/260242/260259/260260/260264/3073/3074/3075/49 class=ur,expver=test,levtype=sol,origin=eswi,stream=oper,type=fc,param=260199/260360,levelist=1/2/3,time=00:00:00/06:00:00/12:00:00/18:00:00,step=1/2/3/4/5/6 |
Grand Total: ============ Entries : 13,852 Total : 8,931,971,037 (8.31855 Gbytes) > archived=$(cat cost.out| grep ^Entries|sed s/,//g| sed 's/.*: //') > echo $archived > 13852 |
The number of fields archived must be always the same. That number can be easily parsed from the above output for example using unix grep:
> archived=$(cat cost.out| grep ^Entries|sed s/,//g| sed 's/.*: //') > echo $archived > 13852 |