The following sections give some information relevant to retrieving data efficiently.

Request scheduling

The main archive implements a queuing system necessary to satisfy all type of requests: time-critical operational tasks, interactive users retrieving a few fields for visualisation, batch users retrieving Gigabytes of data, etc.

Resources available on the main archive impose some limitations on the number of requests processed simultaneously: these resources are basically the number of Unix processes MARS can start and the number of drives available to read data from tapes. The cost of a request is evaluated on arrival in terms of number of fields requested and their location, either on disk or on tape. The more tapes a request has to access, the more likely that it will be queued. It is possible that requests will be queued if the MARS server reaches certain limit of concurrent requests.

The queuing system handles priorities: the request with the highest priority will be chosen next for execution. This priority is computed from the age of a request since it entered the queue. An artificial mechanism of ageing requests allows to prioritise operational work or interactive users. You can view the MARS queue online.

Data collocation

A request is scheduled more efficiently if it minimises the number of tapes it has to access in order to be satisfied. To create efficient MARS requests you must know how the data is organised.

The MARS system is organised in a tree fashion based on MARS keywords. At the bottom of the tree there are what we called the hypercubes or archive objects, that are aimed to be in a single file on tape. There is a compromise between the amount of data these cubes hold and the relationship between the data they contain. By studying data access patterns, we have come up with the following rules:

Different projects have different needs, and therefore these rules may vary. You are encouraged to visit the MARS Catalogue to inspect how much related data a hypercube contains. The description above is the rule, but resources available at certain times might cause to break it, e.g. 1 month of Analysis may be in 2 files because at that particular month the MARS system was short of disk space and data had to be written to tape earlier than desired.

Post-processing

Any requested data manipulation or post-processing is carried out by the MARS client, except in the case of the Web API client where data is first processed at ECMWF prior to its transmission over the network. The post-processing is carried out by MIR, a library of routines for Meteorological Interpolation and Regridding (see ECMWF newsletter no.152 for a description).  For details on the various operations please see also the relevant post-processing keyword descriptions. 

Sub-area extraction

Most of the data at ECMWF is global. A sub-area can be created using the area keyword by defining its latitude and longitude boundaries: North/West/South/East.

Global grids have a grid mesh implicitly based on (0 W, 0 N); the grids do not wrap around at (0 E, 360 W). For example, a 3x3 degree grid has latitudes at 90 N, 87 N,..., 87 S, 90 S and longitudes at 0 E, 3 E,..., 354 W, 357 W. Sub-areas are created from a global grid using the same implicit origin and spacing; if necessary, the sub-area boundaries are adjusted to fit on the grid mesh by enlarging it.

Polar latitudes are made up of repeated grid-point values; a wind V-component has an adjustment for longitude. Gaussian grids do not have a latitude at either Pole or at the Equator.

Sub-area extraction is possible for regular Gaussian and latitude longitude fields (including wave). They cannot be applied if the resulting field is in spherical harmonics or reduced (quasi-regular) representation.

Conversion of horizontal representations

The following conversions (interpolations) of horizontal representations are available:

Derived fields

Some fields are not archived but are derived from others. This is the case for wind components (U and V), which are derived from vorticity and divergence.

Accuracy

GRIB coded fields have a specified number of bits per packed value which can be changed with keyword accuracy. This might be useful when trying to retrieve fields from MARS identical to those you get from dissemination.

There is no gain in precision by using a value higher than the number of bits originally used for archiving.

Truncation before interpolation

By default spectral fields are automatically truncated before interpolation to grid fields to reduce data volumes and spurious aliased values. The truncation can be controlled using the keywords truncation and intgrid. Users wanting to post-process at the full archived resolution can specify truncation = none in the request.

Note that high resolutions might need more resources to carry out post-processing.

Retrieval efficiency

To retrieve data efficiently please follow the hints below:

Please also follow the Guidelines to write efficient MARS requests.

Troubleshooting

Syntactic errors

If a comma is missing or an unknown MARS keyword is specified, MARS will stop processing your request and report a syntax error.

Semantic errors

MARS does not perform any semantic check on the request. A MARS request can be syntactically correct but may not describe any archived data. Common problems are:

System limits

Some system resource limits can be controlled by the user (consult the man pages of your shell if running in interactive mode or documentation about the software used for batch mode).

MARS keyword = all

By using the value all on certain keywords, MARS is asked to retrieve ALL data available which matches the rest of the request. In some cases, all data available is not all the data you expect. The use of all is best avoided.

It should be noted that retrievals from the Fields Data Base do not accept all as a valid value.

Failed HPSS call

Messages starting with Failed HPSS call: n = ::hpss_Read(fd_,buffer,... are HPSS errors. The actual message text can vary but they usually mean that the data is unavailable from tapes for some time. These errors are passed to the client and make your request fail. As a rule, after checking that there is no ongoing system session, re-run your request before reporting the error. If the failure is consistent, please inform ECMWF's Service Desk.

Assertion failed

Error messages starting with Assertion failed: followed by offset[i] > offset[i-1] or handle[n] != 0 hint at an attempt to retrieve the same field more than once, e.g. by specifying param=z/t/z or step=24/48/24. The actual message varies depending on how exactly the data is retrieved. The general advice in such cases is to check the request for multiple occurrences of keyword values.

Debugging

Some failures are not evident to explain from the MARS report. In such cases, you can turn on debug messages by setting environment variable MARS_DEBUG to any value different from 0.

Please, note that running in debug mode can generate large amount of output, therefore you are advised to re-direct MARS output to a file.

Default values

MARS will guess unspecified keywords in a request and will assign default values to them. Some default values are not valid in the context of all possible retrieve requests. Avoid their use as far as possible.

Set of working requests

It is advisable to have a set of working requests and to re-use or modify them as needed.

Report

At the end of each request, MARS will print a report on all the aspects of its execution:

This report gives users an idea about the resources needed for a request, and can be used for future reference when retrieving similar datasets.

MARS messages

MARS has the following levels for messages it prints on execution:

INFOrequest being processed and a report on the execution at the end
DEBUGadditional information if debugging is switched on
WARNINGany unusual aspect of the execution
ERRORsystem or data errors which do not stop MARS execution
FATALterminates the execution of MARS

Field order

Do not expect retrieved fields to be returned in any specific order. Depending on the MARS configuration, fields can be retrieved differently. Therefore, user programs processing the target file must take this into account.

It is possible to obtain always the same sequence, i.e. order of fields in the request, by using fieldsets.

Avoid sub-archives

Avoid creating sub-archives in file based storage systems, e.g. ECMWF's ECFS. This system does not provide data collocation, and a future access to the same data will usually be slower than retrieving it again from MARS. Unless data has been post-processed, do not store Gigabytes of MARS data in ECFS.