This page is under construction!
What is the objective of this page?
The objective:
A good understanding of the MARS efficiency issues is essential especially for users that are interested in downloading large amounts of data.
The aim of this page is to help users to improve their MARS requests performance focusing in CMA reforecast MARS requests.
How the S2S data is organised in general?
First you need to understand how the S2S data is organised in MARS.
In general it is organised, as a huge tree, with the indentation below, showing different levels down that tree:
- centre (ECMWF, NCEP, JMA, ...)
- realtime or reforecast
- type of data (control forecast or perturbed forecast)
- type of level (single level or pressure level or potential temperature)
- dates (2015-01-01 or 2015-01-05 or 2015-01-08, ...)
- time-steps
- members (for perturbed forecast)
- levels (for pl or pt)
- parameters
- levels (for pl or pt)
- members (for perturbed forecast)
- time-steps
- dates (2015-01-01 or 2015-01-05 or 2015-01-08, ...)
- type of level (single level or pressure level or potential temperature)
- type of data (control forecast or perturbed forecast)
- realtime or reforecast
The idea is to have in the same tape file, all time-steps, all members, all parameters for a type of level, a type, a date
What would be the natural way to group requests?
Following the previous paragraph, the natural way to group requests would be:
all parameters, all levels, all members, all time-steps for 1 date.
Note the following:
- 'all' means 'all' that the user wants. It doesn't have to be all parameters.
- If a user is interested only on z500, he may request more dates in one go, since the overall request will not be so big.
What is the best approach to loop over several dates for a CMA request?
The main idea in brief:
for date in date-list
your-request (includes the levels, parameters, steps etc)
An example to request Control forecast, pressure levels from 2010-03-01 to 2010-03-31
The main idea in brief:
for each year from 2010-03-01 to 2010-03-31
your-request
What is the best approach to get all days for several years and months ?
The main idea in brief:
for year in years (firstly iterate over years)
for month in months (secondly iterate over months of the year)
for days in days (thirdly iterate over the days of the month)
your-request (includes the levels, parameters, steps etc)
An example to request Control forecast, sfc, for years 2010-2014 for 2 months (eg April and June)
The main idea in brief:
for each year from 2010 to 2014
for months April, June
your-request
What is the best approach to get all days for several years and months and for several hincasts ?
The main idea in brief:
for year in years (firstly iterate over years)
for month in months (secondly iterate over months of the year)
for days in days (thirdly iterate over the days of the month)
for hindcast in hindcasts
your-request (includes the levels, parameters, steps etc)
An example to request Control forecast, sfc, for years 2010-2014 for 2 months for all hincasts (eg April and June)
The main idea in brief:
for each year from 2010 to 2014
for months April, June
your-request
An example to request Control forecast, pressure levels, for years 2010-2014 for 2 months (eg April and June)
The main idea in brief:
for each year from 2010 to 2014
for months April, June
for each hindcast date
your-request
old below:
---------------
The objective:
cf and pf are stored separately.
For pf is not efficient to do:
HDATE = 20040101/20040106/20040111/20040116/20040121/20040126/20040201/20040206/20040211/20040216/20040221/20040226/20040301/20040306/20040311/20040316/20040321/20040326/20040401/20040406/20040411/20040416/20040421/20040426/20040501/20040506/20040511/20040516/20040521/20040526/20040601/20040606/20040611/20040616/20040621/20040626/20040701/20040706/20040711/20040716/20040721/20040726/20040801/20040806/20040811/20040816/20040821/20040826/20040901/20040906/20040911/20040916/20040921/20040926/20041001/20041006/20041011/20041016/20041021/20041026/20041101/20041106/20041111/20041116/20041121/20041126/20041201/20041206/20041211/20041216/20041221/20041226,
NUMBER = 1,
but it is more efficient to group all members in one go for fewer dates:
HDATE = 20040101/20040106/20040111/20040116/20040121/20040126,
NUMBER = 1/to/50
This implies less positioning and more contiguous reads.
We could look at increasing the 10GB limit. We have more hardware and are in a better position to handle bigger chunks.
For the 3 allowed streams, I would extract different type/levtype in different streams. For instance:
a) 1 stream: pf/sfc
b) 1 stream: pf/pl
c) 1 stream: cf/sfc and when it finishes,cf/pl
and I would put 2 requests for each stream above. Once a request finishes and starts the downloading, the next request will kick in, and it will find in many cases that the tape volume is still in the tape drive, which will save in avg 2 minutes for a tape mount.
An example to request Control forecast, pressure levels, for years 2010-2014 for 2 months (eg April and June)
The main idea in brief:
for each year from 2010 to 2014
for months April, June
for each hindcast date
your-request
old below:
---------------
The objective:
cf and pf are stored separately.
For pf is not efficient to do:
HDATE = 20040101/20040106/20040111/20040116/20040121/20040126/20040201/20040206/20040211/20040216/20040221/20040226/20040301/20040306/20040311/20040316/20040321/20040326/20040401/20040406/20040411/20040416/20040421/20040426/20040501/20040506/20040511/20040516/20040521/20040526/20040601/20040606/20040611/20040616/20040621/20040626/20040701/20040706/20040711/20040716/20040721/20040726/20040801/20040806/20040811/20040816/20040821/20040826/20040901/20040906/20040911/20040916/20040921/20040926/20041001/20041006/20041011/20041016/20041021/20041026/20041101/20041106/20041111/20041116/20041121/20041126/20041201/20041206/20041211/20041216/20041221/20041226,
NUMBER = 1,
but it is more efficient to group all members in one go for fewer dates:
HDATE = 20040101/20040106/20040111/20040116/20040121/20040126,
NUMBER = 1/to/50
This implies less positioning and more contiguous reads.
We could look at increasing the 10GB limit. We have more hardware and are in a better position to handle bigger chunks.
For the 3 allowed streams, I would extract different type/levtype in different streams. For instance:
a) 1 stream: pf/sfc
b) 1 stream: pf/pl
c) 1 stream: cf/sfc and when it finishes,cf/pl
and I would put 2 requests for each stream above. Once a request finishes and starts the downloading, the next request will kick in, and it will find in many cases that the tape volume is still in the tape drive, which will save in avg 2 minutes for a tape mount.