This page is under construction!
The objective:
The objective:
In this area we are only focusing on MARS efficiency issues ie to investigate and present what is the most efficient way to loop over several requests for retrieving CMA reforecast data.
How the S2S data is organised in general:
The data is organised as a huge tree, with the indentation showing different levels down that tree:
- centre (ECMWF, NCEP, JMA, ...)
- realtime or reforecast
- type of data (control forecast or perturbed forecast)
- type of level (single level or pressure level or potential temperature)
- dates (2015-01-01 or 2015-01-05 or 2015-01-08, ...)
- time-steps
- members (for perturbed forecast)
- levels (for pl or pt)
- parameters
- levels (for pl or pt)
- members (for perturbed forecast)
- time-steps
- dates (2015-01-01 or 2015-01-05 or 2015-01-08, ...)
- type of level (single level or pressure level or potential temperature)
- type of data (control forecast or perturbed forecast)
- realtime or reforecast
with aiming to be in the same tape file, all time-steps, all members, all parameters for a type of level, a type, a date
What would be the natural way to group requests:
The natural way to group requests would be:
all parameters, all levels, all members, all time-steps for 1 date.
Note the following:
- 'all' means 'all' that the user wants. It doesn't have to be all parameters.
- If a user is interested only on z500, he may request more dates in one go, since the overall request will not be so big.
What is the most efficient way to loop over several CMA requests?
The main idea in brief:
Taking under consideration what has been presented above if you need to loop in a MARS request, follow the hierarchy below
- date (year and month loop)
- hindcast date
- number (EPS only)
- level
- parameter (inner loop)
- level
- number (EPS only)
- hindcast date
A more practical example on how to request Control forecast, pressure levels, for years 2010-2014 for month April and June
The main idea in brief:
for each year from 2010 to 2014
for months April, June
for each hindcast date
API request
The main idea in brief:
for each year from 2010 to 2014
for months April, June
for each hindcast date
for each level
for each parameter
old below:
---------------
The main idea in brief:
- 4 categories of requests:
- control plevels
- control sfc
- ensemble plevels
- ensemble sfc
- For each category above:
- For each year from 1994 to 2014
- For each month from January to December
- retrieve hindcast dates 1-15 using requests according to data availability*
- API request 1
- API request 2
- API request 3
- retrieve hindcast dates 15-end of month using requests according to data availability
- API request 1
- API request 2
- API request 3
- retrieve hindcast dates 1-15 using requests according to data availability*
- For each month from January to December
- For each year from 1994 to 2014
- for instance
- for plevels different parameters are available on different levels so Ben has created 3 pl requests
- for sfc different parameters are available for different steps so he has created 3 sfc requests
The objective:
cf and pf are stored separately.
For pf is not efficient to do:
HDATE = 20040101/20040106/20040111/20040116/20040121/20040126/20040201/20040206/20040211/20040216/20040221/20040226/20040301/20040306/20040311/20040316/20040321/20040326/20040401/20040406/20040411/20040416/20040421/20040426/20040501/20040506/20040511/20040516/20040521/20040526/20040601/20040606/20040611/20040616/20040621/20040626/20040701/20040706/20040711/20040716/20040721/20040726/20040801/20040806/20040811/20040816/20040821/20040826/20040901/20040906/20040911/20040916/20040921/20040926/20041001/20041006/20041011/20041016/20041021/20041026/20041101/20041106/20041111/20041116/20041121/20041126/20041201/20041206/20041211/20041216/20041221/20041226,
NUMBER = 1,
but it is more efficient to group all members in one go for fewer dates:
HDATE = 20040101/20040106/20040111/20040116/20040121/20040126,
NUMBER = 1/to/50
This implies less positioning and more contiguous reads.
We could look at increasing the 10GB limit. We have more hardware and are in a better position to handle bigger chunks.
For the 3 allowed streams, I would extract different type/levtype in different streams. For instance:
a) 1 stream: pf/sfc
b) 1 stream: pf/pl
c) 1 stream: cf/sfc and when it finishes,cf/pl
and I would put 2 requests for each stream above. Once a request finishes and starts the downloading, the next request will kick in, and it will find in many cases that the tape volume is still in the tape drive, which will save in avg 2 minutes for a tape mount.