Introduction
(include an explanation about previous conventions.... SPECS, CF, ACDD)
Encoding Guide
Global attributes
The following properties are intended to provide information about where the data came from and what has been done to it. This information is mainly for the benefit of human readers and data discovery mechanisms. The attribute values are all character strings. When an attribute appears both globally and as a variable attribute, the variable’s version has precedence.
Attribute Name | Value | Examples | Comment |
---|---|---|---|
Conventions | CF convention string [Other convention] :... | "CF-1.6" "CF-1.6 C3S-0.1" | Multiple conventions may be included (separated by blank spaces) |
title | A controlled vocabulary will be provided CF: Free text ACDD (highly recommended) | "IPSL-CM5A-LR model output prepared for CMIP5 RCP4.5" | A short phrase or sentence describing the dataset. In many discovery systems, the title will be displayed in the results list from a search, and therefore should be human readable and reasonable to display in a list of such names |
references | URIs (such as a URL or DOI) for papers or other references. A valid doi is recommended CF: Free text | "doi:10.5194/gmd-8-1509-2015" | Published or web-based references that describe the data or methods used to produce it. |
source | A methodology to build this attribute will be provided |
| The method of production of the original data. If it was model-generated, source should name the model and its version, as specifically as it could be useful |
institution | A controlled vocabulary will be provided CF: Free text
| "Met Office" | Specifies where the original data was produced. The name of the institution principally responsible for originating this data. |
contact | Copernicus User Support URI should be used CF: Free text | "http://copernicus-support.ecmwf.int" |
|
project | "C3S Seasonal Forecast" should be used CF: Free text
| "C3S Seasonal Forecast" |
|
creation_date | SPECS: YYYY-MM-DDThh:mm:ss<zone> ISO 8601:2004 extended format | "2011-06-24T02:53:46Z" | The date on which this version of the data was created. Modification of values implies a new version, hence this would be assigned the date of the most recent values modification. Metadata changes are not considered when assigning the creation_date NOTE: The ACDD 1.3 names this attribute as |
comment | Free text |
| Miscellaneous information about the data, not captured elsewhere. |
forecast_type | "forecast" or "hindcast" | "forecast" | To identify the type of data |
history | Each line should begin with a timestamp indicating the date and time of day when the program was executed CF: Free Text |
| To record relevant information, such as the command history which led to this file being produced. Provides an audit trail for modifications to the original data.
|
commit, iso_lineage or lineage | Free text (ISO Lineage model 19115-2) | "Produced using CDS Toolbox v1.0" | trace of the tools/scripts used. Paco: include information about the versioning of the software used to create the data Antonio S. Cofino Gonzalez: We need a more implementtios examples on this. This could achiived in EQC WP where metadata is been part of their activities (i.e. WP4@QA4SEAS). ISO 19115-2 defines a linage model where this is been considered. TBD. |
summary | The content will be provided ACDD (highly recommended): Text, defined phrase | A short paragraph describing the dataset | |
keywords | The content will be provided ACDD (highly recommended) : text, controlled vocabulary | A comma separated list of key words and phrases. | |
forecast_reference_time | SPECS: YYYY-MM-DDThh:mm:ssZ NOTE: This is ISO 8601:2004 extended format, but time zone is required to be UTC | "2011-06-01T00:00:00Z" | time of the analysis from which the forecast was made |
Spatial Coordinates
Type (CMIP5) | Coordinate Name (CMIP5) | Dimension Names (CMIP5) | Axis | standard_name | long_name (CMIP5) | units (CF canonical units) | positive | valid_min (CMIP5) | valid_max (CMIP5) | Notes |
---|---|---|---|---|---|---|---|---|---|---|
double | lat | lat | Y | latitude | latitude | degrees_north | N/A | -90. | 90. | Bounds required [-90. , -89. , ..., 0., ... 90.] |
double | lon | lon | X | longitude | longitude | degrees_east | N/A | 0. | 360. | Bounds required Values (1x1deg grid) prescribed: dimension lon=360 [0. , 1. , ..., 358., 359.] |
double | plev | plev | Z | air_pressure | pressure | Pa | down | N/A | N/A | This is also referred to as isobaric level by some tools [925., 850., 700., 500., 400., 300., 200., 100., 50., 30., 10.] (NOTE: in hPa) |
double | depth | depth | Z | depth | depth | m | down | N/A | N/A | Only used for soil model levels NOTE: Number and depth of levels is not prescribed by C3S |
double | height | height | Z | height | height | m | up or down | CMIP5: 2mtemp: 1. | CMIP5: 2mtemp: 10. | Used for single level fields (height, soil,SST) e.g. 2 m (for Temperature) |
C3S: string
| realization | C3S: realization_dim CF: a different name is needed for dim/variable | E | realization | realization | 1 | N/A | N/A | N/A | members are not a physical quantity. Realization is a discrete coordinate and the mebers it categorical values (ordered or non-ordered ones) |
Time Coordinates
Coordinate Name | Dimension Names | Axis | standard_name | long_name (SPECS) | calendar | units | positive | Notes |
---|---|---|---|---|---|---|---|---|
leadtime | time | N/A | forecast_period | "Time elapsed since the start of the forecast" | N/A | SPECS: days | N/A | The interval of time between the forecast reference time and the valid time |
time | time | T | time | "Verification time of the forecast" | standard | SPECS: "days since 1850-01-01" C3S: requested units can be relaxed to equivalent time units | N/A | Time for which the forecast is valid |
NOTE: about forecast_reference_time as a global attribute (not generic, but SPECS use, as it will have one variable/start time per file)
Cell boundaries
As described in section 7.1 Cell Boundaries of CF convention.
To represent cells we add the attribute bounds
to the appropriate coordinate variable(s). The value of bounds
is the name of the variable that contains the vertices of the cell boundaries. We refer to this type of variable as a "boundary variable." A boundary variable will have one more dimension than its associated coordinate or auxiliary coordinate variable. The additional dimension should be the most rapidly varying one, and its size is the maximum number of cell vertices. Since a boundary variable is considered to be part of a coordinate variable’s metadata, it is not necessary to provide it with attributes such as long_name
and units
Bounds Name | Dimensions | Comments |
---|---|---|
time_bounds | time,bounds |
e.g.
[0,24] is that convention always valid? |
lat_bounds | lat, bounds | Values (1x1deg grid) prescribed: [-90., 89.], [-89., -88.], ... [89., 90.] |
lon_bounds | lon, bounds | Values (1x1deg grid) prescribed: [0., 1.], [1., 2.], ... [359., 360.] |
depth_bounds | depth,bounds | Should define the full vertical extent of the soil model layers |
NOTE: about bounds, time_bounds etc. in the context of SPECS (i.e. one variable per file).... for variables with different time steps time variables should be different, and then there should be different time_bounds variables, but this will be a problem for merging data fields in the same file, we are safe here because of the SPECS "one variable per file" rule
Variables
NOTE: coordinates should list first of all the auxiliary coordinate(s) and then all the other coordinates
Static Fields
attributes | ||||||||
name (CMIP5) | dimensions | standard_name | long_name (CMIP5) | units | coordinates | cell_methods | grid_mapping | NOTES |
---|---|---|---|---|---|---|---|---|
sftlf | lat,lon | land_area_fraction | "Land Area Fraction" | 1 | "lat lon"
| | | |
orog | lat,lon | surface_altitude | "Surface Altitude" | m | "lat lon" |
Surface Fields (defined at a given height level)
attributes | ||||||||
name (CMIP5) | dimensions | standard_name | long_name (CMIP5) | units | coordinates | cell_methods | grid_mapping | NOTES |
---|---|---|---|---|---|---|---|---|
tas | time,lat,lon | air_temperature | "Near-Surface Air Temperature" | K | "height time lat lon"
| "time: point" | latitude_longitude what value has this variable? | height is usually 2m |
tasmax | time,lat,lon | air_temperature | "Daily Maximum Near-Surface Air Temperature" | K | "height time lat lon" | "time: maximum (interval: <value> <unit>)" C3S: required. CF: interval is optional | height is usually 2m C3S: The interval is required to have a value<=3 hours) | |
tasmin | time,lat,lon | air_temperature | "Daily Minimum Near-Surface Air Temperature" | K | "height time lat lon" | "time: minimum (interval: <value> <unit>)" C3S: required. CF: interval is optional | height is usually 2m C3S: The interval is required to have a value<=3 hours) | |
time,lat,lon | dew_point_temperature | K | "height time lat lon" | "time: point" C3S: required CF: recommended | height is usually 2m
| |||
uas | time,lat,lon | x_wind | Eastward Near-Surface Wind | m s-1 | "height time lat lon" | "time: point" C3S: required CF: recommended | height is usually 10m | |
vas | time,lat,lon | y_wind | Northward Near-Surface Wind | m s-1 | "height time lat lon" | "time: point" C3S: required CF: recommended | height is usually 10m | |
time,lat,lon | wind_speed_of_gust | m s-1 | "height time lat lon" | "time: maximum (interval: <value> <unit>)" C3S: required. CF: interval is optional | height is usually 10m C3S: The interval is required to have a value<=3 hours) |
Surface Fields (not defined at a height level)
Finalised | Priority (i.e. should be defined first for MARS) | requested variables | Our Convention (in netcdf files) | |||||||
step | Parameter Identifier (as used in ITT) | Originating Centre | name | standard_name | units (as used in ITT) | Cell Methods | time_bounds | comments | ||
---|---|---|---|---|---|---|---|---|---|---|
N | 1 | 6 h inst | 151 | 98 | mean sea level pressure | air_pressure_at_sea_level | Pa | intervals must represent 6 hours | ||
N | 2 | 6 h inst | 164 | 98 | total cloud cover | cloud_area_fraction_assuming_maximum_random_overlap | 1 | intervals must represent 6 hours | ||
N | 2 | 6 h inst | 235 | 98 | skin temperature | surface_temperature | K | intervals must represent 6 hours | skin_temperature" doesn't exist as a CF standard_name, so maybe the required one should be "surface_temperature" | |
N | 2 | 24 h inst | 31 | 98 | sea-ice cover | sea_ice_area_fraction | 1 | Intervals must represent 24 hours starting at 0Z (to be agreed) | ||
N | 1 | 24 h inst | 34 | 98 | sea surface temperature | open_sea_surface_temperature | K | Intervals must represent 24 hours starting at 0Z (to be agreed) | To cope with the fact that some providers send instantaneous 00UTC values and some others daily averages, it was agreed as a compromise to request 6h instantaneous SST values (so the value at 00h would be the same for everyone, and a daily average to account for the diurnal cycle could be obtained from the 6h values) | |
N | 2 | 24 h inst | 141 | 98 | snow depth (water equivalent) | lwe_thickness_of_surface_snow_amount | m | Intervals must represent 24 hours starting at 0Z (to be agreed) | Note it is snow amount, not snowfall amount. | |
N | 2 | 24 h inst | 33 | 98 | snow density | snow_density | kg m-3 | Intervals must represent 24 hours starting at 0Z (to be agreed) | A check is needed whether this should be an average or an instantaneous value | |
N | 2 | 24 h inst | 243 | 98 | forecast albedo | surface_albedo | 1 | Intervals must represent 24 hours starting at 0Z (to be agreed) | don't know how the cell_methods could be coded for this variable if it is obtained from ratios of daily accumulations of shortwave radiation. |
Soil Level Fields
Finalised | Priority (i.e. should be defined first for MARS) | requested variables | Our Convention (in netcdf files) | ||||||||
step | Parameter Identifier (as used in ITT) | Originating Centre | name | standard_name | units (as used in ITT) | Cell Methods | time_bounds | soil model layer(level) number | comments | ||
---|---|---|---|---|---|---|---|---|---|---|---|
N | 1 | 24 h inst | 39 | 98 | volum. soil moisture layer 1 | moisture_content_of_soil_layer | m3 m-3 | intervals must represent 24 hours starting at 0Z (to be agreed) | scalar value=1
| the number of soil levels shouldn't be prescribed (they will likely differ from model to model) and the vertical coordinate for them should be the height -need bounds- of each layer. In addition, there was an issue with the units, Anca should have the final conclusion about that. | |
N | 1 | 24 h inst | 40 | 98 | volum. soil moisture layer 2 | moisture_content_of_soil_layer | m3 m-3 | intervals must represent 24 hours starting at 0Z (to be agreed) | scalar value=2 | see above | |
N | 2 | 24 h inst | 41 | 98 | volum. soil moisture layer 3 | moisture_content_of_soil_layer | m3 m-3 | intervals must represent 24 hours starting at 0Z (to be agreed) | scalar value=3 | see above | |
N | 2 | 24 h inst | 42 | 98 | volum. soil moisture layer 4 | moisture_content_of_soil_layer | m3 m-3 | intervals must represent 24 hours starting at 0Z (to be agreed) | scalar value=4 | see above | |
N | 2 | 24 h inst | 139 | 98 | soil temperature level 1 | soil_temperature | K | intervals must represent 24 hours starting at 0Z (to be agreed) | scalar value=1 |
Accumulation Fields
Finalised | Priority (i.e. should be defined first for MARS) | requested variables | Our Convention (in netcdf files) | ||||||||
step | Parameter Identifier (as used in ITT) | ParamID | Originating Centre | name | standard_name | units (as used in ITT) | Cell Methods | time_bounds | comments | ||
---|---|---|---|---|---|---|---|---|---|---|---|
N | 1 | 24 h | 228 | 98 | total precipitation | precipitation_amount | m | time: sum (interval: 1hour) | intervals must represent 24 hours starting at 0Z (to be agreed) | is the "interval" is needed in cell_methods with "time: sum"? | |
N | 2 | 24 h | 144 | 98 | snowfall | lwe_thickness_of_snowfall_amount | m | time: sum (interval: 1hour) | intervals must represent 24 hours starting at 0Z (to be agreed) | as above | |
N | 2 | 24 h | 146 | 98 | surface sensible heat flux | surface_upward_sensible_heat_flux
-may request integral_of_XXXXX_wrt_time instead (diff units) | J m-2 | TBD | intervals must represent 24 hours starting at 0Z (to be agreed) | If we are not going to request accumulations since the beginning of the forecast, maybe it is more natural for the providers to send daily averaged values (which affects the standard name -integral_of_XXXXX_wrt_time- and hence the units) | |
N | 2 | 24 h | 147 | 98 | surface latent heat flux | surface_upward_latent_heat_flux -may request integral_of_XXXXX_wrt_time instead (diff units) | J m-2 | TBD | intervals must represent 24 hours starting at 0Z (to be agreed) | as above | |
N | 2 | 24 h | 169 | 98 | surface solar radiation downwards | surface_downwelling_shortwave_flux_in_air -may request integral_of_XXXXX_wrt_time instead (diff units) | J m-2 | TBD | intervals must represent 24 hours starting at 0Z (to be agreed) | as above | |
N | 2 | 24 h | 175 | 98 | surface thermal radiation downwards | surface_downwelling_longwave_flux -may request integral_of_XXXXX_wrt_time instead (diff units)
| J m-2 | TBD | intervals must represent 24 hours starting at 0Z (to be agreed) | as above | |
N | 2 | 24 h | 176 | 98 | surface net solar radiation | surface_net_downward_shortwave_flux -may request integral_of_XXXXX_wrt_time instead (diff units)
| J m-2 | TBD | intervals must represent 24 hours starting at 0Z (to be agreed) | as above | |
N | 2 | 24 h | 177 | 98 | surface net thermal radiation | surface_net_downward_longwave_flux -may request integral_of_XXXXX_wrt_time instead (diff units) | J m-2 | TBD | intervals must represent 24 hours starting at 0Z (to be agreed) | as above | |
N | 2 | 24 h | 178 | 98 | top solar radiation | toa_incoming_shortwave_flux -may request integral_of_XXXXX_wrt_time instead (diff units) | J m-2 | TBD | intervals must represent 24 hours starting at 0Z (to be agreed) | as above | |
N | 2 | 24 h | 179 | 98 | top thermal radiation | toa_outgoing_longwave_flux -may request integral_of_XXXXX_wrt_time instead (diff units) | J m-2 | TBD | intervals must represent 24 hours starting at 0Z (to be agreed) | as above | |
N | 2 | 24 h | 180 | 98 | east-west surface stress | surface_downward_eastward_stress -may request integral_of_XXXXX_wrt_time instead (diff units) | (N m-2) s | TBD | intervals must represent 24 hours starting at 0Z (to be agreed) | as above | |
N | 2 | 24 h | 181 | 98 | north-south surface stress | surface_downward_northward_stress -may request integral_of_XXXXX_wrt_time instead (diff units) | (N m-2) s | TBD |
intervals must represent 24 hours starting at 0Z (to be agreed) | as above | |
N | 2 | 24 h | 182 | 98 | evaporation | lwe_thickness_of_water_evaporation_amount | m | time: sum (interval: 1hour) | intervals must represent 24 hours starting at 0Z (to be agreed) | ||
N | 2 | 24h | 205 | 98 | runoff | runoff_amount | m | time: sum (interval: 1hour) | intervals must represent 24 hours starting at 0Z (to be agreed) |
| |
N | 2 | 24 h | 8 | 98 | surface runoff | surface_runoff_amount | m | time: sum (interval: 1hour) | intervals must represent 24 hours starting at 0Z (to be agreed) | ||
N | 2 | 24 h | 9 | 98 | sub-surface runoff | subsurface_runoff_amount | m | time: sum (interval: 1hour) | intervals must represent 24 hours starting at 0Z (to be agreed) |
Pressure Level Fields
Finalised | Priority (i.e. should be defined first for MARS) | requested variables | Our Convention (in netcdf files) | |||||||
step | Parameter Identifier (as used in ITT) | Originating Centre | name | standard_name | units (as used in ITT) | Cell Methods | time_bounds | comments | ||
---|---|---|---|---|---|---|---|---|---|---|
N | 1 | 12 h inst | 129 | 98 | geopotential | geopotential | m2/s2 | intervals must represent 12 hours | Alternative is "geopotential_height" in m | |
N | 2 | 12 h inst | 130 | 98 | temperature | air_temperature | K | intervals must represent 12 hours | ||
N | 2 | 12 h inst | 133 | 98 | specific humidity | specific_humidity | 1 | intervals must represent 12 hours | ||
N | 2 | 12 h inst | 131 | 98 | U component of wind | x_wind | m/s | intervals must represent 12 hours | ||
N | 2 | 12 h inst | 132 | 98 | V component of wind | y_wind | m/s | intervals must represent 12 hours |
Additional Questions to be addressed
Question | Discussion | Decision |
---|---|---|
File format to be used? | Francisco Doblas-Reyes NetCDF4? With or without compression? Kevin Marsh netCDF4 classic model (with deflate =6 suggested by Pierre-Antoine) | |
File naming, | Kevin Marsh Pierre-Antoine Bretonniere proposed follow SPECS convention | |
forecast/hindcast matching and labelling | ||
File size recommendation (maximum size)? | Kevin Marsh Pierre-Antoine Bretonniere suggested 4GB recommended maximum size | Kevin Marsh recommend 4GB Max Size for data files |
Versioning of data files? | ||
DOI | Kevin Marsh DOI likely to be assigned at dataset level | Kevin Marsh DOI likely to be assigned at dataset level |
Variable short names to be specified? | Kevin Marsh Antonio S. Cofino Gonzalez suggested follow cmip5 short names | Kevin Marsh follow cmip5 short names |
Coordinate short names to be specified? | Kevin Marsh Antonio S. Cofino Gonzalez suggested follow cmip5 coordinate short names | Kevin Marsh follow cmip5 coordinate short names |
Extension to include ocean data for C3S? | Kevin Marsh yes, but not in the initial convention release | Kevin Marsh Not considered in initial release |
Grids, resolution etc to be specified? | Kevin Marsh Antonio S. Cofino Gonzalez agreed 1 degree grid specified with valid max/min, but actual grid points not specified | Kevin Marsh 1 degree grid specified with valid max/min, but actual grid points not specified |
MARS attributes to be specified? | Kevin Marsh These will be added by C3S, rather than data provider | Kevin Marsh These will be added by C3S |
standard name request/assignment process? | Kevin Marsh requested via standard name mailing list. Note that this process can take some considerable time. | Kevin Marsh requested via standard name mailing list |
Discussion about time coordinates
NOTE: The SPECS approach (2 1D time coordinates) has been chosen for the "providers" convention
The encoding of multiple time coordinates requires particular consideration. An explicit example of the structure is given below.
Example of encoding data with multiple time axis informations
double forecast_reference_time(forecast_reference_time) ;
forecast_reference_time:bounds = "forecast_reference_time_bnds" ;
forecast_reference_time:units = "hours since 1970-01-01 00:00:00" ;
forecast_reference_time:standard_name = "forecast_reference_time" ;
forecast_reference_time:calendar = "gregorian" ;
double leadtime(leadtime) ;
leadtime:bounds = "leadtime_bnds" ;
leadtime:units = "hours" ;
leadtime:standard_name = "forecast_period" ;
leadtime:calendar = "gregorian" ;
double time(forecast_reference_time,leadtime) ;
time:axis = "T" ;
time:bounds = "time_bnds" ;
time:units = "hours since 1970-01-01 00:00:00" ;
time:standard_name = "time" ;
float temp(forecast_reference_time,leadtime,pressure,latitude,longitude);
temp:units = "K";
temp:standard_name = "air_temperature";
temp:coordinates = "time";
Francisco Doblas-Reyes I interpret this as the time coordinates being a hypercube, where there could be missing data; this won't be consistent with the CMIP files; I
still find this confusing unless a discussion about what to do with the missing data is undertaken.
Eduardo Penabad: Wouldn't that be solved by clarifying that different variables within the same file could potentially have different time coordinates/dimensions?
Francisco Doblas-Reyes Not sure. If to simplify you assume one variable only and this variable has in one file data for two start dates, one with three forecast time steps and another one with only two, the time dimensions will be forecast_reference_time=2, leadtime=3, but one of the values of temp() will have missing values, unless I haven't understood the model.
Antonio S. Cofino Gonzalez: discussion on multi-time dimension data