The global climate projections in the Climate Data Store (CDS) are a quality-controlled subset of the wider CMIP5 data. These data represent only a small subset of CMIP5 archive. A set of 50 core variables from the CMIP5 archive were identified for the CDS. These are the most used of the CMIP5 data. These variables are provided from seven of the most popular CMIP5 experiments.
The CDS subset of CMIP5 data have been through a metadata quality control procedure which ensures a high standard of reliability of the data. It may be for example that similar data can be found in the main CMIP5 archive however these data come with no quality assurance and may have metadata errors or omissions. The quality-control process means that the CDS subset of CMIP5 data is further reduced to exclude data that have metadata errors or inconsistencies. It is important to note that passing of the quality control should not be confused with validity: for example, it will be possible for a file to have fully compliant metadata but contain gross errors in the data that have not been noted. In other words, it means that the quality control is purely technical and does not contain any scientific evaluation (for instance consistency check).
The CDS subset of CMIP5 data are provided as NetCDF files. NetCDF (Network Common Data Form) is a file format that is freely available and commonly used in the climate modelling community.
NetCDF files are accessible by many programming languages such as Python, R, IDL, C, C++ and Fortran.
A NetCDF file contains:
The metadata provided in NetCDF files adhere to the Climate and Forecast (CF) conventions (v1.4 for CMIP5 data). The rules within the CF-conventions ensure consistency across data files, for example ensuring that the naming of variables is consistent and that the use of variable units is consistent.
The CDS subset of the CMIP5 data have been through a set of quality control checks before being made available through the CDS. The objective of the quality control process is to ensure that all files in the CDS meet a minimum standard. Data files were required to pass all stages of the quality control process before being made available through the CDS. Data files that fail the quality control process are excluded from the CDS-CMIP5 subset or if possible the error is corrected and a note made in the history attribute of the file. The quality control of the CDS CMIP5 subset checks for metadata errors or inconsistencies against the Climate and Forecast (CF) Conventions and a set of CMIP5 specific file naming and file global metadata conventions.
Various software tools have been used to check the metadata of the CDS CMIP5 data:
The data within the files were not individually checked however where it was known that a variable from a given model had a gross error, e.g in the sign convention of a flux, then these data were also omitted from the CDS-CMIP5 subset.
It is important to note that passing of these quality control tests should not be confused with validity: for example, it will be possible for a file to be fully CF compliant and have fully compliant CMIP5 metadata but contain gross errors in the data that have not been noted.
For a detailed description of all the quality control of the data please see the accompanying documentation
The models included in the CDS-CMIP5 subset are detailed in the table below, these include most of the models from the main CMIP5 archive. However a small number of models were not included as the data from the models have a research-only restriction on their use, all data in the CDS are released without restriction, therefore, the MIROC and MRI models from Japan are not included.
A full list of models is provided here.
For pressure level data the model output is available on the pressure levels according to the table below. Note that since the model output is standardised all models produce the data on the same pressure levels.
Frequency | Number of Levels | Pressure Levels (hPa) |
Daily | 8 | 1000., 850., 700., 500., 250., 100., 50., 10. |
Monthly | 17 | 1000., 925., 850., 700., 600., 500., 400., 300., 250., 200., 150., 100., 70., 50., 30., 20., 10. |
The CDS-CMIP5 subset consists of the following CMIP5 experiments
Further details can be found on the Earth System Documentation site.
Each modelling centre will typically run the same experiment using the same model several times to confirm the robustness of results and inform sensitivity studies through the generation of statistical information. A model and its collection of runs is referred to as an ensemble. Within these ensembles, three different categories of sensitivity studies are done, and the resulting individual model runs are labelled by three integers indexing the experiments in each category.
Each member of an ensemble is identified by a triad of integers associated with the letters r, i and p which index the “realization”, “initialization” and “physics” variations respectively. For instance, the member "r1i1p1" and the member "r1i1p2" for the same model and experiment indicate that the corresponding simulations differ since the physical parameters of the model for the second member were changed relative to the first member.
It is very important to distinguish between variations in experiment specifications, which are globally coordinated across all the models contributing to CMIP5, and the variations which are adopted by each modelling team to assess the robustness of their own results. The “p” index refers to the latter, with the result that values have different meanings for different models, but in all cases these variations must be within the constraints imposed by the specifications of the experiment.
For the scenario experiments, the ensemble member identifier is preserved from the historical experiment providing the initial conditions, so RCP 4.5 ensemble member “r1i1p2” is a continuation of historical ensemble member “r1i1p2”.
When you download a CMIP5 file from the CDS it will have a naming convention that is as follows:
<variable>_<cmor_table>_<model>_<experiment>_<ensemble_member>_<temporal_range>.nc
Where
A data availability matrix for the C3S CMIP5 exists at: https://cp-availability.ceda.ac.uk.
The data availability matrix filter allows users to search the CDS CMIP5 data subset for the data availability for one or more variables within one or more experiments simultaneously. By specifying a minimum ensemble size, each model returned as a result of the search criteria must have at least the number of ensemble members specified by the user. This functionality allows users to determine if a given combination of variables and experiments is available in enough ensemble members for their scientific analysis. The data availability matrix returns a list displaying which models, experiments and ensembles have all of the selected criteria. The results can be exported either in JSON or CSV format. The first 11 rows of the CSV is metadata, row 12 contains the table headers for the results. The JSON export has 3 main keys: provenance, query and results. Provenance has metadata, query contains information about the selected parameters which gave the results.