...
The sixth phase of the Coupled Model Intercomparison Project (CMIP6) consists of 134 models from 53 modelling modelling centres (Durack, 2020). CMIP6 data publication began in 2019 and the majority of the data publication will be was completed by in 2022. The scientific analyses from CMIP6 will be used extensively in the Intergovernmental Panel on Climate Change (IPCC) 6th Assessment Report (AR6), due for release in 2021/22 (IPCC, 2020).
...
The CDS subset of CMIP6 data has been through a quality control procedure which ensures a high standard of dependability of the data. It may be for example, that similar Additional data can be found in the main CMIP6 ESGF archive, however these data come with very limited quality assurance and may have metadata errors or omissions.
...
Expand | ||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ||||||||||||||||||||
|
Models, grids, calendars, and pressure levels
Models
The models included in the CDS-CMIP6 subset are detailed in the table below including a brief description of the model where this information is readily available, further details can be found on the Earth System Documentation site (ES-DOC) or WDC-climate pages. Sometimes there are small differences in between the model details reported in the CMIP6 metadata and documentation, this also applies to other sources of CMIP6 data, and is not normally recorded in the erratathe source documentation, the models with such discrepancies are marked here with an asterix and further details are provided in a second table below. The grid IDs reported in the final column are explained further under the 'grids' section.
Expand | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Grids
CMIP6 data is reported either on the model’s native grid or re-gridded to one or more target grids with data variables generally provided near the centre of each grid cell (rather than at the boundaries). For CMIP6 there is a requirement to record both the native grid of the model and the grid of its output (archived in the CMIP6 repository) as a “nominal_resolution”. The "nominal_resolution” enables users to identify which models are relatively high resolution and have data that might be challenging to download and store locally. Information about the grids can be found in the model table above, under 'Model Details' and within the NetCDF file metadata.
Pressure levels
For pressure level data the model output is available on the pressure levels according to the table below. Note that since the model output is standardised all models produce the data on the same pressure levels.
...
Frequency
...
Number of Levels
...
Pressure Levels (hPa)
...
Daily
...
8
...
1000., 850., 700., 500., 250., 100., 50., 10.
...
Monthly
...
19
...
1000., 925., 850., 700., 600., 500., 400., 300., 250., 200., 150., 100., 70., 50., 30., 20., 10., 5., 1.
|
Expand | |||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| |||||||||||||||||||||
CMIP6 models where there is a discrepancy between some model details reported in the available documentation and the metadata:
|
Grids
CMIP6 data is available either on the model’s native grid or re-gridded to one or more target grids with data variables generally provided near the centre of each grid cell (rather than at the boundaries). This re-gridding is normally done for models which use native grids other than regular lat-lon grids (e.g. cubed sphere or gaussian), in these cases the output has been re-gridded to a regular lat-lon grid by the modelling centers. For CMIP6 there is a requirement to record both the native grid of the model, and the approximate resolution of the final output data (archived in the CMIP6 repository, and available via the CDS) as a “nominal_resolution”. This "nominal_resolution” enables users to identify which models have relatively high resolution output. Information about the grids can be found in the model table above, under 'Model Details', and within the NetCDF file metadata.
The column 'Grids on the CDS ('gn', 'gr' or 'gr1')' lists which grid IDs are associated with the data from that model available on the CDS. These labels reflect whether a given set of model data (variable) uploaded to ESGF is on the
- native grid of the model component ('gn'),
- regridded to the regular target grid specified for the particular variable ('gr'),
- or another target grid ('gr1').
The output from some models has multiple different grid IDs associated with it, due to different model components (atmosphere, land, ocean, cryosphere etc.) being treated differently. This does not necessarily mean the data itself is on a different grid, for example the atmospheric variables maybe on a regular native grid ('gn'), and the ocean variables with an irregular native grid may have been regridded to the atmosphere grid (hence are labelled 'gr'), so they are on the same grid in spite of the fact that their grid ID is different. On the other hand, if a model is only listed as having output on the native grid ('gn'), this does not guarantee that all the data (variable) is on the same grid, as the native grid for different model components can be different.
Note: some data (i.e. variables) have been submitted to ESGF on multiple grids, in these cases only one grid is made available on the CDS (this is decided on a case-by-case basis).
Calendars
Climate models sometime use different calendars, for example Hadley Centre models in CMIP6 use a 360 day calendar, where every month has exactly 30 days. Some models use a fixed 365-day calendar, and others include leap-years. These variations can result in different length time-dimensions if daily data is downloaded, depending on the time period and models selected, or even failed data requests. Users need to be careful, when using the CDS user interface download form or API, to avoid selecting days which may not be available in the calendar of the given model (for example requests referring to day 31 for the Hadley Centre models would fail, because they have a 360 day calendar).The CDS form for CMIP6 currently assumes a standard calendar, so allows the selection of such missing days, and conversely may not allow selection of all days from models with non-standard calendars (but this data can be retrieved using the API).
Pressure levels
For pressure level data the model output is available on the pressure levels according to the table below. Note that since the model output is standardised all models produce the data on the same pressure levels.
Frequency | Number of Levels | Pressure Levels (hPa) |
Daily | 8 | 1000., 850., 700., 500., 250., 100., 50., 10. |
Monthly | 19 | 1000., 925., 850., 700., 600., 500., 400., 300., 250., 200., 150., 100., 70., 50., 30., 20., 10., 5., 1. |
Ensembles
Ensembles
Each modelling centre typically run the same experiment using the same model with slightly different settings several times to confirm the robustness of results and inform sensitivity studies through the generation of statistical information. A model and its collection of runs is referred to as an ensemble. Within these ensembles, four different categories of sensitivity studies are done, and the resulting individual model runs are labelled by four integers indexing the experiments in each category
...
- The first category, labelled realization_index (referred to with letter r), performs experiments which differ only in random perturbations of the initial conditions of the experiment. Comparing different realizations allow estimation of the internal variability of the model climate.
- The second category, labelled initialization_index (referred to with letter i), refers to variation in initialisation parameters. Comparing differently initialised output provides an estimate of how sensitive the model is to initial conditions.
- The third category, labelled physics_index (referred to with letter p), refers to variations in the way in which sub-grid scale processes are represented. Comparing different simulations in this category provides an estimate of the structural uncertainty associated with choices in the model design.
- The fourth category labelled forcing_index (referred to with letter f) is used to distinguish runs of a single CMIP6 experiment, but with different forcings applied.
Parameter listings
Time-Independent parameters are marked with a - dash in the relevant column.time resolution column. Please note that some parameters defined at pressure levels, such as 1000 hPa temperature, may contain missing data where they are not defined (so the fields look incomplete over terrain) or are filled with interpolated values (different modelling centres may have different approaches). This happens when the pressure level falls below the orography.
Expand | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
...
- CF-Checks: The CF-checker tool checks that each NetCDF4 file in a given dataset is compliant with the Climate and Forecast (CF) conventions, compliance ensures that the files are interoperable across a range of software toolstools. When CF-checker 1.7 is run on the current data some remaining issues are highlighted, particularly for lat, lon and time bounds.
- PrePARE: The PrePARE software tool is provided by PCMDI (Program for Climate Model Diagnosis and Intercomparison) to verify that CMIP6 files conform to the CMIP6 data protocol. All CMIP6 data should meet this required standard however this check is included to ensure that all data supplied to the CDS have passed this QC test.
- nctime: The nctime checker checks the temporal axis of the NetCDF files. For each NetCDF file the temporal element of the file is compared with the time axis data within the file to ensure consistency. For a time-series of data comprised of several NetCDF files nctime ensures that the entire timeseries is complete, that there are no temporal gaps or overlaps in either the filename or in the time axes within the files.
- Errata: The dataset is checked to ensure that no outstanding Errata record exists.
- Data Ranges: A set of tests on the extreme values of the variables are performed, this is used to ensure that the values of the variables fall into physically realistic ranges.
- Handle record consistency checks: This check ensures that the version of the dataset used is the most recently published dataset by the modelling centre, it also checks for any inconsistency in the ESGF publication and excludes any datasets that may have an inconsistent ESGF publication metadata.
- Exists at all partner sites: It is asserted that each dataset exists at all three partner sites CEDA, DKRZ and IPSL.
It is important to note that passing these quality control tests should not be confused with validity: for example, it will be possible for a file to pass all QC steps but contain errors in the data that have not been identified by either data providers or data users.
In cases where the quality control picks up errors that are related to minor technical details of the conventions, or behavior that is in line with expectations for climate model output despite being unexpected in a physical system, the data will be published with details of the errors referenced in the documentation. An example of the 2nd type of error is given by negative salinity values which occur in one model as a result of rapid release of fresh water from melting sea-ice. These negative values are part of the noise associated with the numerical simulation and reflect what is happening in the numerical model.
Citation, license and PID information
In general the CMIP6 data Citation Service provides information for users on how to cite CMIP6 data and also information on the data licenses.
The users can decide on what level they want to refer to the CMIP6 datasets.
The highest level is the one provided by the CDS with the use of the following DOI: 10.24381/cds.c866074c (available also at the right-hand-side of the entry). The users can refer to any data with this DOI, which are available in the CMIP6 catalogue entry in the CDS.
The CMIP6 citation Search is at http://bit.ly/CMIP6_Citation_Search. Citations for CDS CMIP6 data available in the CDS are discoverable in the ESGF on model and experiment levels (please note that these linked files are csv files, which can be looked at after downloading them).
The CMIP6 datasets are also labelled by the so called Persistent Identifiers (PIDs). PIDs are assigned to each version of every file and dataset. These are unique identifiers of the data and they are available in the header of the netcdf datafiles. The PIDs are also provided on dataset and file levels (please note that these files are csv files, which can be looked at after downloading them).
Known issues
CDS users are directed to the CMIP6 ES-DOC Errata Service for known issues with the wider CMIP6 data pool. Data that is provided to the CDS either should not contain any errors, or minor errors should be listed in the Errata Service. Additionally, the Errata Service is also a useful resource for CDS users as data may have been withheld from the CDS for justifiable reasons.
Subsetting and downloading data
CDS users will now be able to apply temporal and spatial subsetting operations to CMIP6 datasets. This mechanism (the "roocs" WPS framework) that runs at each of the partner sites: CEDA, DKRZ and IPSL. The WPS can receive requests for processing based on dataset identifiers, a temporal range, a bounding box and a range of vertical levels. Each request is converted to a job that is run asynchronously on the processing servers at the partner sites. NetCDF files are generated and the response contains download links to each of the files. Users of the CDS will be able to make subsetting selections using the web forms provided by the CDS catalogue web-interface. More advanced users will be able to define their own API requests in the CDS Toolbox that will call the WPS. Output files will be automatically retrieved so that users can access them directly within the CDS.
When CMIP6 data is downloaded from the Climate Data Store, information about any additional processing applied to the data (such as temporal or spatial subsetting) is included in both PNG and JSON form. These provenance files will be zipped up with the retrieved data and named provenance.png and provenance.json, which describe the software used and the methods applied to the data following the W3C PROV standard. For more information about how to interpret these files, please see https://rook-wps.readthedocs.io/en/latest/prov.html.
Additional resources, learning and publications
A training resource in python is availble via a Jupyter Notebook on the C3S data tutorials page here: https://ecmwf-projects.github.io/copernicus-training-c3s/projections-cmip6.html
Some publications that may be helpful in understanding different features or limitation of the data include:
...
Lehner, F., Deser, C., Maher, N., Marotzke, J., Fischer, E. M., Brunner, L., Knutti, R., and Hawkins, E.: Partitioning climate projection uncertainty with multiple large ensembles and CMIP5/6, Earth Syst. Dynam., 11, 491–508, 2020. https://doi.org/10.5194/esd-11-491-2020
John, A., Douville, H., Ribes, A. and Yiou, P., 2022. Quantifying CMIP6 model uncertainties in extreme precipitation projections. Weather and Climate Extremes, 36, p.100435. https://doi.org/10.1016/j.wace.2022.100435
- to ensure that all data supplied to the CDS have passed this QC test.
- nctime: The nctime checker checks the temporal axis of the NetCDF files. For each NetCDF file the temporal element of the file is compared with the time axis data within the file to ensure consistency. For a time-series of data comprised of several NetCDF files nctime ensures that the entire timeseries is complete, that there are no temporal gaps or overlaps in either the filename or in the time axes within the files.
- Errata: The dataset was checked to ensure that no outstanding Errata record existed at the time of publication.
- Data Ranges: A set of tests on the extreme values of the variables are performed, this is used to ensure that the values of the variables fall into physically realistic ranges.
- Handle record consistency checks: This check ensures that the version of the dataset used is the most recently published dataset by the modelling centre, it also checks for any inconsistency in the ESGF publication and excludes any datasets that may have inconsistent high-level metadata.
- Exists at all partner sites: It is asserted that each dataset exists at all three partner sites CEDA, DKRZ and IPSL.
It is important to note that passing these quality control tests should not be confused with validity: for example, it will be possible for a file to pass all QC steps but contain errors in the data that have not been identified by either data providers or data users.
In cases where the quality control picks up errors that are related to minor technical details of the conventions, or behavior that is in line with expectations for climate model output despite being unexpected in a physical system, the data will be published with details of the errors referenced in the documentation. An example of the 2nd type of error is given by negative salinity values which occur in one model as a result of rapid release of fresh water from melting sea-ice. These negative values are part of the noise associated with the numerical simulation and reflect what is happening in the numerical model.
Citation, license and PID information
In general the CMIP6 data Citation Service provides information for users on how to cite CMIP6 data and also information on the data licenses.
The users can decide on what level they want to refer to the CMIP6 datasets.
The highest level is the one provided by the CDS with the use of the following DOI: 10.24381/cds.c866074c (available also at the right-hand-side of the entry). The users can refer to any data with this DOI, which are available in the CMIP6 catalogue entry in the CDS.
The CMIP6 citation Search is at http://bit.ly/CMIP6_Citation_Search. Citations for CDS CMIP6 data available in the CDS are discoverable in the ESGF on model and experiment levels (please note that these linked files are csv files, which can be looked at after downloading them).
The CMIP6 datasets are also labelled by the so called Persistent Identifiers (PIDs). PIDs are assigned to each version of every file and dataset. These are unique identifiers of the data and they are available in the header of the netcdf datafiles. The PIDs are also provided on dataset and file levels (please note that these files are csv files, which can be looked at after downloading them).
Known issues
CDS users are directed to the CMIP6 ES-DOC Errata Service for known issues with the wider CMIP6 data pool. Data that is provided to the CDS either should not contain any errors, or minor errors should be listed in the Errata Service. Additionally, the Errata Service is also a useful resource for CDS users as data may have been withheld from the CDS for justifiable reasons.
Some models currently have either missing historical or scenario data for some variables, which is in the process of being resolved. Some details are given in the table below:
Model | Missing variable data details |
MPI-ESM-1-2-HAM |
|
EC-Earth3 |
|
EC-Earth3-Veg |
|
MIROC-ES2H |
|
EC-Earth3-Veg-LR |
|
NORESM2-LM |
|
GISS-E2-1-G |
|
Subsetting and downloading data
CDS users will now be able to apply temporal and spatial subsetting operations to CMIP6 datasets. This mechanism (the "roocs" WPS framework) that runs at each of the partner sites: CEDA, DKRZ and IPSL. The WPS can receive requests for processing based on dataset identifiers, a temporal range, a bounding box and a range of vertical levels. Each request is converted to a job that is run asynchronously on the processing servers at the partner sites. NetCDF files are generated and the response contains download links to each of the files. Users of the CDS will be able to make subsetting selections using the web forms provided by the CDS catalogue web-interface. More advanced users will be able to define their own API requests in the CDS Toolbox that will call the WPS. Output files will be automatically retrieved so that users can access them directly within the CDS.
When CMIP6 data is downloaded from the Climate Data Store, information about any additional processing applied to the data (such as temporal or spatial subsetting) is included in both PNG and JSON form. These provenance files will be zipped up with the retrieved data and named provenance.png and provenance.json, which describe the software used and the methods applied to the data following the W3C PROV standard. For more information about how to interpret these files, please see https://rook-wps.readthedocs.io/en/latest/prov.html.
Additional resources
A training resource in python is available via a Jupyter Notebook on the C3S data tutorials page here: https://ecmwf-projects.github.io/copernicus-training-c3s/projections-cmip6.html
...
References
Durack, P J. (2020) CMIP6_CVs. v6.2.53.5. Available at: https://github.com/WCRP-CMIP/CMIP6_CVs (Accessed: 26 October 2020).
...