Contributors: Nabiz Rahpoe (DWD)
Issued by: DWD / Nabiz Rahpoe
Date: 28/10/2021
Ref: C3S_D312b_Lot1.2.3.8-v1.1_202102_PQAD_TCWV_SSMIS_TCDR+ICDR_v1.1.1
Official reference number service contract: 2018/C3S_312b_Lot1_DWD/SC1
History of modifications
List of datasets covered by this document
Related documents
Acronyms
Scope of the document
This document is the Product Quality Assurance Document (PQAD) for Total Column Water Vapour (product C3S_D312b_Lot1.3.3.14 v1.0 and v1.1) based on SSM/I & SSMIS measurements. It provides a brief guide to the data quality, and describes the validation method.
The TCDR data products are provided as a brokered service from the EUMETSAT CM SAF and the ICDRs (which are a continuation of the TCDR) are produced within the framework of C3S project. This document refers extensively to the original CM SAF validation report, “CM SAF Validation Report SSM/I and SSMIS - HOAPS 4.0” [D2]. It can found at the CM SAF web site http://www.cmsaf.eu.
Executive summary
The project C3S_312b_Lot1 includes brokering of total column water vapour gridded monthly mean and 6-hourly data from the EUMETSAT CM SAF (TCDR) and the production of an ICDR continuation. The ATBD document refers to the CM SAF documentation [D1] describing the methods and algorithms that are used by the CM SAF to generate the total column water vapour data products. The ICDRs are based on the same retrieval scheme and algorithm as TCDR, and therefore no further changes have been implemented, except of the continuation and extension of the time series processing.
The TCDR covers the time period January 1988 to December 2014, while the ICDR covers the time period from January 2015 to December 2020. The updated ICDR has the same characteristics as the TCDR, which has been generated within a CM SAF reprocessing activity.
In the scope of the validation activity, the comparison has been performed for accuracy and stability of the product and its continuation with the same method as within the CMSAF validation activity for total column water vapour monthly means (1988-2014) [D2]. In addition, the extended time series (ICDR) has been included for inter-comparison (2015-2020) to evaluate the overall bias and stability of the product toward reference sensors.
The general picture of validation presented in this document, shows an overall good performance of HOAPS 4.0 TCDR + the C3S ICDR (1988-2020). The target requirements (Table 1) set for the inter-comparison of bias and RMSD have been met at the optimal level. The stability has been met, at least the target category (see Table 2 for further details). The number of ICDR data values outside the 95% interval of TCDRs, is within the expected critical range at 5% significance level (Table 3) that has been performed with a statistical test. Chapter 3 gives details on methods and results of the validation.
1. Validated products
The validation includes the CM SAF product TCWV SSM/I and SSMIS from HOAPS 4.0 retrieval, containing gridded monthly mean and 6-hourly total column water vapour data for the full time period 1988-2020 containing the TCDR (1988-2014) and the follow-up ICDR generated within the C3S project (2015-2020).
For the validation activity, only the monthly mean data sets have been used due to a lack of daily composite data sets to use as reference, and their statistical representativeness for intercomparison.
2. Reference data set for validation
The validation of TCDR was primarily based on comparisons with ERA-Interim reanalysis, COSMIC (beta-version, ROM SAF), RSS_SSMI (SSM/I+SSMIS) V7, and TMI V7. The reference datasets are described and discussed in Sections 4.5-4.7 of CM SAF Validation Report [D2].
The validation of TCDR+ICDR is primarily based on comparisons with ERA-5, and merged microwave sensors RSS_SSMI V7 (SSM/I+SSMIS).
ERA-5 replaces the ERA-Interim reanalysis which stopped being produced on 31 August 2019.
This is the list of the reference sensors used for the current inter-comparison activity (TCDR+ICDR) and their temporal coverage:
- RSS_SSMI V7 (1988/01-2020/12)
- ERA-5 (1988-2019/06) – Due to current availability of the data set.
- ERA-5 (1988-2020) – (see results for 1988 to 2019/06 in Section 3.4 and further details in the PQAR [D4] for the complete period
3. Validation methodology & results
3.1 TCDR performance
The validation methodology is outlined in Section 6.1 [D2]. The results for total column water vapour are presented in Section 6.7.1 and further discussed in Section 6.7.2 of the CM SAF Validation Report [D2]. For TCWV, the target requirements are listed in Table 1.
The HOAPS 4.0 validation report [D2, Section 6.7.2] gives the following summary of quality. The HOAPS-4.0 monthly mean TCWV data show the following absolute bias and RMSD results, when compared against ERA-Interim and the satellite-based RSS_SSMI and TMI products:
- average (absolute) biases of < 0.4 kg/m2 and
- RMSD of ≤ 1.1 kg/m2
Thus, the monthly TCDR product meets the optimal KPI for bias and the target KPI for RMSD.
The decadal stability of HOAPS 4.0 is 0.00±0.008 kg/m2/decade, which fulfills the requirements for ‘optimal category’ (<0.08 kg/m2/decade) as described in Section 7 [D2].
Table 1: KPIs for the Water Vapour TCWV TCDR as defined by CMSAF (See Table 6-6 in [D2]).
Category | Bias [kg/m2] | cRMSD [kg/m2] | Stability (bias trend) [kg/m2/decade] |
Threshold | 3 | 5 | 0.4 |
Target | 1.4 | 2 | 0.2 |
Optimal | 1.0 | 1 | 0.08 |
The SSM/I and SSMIS 6-hourly daily composites fulfil the GCOS frequency requirement of 4-hourly observations when input data from different DMSP satellites are considered.
On the other hand, the spatial resolution of 50 km x 50 km does not fulfill the spatial resolution requirement set by GCOS (25 km), which is the only limitation of the data set.
3.2 TCDR+ICDR performance
The key performance for TCDR+ICDR is calculated in the same manner as for TCDR. The bias, RMSD and stability have been calculated for RSS-SSMI & ERA-5 that will be presented in next sections.
3.3 Comparison to RSS-SSMI
Figure 1: The time series of global mean differences of HOAPS 4.0 TCDR (CMSAF) & ICDR (C3S) minus RSS_SSMI for the period 1988-2020 (black line) with corresponding running mean of 5-months window (thick black line). The target requirements for the stability are plotted as blue dotted lines. The linear fit is shown as green line and the two-sided 95% interval of the TCDR are plotted as dashed red lines. The numerical values of the validation results are printed on the plot. The vertical grey dashed line marks the time point of the change from TCDR to ICDR. WVPA stands for Water Vapour Path, which is synonym for TCWV (see PUGS [D3] for details).
The comparison of global mean time series toward RSS-SSMI dataset is shown in Figure 1. It has been calculated by taking the monthly mean of HOAPS 4.0 TCDR (CMSAF) & ICDR (C3S) and monthly mean of RSS-SSMI for a given month (t1) to derive the bias. The steps to derive the bias are as follows:
\[ diff(\phi, \lambda, t_{1}) = TCWV(\phi, \lambda, t_{1})_{HOAPS} - TCWV(\phi, \lambda, t_{1})_{RSS} \]With \( diff(\phi, \lambda, t_{1}) \) as the 2-dimensional (latitudes \( (\phi) \) and longitudes \( (\lambda) \) ) difference maps for a given month. From spatial 2-dimensional difference maps, the zonal mean can be calculated along the longitudes \( (\lambda) \) , with \( N_{(\lambda)} \) as the total number of longitude bins:
\[ zonal(\phi, t_{1}) = \frac{\sum_{\lambda}diff(\phi, \lambda, t_{1})}{N_{(\lambda)}} \]The final monthly global mean of the differences for a given month (t1 ) is derived by calculating the weighted zonal mean, with cosine of latitudes as weights, along the latitudes \( \phi \) with \( N_{\phi} \) the total number of latitude bins:
\[ <global(t_{1})> = \frac{\sum_{\phi}zonal(\phi, t_{1}) \ast \cos(\phi)}{N_{\phi}} \]The global mean time series are then constructed from the ensemble from the global monthly means following:
\[ global(t):= \{<global(t_{1})>,<global(t_{2})>,...,<global(t_{N})>\}\ \]The bias is then defined as following, with N as the number of months:
\[ bias= \frac{\sum_{t}global(t)}{N} \]With corresponding sample standard deviation and spread or Inter Quartile (IQR):
\[ \sigma_{bias}= \sqrt{\frac{\sum_{t}(global(t)-bias)^2}{N-1}} \] \[ s:= spread= q_{0.75}-q_{0.25} \]The spread is the difference between upper and lower quartiles and is a more robust estimator in case the dataset has extreme values or outliers. We use the sample standard deviation here, since the difference between them is small. This monthly and global mean difference time series is shown in Figure 1 (thin black line). The bias and its corresponding standard deviation σbias are {-0.28,0.14} kg/m2, respectively.
Additional metric of variation is the root mean of square of the differences (RMSD). The RMSD is defined as following:
\[ RMSD = \sqrt{\frac{\sum_{t}(global_{t})^2}{N}} \]The RMSD is in the order of 0.31 kg/m2. The two parameters (bias=-0.28 & RMSD=0.31) fulfil the optimal requirements for bias and RMSD (Table 1). For zero bias, the standard deviation of bias converges to RMSD.
Figure 2: The time series of the residuals(t) of HOAPS 4.0 TCDR (CMSAF) & ICDR (C3S) minus RSS_SSMI for the period 1988-2020 (black stepped line) with corresponding autocorrelation function for different time lags as red line (small box in the lower left corner). The vertical grey dashed line presents the time point of the change from TCDR to ICDR.
A linear fit is then performed for this residuals curve to calculate the stability, which is the slope of the linear function (0.03 kg/m2/decade):
\[ y(t)=0.03t-0.33 \]In order to derive the uncertainty of stability, the standard deviation of the residuals is required, which is derived from the time-series of residuals (Figure 2):
\[ Residuals(t)=global(t)-y(t) \]Practically, it can be calculated according to the following approximation (Weatherhead et al. 1998):
\[ \sigma_{stability} \left[ \frac{kg/m^2}{month^{\frac{3}{2}}} \right] \approx \frac{\sigma_{residuals}}{N^{\frac{3}{2}}} \sqrt{\frac{1+\rho(1)}{1-\rho(1)}} \]With \( \sigma_{residuals} \) , N, and \( \rho(1) \) the uncertainty of residuals, number of months, and lag-1 autocorrelation, respectively. The lag-1 autocorrelation is in the order of = 0.85 and is derived from the generalized autocorrelation function \( \rho(k) \) (Figure 2 – small lower box):
\[ \rho(k) := Correlation(Residuals_{t}, Residuals_{t-k}) \]With all these ingredients, the uncertainty on stability has been estimated in the order of \( \sigma_{stability} \approx \) 0.007 kg/m2/decade from the \( \sigma_{residuals}= \) 0.14 kg/m2 (Figure 3).
Figure 3: The histogram of the residuals of HOAPS 4.0 TCDR (CMSAF) & ICDR (C3S) minus RSS_SSMI for the period 1988-2020 with corresponding values of mean residuals, standard deviation of the residuals, and the spread of the residuals (upper right corner).
Hence, the stability fulfils the optimal requirements with 100% probability coverage from Table 1 and is shown schematically in Figure 4 (grey dot).
Figure 4: Schematic plot of the stability performance of HOAPS 4.0 TCDR (CMSAF) & ICDR (C3S) minus RSS_SSMI with corresponding target requirements and probability coverage (upper left corner).
3.4 Comparison to ERA-5
The same method as in Chapter 3.3 has been used for comparison with the ERA-5 data set (1988-2019/06). The results are presented in the following Figures (Fig. 5 – 8).
Figure 5: The time series of global mean differences of HOAPS 4.0 TCDR (CMSAF) & ICDR (C3S) minus ERA-5 for the period 1988-2019/06 (black line) with corresponding running mean of 5-months window (thick black line). The target requirements for the stability are plotted as blue dotted lines. The linear fit is shown as green line and the two-sided 95% interval of the TCDR are plotted as dashed red lines. The numerical values of the validation result are printed on the plot. The grey dashed line marks the time point of the change from TCDR to ICDR.
Figure 6: The time series of the residuals(t) of HOAPS 4.0 TCDR (CMSAF) & ICDR (C3S) minus ERA-5 for the period 1988-2019/06 (black stepped line) with corresponding autocorrelation function for different time lags as red line (small box in the lower left corner). The vertical grey dashed line presents the time point of the change from TCDR to ICDR.
Figure 7: The histogram of the residuals of HOAPS 4.0 TCDR (CMSAF) & ICDR (C3S) minus ERA-5 for the period 1988-2019/06 with corresponding values of mean residuals, standard deviation of the residuals, and the spread of the residuals (upper right corner).
Figure 8: Schematic plot of the stability performance of HOAPS 4.0 TCDR (CMSAF) & ICDR (C3S) minus ERA-5 with corresponding target requirements and probability coverage (upper left corner).
The bias and its corresponding RMSD are {0.55,0.84} kg/m2 respectively. The two metrics fulfil the optimal requirements for bias and RMSD (Table 1).
The stability (0.057 kg/m2/decade) and its corresponding uncertainty has been estimated in the order of σstability≈ 0.021 kg/m2/decade with σresiduals=0.62 kg/m2 (Figure 7) and lag-1 autocorrelation ρ(1) = 0.63 respectively. The stability fulfils the optimal requirements with 86% probability coverage (Figure 8) and the target requirement with 100% probability coverage.
Table 2: The validation results of HOAPS 4.0 TCDR (CMSAF) & ICDR (C3S) for RSS_SSMI & ERA-5 dataset with their corresponding target requirements fulfillment as defined in Table 1. In brackets the numerical values of probability coverage of requirements are shown if these are lower than 100%.
Reference Dataset | Bias [kg/m2] | sbias [kg/m2] | RMSD [kg/m2] | Lag-1 Autocorrelation r(1) | Stability (via s) [kg/m2/decade] | Stability (via spread) [kg/m2/decade] |
RSS_SSMI: 1988-2020 | -0.28 Optimal | 0.14 | 0.31 Optimal | 0.85 | 0.033±0.007 Optimal | 0.033±0.010 Optimal |
ERA-5: 1988-2019/06 | 0.55 Optimal | 0.62 | 0.84 Optimal | 0.63 | 0.057±0.021 Optimal (86%) Target | 0.057±0.029 Optimal (78%) Target |
3.5 KPI test of ICDR vs.TCDR
In addition, a statistical test has been carried out in order to check whether the ICDR differences fulfill the null hypothesis of falling within the 95% confidence interval of TCDR differences (Figures Figure 9 - 10). For this reason, a two-sided test has been conducted to evaluate the 2.5 and 97.5 percentiles of the TCDR with the null hypothesis, that the number of ICDR failures are expected at α=5% significance level (Type I error). By counting the number of ICDR values falling outside this interval and the expectation of penalized counts allowed at α=5% significance level, we can conclude on rejecting or accepting the null hypothesis based on the Binomial test (Pbinomial > α=5%). The histogram of the probability distribution of TCDR values and ICDRs are presented in Figure 9 (RSS_SSMI) and Figure 10 (ERA-5).
The test shows that the number of ICDRs falling outside the 95% confidence interval (between P2.5% & P97.5%) are:
RSS_SSMI is 5 out of 71
ERA-5 is 3 out of 53
These results are lower than the critical numbers (see details in Table 3). The test gives cumulative probabilities in the order of 28% (RSS_SSMI) & 50% (ERA-5). Hence, both ICDRs fulfil the KPI requirement of Pbinomial > α=5% (significance level) and the null hypothesis is not inconsistent with the current observed data.
Figure 9: Histogram of the global mean differences distribution of TCDRs (blue) and ICDRs(orange) and corresponding test values (upper right corner) for the ICDR failures depending on the two-sided 95% confidence interval derived from the TCDR distribution. The lower threshold (P2.5%) and upper threshold (P97.5%) of the TCDR are plotted as vertical dashed lines with corresponding numerical values.
Figure 10: Histogram of global mean differences distribution of TCDRs (blue) and ICDRs (orange) and corresponding test values (upper right corner) for the ICDR failures depending on the two-sided 95% confidence interval derived from the TCDR distribution. The lower threshold (P2.5%) and upper threshold (P97.5%) of the TCDR are plotted as vertical dashed lines with corresponding numerical values.
Table 3: Results of the hypothesis test upon the number of ICDRs falling outside the 95% confidence interval of the TCDR.
Reference Dataset | TCDR lower threshold 2.5% [kg/m2] | TCDR upper threshold 97.5% [kg/m2] | Number of ICDRs | Critical number of failures at 5% rate | Observed K failures of ICDRs outside the 95% interval [2.5%,97.5%] | Cumulative probability P(N-K,N,95%) |
RSS_SSMI: 1988-2020 | -0.52 | 0.00085 | 71 | K > 7 | 5 | 28% Accept |
ERA-5: 1988-2019/06 | -0.6 | 1.7 | 53 | K > 5 | 3 | 50% Accept |
3.6 Summary & Conclusion
The target requirements set for the inter-comparison of bias & RMSD have been met at optimal category (in comparison to both datasets used for validation RSS_SSMI and ERA-5) (See Table 2 for details). The stability has been met at the optimal category (RSS_SSMI 100%, ERA-5 86%) and 100% of target category (ERA-5). We also used the spread of residuals, instead of sample standard deviation of the residuals. In this manner we checked, if the two different metrics have a significant impact on the estimation of the uncertainty estimates of stability. We conclude, that the different metrics does not change the final conclusion upon fulfilling the requirements (Table 2). The number of ICDR values outside the 95% interval of TCDRs is within the expected critical range at 5% significance level (Table 3).
The general picture of validation summarized here, shows an overall good performance of HOAPS 4.0 TCDR (CMSAF)+ICDR (C3S) (1988-2020) toward the reference sensors/datasets RSS_SSMI and ERA-5.
References
Weatherhead, E. C., Reinsel, G. C., Tiao, G. C., Meng, X.-L., Choi, D., Cheang, W.-K., Keller, T., DeLuisi, J., Wuebbles, D. J., Kerr, J. B., Miller, A. J., Oltmans, S. J., and Frederick, J. E.: Factors affecting the detection of trends: Statistical considerations and applications to environmental data, J. Geophys. Res., 103, 17149–17161, 1998.