Introduction
The GloFAS v4.0 hydrological model performance was evaluated in the model calibration context in GloFAS v4 calibration hydrological model performance, using only the 1995 stations involved in the calibration, verified using the full longterm run (produced within the calibration excercise) and the KGE and the three component scores.
On this page, the model performance is analysed over the final v4.0 reanalysis time series (https://cds.climate.copernicus.eu/cdsapp#!/dataset/cems-glofas-historical?tab=overview; which is not expected to be noticeably different to the one used in the calibration evaluation). In addition, all stations are considered here, which have at least 1 year of good enough quality observation data in the 1979-2021 period (while it was at least 4 year for the calibration), supplemented also with a separate station network without larger noticeable impact of reservoirs or lakes. In In total, 1987 2293 stations were considered for the general v4.0 verification and 1949 with all stations, 996 for the v4.0 vs v3.1 model comparison with stations used in calibration for both models and also a third set with 233 stations that were not used in either calibrations. Details on the station selection and other aspects of the verification, including the used metrics, are available on the verification methodology page (place holder GloFAS hydrological performance verification methodology).
General v4.0 performance
OBS availability
For this comparison, we used all stations with good quality river discharge observations and minimal human or lake influence that could be mapped (find the corresponding model river network location) onto the higher resolution v4.0 river network. In total 1987 stations could be considered as shown below with the available observation length (gaps are removed to compute the length).
Figure 1. Number of years of available river discharge observations in the 1979-2021 reanalysis period.
KGE
The generic GloFAS v4.0 model performance is measured by the modified Kling Gupta efficiency (KGE) in Figure 2. High skill (above 0.7) is shown over much of the higher latitude areas and also some southest Asian and central south American areas. The lowest KGE, including even some catchments with no skill at all (below -0.41), are spread across some tropical areas, often in central southern USA and Mexico and some areas in Africa, often in the drier climate.
Figure 2. KGE of the GloFAS v4 simulation.
Bias, variability and correlation
The KGE's component scores (Figure 3-4-5.) highlight that much of the lower KGE skill comes from the often high and mainly positive bias, and also larger variability errors. The bias ratio is over 1 for a lot of catchments in the tropical belt, which means the simulation average is more than double the observation average value (i.e. twice as high as it should be). On the other hand, the variability error tend to be negatively oriented and many tropical catchment sees too low variability in the simulations, often 1/3 less than in the observations (-0.33 to -0.5) or even at least 50% less than it should be according to the observations (darkest red).
...
Figure 5. Pearson correlation of the GloFAS v4 simulation.
Timing
The timing error shows quite a lot of areal variability (Figure 6). Some of this probably comes from the potentially short sample period, which makes the verification scores less robust. Also, some larger errors in large variability areas can come from the type of catchments which have lower quality simulation, combined with less clear signal distribution, i.e. no clear peak and trough structure, which can result in not little correlation change by shifting the simulation.
...
Figure 6. Timing error of the GloFAS v4 simulation.
General v4.0 vs v3.1 performance comparison
When comparing the v4 performance with the previous v3 model, we provide 3 flavours of the comparison, one which uses all possible stations, regardless of the lake and reservoir impact and two which includes only points that has maximum small reservoir or lake influence. One of these two is for the calibration comparison, i.e. with points used in both v4 and v3 calibration, while the other is with only points that were used in neither of the calibrations.
KGE
The new higher resolution v4 GloFAS outperforms the earlier v3 almost everywhere (Figure 7). Exceptions are mainly in eastern USA, Amazonia and western Europe. In other areas, apart form the odd catchments, v4 is better, or largely better. In many of the tropical catchments and also in central/southern North America the KGE improvement is larger than 0.5 over a very large area. The cumulative KGE distributions highlight that including all stations, the median improves from about 0.31 to 0.65, with +0.22 as the median of the KGE differences. Moreover, while about 25% of catchments in v3 had KGE below -1, in v4 this has decreased to only 7%.
...
Figure 7. KGE error difference maps between GloFAS v4 and v3 simulations (top row) and cumulative distributions of KGE for both v4 and v3. Using all all points (1st column), using only calibration points for both models without larger reservoir or lake influence (2nd column) and non-calibration points for both models without larger reservoir or lake influence (3rd column).
Bias
The bias, measured by the 0-centred version of the KGE's bias ratio component (bias), is very clearly largely contributing to the improved KGE by drastically reduced bias errors in v4 (Figure 8). The first row in Figure 8 shows the difference in absvar, the absolute value of bias, as the bias error magnitude difference between v4 and v3. The large impact of the bias is generally the same with all station versions, the full list (Figure 8, 1st column), the calibrated (Figure 8 2nd column) or non-calibrated station networks (Figure 8 3rd column). The geographical distribution of the errors is very similar to the KGE's picture in Figure 7, with the tropics in general showing very large bias improvement, often more than halving the bias ratio error of v3 by v4.
...
Figure 8. Abspbias error difference maps between GloFAS v4 and v3 simulations (top row) and cumulative distributions of bias for both v4 and v3 (bottom row). Using all all points (1st column), using only calibration points for both models without larger reservoir or lake influence (2nd column) and non-calibration points for both models without larger reservoir or lake influence (3rd column).
Variability
The variability, measured by the 0-centred version of the KGE's variability ratio component, shows a quite homogeneous geographical distribution globally (Figure 9, top row). Improvement by v4, i.e. negative var difference, is the overwhelming picture, other than for the non-calibrated stations, which seem more mixed. There is not really any emerging area with a clear cluster of better variability in v3 (i.e. blue dots). It is also clear, that the variability improvement is smaller than the bias improvement seen in Figure 8, there are much less dark red stations in Figure 9 than we had in Figure 8.
...
Figure 9. Absvar error difference maps between GloFAS v4 and v3 simulations (top row) and cumulative distributions of var for both v4 and v3 (bottom row). Using all all points (1st column), using only calibration points for both models without larger reservoir or lake influence (2nd column) and non-calibration points for both models without larger reservoir or lake influence (3rd column).
Correlation
The correlation shows a very mixed picture globally, with slightly more positive than negative catchments (Figure 10, top row). The most prominent area with a correlation improvemnt cluster is in central North-America. The mixed picture is similar for all three station selections (in the three columns).
The cumulative distributions confirms that v4 provides only marginal improvement over v3 in correlation. For the high correlations v3 seems to be even very slightly better, while v4 is noticeably better for low to medium correlations. For the calibrated stations this the difference is even less, while for the non-calibrated stations v3 actually seems to be better. It seems the up and downs of the simulations could not really be improved very noticeably by the v4 model.
Regarding the actual correlation values, the median changes from 0.748 to 0.759 in v4, with 0.000 as the median of the correlation differences for the all-station case, i.e. no change on average at all. For the calibration stations, the improvement is from 0.817 to 0.816 (so actually even very slight decrease), with -0.002 as the median of the correlation differences, while for the non-calibrated stations it is from 0.672 to 0.629, with -0.006 as the median of the correlation differences. These number also confirm that the correlation aspect of the river discharge simulation in v4 did improve only marginally when measured using all stations, while the calibration station comparison shows no change at all and the non-calibration comparison shows rather some small deterioration.
Figure 10. Correlation error difference maps between GloFAS v4 and v3 simulations (top row) and cumulative distributions of correlation for both v4 and v3 (bottom row). Using all all points (1st column), using only calibration points for both models without larger reservoir or lake influence (2nd column) and non-calibration points for both models without larger reservoir or lake influence (3rd column).