A recently developed precipitation forecast bias correction tool has the potential to improve river discharge forecasts produced by the European Flood Awareness System (EFAS), first tests at ECMWF have shown. ECMWF is the computational centre for EFAS, which is part of the EU-funded Copernicus Emergency Management Service (CEMS).
Bias-correction method
Observed near-real-time and predicted meteorological forcings, such as precipitation forecasts, are key elements of the EFAS forecast production chain. ECMWF has recently tested the quantile mapping method to bias-correct the precipitation forecasts used in EFAS. The method was applied to ECMWF medium-range ensemble forecasts using the bias correction tool developed at the US National Oceanic and Atmospheric Administration (NOAA) with support from Tom Hamill, who led the NOAA work (https://github.com/ThomasMoreHamill/Multi-model_PQPF). The tests cover the period of June to November 2016. To apply quantile mapping, one generates forecast and observed cumulative distribution functions (CDFs) from available forecast and analysed data for the grid point of interest. For the precipitation forecast value at a certain grid point, the associated quantile is determined from the forecast CDF. The forecast value is then replaced with the analysed value associated with that same quantile. In a second step, weighted probabilities can be generated for the ensemble forecast distribution based on closest-member histograms.
Hydrological forecast improvements
The hydrological experiments carried out at ECMWF were initialised at 06:00 UTC, using the analysis as initial conditions and proxy observations. Three experiments for summer and autumn 2016 were conducted to evaluate the possible gain in skill when bias-correcting the precipitation forecast forcing for the LISFLOOD hydrological model used in EFAS. The first is a benchmark simulation which uses daily ECMWF ensemble forecasts (ENS) of precipitation downscaled to the EFAS 5 km grid, i.e. an identical setup to the one used in EFAS operationally. In the second and third experiments, the precipitation forecast was statistically bias-corrected with (a) quantile mapping (qm) and (b) quantile mapping combined with an objective weighting of the sorted ensemble members (weighted qm). Generally, compared with the benchmark forecast, the bias-corrected 1- to 10-day probabilistic 24-hour river discharge forecasts show higher skill as measured against the river discharge analyses. As an example, Figure 1 shows the results for a forecast for the Hortobagy station on the Sebes Koros river in Hungary. Here the weighted qm bias correction forecast outperforms both the benchmark and the qm forecast.
We have used the Continuous Ranked Probability Score (CRPS) and the CRP Skill Score (CRPSS) to evaluate the overall quality of the bias-corrected forecasts. Over the whole European domain, the highest skill is achieved in the weighted qm experiment, while the benchmark is found to have the lowest skill at all lead times (up to 10 days), as measured by the median CRPS score. In the weighted qm experiment, higher CRPSS values than in the benchmark were achieved across Europe for lead times of up to 3 days. Up to 7 days ahead, weighted qm remains better than the benchmark with the exception of some small areas in the domain. For short lead times, absolute positive differences in bias-corrected vs benchmark CRPSS values for individual catchments are larger for medium-sized catchments (1,000–5,000 km2) than for larger catchments (> 5,000 km2). This is as expected due to the faster response times and greater sensitivity of smaller catchments. Similar conclusions are reached with the Brier score and reliability diagrams: in all cases, the weighted qm experiment outperforms the benchmark experiment.
Outlook
The work presented here illustrates how ECMWF data can be used in the context of specific applications. However, the findings need to be interpreted with caution as the tests were conducted for a relatively short period of time for hydrological verification (six months), during a relatively dry period of the year within Europe (summer, autumn). A much longer experiment will have to be conducted, ideally covering several years, to make a more robust assessment. Looking at catchment properties to understand the difference in performance gain and investigating spatial patterns of skill are areas of further research worth exploring, along with the use of alternative skill assessment metrics.