Data and methods for the MiKlip decadal forecasts 2019-2028

Data

For the analysis yearly means of near surface temperature are used. The data of the decadal climate prediction system (MiKlip-system) consists of predictions, which were started in the past to validate the system (hindcasts), and of predictions for the next ten years. The forecast system consists of an initialisation scheme, considering observations, and the global circulation model MPI-ESM (Müller et al., 2012; Pohlmann et al., 2013; Marotzke et al., 2016). The predictions were done with the 'PreopHR' configuration using MPI-ESM 1.2. The prediction data contain ten ensemble members, which were yearly initialised for the years 1960-2018. Each simulation is integrated for ten lead years. For a spatially higher resolved evaluation over Europe (13°W-30°E and 35°-75°N), data of the global model were dynamically scaled down by means of the regional climate model CCLM5 (Rockel et al., 2008; Mieruch et al., 2014).

With the data of the global model, the region of the North-Atlantic (NA) between 60°-10°W and 50°-65°N is additionally investigated, besides global evaluation. Therefore, HadCRUT4 (Morice et al., 2012) is used as observational data-set, which is available on a global 5°x5° grid from 1960-2018. For the year 2018 the anomalies from January to November are used, since December was not available at the date of publication. For the assessment of forecast skill of the regional model, comparison with observation data-set CRU TS 4.01 (Harris et al., 2014) from 1960-2017 is done. For a consistent validation, the model data of the prediction system is re-gridded to the same grid as observations (5°x5° and 0.5°x0.5°, respectively). On the one hand, data analysis is done for each grid point separately and on the other hand it is done for spatial averages of the considered regions, i.e. global mean and NA for data of the global model as well as average of the European region for the regional model.

Temperature anomalies, bias adjustment, and spatial and temporal averaging

Temperature anomalies with respect to the period 1981-2010 (WMO reference period) are calculated for both, predictions and observations. The systematic temperature difference between model and observation (bias) generally changes with lead time (model drift). This functional behaviour is estimated by means of the hindcasts, in order to adjust these deviations (Pasternack et al., 2018). At the same time, it is assumed that the functional behaviour of the bias changes with initialisation time. The adjustment is trained for the period 1960-2017 and is applied to hindcasts and the forecast. The method of Pasternack et al. (2018) adjusts mean bias and additionally the conditional bias and the ensemble spread. The latter is done to ensure that the forecast uncertainty is represented by the ensemble spread.

The adjusted temperature anomalies are analysed for four year running means for the decadal forecast. Thus, predictions are made for the lead years 1-4, 2-5, 3-6, …, 7-10. For the yearly forecast, only the mean of the first lead year is evaluated.

For the calculation of the spatial averages for the regions (global mean, NA, Europe), both, observations as well as model simulations use the same spatial mask, which only takes grid cells into account, which have enough observational data during the evaluation period.

Validation and prediction skill

Validation of prediction skill is done with hindcasts of the MiKlip system, which were produced for the past. The maximum time period, which can be used for validation for every lead time period (year 1-4 to year 7-10) is 1967-2017. For skill assessment, hindcasts are compared with observations. Assessment cannot be done for grid points without existing observations for the validation period (missing values). These grid points are grayed on the map. The skill of the decadal prediction is compared with a reference forecast. The difference of these forecast skills, i.e. the improvement of the decadal forecast in comparison to the reference forecast, is called skill score [%]. If the skill of the decadal forecast system and the skill of the reference forecast is identical, the skill score has a value of 0%. For a perfect decadal prediction, the skill score is 100%. Reference forecasts are the climatology of the observations for the period 1981-2010 and the uninitialised historical climate projection, which differs from the decadal prediction system only in the non-existing initialistion scheme. Bootstrapping is used for testing whether the skill improvement in comparison to the reference forecast is random (significance test). Therefore, random years from the validation period are 1000 times sampled with replacement and also validated. The significance level is 95%.

Ensemble mean forecast

An ensemble average is calculated out of the ensemble members, which is used for forecast and validation. For the spatial averages, the 10. and 90. percentiles of the ensemble distribution are also shown beside the ensemble mean. To validate the prediction skill of the ensemble average, the skill score of the mean square error between hindcast and observation is used (MSESS) (Goddard et al., 2013; Illing et al., 2013; Kadow et al., 2014). The MSESS assesses whether the decadal prediction is able to better reproduce observations than the reference forecast of climatology (Fig. 1) and the uninitialised historical climate projection (Fig. 2).

Figure 1: MSESS of the decadal forecast (ensemble mean of near-surface temperature) for lead years 1-4: positive/negative values describe skill improvement/decline of the decadal prediction in comparison to the reference forecast of the climatology, both compared to observations HadCRUT4.
Figure 2: MSESS of the decadal forecast (ensemble mean of near-surface temperature) for lead years 1-4: positive/negative values describe skill improvement/decline of the decadal prediction in comparison to the reference forecast of the uninitialised historical climate projections, both compared to observations HadCRUT4.

Probabilistic forecast

For the probabilistic forecast the period between 1981-2010 is split into three equivalent frequency ranges of temperature (temperature below normal, normal and higher than normal). Based on the distribution of the ensemble simulations, a forecast probability for each category and lead year period (year 1-4, … , year 7-10) is calculated. Due to the small number of ensemble members, the probability calculation is done with a Dirichlet multinomial model with flat Dirichlet prior (Agresti and Hitchcock; 2005).

The validation of the decadal forecast compared to observations is done with the ranked probability skill score (RPSS) (Ferro 2007; Ferro et al., 2008), which asses the prediction of the concerning ranks. The RPSS compares whether the decadal prediction system is able to better reproduce the observations than the reference forecast of climatology (Fig. 3) and the uninitialised historical climate projection (Fig. 4).

Figure 3: RPSS of the decadal forecast of near-surface temperature for lead years 1-4: positive/negative values describe skill improvement/decline of the decadal prediction in comparison to the reference forecast of the climatology, both compared to observations HadCRUT4.
Figure 4: RPSS of the decadal forecast of near-surface temperature for lead years 1-4: positive/negative values describe skill improvement/decline of the decadal prediction in comparison to the reference forecast of the uninitialised historical climate projections, both compared to observations HadCRUT4.

References