For the analysis, yearly means of near surface temperature are used. The data of the decadal climate prediction system (MiKlip-system) consists of predictions, which were started in the past to validate the system (hindcasts), and of predictions for the next ten years. The forecast system consists of an initialisation scheme, considering observations, and the global circulation model MPI-ESM (Müller et al., 2012; Pohlmann et al., 2013; Marotzke et al., 2016). The predictions were done with the 'PreopLR' configuration using MPI-ESM 1.2. The prediction data contain ten ensemble members, which were yearly initialised for the years 1960-2017. Each simulation is integrated for ten lead years. For a spatially higher resolved evaluation over Europe, data of the global model in the 'baseline1' configuration (MPI-ESM 1.0) were dynamically downscaled by means of the regional climate model CCLM (Rockel et al., 2008; Mieruch et al., 2014).
With the data of the global model, the region of the North-Atlantic (NA) between 60°-10°W and 50°-65°N is additionally investigated, besides global evaluation. Therefore, HadCRUT4 (Morice et al., 2012) is used as observational dataset, which is available on a global 5°x5° grid. For the assessment of forecast skill of the regional model, a comparison with observation dataset CRU TS 4.01 (Harris et al., 2014) is done. For a consistent validation, the model data of the prediction system is regridded to the same grid as observations (5°x5° and 0.5°x0.5°, respectively). On the one hand data analysis is done for each grid point separately and on the other hand done for spatial averages of the considered regions, i.e. global mean and NA for data of the global model as well as average of the European region for the regional model.
Temperature anomalies and temporal averaging
Temperature anomalies with respect to the period 1981-2010 (WMO reference period) are calculated for both, predictions and observations. The systematic temperature difference between model and observation (bias) generally changes with lead time (model drift). This functional behaviour is estimated by means of the hindcasts, in order to adjust these deviations (Pasternack et al., 2017). At the same time, it is assumed that the functional behavior of the bias changes with initialisation time from 1960-2017. The method of Pasternack et al. (2017) adjusts mean bias and additionally the conditional bias and the ensemble spread. The latter is done to ensure that the forecast uncertainty is represented by the ensemble spread.
The adjusted temperature anomalies are analysed for 4-year running means for the decadal forecast. Thus, predictions are made for the lead years 1-4, 2-5, 3-6, …, 7-10. For the yearly forecast, only the mean of the first lead year is evaluated.
Validation and prediction skill
The validation of the prediction skill is done with hindcasts of the MiKlip system, which were produced for the past. The maximum time period, which can be used for validation for every lead time period (year 1-4 to year 7-10) is 1967-2016. For the skill assessment, hindcasts are compared with observations. An assessment cannot be done for grid points without existing observations for the validation period (missing values). These grid points are grayed on the map. The skill of the decadal prediction is compared with a reference forecast. The difference of these forecast skills, i.e. the improvement of the decadal forecast in comparison to the reference forecast, is called skill score [%]. If the skill of the decadal forecast system and the skill of the reference forecast is identical, the skill score has a value of 0%. For a perfect decadal prediction, the skill score is 100%. Reference forecasts are the climatology of the observations for the period 1981-2010 and the uninitialised historical climate projection, which differs from the decadal prediction system only in the non-existing initialisation scheme. Bootstrapping is used for testing whether the skill improvement in comparison to the reference forecast is random (significance test). Therefore, random years from the validation period are 1000 times sampled with replacement and also validated. The significance level is 95%.
Ensemble mean forecast
An ensemble average is calculated out of the ensemble members, which is used for forecast and validation. For the spatial averages, the 10. and 90. percentiles of the ensemble distribution are also shown beside the ensemble mean. To validate the prediction skill of the ensemble average, the skill score of the mean square error between hindcast and observation is used (MSESS) (Goddard et al., 2013; Illing et al., 2013; Kadow et al., 2014). The MSESS assesses whether the decadal prediction is able to better reproduce observations than the reference forecast of climatology (Fig. 1) and the uninitialised historical climate projection (Fig. 2).
For the probabilistic forecast the period between 1981-2010 is split into three equivalent frequency ranges of temperature (temperature below normal, normal and higher than normal). Based on the distribution of the ensemble simulations, a forecast probability for each category and lead year period (year 1-4, … , year 7-10) is calculated. Due to the small number of ensemble members, the probability calculation is done with a Dirichlet multinomial model with flat Dirichlet prior (Agresti and Hitchcock; 2005).
The validation of the decadal forecast compared to observations is done with the ranked probability skill score (RPSS) (Ferro 2007; Ferro et al., 2008), which assesses the prediction of the concerning ranks. The RPSS compares whether the decadal prediction system is able to better reproduce the observations than the reference forecast of climatology (Fig. 3) and the uninitialised historical climate projection (Fig. 4).
Agresti, Alan, and David B. Hitchcock, 2005: Bayesian inference for categorical data analysis." Statistical Methods and Applications, 14.3, 297-330.
Boer, G. J.; Smith, D. M.; Cassou, C.; Doblas-Reyes, F.; Danabasoglu, G.; Kirtman, B.; Kushnir, Y.; Kimoto, M.; Meehl, G. A.; Msadek, R.; Mueller, W. A.; Taylor, K. E.; Zwiers, F.; Rixen, M.; Ruprich-Robert, Y. & Eade, R., 2016: The Decadal Climate Prediction Project (DCPP) contribution to CMIP6, Geoscientific Model Development, 9, 3751-3777, 10.5194/gmd-9-3751-2016
Ferro, C.A.T., 2007: Comparing Probabilistic forecasting systems with the brier score. – Wea. Forecast. 22(5), 1076–1088, DOI: 10.1175/WAF1034.1.
Ferro, C.A.T., D.S. Richardson, A.P. Weigel, 2008: On the effect of ensemble size on the discrete and continuous ranked probability scores. – Meteor. Appl. 15, 19–24, DOI:10.1002/met.45.
Goddard, L.; Kumar, A.; Solomon, A.; Smith, D.; Boer, G.; Gonzalez, P.; Kharin, V.; Merryfield, W.; Deser, C.; Mason, S.; Kirtman, B.; Msadek, R.; Sutton, R.; Hawkins, E.; Fricker, T.; Hegerl, G.; Ferro, C.; Stephenson, D.; Meehl, G.; Stockdale, T.; Burgman, R.; Greene, A.; Kushnir, Y.; Newman, M.; Carton, J.; Fukumori, I. & Delworth, T., 2013: A verification framework for interannual-to-decadal predictions experiments, Climate Dynamics, Springer-Verlag,, 40, 245-272, 10.1007/s00382-012-1481-2
Harris, I., Jones, P.D., Osborn, T.J. and Lister, D.H., 2014: Updated high-resolution grids of monthly climatic observations – the CRU TS3.10 Dataset. Int. J. Climatol., 34: 623–642. doi: 10.1002/joc.3711
Illing, S.; Kadow, C.; Kunst, O.; & Cubasch, U., 2014. MurCSS: A Tool for Standardized Evaluation of Decadal Hindcast Systems. Journal of Open Research Software. 2(1), p.e24., doi:10.5334/jors.bf
Kadow, C.; Illing, S.; Kunst, O.; Rust, H. W.; Pohlmann, H.; Müller, W. A. & Cubasch, U., 2015: Evaluation of forecasts by accuracy and spread in the MiKlip decadal climate prediction system, Meteorologische Zeitschrift, Schweizerbart Science Publishers, 10.1127/metz/2015/0639
Marotzke, J.; Müller, W. A.; Vamborg, F. S. E.; Becker, P.; Cubasch, U.; Feldmann, H.; Kaspar, F.; Kottmeier, C.; Marini, C.; Polkova, I.; Prömmel, K.; Rust, H. W.; Stammer, D.; Ulbrich, U.; Kadow, C.; Köhl, A.; Kröger, J.; Kruschke, T.; Pinto, J. G.; Pohlmann, H.; Reyers, M.; Schröder, M.; Sienz, F.; Timmreck, C. & Ziese, M., 2016: MiKlip - a National Research Project on Decadal Climate Prediction, Bulletin of the American Meteorological Society, 10.1175/BAMS-D-15-00184.1
Mieruch, S., Feldmann, H., Schädler, G., Lenz, C.-J., Kothe, S., and Kottmeier, C., 2014: The regional MiKlip decadal forecast ensemble for Europe: the added value of downscaling, Geosci. Model Dev., 7, 2983-2999, doi:10.5194/gmd-7-2983-2014
Morice, C. P., J. J. Kennedy, N. A. Rayner, and P. D. Jones, 2012: Quantifying uncertainties in global and regional temperature change using an ensemble of observational estimates: The HadCRUT4 dataset, J. Geophys. Res., 117, D08101, doi:10.1029/2011JD017187
Müller, W. A., J. Baehr, H. Haak, J. H. Jungclaus, J. Kröger, D. Matei, D.
Notz, H. Pohlmann, J.-S. von Storch, and J. Marotzke, 2012: Forecast skill
of multi-year seasonal means in the decadal prediction system of the Max
Planck Institute for Meteorology. Geophys. Res. Lett., doi:10.1029/2012GL053326 .
Pasternack, A., Bhend, J., Liniger, M. A., Rust, H. W., Müller, W. A., and Ulbrich, U., 2017. Parametric Decadal Climate Forecast Recalibration (DeFoReSt 1.0), Geosci. Model Dev. Discuss., doi.org/10.5194/gmd-2017-162, in review (accepted for the journal Geoscientific Model Development (GMD)).
Pohlmann, H., W. A. Müller, K. Kulkarni, M. Kameswarrao, D. Matei, F. S. E. Vamborg, C. Kadow, S. Illing, J. Marotzke, 2013: Improved forecast skill in the tropics in the new MiKlip decadal climate predictions. Geophys. Res. Lett., 40, 5798-5802, doi:10.1002/2013GL058".
Rockel, B., Will, A., and A. Hense, 2008: The Regional Climate Model COSMO-CLM (CCLM), Meteorol. Z., 17, 347- 348, doi:10.1127/0941-2948/2008/0309