Articles | Volume 11, issue 1
https://doi.org/10.5194/ascmo-11-23-2025
© Author(s) 2025. This work is distributed under
the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
https://doi.org/10.5194/ascmo-11-23-2025
© Author(s) 2025. This work is distributed under
the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Proper scoring rules for multivariate probabilistic forecasts based on aggregation and transformation
Université Marie et Louis Pasteur, CNRS, LmB (UMR 6623), 25000 Besançon, France
Clément Dombry
Université Marie et Louis Pasteur, CNRS, LmB (UMR 6623), 25000 Besançon, France
Philippe Naveau
Laboratoire des Sciences du Climat et de l'Environnement, UMR 8212, CEA-CNRS-UVSQ, EstimR, IPSL & U Paris-Saclay, Gif-sur-Yvette, France
Maxime Taillardat
CNRM, Université de Toulouse, Météo-France, CNRS, Toulouse, France
Related authors
No articles found.
Yoann Robin, Mathieu Vrac, Aurélien Ribes, Occitane Barbaux, and Philippe Naveau
EGUsphere, https://doi.org/10.5194/egusphere-2025-1121, https://doi.org/10.5194/egusphere-2025-1121, 2025
Short summary
Short summary
We describe an improved method and the associated free licensed package ANKIALE (ANalysis of Klimate with bayesian Inference: AppLication to extreme Events) for estimating the statistics of temperature extremes. This method uses climate model simulations (including multiple scenarios simultaneously) to provide a prior of the real-world changes, constrained by the observations. The method and the tool are illustrated via an application to temperature over Europe until 2100, for four scenarios.
Cedric Gacial Ngoungue Langue, Helene Brogniez, and Philippe Naveau
EGUsphere, https://doi.org/10.5194/egusphere-2024-3481, https://doi.org/10.5194/egusphere-2024-3481, 2025
This preprint is open for discussion and under review for Atmospheric Measurement Techniques (AMT).
Short summary
Short summary
This work evaluates the representation of total column water vapor and total cloud cover in General Circulation Models, ERA5 reanalysis and satellite data records from the European Space Agency Climate Change Initiative. A new technique, called "multiresolution analysis," is applied to this evaluation, which enables an analysis of model behavior across different temporal frequencies, from daily to decadal scales, including subseasonal and seasonal variations.
Pauline Rivoire, Olivia Martius, Philippe Naveau, and Alexandre Tuel
Nat. Hazards Earth Syst. Sci., 23, 2857–2871, https://doi.org/10.5194/nhess-23-2857-2023, https://doi.org/10.5194/nhess-23-2857-2023, 2023
Short summary
Short summary
Heavy precipitation can lead to floods and landslides, resulting in widespread damage and significant casualties. Some of its impacts can be mitigated if reliable forecasts and warnings are available. In this article, we assess the capacity of the precipitation forecast provided by ECMWF to predict heavy precipitation events on a subseasonal-to-seasonal (S2S) timescale over Europe. We find that the forecast skill of such events is generally higher in winter than in summer.
Jonathan Demaeyer, Jonas Bhend, Sebastian Lerch, Cristina Primo, Bert Van Schaeybroeck, Aitor Atencia, Zied Ben Bouallègue, Jieyu Chen, Markus Dabernig, Gavin Evans, Jana Faganeli Pucer, Ben Hooper, Nina Horat, David Jobst, Janko Merše, Peter Mlakar, Annette Möller, Olivier Mestre, Maxime Taillardat, and Stéphane Vannitsem
Earth Syst. Sci. Data, 15, 2635–2653, https://doi.org/10.5194/essd-15-2635-2023, https://doi.org/10.5194/essd-15-2635-2023, 2023
Short summary
Short summary
A benchmark dataset is proposed to compare different statistical postprocessing methods used in forecasting centers to properly calibrate ensemble weather forecasts. This dataset is based on ensemble forecasts covering a portion of central Europe and includes the corresponding observations. Examples on how to download and use the data are provided, a set of evaluation methods is proposed, and a first benchmark of several methods for the correction of 2 m temperature forecasts is performed.
Manuela Irene Brunner and Philippe Naveau
Hydrol. Earth Syst. Sci., 27, 673–687, https://doi.org/10.5194/hess-27-673-2023, https://doi.org/10.5194/hess-27-673-2023, 2023
Short summary
Short summary
Reservoir regulation affects various streamflow characteristics. Still, information on when water is stored in and released from reservoirs is hardly available. We develop a statistical model to reconstruct reservoir operation signals from observed streamflow time series. By applying this approach to 74 catchments in the Alps, we find that reservoir management varies by catchment elevation and that seasonal redistribution from summer to winter is strongest in high-elevation catchments.
Antoine Grisart, Mathieu Casado, Vasileios Gkinis, Bo Vinther, Philippe Naveau, Mathieu Vrac, Thomas Laepple, Bénédicte Minster, Frederic Prié, Barbara Stenni, Elise Fourré, Hans Christian Steen-Larsen, Jean Jouzel, Martin Werner, Katy Pol, Valérie Masson-Delmotte, Maria Hoerhold, Trevor Popp, and Amaelle Landais
Clim. Past, 18, 2289–2301, https://doi.org/10.5194/cp-18-2289-2022, https://doi.org/10.5194/cp-18-2289-2022, 2022
Short summary
Short summary
This paper presents a compilation of high-resolution (11 cm) water isotopic records, including published and new measurements, for the last 800 000 years from the EPICA Dome C ice core, Antarctica. Using this new combined water isotopes (δ18O and δD) dataset, we study the variability and possible influence of diffusion at the multi-decadal to multi-centennial scale. We observe a stronger variability at the onset of the interglacial interval corresponding to a warm period.
Julie Bessac and Philippe Naveau
Adv. Stat. Clim. Meteorol. Oceanogr., 7, 53–71, https://doi.org/10.5194/ascmo-7-53-2021, https://doi.org/10.5194/ascmo-7-53-2021, 2021
Short summary
Short summary
We propose a new forecast evaluation scheme in the context of models that incorporate errors of the verification data. We rely on existing scoring rules and incorporate uncertainty and error of the verification data through a hidden variable and the conditional expectation of scores. By considering scores to be random variables, one can access the entire range of their distribution and illustrate that the commonly used mean score can be a misleading representative of the distribution.
Guillaume Evin, Matthieu Lafaysse, Maxime Taillardat, and Michaël Zamo
Nonlin. Processes Geophys., 28, 467–480, https://doi.org/10.5194/npg-28-467-2021, https://doi.org/10.5194/npg-28-467-2021, 2021
Short summary
Short summary
Forecasting the height of new snow is essential for avalanche hazard surveys, road and ski resort management, tourism attractiveness, etc. Météo-France operates a probabilistic forecasting system using a numerical weather prediction system and a snowpack model. It provides better forecasts than direct diagnostics but exhibits significant biases. Post-processing methods can be applied to provide automatic forecasting products from this system.
Jakob Zscheischler, Philippe Naveau, Olivia Martius, Sebastian Engelke, and Christoph C. Raible
Earth Syst. Dynam., 12, 1–16, https://doi.org/10.5194/esd-12-1-2021, https://doi.org/10.5194/esd-12-1-2021, 2021
Short summary
Short summary
Compound extremes such as heavy precipitation and extreme winds can lead to large damage. To date it is unclear how well climate models represent such compound extremes. Here we present a new measure to assess differences in the dependence structure of bivariate extremes. This measure is applied to assess differences in the dependence of compound precipitation and wind extremes between three model simulations and one reanalysis dataset in a domain in central Europe.
Stephan Hemri, Sebastian Lerch, Maxime Taillardat, Stéphane Vannitsem, and Daniel S. Wilks
Nonlin. Processes Geophys., 27, 519–521, https://doi.org/10.5194/npg-27-519-2020, https://doi.org/10.5194/npg-27-519-2020, 2020
Cited articles
Agnolucci, P., Rapti, C., Alexander, P., De Lipsis, V., Holland, R. A., Eigenbrod, F., and Ekins, P.: Impacts of rising temperatures and farm management practices on global yields of 18 crops, Nature Food, 1, 562–571, https://doi.org/10.1038/s43016-020-00148-x, 2020. a
Al Masry, Z., Pic, R., Dombry, C., and Devalland, C.: A new methodology to predict the oncotype scores based on clinico-pathological data with similar tumor profiles, Breast Cancer Res. Tr., https://doi.org/10.1007/s10549-023-07141-5, 2023. a
Alexander, C., Coulon, M., Han, Y., and Meng, X.: Evaluating the discrimination ability of proper multi-variate scoring rules, Ann. Oper. Res., 334, 857–883, https://doi.org/10.1007/s10479-022-04611-9, 2022. a
Allen, S.: sallen12/MultivCalibration: MultivCalibration v.1.0 (v.1.0), Zenodo [code, data set], https://doi.org/10.5281/zenodo.10201289, 2023. a
Allen, S., Bhend, J., Martius, O., and Ziegel, J.: Weighted Verification Tools to Evaluate Univariate and Multivariate Probabilistic Forecasts for High-Impact Weather Events, Weather Forecast., 38, 499–516, https://doi.org/10.1175/waf-d-22-0161.1, 2023a. a, b, c
Allen, S., Ginsbourger, D., and Ziegel, J.: Evaluating Forecasts for High-Impact Events Using Transformed Kernel Scores, SIAM/ASA Journal on Uncertainty Quantification, 11, 906–940, https://doi.org/10.1137/22m1532184, 2023b. a, b, c
Anderson, J. L.: A Method for Producing and Evaluating Probabilistic Forecasts from Ensemble Model Integrations, J. Climate, 9, 1518–1530, https://doi.org/10.1175/1520-0442(1996)009<1518:amfpae>2.0.co;2, 1996. a
Basse-O'Connor, A., Pilipauskaitė, V., and Podolskij, M.: Power variations for fractional type infinitely divisible random fields, Electron. J. Probab., 26, 1–35, https://doi.org/10.1214/21-EJP617, 2021. a
Ben Bouallègue, Z., Clare, M. C. A., Magnusson, L., Gascón, E., Maier-Gerber, M., Janoušek, M., Rodwell, M., Pinault, F., Dramsch, J. S., Lang, S. T. K., Raoult, B., Rabier, F., Chevallier, M., Sandu, I., Dueben, P., Chantry, M., and Pappenberger, F.: The Rise of Data-Driven Weather Forecasting: A First Statistical Assessment of Machine Learning–Based Weather Forecasts in an Operational-Like Context, B. Am. Meteorol. Soc., 105, E864–E883, https://doi.org/10.1175/bams-d-23-0162.1, 2024a. a
Ben Bouallègue, Z., Weyn, J. A., Clare, M. C. A., Dramsch, J., Dueben, P., and Chantry, M.: Improving Medium-Range Ensemble Weather Forecasts with Hierarchical Ensemble Transformers, Artificial Intelligence for the Earth Systems, 3, e230027, https://doi.org/10.1175/aies-d-23-0027.1, 2024b. a, b
Benassi, A., Cohen, S., and Istas, J.: On roughness indices for fractional fields, Bernoulli, 10, 357–373, https://doi.org/10.3150/bj/1082380223, 2004. a
Berlinet, A. and Thomas-Agnan, C.: Reproducing kernel Hilbert spaces in probability and statistics, with a preface by Persi Diaconis, Kluwer Academic Publishers, Boston, MA, ISBN 1-4020-7679-7, https://doi.org/10.1007/978-1-4419-9096-9, 2004. a
Bi, K., Xie, L., Zhang, H., Chen, X., Gu, X., and Tian, Q.: Accurate medium-range global weather forecasting with 3D neural networks, Nature, 619, 533–538, https://doi.org/10.1038/s41586-023-06185-3, 2023. a
Bjerregård, M. B., Møller, J. K., and Madsen, H.: An introduction to multivariate probabilistic forecast evaluation, Energy and AI, 4, 100058, https://doi.org/10.1016/j.egyai.2021.100058, 2021. a, b
Bolin, D. and Wallin, J.: Local scale invariance and robustness of proper scoring rules, Stat. Science, 38, 140–159, https://doi.org/10.1214/22-sts864, 2023. a
Bosse, N. I., Abbott, S., Cori, A., van Leeuwen, E., Bracher, J., and Funk, S.: Scoring epidemiological forecasts on transformed scales, PLOS Comput. Biol., 19, e1011393, https://doi.org/10.1371/journal.pcbi.1011393, 2023. a
Brehmer, J.: Elicitability and its Application in Risk Management, arXiv [thesis], https://doi.org/10.48550/ARXIV.1707.09604, 2017. a
Brehmer, J. R. and Strokorb, K.: Why scoring functions cannot assess tail properties, Electronic Journal of Statistics, 13, https://doi.org/10.1214/19-ejs1622, 2019. a
Bremnes, J. B.: Ensemble Postprocessing Using Quantile Function Regression Based on Neural Networks and Bernstein Polynomials, Mon. Weather Rev., 148, 403–414, https://doi.org/10.1175/mwr-d-19-0227.1, 2019. a
Brier, G. W.: Verification of Forecasts Expressed in Terms of Probability, Mon. Weather Rev., 78, 1–3, https://doi.org/10.1175/1520-0493(1950)078<0001:vofeit>2.0.co;2, 1950. a
Bröcker, J.: Reliability, sufficiency, and the decomposition of proper scores, Q. J. Roy. Meteor. Soc., 135, 1512–1519, https://doi.org/10.1002/qj.456, 2009. a
Bröcker, J. and Ben Bouallègue, Z.: Stratified rank histograms for ensemble forecast verification under serial dependence, Q. J. Roy. Meteor. Soc., 146, 1976–1990, https://doi.org/10.1002/qj.3778, 2020. a
Bröcker, J. and Smith, L. A.: Scoring Probabilistic Forecasts: The Importance of Being Proper, Weather Forecast., 22, 382–388, https://doi.org/10.1175/waf966.1, 2007. a
Buschow, S.: Measuring Displacement Errors with Complex Wavelets, Weather Forecast., 37, 953–970, https://doi.org/10.1175/waf-d-21-0180.1, 2022. a
Buschow, S. and Friederichs, P.: Using wavelets to verify the scale structure of precipitation forecasts, Adv. Stat. Clim. Meteorol. Oceanogr., 6, 13–30, https://doi.org/10.5194/ascmo-6-13-2020, 2020. a
Buschow, S. and Friederichs, P.: SAD: Verifying the scale, anisotropy and direction of precipitation forecasts, Q. J. Roy. Meteor. Soc., 147, 1150–1169, https://doi.org/10.1002/qj.3964, 2021. a
Casati, B., Dorninger, M., Coelho, C. A. S., Ebert, E. E., Marsigli, C., Mittermaier, M. P., and Gilleland, E.: The 2020 International Verification Methods Workshop Online: Major Outcomes and Way Forward, B. Am. Meteorol. Soc., 103, E899–E910, https://doi.org/10.1175/bams-d-21-0126.1, 2022. a
Chapman, W. E., Delle Monache, L., Alessandrini, S., Subramanian, A. C., Ralph, F. M., Xie, S.-P., Lerch, S., and Hayatbini, N.: Probabilistic Predictions from Deterministic Atmospheric River Forecasts with Deep Learning, Mon. Weather Rev., 150, 215–234, https://doi.org/10.1175/mwr-d-21-0106.1, 2022. a
Chen, K., Han, T., Gong, J., Bai, L., Ling, F., Luo, J.-J., Chen, X., Ma, L., Zhang, T., Su, R., Ci, Y., Li, B., Yang, X., and Ouyang, W.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead, arXiv [preprint], https://doi.org/10.48550/ARXIV.2304.02948, 2023. a
Christensen, H. M., Moroz, I. M., and Palmer, T. N.: Evaluation of ensemble forecast uncertainty using a new proper score: Application to medium‐range and seasonal forecasts, Q. J. Roy. Meteor. Soc., 141, 538–549, https://doi.org/10.1002/qj.2375, 2014. a, b, c
Clark, M., Gangopadhyay, S., Hay, L., Rajagopalan, B., and Wilby, R.: The Schaake Shuffle: A Method for Reconstructing Space–Time Variability in Forecasted Precipitation and Temperature Fields, J. Hydrometeorol., 5, 243–262, https://doi.org/10.1175/1525-7541(2004)005<0243:tssamf>2.0.co;2, 2004. a
Dawid, A. P.: Present Position and Potential Developments: Some Personal Views: Statistical Theory: The Prequential Approach, J. R. Stat. Soc. Ser. A-G., 147, 278, https://doi.org/10.2307/2981683, 1984. a
Dawid, A. P. and Musio, M.: Theory and applications of proper scoring rules, METRON, 72, 169–183, https://doi.org/10.1007/s40300-014-0039-y, 2014. a
Dawid, A. P., Musio, M., and Ventura, L.: Minimum Scoring Rule Inference, Scand. J. Stat., 43, 123–138, https://doi.org/10.1111/sjos.12168, 2015. a
Delle Monache, L., Eckel, F. A., Rife, D. L., Nagarajan, B., and Searight, K.: Probabilistic Weather Prediction with an Analog Ensemble, Mon. Weather Rev., 141, 3498–3516, https://doi.org/10.1175/mwr-d-12-00281.1, 2013. a
Demaeyer, J.: EUPPBench postprocessing benchmark dataset – gridded data – Part I (v1.0), Zenodo [data set], https://doi.org/10.5281/zenodo.7429236, 2022. a
Demaeyer, J., Bhend, J., Lerch, S., Primo, C., Van Schaeybroeck, B., Atencia, A., Ben Bouallègue, Z., Chen, J., Dabernig, M., Evans, G., Faganeli Pucer, J., Hooper, B., Horat, N., Jobst, D., Merše, J., Mlakar, P., Möller, A., Mestre, O., Taillardat, M., and Vannitsem, S.: The EUPPBench postprocessing benchmark dataset v1.0, Earth Syst. Sci. Data, 15, 2635–2653, https://doi.org/10.5194/essd-15-2635-2023, 2023. a, b, c
Diebold, F. X. and Mariano, R. S.: Comparing Predictive Accuracy, J. Bus. Econ. Stat., 13, 253–263, https://doi.org/10.1080/07350015.1995.10524599, 1995. a, b
Ebert, E. E.: Fuzzy verification of high‐resolution gridded forecasts: a review and proposed framework, Meteorol. Appl., 15, 51–64, https://doi.org/10.1002/met.25, 2008. a
Ehm, W. and Gneiting, T.: Local proper scoring rules of order two, Ann. Stat., 40, 609–637, https://doi.org/10.1214/12-aos973, 2012. a
EUMETNET: MeteoAlarm, https://www.meteoalarm.org/en/live/, last access: 16 October 2024. a
Ferro, C. A. T., Richardson, D. S., and Weigel, A. P.: On the effect of ensemble size on the discrete and continuous ranked probability scores, Meteorol. Appl., 15, 19–24, https://doi.org/10.1002/met.45, 2008. a
Friederichs, P. and Hense, A.: A Probabilistic Forecast Approach for Daily Precipitation Totals, Weather Forecast., 23, 659–673, https://doi.org/10.1175/2007waf2007051.1, 2008. a, b
Gilleland, E.: Spatial Forecast Verification: Baddeley's Delta Metric Applied to the ICP Test Cases, Weather Forecast., 26, 409–415, https://doi.org/10.1175/waf-d-10-05061.1, 2011. a
Gilleland, E., Ahijevych, D., Brown, B. G., Casati, B., and Ebert, E. E.: Intercomparison of Spatial Forecast Verification Methods, Weather Forecast., 24, 1416–1430, https://doi.org/10.1175/2009waf2222269.1, 2009. a, b, c
Gneiting, T. and Katzfuss, M.: Probabilistic Forecasting, Annu. Rev. Stat. Appl., 1, 125–151, https://doi.org/10.1146/annurev-statistics-062713-085831, 2014. a, b
Gneiting, T., Balabdaoui, F., and Raftery, A. E.: Probabilistic Forecasts, Calibration and Sharpness, J. R. Stat. Soc. B, 69, 243–268, https://doi.org/10.1111/j.1467-9868.2007.00587.x, 2007. a, b
Gneiting, T., Stanberry, L. I., Grimit, E. P., Held, L., and Johnson, N. A.: Assessing probabilistic forecasts of multivariate quantities, with an application to ensemble predictions of surface winds, TEST, 17, 211–235, https://doi.org/10.1007/s11749-008-0114-x, 2008. a, b
Gneiting, T., Lerch, S., and Schulz, B.: Probabilistic solar forecasting: Benchmarks, post-processing, verification, Sol. Energy, 252, 72–80, https://doi.org/10.1016/j.solener.2022.12.054, 2023. a
Good, I. J.: Rational Decisions, J. Roy. Stat. Soc. B Met., 14, 107–114, https://doi.org/10.1111/j.2517-6161.1952.tb00104.x, 1952. a, b
Han, F. and Szunyogh, I.: A Technique for the Verification of Precipitation Forecasts and Its Application to a Problem of Predictability, Mon. Weather Rev., 146, 1303–1318, https://doi.org/10.1175/mwr-d-17-0040.1, 2018. a
Heinrich‐Mertsching, C., Thorarinsdottir, T. L., Guttorp, P., and Schneider, M.: Validation of point process predictions with proper scoring rules, Scand. J. Stat., 51, 1533–1566, https://doi.org/10.1111/sjos.12736, 2024. a, b, c, d
Hersbach, H.: Decomposition of the Continuous Ranked Probability Score for Ensemble Prediction Systems, Weather Forecast., 15, 559–570, https://doi.org/10.1175/1520-0434(2000)015<0559:dotcrp>2.0.co;2, 2000. a
Hersbach, H., Bell, B., Berrisford, P., Hirahara, S., Horányi, A., Muñoz‐Sabater, J., Nicolas, J., Peubey, C., Radu, R., Schepers, D., Simmons, A., Soci, C., Abdalla, S., Abellan, X., Balsamo, G., Bechtold, P., Biavati, G., Bidlot, J., Bonavita, M., De Chiara, G., Dahlgren, P., Dee, D., Diamantakis, M., Dragani, R., Flemming, J., Forbes, R., Fuentes, M., Geer, A., Haimberger, L., Healy, S., Hogan, R. J., Hólm, E., Janisková, M., Keeley, S., Laloyaux, P., Lopez, P., Lupu, C., Radnoti, G., de Rosnay, P., Rozum, I., Vamborg, F., Villaume, S., and Thépaut, J.: The ERA5 global reanalysis, Q. J. Roy. Meteor. Soc., 146, 1999–2049, https://doi.org/10.1002/qj.3803, 2020. a
Holzmann, H. and Eulert, M.: The role of the information set for forecasting – with applications to risk management, Ann. Appl. Stat., 8, 595–621, https://doi.org/10.1214/13-aoas709, 2014. a
Hu, W., Ghazvinian, M., Chapman, W. E., Sengupta, A., Ralph, F. M., and Delle Monache, L.: Deep Learning Forecast Uncertainty for Precipitation over the Western United States, Mon. Weather Rev., 151, 1367–1385, https://doi.org/10.1175/mwr-d-22-0268.1, 2023. a
Jolliffe, I. T. and Primo, C.: Evaluating Rank Histograms Using Decompositions of the Chi-Square Test Statistic, Mon. Weather Rev., 136, 2133–2139, https://doi.org/10.1175/2007mwr2219.1, 2008. a
Jordan, A., Krüger, F., and Lerch, S.: Evaluating Probabilistic Forecasts with scoringRules, J. Stat. Softw., 90, 1–37, https://doi.org/10.18637/jss.v090.i12, 2019. a, b
Jordan, T. H., Chen, Y.-T., Gasparini, P., Madariaga, R., Main, I., Marzocchi, W., Papadopoulos, G., Sobolev, G., Yamaoka, K., and Zschau, J.: OPERATIONAL EARTHQUAKE FORECASTING. State of Knowledge and Guidelines for Utilization, Ann. Geophys.-Italy, 54, 316–391, https://doi.org/10.4401/ag-5350, 2011. a
Jose, V. R.: A Characterization for the Spherical Scoring Rule, Theor. Decis., 66, 263–281, https://doi.org/10.1007/s11238-007-9067-x, 2007. a
Keisler, R.: Forecasting Global Weather with Graph Neural Networks, arXiv [preprint], https://doi.org/10.48550/ARXIV.2202.07575, 2022. a
Kullback, S. and Leibler, R. A.: On Information and Sufficiency, Ann. Math. Stat., 22, 79–86, https://doi.org/10.1214/aoms/1177729694, 1951. a
Lam, R., Sanchez-Gonzalez, A., Willson, M., Wirnsberger, P., Fortunato, M., Alet, F., Ravuri, S., Ewalds, T., Eaton-Rosen, Z., Hu, W., Merose, A., Hoyer, S., Holland, G., Vinyals, O., Stott, J., Pritzel, A., Mohamed, S., and Battaglia, P.: GraphCast: Learning skillful medium-range global weather forecasting, arXiv [preprint], https://doi.org/10.48550/ARXIV.2212.12794, 2022. a
Lerch, S. and Thorarinsdottir, T. L.: Comparison of non-homogeneous regression models for probabilistic wind speed forecasting, Tellus A, 65, 21206, https://doi.org/10.3402/tellusa.v65i0.21206, 2013. a
Lerch, S., Thorarinsdottir, T. L., Ravazzolo, F., and Gneiting, T.: Forecaster's Dilemma: Extreme Events and Forecast Evaluation, Stat. Sci., 32, 106–127, https://doi.org/10.1214/16-sts588, 2017. a
Matheron, G.: Principles of geostatistics, Econ. Geol., 58, 1246–1266, https://doi.org/10.2113/gsecongeo.58.8.1246, 1963. a, b
Matheson, J. E. and Winkler, R. L.: Scoring Rules for Continuous Probability Distributions, Manage. Sci., 22, 1087–1096, 1976. a
Meng, X., Taylor, J. W., Ben Taieb, S., and Li, S.: Scores for Multivariate Distributions and Level Sets, Oper. Res., 344–362, https://doi.org/10.1287/opre.2020.0365, 2023. a
Murphy, A. H. and Winkler, R. L.: A General Framework for Forecast Verification, Mon. Weather Rev., 115, 1330–1338, https://doi.org/10.1175/1520-0493(1987)115<1330:agfffv>2.0.co;2, 1987. a
Nowotarski, J. and Weron, R.: Recent advances in electricity price forecasting: A review of probabilistic forecasting, Renew. Sust. Energ. Rev., 81, 1548–1568, https://doi.org/10.1016/j.rser.2017.05.234, 2018. a
Palmer, T. N.: Towards the probabilistic Earth‐system simulator: a vision for the future of climate and weather prediction, Q. J. Roy. Meteor. Soc., 138, 841–861, https://doi.org/10.1002/qj.1923, 2012. a
Parry, M., Dawid, A. P., and Lauritzen, S.: Proper local scoring rules, Ann. Stat., 40, 561–592, https://doi.org/10.1214/12-aos971, 2012. a, b, c
Pathak, J., Subramanian, S., Harrington, P., Raja, S., Chattopadhyay, A., Mardani, M., Kurth, T., Hall, D., Li, Z., Azizzadenesheli, K., Hassanzadeh, P., Kashinath, K., and Anandkumar, A.: FourCastNet: A Global Data-driven High-resolution Weather Model using Adaptive Fourier Neural Operators, arXiv [preprint], https://doi.org/10.48550/arXiv.2202.11214, 2022. a, b
Pic, R.: aggregation-transformation, Zenodo [code], https://doi.org/10.5281/zenodo.14982271, 2024. a
Pinson, P.: Wind Energy: Forecasting Challenges for Its Operational Management, Stat. Sci., 28, 564–585, https://doi.org/10.1214/13-sts445, 2013. a
Pinson, P. and Girard, R.: Evaluating the quality of scenarios of short-term wind power generation, Appl. Energ., 96, 12–20, https://doi.org/10.1016/j.apenergy.2011.11.004, 2012. a, b
Radanovics, S., Vidal, J.-P., and Sauquet, E.: Spatial Verification of Ensemble Precipitation: An Ensemble Version of SAL, Weather Forecast., 33, 1001–1020, https://doi.org/10.1175/waf-d-17-0162.1, 2018. a, b
Rasp, S. and Lerch, S.: Neural Networks for Postprocessing Ensemble Weather Forecasts, Mon. Weather Rev., 146, 3885–3900, https://doi.org/10.1175/mwr-d-18-0187.1, 2018. a
Rasp, S., Hoyer, S., Merose, A., Langmore, I., Battaglia, P., Russel, T., Sanchez-Gonzalez, A., Yang, V., Carver, R., Agrawal, S., Chantry, M., Ben Bouallègue, Z., Dueben, P., Bromberg, C., Sisk, J., Barrington, L., Bell, A., and Sha, F.: WeatherBench 2: A Benchmark for the Next Generation of Data-Driven Global Weather Models, J. Adv. Model. Earth Sy., 16, e2023MS004019, https://doi.org/10.1029/2023MS004019, 2024. a
Rivoire, P., Martius, O., Naveau, P., and Tuel, A.: Assessment of subseasonal-to-seasonal (S2S) ensemble extreme precipitation forecast skill over Europe, Nat. Hazards Earth Syst. Sci., 23, 2857–2871, https://doi.org/10.5194/nhess-23-2857-2023, 2023. a
Roberts, N. M. and Lean, H. W.: Scale-Selective Verification of Rainfall Accumulations from High-Resolution Forecasts of Convective Events, Mon. Weather Rev., 136, 78–97, https://doi.org/10.1175/2007mwr2123.1, 2008. a
Roulston, M. S. and Smith, L. A.: Evaluating Probabilistic Forecasts Using Information Theory, Mon. Weather Rev., 130, 1653–1660, https://doi.org/10.1175/1520-0493(2002)130<1653:epfuit>2.0.co;2, 2002. a
Schefzik, R., Thorarinsdottir, T. L., and Gneiting, T.: Uncertainty Quantification in Complex Simulation Models Using Ensemble Copula Coupling, Stat. Sci., 28, 616–640, https://doi.org/10.1214/13-sts443, 2013. a
Scheuerer, M. and Möller, D.: Probabilistic wind speed forecasting on a grid based on ensemble model output statistics, Ann. Appl. Stat., 9, 1328–1349, https://doi.org/10.1214/15-aoas843, 2015. a
Schlather, M., Malinowski, A., Menck, P. J., Oesting, M., and Strokorb, K.: Analysis, Simulation and Prediction of Multivariate Random Fields with PackageRandomFields, J. Stat. Softw., 63, 1–25, https://doi.org/10.18637/jss.v063.i08, 2015. a
Schorlemmer, D., Werner, M. J., Marzocchi, W., Jordan, T. H., Ogata, Y., Jackson, D. D., Mak, S., Rhoades, D. A., Gerstenberger, M. C., Hirata, N., Liukis, M., Maechling, P. J., Strader, A., Taroni, M., Wiemer, S., Zechar, J. D., and Zhuang, J.: The Collaboratory for the Study of Earthquake Predictability: Achievements and Priorities, Seismol. Res. Lett., 89, 1305–1313, https://doi.org/10.1785/0220180053, 2018. a
Schulz, B. and Lerch, S.: Machine Learning Methods for Postprocessing Ensemble Forecasts of Wind Gusts: A Systematic Comparison, Mon. Weather Rev., 150, 235–257, https://doi.org/10.1175/mwr-d-21-0150.1, 2022. a
Shannon, C. E.: A Mathematical Theory of Communication, Bell Syst. Tech. J., 27, 623–656, https://doi.org/10.1002/j.1538-7305.1948.tb00917.x, 1948. a
Smola, A., Gretton, A., Song, L., and Schölkopf, B.: A Hilbert Space Embedding for Distributions, in: Algorithmic Learning Theory, edited by: Hutter, M., Servedio, R. A., and Takimoto, E., 13–31, Springer Berlin Heidelberg, Berlin, Heidelberg, ISBN 978-3-540-75225-7, 2007. a
Stein, J. and Stoop, F.: Neighborhood-Based Ensemble Evaluation Using the CRPS, Mon. Weather Rev., 150, 1901–1914, https://doi.org/10.1175/mwr-d-21-0224.1, 2022. a
Steinwart, I. and Christmann, A.: Support Vector Machines, Information Science and Statistics, Springer, New York, ISBN 978-0-387-77241-7, 2008. a
Steinwart, I. and Ziegel, J. F.: Strictly proper kernel scores and characteristic kernels on compact spaces, Appl. Comput. Harmon. A., 51, 510–542, https://doi.org/10.1016/j.acha.2019.11.005, 2021. a
Székely, G.: E-statistics: The Energy of Statistical Samples, techreport, Bowling Green State University, https://doi.org/10.13140/RG.2.1.5063.9761, 2003. a
Taillardat, M.: Skewed and Mixture of Gaussian Distributions for Ensemble Postprocessing, Atmosphere, 12, 966, https://doi.org/10.3390/atmos12080966, 2021. a
Taillardat, M. and Mestre, O.: From research to applications – examples of operational ensemble post-processing in France using machine learning, Nonlin. Processes Geophys., 27, 329–347, https://doi.org/10.5194/npg-27-329-2020, 2020. a
Taillardat, M., Mestre, O., Zamo, M., and Naveau, P.: Calibrated Ensemble Forecasts Using Quantile Regression Forests and Ensemble Model Output Statistics, Mon. Weather Rev., 144, 2375–2393, https://doi.org/10.1175/mwr-d-15-0260.1, 2016. a
Talagrand, O., Vautard, R., and Strauss, B.: Evaluation of probabilistic prediction systems, in: Workshop on Predictability, 20–22 October 1997, 1–26, ECMWF, Shinfield Park, Reading, 1997. a
Thorarinsdottir, T. L. and Schuhen, N.: Verification: Assessment of Calibration and Accuracy, 155–186, Elsevier, https://doi.org/10.1016/b978-0-12-812372-0.00006-6, 2018. a, b, c, d
Thorarinsdottir, T. L., Gneiting, T., and Gissibl, N.: Using Proper Divergence Functions to Evaluate Climate Models, SIAM/ASA Journal on Uncertainty Quantification, 1, 522–534, https://doi.org/10.1137/130907550, 2013. a
Tsyplakov, A.: Evaluating Density Forecasts: A Comment, SSRN Electronic Journal, 1907799, https://doi.org/10.2139/ssrn.1907799, 2011. a
Tsyplakov, A.: Evaluation of Probabilistic Forecasts: Proper Scoring Rules and Moments, SSRN Electronic Journal, 2236605, https://doi.org/10.2139/ssrn.2236605, 2013. a
Tsyplakov, A.: Evaluation of probabilistic forecasts: Conditional auto-calibration, https://www.sas.upenn.edu/~fdiebold/papers2/Tsyplakov_Auto_calibration_sent_eswc2020.pdf (last access: 6 March 2025), 2020. a
Vannitsem, S., Bremnes, J. B., Demaeyer, J., Evans, G. R., Flowerdew, J., Hemri, S., Lerch, S., Roberts, N., Theis, S., Atencia, A., Ben Bouallègue, Z., Bhend, J., Dabernig, M., De Cruz, L., Hieta, L., Mestre, O., Moret, L., Plenković, I. O., Schmeits, M., Taillardat, M., Van den Bergh, J., Van Schaeybroeck, B., Whan, K., and Ylhaisi, J.: Statistical Postprocessing for Weather Forecasts: Review, Challenges, and Avenues in a Big Data World, B. Am. Meteorol. Soc., 102, E681–E699, https://doi.org/10.1175/bams-d-19-0308.1, 2021. a
Wernli, H., Paulat, M., Hagen, M., and Frei, C.: SAL – A Novel Quality Measure for the Verification of Quantitative Precipitation Forecasts, Mon. Weather Rev., 136, 4470–4487, https://doi.org/10.1175/2008mwr2415.1, 2008. a, b
Winkelbauer, A.: Moments and Absolute Moments of the Normal Distribution, arXiv [preprint], https://doi.org/10.48550/ARXIV.1209.4340, 2014. a
Winkler, R. L.: Rewarding Expertise in Probability Assessment, 127–140, Springer Netherlands, ISBN 9789401012768, https://doi.org/10.1007/978-94-010-1276-8_10, 1977. a, b
Winkler, R. L., Muñoz, J., Cervera, J. L., Bernardo, J. M., Blattenberger, G., Kadane, J. B., Lindley, D. V., Murphy, A. H., Oliver, R. M., and Ríos-Insua, D.: Scoring rules and the evaluation of probabilities, Test, 5, 1–60, https://doi.org/10.1007/bf02562681, 1996. a, b
Zamo, M. and Naveau, P.: Estimation of the Continuous Ranked Probability Score with Limited Information and Applications to Ensemble Weather Forecasts, Math. Geosci., 50, 209–234, https://doi.org/10.1007/s11004-017-9709-7, 2017. a
Ziel, F. and Berk, K.: Multivariate Forecasting Evaluation: On Sensitive and Strictly Proper Scoring Rules, arXiv [preprint], https://doi.org/10.48550/arXiv.1910.07325, 2019. a, b, c
Short summary
Correctly forecasting weather is crucial for decision-making in various fields. Standard multivariate verification tools have limitations, and a single tool cannot fully characterize predictive performance. We formalize a framework based on aggregation and transformation to build interpretable verification tools. These tools target specific features of forecasts, improving predictive performance characterization and bridging the gap between theoretical and physics-based tools.
Correctly forecasting weather is crucial for decision-making in various fields. Standard...