Recent heatwaves such as the 2021 Pacific Northwest heatwave have shattered temperature records across the globe. The likelihood of experiencing extreme temperature events today is already strongly increased by anthropogenic climate change, but it remains challenging to determine to what degree prevalent atmospheric and land surface conditions aggravated the intensity of a specific heatwave event. Quantifying the respective contributions is therefore paramount for process understanding but also for attribution and future projection statements conditional on the state of atmospheric circulation or land surface conditions. We here propose and evaluate a statistical framework based on extreme value theory, which enables us to learn the respective statistical relationship between extreme temperature and process variables in initial-condition large ensemble climate model simulations. Elements of statistical learning theory are implemented in order to integrate the effect of the governing regional circulation pattern. The learned statistical models can be applied to reanalysis data to quantify the relevance of physical process variables in observed heatwave events. The method also allows us to make conditional attribution statements and answer “what if” questions. For instance, how much would a heatwave intensify given the same dynamic conditions but at a different warming level? How much additional warming is needed for the same heatwave intensity to occur under average circulation conditions? Changes in the exceedance probability under varying large- and regional-scale conditions can also be assessed. We show that each additional degree of global warming increases the 7 d maximum temperature for the Pacific Northwest area by almost

Heatwave events pose a substantial risk to ecosystems

The objective of this study is to disentangle the effects of different physical drivers on low-likelihood heatwave events. It should provide estimates of how much prevalent dynamic and thermodynamic configurations contributed to the intensity of observed extreme heat events but also allow “what if” questions to be answered, for example how much the intensity of a specific event would be altered under different climatic conditions. The focus on low-likelihood heat extremes and the limited availability of observational data require one to learn the statistical relationships of heat extreme intensity and selected covariates from simulated heatwave data, which are provided in single-model initial-condition large ensemble climate simulations. The primary dataset used in this study is an 84-member large ensemble of the Community Earth System Model version 1.2 (CESM12), and non-stationary extreme value modelling provides the methodological framework for the analysis. The main focus is on the Pacific Northwest (PNW) area, but statistical models are also retrieved for western Europe (WEU) and western Russia (WRU), where well-studied record-breaking heat extremes occurred in the past decades (area-of-interest definitions are provided in Supplement Table S2).

The intensity of extreme heatwave events at the mid-latitudes is determined by a multitude of climatological factors across various spatio-temporal scales

Representing both dynamic and thermodynamic processes across the spectrum of spatio-temporal scales in the same model is required to accurately explain the intensity of heat extremes

Smoothed global mean surface temperature (GMST) as a proxy for long-term climatic changes and the respective thermodynamic forcing of local heat extremes

Soil moisture (SM) as a proxy for local land surface and soil conditions

Geopotential height of the 500 hPa pressure surface (

The statistical analysis of extreme climate events requires careful curation of data and application of methods, as results may depend heavily on seemingly negligible processing steps. The need for timely and robust extreme event attribution statements gave rise to a set of best-practice protocols, such as

For the training of the statistical model, we rely on Earth system model simulations, as only this data source provides the required number of extreme heatwave samples where the respective processes are represented at an adequate degree in past, present and future global climate states. The following analysis is primarily based on an exhaustive simulation set of the initial-condition large ensemble of CESM12, and additional simulations of the Coupled Model Intercomparison Project Phase 6 (CMIP6;

Simulations of the CESM12 model

CMIP6 data were selected upon availability from the archive

Length of the pre-industrial control run, the number of historical and transient future ensemble members per model and forcing scenario, and the resulting combined number of model years.

We here define the heatwave predictant variable as the 5-year maxima of 7 d running mean temperature (Tx7d), averaged over a spatial domain of interest (e.g. the PNW region shown in the yellow box in Fig.

In order to compare effect sizes and apply the statistical models to reanalysis data, the predictor variables should have comparable statistical metrics across datasets, i.e. a similar mean and variance. The predictor variables should also be close to orthogonal in order to reliably quantify the respective effect sizes and avoid collinearity artefacts. However, geopotential height and soil moisture both correlate with global mean temperature change. Absolute values of SM further show large differences across datasets (in both mean and variability; see Fig.

Unprocessed SM

Figure

Statistical theory provides various concepts which are specifically tailored to the analysis of extreme values in a larger dataset, following the philosophy of “letting the tail speak for itself” (

The non-stationary GEV distribution in Eq. (

We model the location parameter

The scale parameter

The shape parameter

In order to adequately sample the upper tail of the temperature distribution, a 5-year block size was found to be required, considering the extensive autocorrelation in temperature data and the short summer period when temperatures peak (an extended discussion of the block size is provided in Supplement Sect. S2.1). Five-year block-maximum events were extracted from the period 1980–2089 as training data of the GEV models, events from stationary simulation periods (pre-industrial and historical 1850–1900) were used to obtain optimal starting values for the

For the GEV model in Eq. (

The GEV model with a location parameter as in Eq. (

Model evaluation was conducted on a distinct testing period, 2090–2100, and a sub-set of events with a particularly strong

Applying a statistical model whose parameters were estimated from climate model data to specific heatwave events in reanalysis data implicitly assumes that the training (climate model) and evaluation (reanalysis) data share the same statistical characteristics. In this section, the necessary measures taken after model estimation are briefly summarised. The pre-processing (detrending and scaling) ensures that

In the following, the estimated parameters of the full GEV model are presented, which convey information on how the effects of physical process variables are found to alter the heatwave intensity. Furthermore, in the second part of this section, the relative improvement in the representation of temperature extremes by including the dynamical field (the full model) relative to using local process variables (local

Before drawing conclusions from a statistical model, it is vital to check whether the respective model is representative of the underlying data. Diagnostics like quantile–quantile plots in Supplement Sect. S3.1 confirm that the full GEV model is capable not only of describing the data used for estimating the respective parameters, but also performing reasonably on the testing dataset as well as for events with a dominant

The linear and additive structure of the statistical model in Eq. (

GEV parameter estimates at the PNW location. Point estimates (central horizontal line in the box) and bootstrap 95 % (dark shading) and 99 % (light shading) CI of the

We first analyse the scaling relationship of local extreme and global average temperature, as this constitutes the dominating long-term thermodynamic effect on heatwave intensity. As linear detrending removed the global warming signal in the SM and

The estimates

The spatially resolved effect of the geopotential height field on local extreme temperature is represented in the coefficient fields of

The log-scale parameter in Eq. (

In summary, the estimated parameters associated with the process variables are consistent with physical understanding and earlier research. However, it is not per se evident whether the model is missing additional crucial predictor information and whether the linear additive model structure can represent the relationship between predictors and predictant. In Supplement Sect. S3.2 the potential effects of seasonality and low-frequency climate variability on Tx7d data are analysed. No relevant signal could be detected, except for a tendency to overestimate the intensity of heat extremes at the end of the summer period, indicating that all relevant first-order process variables are considered.

With respect to the model structure, prior process understanding and sufficient data provide the basis and justification for a non-stationary modelling approach

An interpretation of the GEV parameter estimates of the full GEV model is provided in the previous section. It remains to be shown that this complex model adds value compared to simpler model structures, i.e. with fewer predictor variables. We evaluate the full GEV model with respect to the local

Overall, the skill of the full model is higher from the regression perspective, indicated by higher average coefficient of determination

The parameter estimates of the statistical GEV model discussed in Sect.

In the first step, we present the state of the physical process variables pre-conditioning the heatwave event, and given the estimated statistical models, the respective effect size is quantified. In the second step, the conditional event intensity is put into perspective: for instance, according to the statistical model, how would the intensity change at a different warming level? In the third step, an assessment of the respective likelihood changes is also presented. The following section will not just discuss the added value of the method, but also provide a critical review of its limitations.

The Pacific Northwest heatwave in late June 2021 was an unprecedented event considering the location and time of occurrence

ERA5-Land

As the PNW 2021 event falls into a period of strong global warming, the GMST effect amounts to

There remains a considerable gap between the estimated location parameter

Given the statistical model, how would the event intensity be altered under alternative prevalent conditions, in contrast to the actual drivers of the PNW heatwave? A major driver of the 2021 heatwave event is the anti-cyclone, whose effect is larger than in all previous 5-year ERA5 heatwave events (highest among the dots in Fig.

The model also suggests that, in order to fully compensate for the

Our statistical model detects a considerable contribution of atmospheric circulation to the PNW 2021 heatwave intensity. Other methods used to disentangle the contribution of atmospheric circulation and warming to heatwaves include the analogue method

The non-stationary GEV model also provides a (parametric) probability distribution quantifying the stochastic, unexplained variability in heat extremes, conditional on the respective process variables. In the following paragraphs, the conditional likelihood of the PNW heatwave is analysed, which is expressed in terms of its annual exceedance probability (AEP). The AEP is the probability of observing an event as intense or more intense than the PNW heatwave (according to the estimated GEV distribution and conditional on the respective covariates). Under stationary conditions, the AEP corresponds to the inverse of the return period. As the GEV distribution is derived from 5-year block maxima, the annual exceedance probability

If only GMST is considered as a driver for changing heatwave intensity – as in the GMST-only GEV model – the estimated AEP for a given Tx7d threshold can be displayed as a function over time. Figure

AEP for heatwave intensity

The estimated AEP value of the 2021 heatwave event is estimated to be

Figure

The conditional AEP

AEP of the 2021 PNW event

The following limitations should be considered regarding conditional AEP estimates: first, non-stationary extreme value theory strictly holds only for slowly varying covariates (like the GMST covariate), such that block maxima arise from a larger set of observations which are identically distributed given the covariate

In this publication we introduce and evaluate a statistical framework to disentangle and quantify the effect of three major physical drivers of heatwave intensity and likelihood: the long-term warming trend, the regional-scale atmospheric circulation and local soil moisture anomalies. The respective relationships are integrated into a non-stationary extreme value model by estimating the respective parameters across a large set of heatwave events in long climate model simulations and large initial-condition ensembles. The framework is then applied to reanalysis data in order to estimate the respective contributions in observed heatwave events, specifically the 2021 Pacific Northwest heatwave. The climate change signal is first separated from the circulation and soil moisture covariates, such that a linear and additive model structure is representative of the statistical relationship of heatwave intensity with the considered process variables. It is shown that, by detrending and scaling, the covariates become comparable in mean and variance across climate model datasets. Thus, the pre-processing allows the relationship structure learned in climate model simulations to be transferred to heatwave events in reanalysis data. The statistical model benefits from the regional circulation field information by pre-training parameters in a stationary pre-industrial environment and regularising the respective estimates, thus guarding against overfitting.

Estimated GEV parameters provide valuable information on the relationship of local extreme temperatures with global warming, indicating that heatwave events intensify beyond the summer mean temperature trends. In the case of the PNW area, the 7 d annual temperature maxima increase by almost

For the PNW 2021 heatwave event, the dominant contribution of the anti-cyclone and the amplification by global warming is confirmed by our analysis. The event magnitude is estimated to be increased by 2.1

The statistical model is also a tool to approach “what if” questions, such as by how much the event would intensify given the same dynamic conditions but at a different warming level or how much additional warming is needed for the same event intensity to occur under average circulation conditions. Changes in the exceedance probability under varying large- and regional-scale conditions can also be assessed. For instance, we estimate that the additional warming at a

Given the capabilities and limitations of the method, it can serve as an additional tool to characterise and assess both simulated and observed low-likelihood heatwave extremes. The method can further be extended by not just applying the model to observational data, but also integrating it into the estimation procedure, such that the distribution estimated from climate model data is also constrained by observations

All original CMIP6, ERA5 and ERA5-Land reanalysis data used in this study are publicly available.

CMIP6:

ERA5:

ERA5-Land:

Pre-processed data (including CESM12 large ensemble model data) are available at

The supplement related to this article is available online at:

JZ: conceptualisation, data curation, methodology, software, formal analysis, writing, visualisation. EMF: conceptualisation, methodology, formal analysis, writing, supervision.

The contact author has declared that neither of the authors has any competing interests.

Publisher’s note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

We would like to thank the associate editor Likun Zhang, Jonathan Koh and three anonymous reviewers for their helpful and constructive feedback and suggestions. We further thank Christoph Frei for the discussion of GEV properties and scoring measures and Urs Beyerle, Ruth Lorenz and Lukas Brunner for the maintenance of the CESM and CMIP6 climate model data archive. We thank the World Climate Research Programme’s Working Group on Coupled Modelling, which is responsible for CMIP, and we thank the climate modelling groups for producing and making available the model output.
We also thank all the scientists, software engineers and administrators who contributed to the development of CESM.
The analysis was carried out in R

This research has been supported by the Schweizerischer Nationalfonds zur Förderung der Wissenschaftlichen Forschung (grant no. 200020_178778) and the European Commission, Horizon 2020 (grant no. 101003469).

This paper was edited by Likun Zhang and reviewed by Jonathan Koh and three anonymous referees.