We present a method for the analysis and compact description of large-scale multivariate weather extremes. Spatial patterns of extreme events are identified using the tail pairwise dependence matrix (TPDM) proposed by Cooley and Thibaud (2019). We also introduce the cross-TPDM to identify patterns of common extremes in two variables. An extremal pattern index (EPI) is developed to provide a pattern-based aggregation of temperature. A heat wave definition based on EPI is able to detect the most important heat waves over Europe. As an extension for considering simultaneous extremes in two variables, we propose the threshold-based EPI (TEPI) that captures the compound character of spatial extremes. We investigate daily temperature maxima and precipitation deficits at different accumulation times and find evidence that preceding precipitation deficits have a significant influence on the development of heat waves and that heat waves often co-occur with short-term drought conditions. We exemplarily show for the European heat waves of 2003 and 2010 that TEPI is suitable for describing the large-scale compound character of heat waves.

Extreme weather events over Europe, such as heat waves and droughts, but also heavy rainfall events have repeatedly attracted attention in recent years due to their dramatically high impact on socioeconomic systems. The most prominent European heat waves in the 21st century occurred in 2003, 2010, and 2018. The high temperatures, which lasted for a period of several weeks to months, in combination with compound drought periods, led to disastrous effects on socioeconomic and local natural systems, such as an increased number of wildfires, crop failure, and increased health-related deaths

Several studies indicate that an increase in the global mean temperature and its variability leads to more frequent heat wave and drought events that are characterized by longer durations and greater severity

Several definitions and indices for drought and heat waves can be found in the literature. Heat wave indices usually include daily maximum temperature exceeding a certain threshold for several consecutive days

Detection and attribution (D&A) provides a powerful statistical tool for analyzing climate change's complex causes and effects. Reviews of the application of D&A to meteorological data are given in

There exists a variety of statistical methods of different complexity for information compression. For example, a relatively simple approach is to aggregate the data spatially, but this usually neglects the spatial dependence structure of the underlying extreme values. Other techniques may include filtering using Fourier or wavelet transforms or adaptive data methods such as principal component analysis (PCA). However, most data compression methods focus on the description of the bulk of the underlying distribution, as described by first- and second-order moments, and provide little to no information about the tail of the distribution. Adaptive data methods have been widely used to study the spatial structure of heat waves. For example,

For the D&A of extreme events, mathematical methods are needed that achieve a targeted description of extremes. The framework in which these methods operate is the multivariate extreme value theory (MEVT) with broad applications in mathematical and meteorological research

Another promising approach comes from

In this paper, we introduce the extremal pattern index (EPI) for spatial weather extremes. The EPI is constructed based on the leading patterns derived from a PCA for extremes, as proposed by

We use daily maximum 2 m temperature (maxT2m) from COSMO-REA6 reanalysis to derive the TPDM and identify spatial patterns associated with temperature extremes. Its EPI describes the intensity and spatial extent of heat waves in the area under consideration. To test the suitability of the EPI for describing heat waves, we focus on the two European heat waves in 2003 and 2010, about which many publications already exist. We compare our findings with existing literature.

We further explore the potential use and benefits of the TPDM in describing co-occurring heat waves and meteorological droughts. We again use maxT2m and define precipitation deficit (PD) as the additive reciprocal of daily precipitation accumulated over periods ranging from 11 to 47 d. To identify the pairwise extremal dependence between both variables, we introduce the cross-TPDM and provide an estimator thereof. We decompose the estimator of the cross-TPDM using a singular value decomposition (SVD), providing insight into the co-occurrence of extreme spatial patterns in two variables. Finally, we apply an analogue definition of the EPI using the SVD of the cross-TPDM. In order to target the index to co-occurring events, a modulated threshold-based estimator of EPI (TEPI) is defined.

The remainder of the paper is organized as follows. First, the theoretical framework of regular variation underlying the definition of the TPDM is presented in Sect.

The definition of the TPDM is based on regularly varying random variables. Regularly varying variables are defined by the tail behavior of their distribution function

Now let

To understand the derivation and definition of the TPDM, we explore the properties of a regularly varying random variable. This exploration is done through the lens of weak convergence

Let

Pre-processing is required when using meteorological data. Again we follow

The eigenvalue decomposition (EVD) of a covariance matrix is a standard tool in climate data analysis to reduce the dimensionality of high-dimensional weather and climate data

The TPDM is defined on the basis of positive real numbers. The transformation to Fréchet margins leads to the input variable

The TPDM presented above provides information on the pairwise dependencies between the margins of a random vector. Here we introduce the cross-TPDM as a measure of extremal dependency between two random vectors, such as spatial fields of maxT2m and PD.

Let

Let

The SVD of a cross-covariance matrix is a standard tool in climate data analysis to reduce the dimensionality of high-dimensional weather and climate data

The decomposition of the cross-TPDM with

The cross-TPDM is defined on the basis of positive real numbers. The transformation to Fréchet margins leads to the input variables

To assess how many EVs or SVs are needed to represent the main features of the data, we use the fraction of explained covariance, which is

To examine the extent to which each pattern is dominated by individual events that occurred during the training period and find an appropriate number of modes that yields a representative subset of patterns, we perform 2-fold cross-validation. There are several measurements that aim to find the “optimal” number of modes after PCA

The reconstructions follow Sect.

We define an extremal pattern index (EPI) based on the PCs using the

The SVD of the cross-TPDM provides two ECs, for the right and the left singular vectors. We define the EPI for the cross-TPDM analogously as

This study relies on the COSMO-REA6 regional reanalysis

We use accumulated PD, as well as daily maxT2m. PD is defined as the additive inverse of precipitation accumulated over various periods between 11 and 47 d using a centered moving-average approach (e.g., for an accumulation period of 11 d, PD is estimated at the center including the 5 d preceding and following the center date, respectively). The use of the moving average for precipitation is less common in the literature, but for our application, we consider it appropriate. Otherwise, we would rather focus on the precipitation deficit before a heat wave. (e.g., PD from 1 to 30 June in conjunction with maxT2m on 30 June). The spatial domain of COSMO-REA6 is according to the CORDEX EUR-11 specifications

Our pre-processing requires three steps: (1) in order to reduce spatial non-stationarity we choose to standardize the data at grid point level. Along with standardization, we also remove seasonal trends, which we identify in maxT2m and PD. To this end, we estimate mean and standard deviation at each grid point and day of the year, which are then used for standardization. (2) The spatial fields in COSMO-REA6 contain

The choice of the threshold value for estimating the TPDM is always critical. If the threshold is too low, there is a risk that the asymptotic limit of the extreme value model has not yet been reached, which leads to biases in the estimation. A threshold that leaves too few data points to which the model can be fitted leads to large uncertainties in the estimates. To determine a suitable threshold, we consider the rank of the TPDM and determine its stability as a function of the threshold value in Fig.

Rank of the TPDM, calculated for maxT2m and varying thresholds

Scatter plots of Fréchet-transformed maxT2m and PD (black dots) at two grid points in COSMO-REA6 at Paris and Bonn. PD has been calculated from 15 d accumulated precipitation. The circle indicates the respective 98 % quantile, and the values inside the circle (gray shading) are not used for the estimation of the (cross-)TPDM.

Figure

Figure

As

First 12 (from top to bottom, left to right) EVs of the TPDM for maxT2m in northern hemispheric summer months (JJA). Each vector is transformed to the positive real numbers

We start our analysis with the estimation of the TPDM for maxT2m. The estimation is based on maxT2m values exceeding the local 98 % quantile on the reduced grid (see Sect.

Higher-order EVs show large-scale patterns associated with the typical dipole and multipole structures also known from PCA. Since the EVs are transformed on

An important question is how representative the EVs are and at what number of modes significant information is no longer added by further modes but overfitting prevails. In Figs.

Explained variance

Our calculation of the

Overview of the temporal evolution of the 2003 heat wave. Top: pattern of mean anomalies within the periods

We will now examine the two heat waves of 2003 and 2010 in more detail using our framework. Both heat waves have been extensively studied and analyzed in the existing literature. The 2010 heat wave is characterized by the highest peak EPI in the period considered. This indicates that the 2010 heat wave was the most extreme event in terms of its combination of intensity and spatial extent over the period considered. The 2003 heat wave, on the other hand, did not reach the highest peak but is characterized by its long, highly pronounced duration. Finally, both heat waves had significant impacts on socio-economic systems, underlining their importance for understanding extreme heat events

Overview of the temporal evolution of the 2010 heat wave. Top: pattern of mean anomalies within the periods

In Fig.

Periods of strongest heat waves defined by EPI and GRD events. Periods indicated in black are within the top 15 strongest events using the respective index (EPImean, GRDsize, and GRDmean). Periods in bold are heat waves with a lower rank than 15, and underlined periods show no correspondence in the respective counterpart (EPI or GRD).

To further evaluate the suitability of

We define an EPI event as a period when

The Cooley and Thibaud framework requires the assumption of independent and identically distributed samples. This assumption is generally not given for daily values of maxT2m and PD. However, it turns out that our method is quite robust in this respect. We evaluate this in Appendix Fig. A3 using the EPI calculated on the one hand from the patterns of daily maxT2m data and on the other hand from the patterns of weekly maxima of maxT2m. It can be seen that the EPI is robust to small deviations in individual patterns. The range of EPI values changes slightly as the first 10 modes explain a higher proportion of the variance.

We have shown that the decomposition of the TPDM provides a set of spatial patterns to describe extremal dependencies in maxT2m and leads to a suitable definition of a heat wave index. We now turn to the cross-TPDM, which encodes the extremal covariability of maxT2m and PD. Commonly used indices, e.g., the standard precipitation index (SPI), are calculated on relatively long accumulation times from 3 to 48 months

To allow for a computation of the cross-TPDM for many different accumulations times, we need to lower the computational costs by further reducing the grid size. For this purpose, we increase the neighborhood in the COSMO-REA6 grid to

Not only the duration of a dry spell but also when it occurs in relation to the heat wave can be important for dynamic interpretation. For example, there is evidence that a dry phase can promote the development of an intense heat period

First and first

We assess statistical significance based on the singular values. To this end, we generate a 100-head bootstrap sample where we draw maxT2m and PD of a season independently so that the temporal relationship is destroyed while seasonality is preserved in the data. In Fig.

For short accumulation times, maximum association is obtained for instantaneous anomalies. For higher accumulation times, the association is maximal for negative shifts when anomalies in PD precede the maxT2m anomalies. This indicates that a heat wave is preferentially formed when it is preceded by a drought period. Considering the sum of more singular values (see Appendix Fig.

Second, third, and fourth left and right SVs of the cross-TPDM, associated with anomalies in maxT2m

Scatter plot of

We now turn to the large-scale patterns of extremes in maxT2m and PD. In Fig.

As in Sect.

Our investigation focuses on three cases with varying accumulation periods and lead times for PD anomalies: one with 11 d accumulations and immediate anomalies, another with 35 d accumulations and a PD anomaly lead time of 8 d, and a third case with 93 d accumulations and a PD anomaly lead time of 38 d. The first case represents short-term, coincident events, while the second and third cases represent monthly and seasonal PD driving a heat wave. In all cases, we find significant statistical dependence between maxT2m and PD (Fig.

Both the 2003 and 2010 heat waves show a strong signal in instantaneous TEPI. TEPI

Furthermore, the 2003 heat wave was preceded by a PD that was mainly evident on a seasonal timescale, whereas the 2010 heat wave was preceded by a more pronounced PD on a monthly timescale. In both cases, the high agreement of

Thus, the heat waves in 2003 and 2010 display different dynamics concerning common extremes of PD and maxT2m. Our TEPI definition effectively captures these characteristics, demonstrating that both events meet the IPCC definition of compound events as “two or more extreme events occurring simultaneously or sequentially”

We thus demonstrate that our definition of TEPI for different accumulation times and temporal shifts summarizes many characteristics of compound extremes and represents a suitable index.

In this study, we apply the promising approach of

We define the extremal pattern index (EPI) based on the time series of PCs and obtain a pattern-based spatial aggregation index. Thus, we can specifically identify spatially related extreme events with minimal reliance on pre-defined thresholds and regions of influence in grid point space. Although we cannot completely eliminate the use of a threshold, as it is a fundamental factor in estimating extreme dependencies based on the TPDM, all data, even those below the threshold, are included in the EPI estimator. We can show that this results in little information being lost with respect to the heat wave description.

Using the 2003 and 2010 European heat waves as examples, we examine how well the EPI captures the essential characteristics of heat waves. The temporal progression of both heat waves can be effectively illustrated using the EPI and aligns with the findings of previous analyses. Some characteristics, such as the collapse of the 2010 heat wave between 10 and 14 August, can be clearly identified using the EPI yet have received little exploration in the existing literature. An additional condition of heat waves besides spatial extent and intensity is usually temporal persistence. This condition can also be applied to EPI by defining heat waves. In Sect. 7, e.g., we define a heat wave (i.e., an EPI event) such that the EPI exceeds a threshold for at least 5 consecutive days. We propose an EPI-based definition of heat waves that, compared to a conventional definition, gives very consistent results. This shows that the EPI is suitable as an alternative to conventional heat wave indices with minimum reliance on user-defined thresholds. The EPI is thus a powerful tool for, e.g., attribution studies, climate monitoring, and characterization of extreme weather conditions.

Since pronounced heat waves over Europe cover a large part of the area studied, the spatial mean describes the course of the heat waves relatively well. Nevertheless, the higher modes of the PCs contribute significantly to the description of heat waves, as we discuss in Sect.

To include the temporal persistence of heat waves as another condition, one could include only those patterns that exhibit a certain persistence and then aggregate them over time. However, too hard a condition could be detrimental.

After initially focusing on the description of heat waves using maxT2m, we extend our approach and go beyond the examination of individual variables. We deal with common extremes in maxT2m and PD. To this end, we propose a cross-TPDM model as an analog to the cross-covariance matrix that describes pairwise extremal dependencies between the two variables. Previous studies for compound events have usually referred to monthly or even seasonal variables

Last but not least, we extend the EPI from considering a single variable to describing simultaneous extremes in two variables. We introduce a threshold-based estimator, the TEPI, which ensures simultaneous extremes of the underlying patterns. To demonstrate the utility of the TEPI, we again focus on the 2003 and 2010 heat waves and analyze their different dynamics in terms of compound extremes in maxT2m and PD using the TEPI. We analyze both heat waves using two event types: short-term coincident events and long-term precipitation deficit triggering a heat wave. In both cases, we can use TEPI to reconstruct the known characteristics of these events and demonstrate its effectiveness as a valuable tool for analyzing compound extremes. TEPI is a way to focus on specific events. A disadvantage of our formulation of TEPI is the hard condition of exceeding a threshold (see Fig.

There are several indicators for assessing climate indices

The cost-intensiveness of calculating TPDM based on high-resolution climate models is a universal problem that affects methods for dealing with spatial dependencies. This can possibly be solved by clever mathematical reformulation or extension of the working memory. As shown in Fig.

The TPDM assumes asymptotic dependence. However, we recognize that this assumption may not hold for maxT2m, PD, and their interdependencies. As

Code for data pre-processing and estimation of (cross-)TPDM and (T)EPI is provided as the R package “ExtrPatt”. It is available at

The COSMO REA6 dataset is available via the DWD Climate Data Centre at

SvS and PF developed the idea for this study. Data processing, analysis, and visualization were performed by SvS, who also led the writing, with contributions from PF. Both authors contributed to the discussion and interpretation of the results.

The contact author has declared that neither of the authors has any competing interests.

Publisher’s note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors.

This article is part of the special issue “Past and future European atmospheric extreme events under climate change”. It is not associated with a conference.

This work was conducted as part of the ClimXtreme project Module B/CoDEx. We are grateful to our project partners Marco Oesting and Carolin Forster from the University of Stuttgart and Sebastian Buschow, University of Bonn, for valuable suggestions and fruitful discussions.

This research has been supported by the Bundesministerium für Bildung und Forschung (ClimXtreme Module B/CoDEx) (grant no. 01LP1902A).

This paper was edited by Dan Cooley and reviewed by three anonymous referees.