In this study we detect and quantify changes in the distribution of the annual maximum daily maximum temperature (TXx)
in a large observation-based gridded data set of European daily temperature during the years 1950–2018. Several statistical models are considered, each of which analyses TXx using a generalized extreme-value (GEV) distribution with the GEV parameters varying smoothly over space.
In contrast to several previous studies which fit independent GEV models at the grid-box level, our models pull information from neighbouring grid boxes for more efficient parameter estimation. The GEV location and scale parameters are allowed to
vary in time using the log of atmospheric

The greenhouse effect, whereby increasing levels of greenhouse gases in the Earth's atmosphere lead to a warming of the climate system, has long been understood

Temperature extremes, which may manifest in more intense heatwaves and enhance the risk of fires, pose a risk to human health

Several previous studies consider changes in the probability distribution of daily temperature and infer that similar changes should also hold for extremes.

In this paper we consider statistical models for the variable TXx at approximately 12 000 locations of a gridded data set in a large subset of Europe. We consider the question of whether, over various large sub-regions of Europe, there is evidence for changes in the distributions of TXx and, if so, how such changes are best described. Our approach can, informally, be viewed as macroscopic, since we are interested in detecting changes in TXx
on a large scale rather than at any one specific geographic location.
We fit statistical models that allow for changes in both the location and scale of the TXx distributions. A change in the location of the TXx distribution corresponds to a horizontal shift in the distribution, with the mean and all quantiles being shifted by the same amount.
A change in scale corresponds to a horizontal stretching or compression of the distribution, which in turn changes measures of variability, such as the variance of TXx.
Figure

The solid black curve shows a hypothetical probability density function of TXx. The dashed curve illustrates the effect of a shift in the location of the distribution towards hotter temperatures, while the dotted curve illustrates a change in the scale, leading to greater variability in TXx.

Most of the studies mentioned above treat the data occurring at different geographic locations in an independent manner, fitting separate statistical models to the data at each location. One difficulty with this approach in the context of extremes is that, as extreme observations are by definition rare, we will only have
a small sample at each location, making precise estimation of trends problematic. Although it may be unreasonable to assume a common trend at every geographic location of a large spatial domain, we would nonetheless expect nearby regions to be similarly affected by climate change. There are several classes of models, such as varying coefficient models

The lack of availability of high-resolution, continental-scale, temporally complete and homogenized observational data, together with the impracticality of performing large-scale controlled experiments on the climate system, means that climate researchers often rely on gridded data products

Gridding of station data is performed using aggregation of stations within spatial boxes, often with estimates of uncertainty. This yields estimates of area-averaged data that are comparable with climate model data over a similarly sized grid box, making them widely used for climate model evaluation

The structure of the paper is as follows. Section

We use the daily E-OBS data, publicly available through the European Climate Assessment and Dataset (ECA & D) project. E-OBS is based on observational data from an underlying network of weather stations interpolated onto a regular

E-OBS is frequently used as a benchmark at the European scale

Both

In addition to inhomogeneities, a further issue with observation-based gridded data is that in regions with very low station density, grid-box areal averages may be poorly estimated and have large interpolation uncertainties. However, these problems are less severe for a spatially smooth variable such as temperature in comparison to precipitation

A plot of the station network density used in E-OBS can be found in

For atmospheric

The spatial domain considered, showing the maximum value of the variable TXx (annual maximum daily maximum temperature) at each grid box during the period 1950–2018.

Our approach is based on fitting generalized extreme-value (GEV) distributions to the TXx values at each grid box. Another possible and theoretically well-founded approach to modelling extremes is the peaks-over-threshold method

Just as variations in the mean of a large number of independent and identically distributed random variables are naturally modelled by a normal (Gaussian) random variable, variations in the sample maximum are most naturally modelled by a GEV random variable with distribution function

The three classes (

Suppose that, in grid box

Having estimated

From Eq. (

An alternative to maximum-likelihood estimation that is more robust to small sample sizes is the method of L-moments or, equivalently, probability-weighted moments

Maximum-likelihood estimates of the GEV parameters, fitted to the TXx values (

In this section we describe the statistical models that we fit to the E-OBS data. For computational convenience, and also to allow for the possibility that different models may be better suited to different regions, we partition our spatial domain into eight sub-regions, which are defined in Table

The various sub-regions of the domain that the models from Table

For statistical modelling of TXx, the log-likelihood function in Eq. (

The dependency of the GEV parameters on time can be linked to that of a climatological covariate, and for this purpose we will use the atmospheric concentration of

We assume that the annual maximum temperature in grid box

The Gaussian Markov random field (GMRF) penalty allows us to formalize the belief that grid boxes that are near to each other are more likely to have parameter values that are similar than those that are far apart. In order to define the GMRF penalty, we are required to specify a neighbourhood structure for our domain. Specifically, for each grid box

The objective function in Eq. (

Although commonplace, the conditional independence assumption implied by Eq. (

The model that has been described so far in this section contains only a single covariate in the GEV location parameter, and we have seen how the effect of this covariate on the annual maximum temperature can be modelled as smoothly varying over space by using the GMRF penalty. The value,

The most complex model is Mod4, which corresponds to the following formulas for the GEV parameters:

Comparison of Mod1–Mod5 according to the inclusion of a trend in

To illustrate the effect and benefit of using the GMRF smoothing penalty, we compare, for region UKRI, the independent grid-box fits based on maximizing Eq. (

Plots

In Sect.

Comparison of model scores, defined in Appendix

Approximate 95 % confidence intervals for the spatially averaged 100-year return-level differences (

The difference (

Risk ratios and approximate 95 % confidence interval limits, calculated by Monte Carlo simulation as described in Sect.

Changes in the GEV location and scale parameters over the period 1950–2018 (2018 parameter values subtract 1950 values)
and approximate 95 % confidence interval limits, calculated by Monte Carlo simulation as described in Sect.

Approximate 95 % confidence intervals for the spatially averaged changes in the GEV location and scale parameters (2018 parameter value subtract 1950 parameter value) for each region defined in Table

Plots showing the distribution (density function) of TXx based on spatially averaged values of the fitted GEV parameters in 1950
(solid black curve) and 2018 (dashed curve) for each region defined in Table

Another way that we quantify changes in the distribution of the annual maximum temperatures is via risk ratios

All the models were fitted using the R

Table

Table

Figure

Changes in the GEV location and scale parameters over the period 1950 to 2018 calculated using Mod4 are shown in Fig.

In the same notation as Sect.

We have considered the problem of detecting and quantifying large-scale changes in the distributions of the annual maximum daily maximum temperature (TXx) in a large subset of Europe during the years 1950–2018. Our approach was to divide the full domain into eight sub-regions over which several statistical models were fitted. In each of the models considered, TXx at each grid box was modelled using a generalized extreme-value (GEV) distribution with the GEV location and scale parameters allowed to vary in time using atmospheric

We use several scoring rules that evaluate the performance of models based on their ability to predict unseen data, i.e. data that were held out from the model-fitting procedure. A scoring rule is a function,

Histograms of probability integral transform (PIT) values by region for Mod4. If the model is correct, the PIT values are uniformly distributed.

A negatively oriented scoring rule

One of the simplest and most common scoring rules is the squared error score

Another commonly used scoring rule is the continuous ranked probability score (CRPS) defined by

We use each of the scoring rules described above as part of a cross-validation scheme to evaluate a model's performance. This is described in Appendix

For each of the regions in Table

Suppose that in the cross-validation procedure described above, model

Hypothesis testing procedure for testing pair-wise exchangeability of model scores.

Set

Compute the randomized score difference;

The value of

Probability plots by region for Mod4.

In this Appendix we perform some visual checks for Mod4 to see whether this model provides a reasonable fit to the data and is not merely the best of a bad bunch of models. As it is not feasible to provide plots for every grid box, we consider the performance as a whole over the sub-regions as defined in Table

One simple way to check for any systematic discrepancies between a fitted model and the observed data is to use the probability integral transform (PIT).
The PIT states that, if

Another standard method for checking a non-stationary extreme-value model fit is via probability or quantile plots

Quantile–quantile plots by region for Mod4.

The quantile plot compares fitted-model and empirical quantiles and plots the pairs

Finally, we inspect the spatial distribution of the Pearson residuals obtained from a model fit. For grid box

The data analysed in this paper are publicly available from

All the authors are responsible for the conceptualization of the research. GA carried out the formal analysis, developed the methodology and wrote the original draft. All the authors discussed the results and helped with review and editing of the final draft. Both IP and GCH acted as supervisors to GA.

The contact author has declared that none of the authors has any competing interests.

Publisher’s note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

We are grateful to Ben Youngman for helpful advice regarding use of the evgam package and to two anonymous reviewers for helpful comments on an earlier draft of the manuscript.

Graeme Auld is supported financially by the Ratchadapisek Somphot Fund for Postdoctoral Fellowship, Chulalongkorn University, and was supported during his PhD, when much of this research took place, by the EPSRC (grant no. 1935526). Gabriele C. Hegerl has been supported by NERC grant EMERGENCE (grant no. NE/S004661/1).

This paper was edited by Seung-Ki Min and reviewed by two anonymous referees.