Supplement of Downscaling probability of long heatwaves based on seasonal mean daily maximum temperatures

Abstract. A methodology for estimating and downscaling the probability associated with the duration of heatwaves is presented and applied as a case study for Indian wheat crops. These probability estimates make use of empirical-statistical downscaling and statistical modelling of probability of occurrence and streak length statistics, and we present projections based on large multi-model ensembles of global climate models from the Coupled Model Intercomparison Project Phase 5 and three different emissions scenarios: Representative Concentration Pathways (RCPs) 2.6, 4.5, and 8.5. Our objective was to estimate the probabilities for heatwaves with more than 5 consecutive days with daily maximum temperature above 35 ∘C, which represent a condition that limits wheat yields. Such heatwaves are already quite frequent under current climate conditions, and downscaled estimates of the probability of occurrence in 2010 is in the range of 20 %–84 % depending on the location. For the year 2100, the high-emission scenario RCP8.5 suggests more frequent occurrences, with a probability in the range of 36 %–88 %. Our results also point to increased probabilities for a hot day to turn into a heatwave lasting more than 5 days, from roughly 8 %–20 % at present to 9 %–23 % in 2100 assuming future emissions according to the RCP8.5 scenario; however, these estimates were to a greater extent subject to systematic biases. We also demonstrate a downscaling methodology based on principal component analysis that can produce reasonable results even when the data are sparse with variable quality.

## Remove periods with mostly missing data and stations with few valid data Y <-subset(tmax,it=c (1970,2015)) nv <-apply(coredata(Y),2,FUN= nv ) ##nv = number of valid data points Y <-subset(Y,is=nv > 10290) There were some stations with little data or short records which have been omitted here. We have only kept records with plenty of valid data (more than 10290 data points) between 1970 and 2015.

Spell statistics for heatwaves
For all Indian wheat varieties, the main challenge is the high temperatures in the final growing phase, late in the season from February to April. All temperatures above optimal decrease the yield. Here the upper temperature threshold was set to T > 35 • C for 5 days; ## The fraction of events lasting more than 5 days f.gt.5[i] <-sum(heatwave > 5)/sum(is.finite(heatwave)) #f.gt.5 [i] <-sum(subset(sds,is=1) > 5)/sum (is.finite(subset(sds,is=1))) #f.gt.5 [i] <-sum(subset(sds,is=1) > 5)/count (subset(Y,is=i),threshold=35) cc <-coredata(heatwave); cc[cc <= 5] <-NA; cc[is.finite(cc)] <-1 hw <-zoo(cc[is.finite(cc)],order.by=index(heatwave)[is.finite(cc)]) ## The portion of seasons with a 5-day or longer 35C heatwave nf.gt.5[i] <-length(rownames(table(year(hw))))/n.yrs ## Probability of a 5-day or longer 35C heatwave in February-March-April Pr.Tmax.gt.35[i] <-1-pnorm (35,mean=mean(subset(Y,it=month.abb[2:4] The printout from the analysis of spell duration (consecutive days with more than 35 degrees) reveals that a number of stations still had gaps of missing data. To deal with this in an ad hoc manner (it's difficult to estimate the heatwave durations with missing data), a linear interpolation in time was used to fill those gaps. This is a step that will introduce errors and an additional layer of uncertainties that will affect the whole analysis.
The results of the test for the probability for one day or more with maximum daily temperature exceeding 35 • C against the observed frequency of such hot days suggests that the estimated probability assuming a geometric distribution gives higher values that are less sensitiveto the location differences. The probabilities here are the probability of heatwaves per season. The two estimates differ because the former takes heatwaves greater than 1 day as one case where the length is greater than zero, and therefore the two are not entirely comparable. However, based on this, the observed frequency on the x-axis is expected to be higher than the probability on the y-axis.

Seasonal aggregates
We want to use seasonal statistics of the temperature and duration of hot spells for downscaling the hot spell statistics. Here lws.fma (also referred to as L H in the main paper) refers to the mean length of warm spells for February-March-April, and the probability of five-day spells are estimated from this using the geometric distribution. Aggregates over seasons are expected to be more resistant to errors in the data and caused by the interpolation than for the individual events. ## There are some missing data which will cause some technical problems in the analysis ## but these are only a few data points, and we can interpolate their values in order to ## get around the stumbling blocks that the missing values cause. The function pcafill makes ## use of a spatio-temporal covariance matrix for filling in the gaps. tmax.fma <-pcafill ( There were gaps of missing data also for the aggregated data, and the PCA required complete datasets with no missing data. We used a PCA-based strategy to fill in the gaps through the use of PCA applied to subsets (blocks of the data matrix) of the data with complete coverage, and regression to fill in the gaps. This is explained in Benestad et al (2015)  We see that there is typically one or less five-day heatwave per February-March-April season in most locations. One exception is Surat with an average of 6 events each season. There is also a tendency of higher numbers along the southeastern coast, however, none of these are important places for wheat crops. The mean frequency is about 2 in the interior northern part on our map.

Normal Q−Q Plot
Theoretical Quantiles

Sample Quantiles
We see that there is an indication that the mean number of five-day heatwaves n h5 is influenced by the mean daily maximum temperature for the same season T [ max].

Relationship between the mean temperature and duration of heat waves
The following chunks of R-code show how we calibrated the regression model to quantify the statistical link between the mean temperature and the mean duration. ## Figure 2. print( Figure 2 )  The results are saved locally, so the downscaling (which takes some time) needs to be done only once. Repeated runs with this script will be faster.

Diagnosis and evaluation of the downscaled ensembles
The skill of simulating the trend and inter-annual variability for the different sites can be summarised in the following figure: diagnose ( The plot shows how well the trend for the common interval  corresponded between the downscaled projections and the actual observations (x-axis) and how well the magnitude of the interannual variability was reproduced (y-axis). The colours of the symbols correspond with those in the map, and the locations performing less well are those in the southern part of India which had lower weights in the leading PCA. The points in upper right corner represent the locations where the downscaled results were associated with weaker trends and weaker interannual variations than seen in the observations. The points within the 'target' represent sites with skillful downscaling for the entire ensemble, and include sites more relevant for sites with wheat-crops.
We can also evaluate the downscaling of the different GCMs by examining the cross-validation correlations for each PC. In this case, it involved a five-fold cross-validation, meaning that the records were divided into five segments and four were used to calibrate the models whereas the last one was used for independent evaluation. (1,1) The results from the cross-validation indicate that the leading PCA for the stations was skillfully simulated, and that modes 2-4 could be described as moderately skillful. PCA 5, which carries lowest weights, was associated with negative skill.

Example of downscaled results
There is an abundance of information hidden in the compact downscaled results,

The figure shows the observed Feb-Apr mean daily maximum temperature T [ max] as a black time series and the ensemble statistics in red. The light central line is the ensemble mean and the gray-dashed lines mark the 90% confidence region. The map insert shows the site of the observations. This plot also presents diagnostics for this particular site, and the 'target diagram' indicates that the downscaled simulations were associated with stronger trends than observations (which probably is erronious due to the suspect data point in the start of the observations) but reasonable range of interannual variability.
This evaluation suggests that the downscaled results are reasonably well estimated.  There is a scatter in the mean-stdv points, and even a systematic dependency that can be considered statistical significant at the 5%-level. However the slope |m| is small compared to the intercept |c|: |b| |c| and the best fit is strongly influenced by outliers.

Region for wheat crops
Define regions for wheat crops that are used in maps.

Identify locations with questionable results
Identify locations with large differences between observed and estimated frequency of 5-day heatwaves.
The first map shows a comparison between the observed seasonal frequency of heatwaves and the projected probability of at least one heatwave (T>35 • C more than 5 days). The locations with an orange triangle are those with a poor match in Table 1 The second map shows the differences between the observed portion of hot days with a duration longer than five days and the projected probability that a hot day turns into a more than five day long heatwave. The locations with an orange triangle are those with a poor match in Table 2

)) > 0.5 ## identify the locations with large error pch <rep(19,length(err)); col <rep( darkgreen ,length(err)) pch[ile] <-17; col[ile] <-orange plot(lon(lws),lat(lws),pch=pch,col=col,xlab= ,ylab= ) text(lon(lws),lat(lws),substr(loc(lws),1,7),cex=0.7,col= grey ) data(geoborders) lines(geoborders)
lines (west_lon,west_lat,lwd=3,col=rgb(0,0,0,0.05))  lines(cent_lon,cent_lat,lwd=3,col=rgb(0,0,0,0.05) The maps above highlight the sites with a difference greater than 50% between observed and projected probabilities for the present. Based on these results, it looks like the approach to estimate the probability of one or more heatwaves in a season (Table 1, top map) is somewhat more successful than calculating the probability that a hot day will turn into a heatwave (Table 2, lower map), although it may be related to the short data series rather than the underlying statistical assumptions. It is important to keep in mind that the frequency of observed heatwaves are uncertain due to problems with data availability and quality. We nevertheless get some crude results for probabilities despite sparse data of questionable quality for some sites. These results may be some of the best information there is for the probability of future heatwaves in India.