Dynamical downscaling of earth system models is intended to produce high-resolution climate information at regional to local scales. Current models, while adequate for describing temperature distributions at relatively small scales, struggle when it comes to describing precipitation distributions. In order to better match the distribution of observed precipitation over Norway, we consider approaches to statistical adjustment of the output from a regional climate model when forced with ERA-40 reanalysis boundary conditions. As a second step, we try to correct downscalings of historical climate model runs using these transformations built from downscaled ERA-40 data. Unless such calibrations are successful, it is difficult to argue that scenario-based downscaled climate projections are realistic and useful for decision makers. We study both full quantile calibrations and several different methods that correct individual quantiles separately using random field models. Results based on cross-validation show that while a full quantile calibration is not very effective in this case, one can correct individual quantiles satisfactorily if the spatial structure in the data are accounted for. Interestingly, different methods are favoured depending on whether ERA-40 data or historical climate model runs are adjusted.

The intensification of climate research over the past decade produces a
steadily increasing number of data sets combining different global
circulation or earth system models, CO

The comparison of climate models to weather data raises interesting
statistical problems. For a statistician, the most natural definition
of the climate is that it is the distribution of weather (and other
earth system variables) over multidecadal timescales

A multitude of models have emerged for projection of future climate change at
different spatial (and temporal) scales. Essential in the process of going
from the coarse resolution of the global models to finer spatial scales are
the regional climate models (RCMs). Such models propagate information from a
coarse-scale model along the boundary of a higher-resolution area of
interest, using a more detailed terrain description, model solutions using
finer resolution, and improved physical process parameterizations. The
boundary conditions may be computed either from a global weather model forced
with updated historical observations to calculate consistently the state of
the atmosphere (reanalysis), or from a global climate model. A regional model
using reanalysis boundary conditions is sometimes said to be run in “weather
forecasting mode”, and is the closest one can hope to get to observed weather
using a regional climate model.

One major purpose of regional climate models is to give end users such as stakeholders and
decision makers a representation, preferably a reliable projection, at a practically
useful spatio-temporal scale, of future weather. In the
insurance industry, for instance, the interest lies in high precipitation projections
under various possible future scenarios to assess the changing risk of damages
to buildings or flooding

It has long been understood that regional models tend to be regionally biased
in terms of precipitation (e.g.

There are a variety of bias correction methods in the literature
(

In this paper we consider approaches to statistical adjustment of the
regional model output, obtaining a calibrated product that is closer in
distribution to the observed data than the original output. We first
investigate the Doksum shift function

The paper continues as follows. In Sect.

All code needed to run the analysis on the data are found at

The data used in this study constitute 40 years of daily
precipitation values for the Norwegian mainland, covering the period 1961 to
2000. The data set is twofold: one part consists of dynamically downscaled
model data (ERA-40 reanalysis and climate model), and the other is a gridded
product based on in situ observations. A more thorough description of the
data are given in

Reanalysis data express the best, physically consistent, estimate available
for the historical state of the atmosphere. They are formed in retrospect
from feeding various sources of past meteorological observations
into a current meteorological forecast model. ERA-40 reanalysis data

Downscaled ERA-40 data are collected from the ENSEMBLES project website
(

Precipitation is measured daily at stations irregularly
distributed across Norway. Based on all observations of precipitation
available at every time step, high-resolution precipitation grids (

In order to compare the two data sets, the

The global Bergen Climate Model, BCM, data set

The results of the evaluation of the regional model

We characterize the transfer function between two distribution functions, in
our case those of a model and of observations, using Doksum's shift function

If the shift function is constant, it means that there is only a difference in location between the two distributions (and particularly if that constant equals zero there is no difference between the distributions). If it is linear, a location-scale transformation is implied.

Fisher tests of the 95 % quantile of the winter season
uncalibrated dERA40 test data (left panel), the calibrated dERA40 test data
(middle panel), and the calibrated dBCM test data (right panel). The plots
show significance level

Assume next that a region is divided into

Calibration of a new value

Assume that we want to apply the calibration in Eq. (

In

In our current setup, the dERA40 and OBS data are further divided into a training set and a test set. The training set is used to fit the calibration model. The transfer function thus obtained is applied to dERA40 data for the test period, which then is compared to observations for the test period. Here the training data are chosen to be the first 80 % of the total data, i.e. the years from 1961 to 1992. The test data are chosen to be the last 20 % of the data, i.e. the years from 1993 to 2000.

The dERA40 data are calibrated using Doksum's shift function as described in
Sect.

Assuming independence between the test statistics for different grid squares,
if all null hypotheses are true, we would expect about 39 spurious
significances at 95 % confidence level in a plot with 777 grid cells. We have
carried out the same kind of comparisons as in

The main reason for the difficulty of making a full quantile calibration is that the bulk of the distribution is concentrated around very small precipitation values, and the Kolmogorov–Smirnov statistic tends to focus on these well-estimated parts of the distribution, where very small differences in amounts are the reason for rejection. It would be natural to hope to use a spatial model to borrow strength from nearby grid squares. However, the high variability in the quantile correction for large values (occurring since there are relatively few high observations of precipitation) makes it difficult to fit a spatial functional model. Instead we will focus on calibrating directly quantities of higher interest for adaptation, namely high quantiles, where the full calibration did somewhat better.

We now focus on calibrating a fixed quantile,

The goal is now to predict the spatial field

We will use the notation

As a baseline method, we use the empirical quantile at each location

As a reference model, we use the smoothing spline method that performed well
in

As a first method for doing the calibration, we do linear regression with

Pointwise 95 % quantiles of OBS (left), dERA40 (middle), and dBCM (right) for the first 5-year period in the cross-validation.

Cross-validation mean square error for the 95 % quantile. The best model for each fold is displayed in bold.

There is clearly spatial dependence in the data, which we want to incorporate
in the model to improve the predictions. We can do this by assuming that the
regression coefficients are spatially dependent, using a stochastic model as
follows

Cross-validation mean square error for the 95 % quantile using dBCM predictions. The best model (excluding Model 0) for each fold is displayed in bold. For this comparison, Model 0 can be seen as the target value we want to reach with the dBCM-based predictions.

A more computationally efficient alternative to the covariance-based model
would be to use a Markov random field prior on

The model parameters

The predictor of

To apply the model to other data one simply replaces

Example calibration for the pointwise 95 % quantiles using Model 2s with the dBCM covariate (right). The result is for the final 5-year time period in the cross-validation study. The observed quantiles (left) and uncalibrated dBCM (middle) are shown as references.

Average mean square error for models trained on dERA40 data. The values in the table are averages across the eight folds for each season.

A somewhat surprising feature of the data are that the quantiles of the
observed data,

In this section, we evaluate the performance of the methods described in
Sect.

The results in Sect.

The results using the various models can be seen in Table

Overall, Model 1s performs best for the dERA40 predictions, whereas Model 2s
performs best for the dBCM predictions. For the dBCM predictions, Model 2s
has satisfactory performance compared with the target performance for that
case which is given by the Model 0 results. An example prediction using this
model can be seen in Fig.

The results for the different seasons are summarized in
Table

The low quality of Norwegian precipitation in the HIRHAM regional model
forced by reanalysis

Instead of adjusting the entire distribution, we are able to achieve a better performance by focusing on adjusting an individual quantile. In that case we were able to achieve error rates that indicate that the corrected downscaled climate model performed almost as well as the reanalysis-forced downscaling, indicating that this approach can be a useful tool in downscaling climate projections of precipitation over Norway.

There is a case in between the full quantile adjustment and the individual quantile adjustment, namely simultaneous adjustment of several quantiles. This will be subject to further research.

The sensitivity of regional dynamic downscalings to the lateral boundary
conditions is well known (e.g.

Dynamically downscaled BCM and ERA-40 reanalysis data are accessible from the
ENSEMBLES project website

P. Guttorp formulated the Doksum shift calibration method and provided
the literature review. E. Orskaug and O. Haug collected and prepared data for the
study, and together with I. Scheel and A. Frigessi carried out and documented the
analyses in Sect.

We are grateful for helpful discussions with Douglas Maraun, Leibniz
Institute of Marine Sciences, Kiel University. This research had partial funding
through Statistics for Innovation (sfi)