Weather forecasts from ensemble prediction systems (EPS) are improved by statistical models trained on past EPS forecasts and their atmospheric observations. Recently these corrections have moved from being univariate to multivariate. The focus has been on (quasi-)horizontal atmospheric variables. This paper extends the correction methods to EPS forecasts of vertical profiles in two steps. First univariate distributional regression methods correct the probability distributions separately at each vertical level. In the second step copula coupling re-installs the dependence among neighboring levels by using the rank order structure of the EPS forecasts. The method is applied to EPS data from the European Centre for Medium-Range Weather Forecasts (ECMWF) at model levels interpolated to four locations in Germany, from which radiosondes are released to measure profiles of temperature and other variables four times a day. A winter case study and a summer case study, respectively, exemplify that univariate postprocessing fails to preserve stable layers, which are crucial for many atmospheric processes. Quantile resampling and a resampling that preserves the relative distance between individual EPS members improve the calibration of the raw forecasts of the temperature profiles as shown by rank histograms. They also improve the multivariate metrics of energy score and variogram score and retain the stable layers. Improvements take place over all times of the day and all seasons. They are largest within the atmospheric boundary layer and for shorter lead times.

Ensemble prediction systems (EPS) are an important tool in modern weather forecasting for providing estimates of the range of possible forecast outcomes. The individual members of an EPS are based on numerical weather prediction (NWP) models, which simulate the fluid dynamic and thermodynamic behavior of the atmosphere and its lower boundary. NWP models are not perfect because they only approximately represent physical laws, cannot resolve processes at all temporal and spatial scales, and have to start from an inexactly known initial state. Their imperfection was actually the motivation behind EPS, which should provide a realistic and comprehensive spectrum of the possible future weather.

Statistical postprocessing techniques, which learn from past measurements and
NWP EPS forecasts, can remove systematic errors of the EPS forecasting
distribution

Postprocessing the vertical structure of the atmosphere, on the other hand, has so far been mostly neglected, with the exception of

Forecasts of temperature profiles also suffer from systematic errors of the
underlying NWP models. Additionally, the forecast uncertainty represented by the EPS has systematic errors and is typically underdispersive

To develop and demonstrate methods for the correction of vertical temperature profiles forecast by NWP models, we use the lower two-thirds of the troposphere from the surface to 400 hPa. Many processes that affect its lowest part, the planetary boundary layer, are parameterized in an NWP model, and substantial systematic errors occur

Temperature profiles are from radiosondes where a sensor package carried aloft by a balloon transmits data back to a ground station. Vertical resolution of the data as available in global databases is variable on the order of 100

We use forecasts from the EPS of the European Centre for Medium Range Weather Forecasts (ECMWF) with 50 perturbed ensemble members and one
unperturbed control run

In the remainder, the radiosonde measurements

The goal of postprocessing is to alleviate systematic errors in the vertical temperature profile and produce a probabilistic forecast that is calibrated and sharp. For the univariate case, “calibrated” means that the verifying observations are equally likely to fall into the bins into which the ordered NWP ensemble members partition the real line. The sharper a calibrated predictive distribution is, the smaller the uncertainty of the forecast will be.

Postprocessing will proceed in two steps. First, to correct the marginal
distributions of temperature at all vertical height levels simultaneously,
non-homogeneous Gaussian regression

Nonhomogeneous Gaussian regression

A similar model is chosen for the logarithm of the scale parameter,

This section provides an alternative method to remove the constraint that NGR
models have to be fitted to each location separately. In the current case

The motivation behind SAMOS is that with many locations and/or for operational use fitting many models might become computationally expensive. However, one single NGR-like regression model can be fitted to all locations at once when in a first step all site-specific characteristics are (nearly) eliminated from observations and ensemble data. The model is fitted not to the direct values, but rather to standardized anomalies formed by subtracting the respective climatological mean and dividing by the climatological standard deviation. This SAMOS approach is described in

Here, we estimate the climatological properties of the observations with a
GAM(LSS); for the expected value as

The pressure levels of the observational radiosondes and the EPS data are not the same. We linearly interpolated the spatially better resolved observations to the pressure levels of the EPS data (see Sect.

Both NGRVC (Sect.

The ECC procedure consists of three steps.

Sample quantiles from the predicted marginal distributions.

Access the rank structure of the raw ensemble.

Arrange the quantiles from step 1 with the ranks of step 2.

In step 1 temperature quantiles have to be sampled from the
univariate distributions obtained by either NGRVC (Sect.

ECC-Q transforms

ECC-R draws samples randomly from the predictive distribution.

ECC-T first fits a Gaussian distribution to the raw ensemble and then
evaluates this cumulative distribution function (cdf) at the values of the raw ensemble leading to

The second step is to access the ranks of the raw ensemble temperatures for each level: the ensemble member with the lowest temperature gets rank

The final step of the ECC procedure takes the quantiles from step one and
arranges them using the ranks of step two. For instance, the lowest temperature within the quantiles of step one will be associated with the member that has rank

Several metrics are used to evaluate the performance of the forecasting methods in achieving the goal of probabilistic forecasts: to maximize the sharpness of the predictive distribution subject to calibration. The metrics are applied out-of-sample using 5-fold cross-validation to test on independent data and avoid overfitting.

Calibration and sharpness of univariate predictions are evaluated using the
probability integral transform (PIT) histogram and the sharpness diagram,
respectively

The rank histogram and the PIT histogram are common tools to verify the
calibration of the forecasts in the univariate case.

As numerical measures we apply the continuous ranked probability score (CRPS)
and the log score (LS) for univariate predictions

The energy score

All scores are computed out-of-sample by 5-fold cross-validation using the

The validation of the numerical scores was done by bootstrapping to estimate the uncertainty in the scores. This was applied by drawing a random sample in the size of the original data with replacement and averaging it. This procedure was repeated 500 times, and the 500 values are presented as box-and-whisker plots. In order to preserve the vertical dependence structure while bootstrapping, we considered a vertical profile to be the smallest indivisible unit: rather than bootstrapping temperatures on vertical levels and merging these samples to new profiles, we bootstrapped vertical temperature profiles as a whole for different times, locations and lead times.

The presentation of the results proceeds from the univariate case using the two NGR extensions (NGRVC and SAMOS) to the multivariate case, which is introduced with case studies of winter and summer profiles, respectively, and then verified for the whole data set. The response variable for all approaches is the temperature measured by the radiosondes.

Example of the effects of the location parameter

Allowing varying coefficients of the NGR makes a diurnally, seasonally and
vertically varying correction of the expected value of the Gaussian forecasting
distribution of the EPS possible – as described by
Eq. (

SAMOS, a further extension to NGR, makes it possible to correct all vertical
levels with one postprocessing model, but requires climatologies of expected
value and standard deviation in order to compute the prerequisite standardized anomalies (cf. Sect.

Effects of the climatology of observations
y for station Bergen

The effects of the climatology for response

A climatology is also computed for the ensemble forecasts. The climatology of
the ensemble mean is modeled with a single GAMLSS. The standardized anomalies of

Climatological values for mean and standard deviation from
Sect.

PIT histograms of the probability distributions of the

Several verification measures are used to compare calibration and sharpness of the distributions resulting from univariate postprocessing with SAMOS and NGRVC, respectively. Calibration is first evaluated with PIT
histograms. Figure

PIT histograms by pressure ranges for the probability distributions of the

Figure

After calibration the predictive distributions slightly lose sharpness:
Fig.

Out-of-sample negative logarithm of the likelihood (LS) and CRPS of the raw ensemble (RAW), SAMOS and NGRVC probability distributions for all stations combined. Numbers in brackets give the 5th and 95th percentiles of the bootstrapped scores; they represent the 90 % confidence intervals. Bold font indicates the models that performed best.

Boxplots of the widths of prediction intervals of 50 %

Ensemble copula coupling (Sect.

Vertical temperature profiles as observed (black bold solid line) and the members (magenta) of the

Zoom into the rectangle of Fig.

The two case studies show ensemble copula coupling at work in typical winter and summer settings, respectively. The winter morning case is characterized by a strong surface-based temperature inversion, topped by a dry-adiabatically stratified layer, which is capped by a second inversion, as shown by the radiosonde measurements (bold black line) in Fig.

Observation (black bold solid) and forecasts (magenta) of the
vertical temperature gradient for the 6 December 2016, 06:00 Z winter case. Top row is for lead time

How well do the different methods correct the raw profiles? The univariate NGRVC method (second column of Fig.

The multivariate ensemble copula coupling method with sampling from the
quantiles of the raw ensemble (ECC-Q) restores the overall shape of the member profiles and thus allows the cross-over near the surface of the blue lines in the

The vertical temperature gradient determines how easily an air parcel can be
displaced in the vertical, which in turn determines exchange processes of, e.g., pollutants and the formation of clouds. Figure

As in Fig.

As in Fig.

The second case study of a summer noon profile exemplifies the performance for a deep convective boundary layer capped by a thick stable layer and a nearly moist-adiabatic stratification aloft where the profile parallels the green saturated adiabat in Fig.

Multivariate rank histograms of the samples of the
raw ensemble (RAW), quantile sample of the NGRVC-postprocessed ensemble
(NGRVC-Q), quantile sample of ECC (ECC-Q), random sample of ECC (ECC-R) and
the transformed RAW sample (ECC-T). Top row: average rank histograms;
bottom: band-depth histograms. Data include all available stations, all
lead times and the 31 pressure levels between 1020 and 400

The performance of the different correction methods is evaluated both
graphically with rank histograms and numerically with the energy score and
variogram score (cf. Sect.

Figure

Bootstrapped continuous ranked probability score
(CRPS, first column), energy score (ES, second column) and variogram score
(VS, third column) for the raw ensemble (RAW), the quantile sample of the
NGR-postprocessed (NGRVC-Q), the quantile sample of ECC (ECC-Q), the
random sample of ECC (ECC-R) and the transformed sample of ECC
(ECC-T). The first row shows results for all pressure ranges and lead times
combined, the second row stratified by three pressure ranges but over all lead
times, and the third row by three lead times. Note that the range of the
scores changes between rows. The bottom row shows the scores for the
temperature gradient

Figure

When the scores are stratified into three pressure ranges (second row of
Fig.

A stratification by lead time (third row of Fig.

The last row of Fig.

Postprocessing ensembles of numerical weather prediction model forecasts with
statistical models trained on past ensemble forecasts and “truth”,
i.e., observations, improves these forecasts further and has thus been a
burgeoning field of research, of which

Postprocessing the vertical structure of (ensemble) NWP forecasts, however, has remained largely unexplored. Since the vertical structure strongly influences exchange processes, onset and cessation of convection, formation of clouds and precipitation, improvement over the raw ensemble forecasts is arguably as important as for horizontal fields.

We take a different approach modeled on results for quasi-horizontal
postprocessing and postprocess the vertical profiles using a combination of
univariate calibration and copula coupling. ECC needs probability distributions, also known as margins, at each pressure level. The margins are obtained by two univariate techniques, which are enhanced variants of the classical nonhomogeneous Gaussian regression (NGR). When the coefficients of NGR vary diurnally, seasonally and in the vertical (NGRVC,

Stable layers, for example, are crucial for the impediment of vertical motions and exchange: they determine the top of the boundary layer and the mixing volume for
pollutants. While their scale might be too small to have predictability to

Using the univariately corrected margins for further and multivariate postprocessing with copula coupling better reproduces such (potentially very thin) stable layers. With ECC the rank order structure of the ensemble of NWP forecasts from the ECMWF-EPS is conserved. Several sampling strategies might be used with copula coupling. We used three. To sample randomly (ECC-R) is unadvisable as it may deliver worse results than the raw ensemble forecasts. On the other hand, quantile resampling (ECC-Q) and sampling with rescaling of the raw ensemble by conserving the relative distances between ensemble members (ECC-T) and thus also accounting for extreme ensemble members are successful. Of the two, ECC-T is overall better than ECC-Q in all three verification measures used: rank histograms, energy score and variogram score.

The largest improvements are obtained for the profiles in the atmospheric
boundary layer over all lead times and all seasons and also in the two case
studies shown. Consequently, this is also the layer where the largest potential for improvements of the NWP model is, which is a well-known fact

The main motivation for using GAMLSS models is to be able to set up additive
predictors of nonlinear smooth functions for each parameter of a distribution

We assume that the response variable

The functional terms

In the present study GAMLSS models are estimated by maximizing a penalized
log-likelihood. The models were fitted with R package

ECMWF-EPS data of ensemble mean and standard deviation
used for univariate postprocessing are available from ECMWF upon
request. Charges may apply. Data at model levels of individual ensemble
members, which ECMWF does not store, are available from the authors after
receiving permission from ECMWF

Data were processed in R using the following packages: mgcv, gamlss, crch and scoringRules.

DS, TS, GJM defined the scientific scope of this study. DS performed the statistical modelling and evaluated the results. GJM supported the meteorological analysis. TS contributed to the development of the statistical methods. All the authors contributed to the paper by writing significant parts. Furthermore all the authors discussed the results and commented on the manuscript.

The authors declare that they have no conflict of interest.

We thank the editor Chris Forest and four anonymous referees for their efforts.

This research has been supported by the Austrian Research Promotion Agency (FFG) (grant no. 846620) and the Austrian Science Fund (FWF) (grant no. P31836).

This paper was edited by Chris Forest and reviewed by four anonymous referees.