Nonhomogeneous post-processing is often used to improve the predictive
performance of probabilistic ensemble forecasts. A common quantity used to develop,
test, and demonstrate new methods is the near-surface air temperature, which is
frequently assumed to follow a Gaussian response distribution. However,
Gaussian regression models with only a few covariates are often not able to
account for site-specific local features leading to uncalibrated forecasts and skewed residuals. This residual skewness remains even if many covariates are incorporated.
Therefore, a simple refinement of the classical nonhomogeneous Gaussian
regression model is proposed to overcome this problem by assuming a skewed
response distribution to account for possible skewness.
This study shows a comprehensive analysis of the performance of nonhomogeneous
post-processing for the

Probabilistic weather forecasts have become state-of-the-art in recent years

Statistical post-processing techniques

The two most important properties of probabilistic forecasts are sharpness and
calibration

However, in recent publications, the probability transform histograms (PIT;

So far, most studies assume a Gaussian response distribution for their
temperature post-processing methods

Section

The nonhomogeneous Gaussian regression (NGR) framework as proposed by

Classical NGR

This study compares three different distributions for temperature
post-processing: (i) the frequently-used Gaussian distribution, (ii) the
symmetric logistic distribution, and (iii) the generalized logistic
distribution type I (Fig.

Density function of the skewed logistic distribution, illustrating
the third moment (

The skewed logistic distribution has the cumulative distribution function (CDF):

As an example, values for

Results are presented at

Study area and selected stations in Germany (GER), Switzerland (CH), Italy (IT), and Austria (AUT). The markers indicate stations classified as Alpine (triangle), foreland (star), and plain (square). Large symbols represent stations that are discussed in detail in this article: Innsbruck, Austria (large triangle), and Hamburg, Germany (large square).

Temperature observations are provided by automatic weather stations (10 min
mean values). As input,

In this article, detailed case studies will be shown for Innsbruck, Austria
(Alpine site), and Hamburg, Germany (plain site;
cf., Fig.

Similar to previous works (cf.,

While the Gaussian and the logistic distribution have only two parameters
(

Covariates used in the linear predictors of the distributional
parameters

Different scores are used to assess the predictive performance of the models
tested. The overall performance is evaluated by the logarithmic score (LS;

Of particular interest for this study is the performance of the post-processing
models in terms of sharpness and calibration

Calibration is visually evaluated using probability integral transform (PIT) histograms

This section presents a detailed analysis of the different statistical models.
Section

All results presented in Sect.

Raw ensemble forecasts for Alpine sites cannot be directly used because the topography is not well resolved. Therefore, raw ensemble forecasts are typically characterized by small 80 % prediction interval widths (PIWs) around 3

To show the performance of the proposed approach, the analysis for one selected
site with a distinct Alpine character is shown (large triangle,
Fig.

Performance measures at the selected Alpine site (left)
and plain site (right) for all three models
(Gaussian: squares; logistic: circles; skewed logistic: triangles).
From the top down, the LS, CRPS,

When comparing the logistic model with the benchmark Gaussian model, the
logistic model shows small improvements in LS, especially during
nighttime. Similar behavior can be seen for the sharpness (

PIT histograms at the Alpine site for the Gaussian (black/dark line)
and skewed logistic (green/bright line) models
for the 2 d ahead forecasts (left to right:
06:00, 12:00, 18:00, and 00:00 UTC) corresponding to forecasts

Figure

Both, the Gaussian and logistic model, already show an almost uniform distribution, although for one particular hour of the day special features can be identified.
The convex shape of the Gaussian model for the all year period at

Joint time series of the empirical skewness

Fig.

To see the benefits of a nonsymmetric response distribution in
a different environment, the same study is shown for a selected
plain site (large square Fig.

Similar to the Alpine site, a pronounced diurnal cycle is visible for all models in
terms of LS and CRPS (Fig.

In comparison with the Alpine site, the plain site shows an overall better forecast performance for all measures except for RI where both stations show similar scores indicating that both stations are, on average, well calibrated. Moreover, almost all scores (LS, CRPS, and PIW) are smaller than for the Alpine site even for the longest lead time. This is mainly due to the overall better performance of the NWP for regions with no or few topographical features. In such situations the overall performance of the NWP is already adequate and the EPS provides covariates containing more information. Thus, the benefit of the statistical post-processing is much smaller compared with sites in complex terrain. In this example the Gaussian assumption seems to be an appropriate choice, and the improvements of the logistic or skewed logistic distribution are only minor.

Figure

Performance measures in terms of LS, CRPS,

As in Fig.

Median of (left to right) the logarithmic score (LS), the continuous ranked probability score (CRPS), the reliability index (RI), and three prediction intervals (PIs) reporting the prediction interval width (PIW) and the prediction interval coverage (PIC) for Alpine, foreland, and plain sites (top to bottom), evaluated for each model type (Gaussian, logistic, and skewed logistic).

Figure

In the following, the long-term training approach presented using 3 years of training data (2012, 2013, and 2014) is compared to the widely used sliding window approach utilizing only the previous 30 or 60 d for training. The validation period chosen is 2015 in order to have at least 1 year of out-of-sample data. Skewed logistic models are not estimated for sliding windows. Due to the parametrization of the skewed logistic distribution and the relatively short training periods, reliable parameter estimates can no longer be ensured; therefore, only results for the Gaussian and logistic models are shown. The estimation of all sliding window models is based on the R package “crch”

Performance measures in terms of LS, CRPS,

As in Fig.

Figures

PIT histograms at the Alpine site for the Gaussian (black/dark line)
and logistic (green/bright line) sliding 60 d models using CRPS optimization
for the 2 d ahead forecasts (left to right:
06:00, 12:00, 18:00, and 00:00 UTC) corresponding to forecasts

Therefore, Fig.

Nonhomogeneous regression is a widely used statistical method for post-processing numerical ensemble forecasts. It was originally developed to improve probabilistic air temperature forecasts and assumes a Gaussian response distribution.

However, several studies
have shown that marginal temperature distributions can be skewed or
nonsymmetric, respectively

Moreover, skewness is supposed to decrease if additional covariates (e.g., individual ensemble members, seasonal effect, and different ensemble forecast quantities) are
included in the Gaussian model (see, e.g.,

In this study, the skewed logistic distribution was used and compared to the (symmetric) logistic and Gaussian distributions for probabilistic
post-processing of the

The two logistic distributions perform better for 1 d up to 4 d ahead forecasts for the majority of the stations and lead times – in particular regarding sharpness and logarithmic score (LS) – without decreasing calibration, which is analyzed by the reliability index (RI) and probability integral transform (PIT) histograms. The amount of improvement decreases with the decreasing complexity of the topography.

When PIT histograms are used to check for calibration, they have to be checked for different seasons, lead times, and hours of day. Averaging over the whole year or multiple times of the day may mask shortcomings especially in complex terrain, and the distinct patterns as shown in the results might easily be overlooked.

A comparison to sliding window models, where a fixed number of previous days is used for training, highlights that the sliding window approach obtains sharp forecasts, but results in uncalibrated forecasts regarding PIT histograms. A longer sliding window of 60 d compared with 30 d decreases the sharpness of the probabilistic forecasts; however, it is still not calibrated and indicates that skewness occurs in the residuals. Consequently, longer training windows would have even larger issues with residual skewness. To overcome this, the current study uses a long-term training approach of 3 years and accounts for seasonality. This additional seasonality reduces most parts of the skewness, but still improves the sharpness without decreasing calibration.

The sliding training approach has the advantage of being able to react to and account for changes in the ensemble model quickly if two statistically different time periods exist. The long-term approach would need a refitting of the regressions coefficients for the new period after a change occurred, or the change would have to be treated in the statistical models if two periods are mixed during training.

In conclusion, the Gaussian assumption for probabilistic temperature post-processing may be appropriate for regions where the ensemble provides sufficient information regarding the marginal distribution of the response. However, if the covariates used in the regression model miss some features, residual skewness becomes challenging. An alternative response distribution, such as the proposed skewed logistic distribution, allows one to directly address unresolved skewness and increases the predictive performance of the probabilistic forecasts.

The results of the models including smooth splines have been achieved using the R package
“bamlss”

The third moment (skewness,

This study is based on the PhD work of MG under supervision of GJM and AZ. Simulations were performed by MG and RS; this involved a strong effort from RS, who adjusted the BAMLSS framework. Verification and visualization was performed by MG, who also prepared the paper and the initial concept. All authors worked strongly together discussing the results and commented on the paper.

The authors declare that they have no conflict of interest.

Results were partly achieved utilizing the high-performance computing infrastructure at the University of Innsbruck using the supercomputer LEO.

This project was partially funded by doctoral funding from the University of Innsbruck, Vizerektorat für Forschung, and the Austrian Research Promotion Agency (FFG), project “Prof-Cast” (grant no. 858537).

This paper was edited by Dan Cooley and reviewed by Gregory Herman and two anonymous referees.