A new probabilistic post-processing method for wind vectors is presented in a distributional regression framework employing the bivariate Gaussian distribution. In contrast to previous studies, all parameters of the distribution are simultaneously modeled, namely the location and scale parameters for both wind components and also the correlation coefficient between them employing flexible regression splines. To capture a possible mismatch between the predicted and observed wind direction, ensemble forecasts of both wind components are included using flexible two-dimensional smooth functions. This encompasses a smooth rotation of the wind direction conditional on the season and the forecasted ensemble wind direction.

The performance of the new method is tested for stations located in plains, in mountain foreland, and within an alpine valley, employing ECMWF ensemble forecasts as explanatory variables for all distribution parameters. The rotation-allowing model shows distinct improvements in terms of predictive skill for all sites compared to a baseline model that post-processes each wind component separately. Moreover, different correlation specifications are tested, and small improvements compared to the model setup with no estimated correlation could be found for stations located in alpine valleys.

Accurate forecasts of wind speed and direction are of great importance for
decision-making processes and risk management in today's society and will
likely become more important in the future. This is not only because of the
rapid change in climate and the resulting increase in severe storms

Probabilistic weather forecasts are usually issued in the form of ensemble
predictions. To account for the underlying uncertainty in the atmosphere,
numerical ensemble prediction systems (EPSs) provide a set of weather forecasts
using slightly perturbed initial conditions and different model
parameterizations

To account for the circular characteristics of wind or utilizing information of
wind speed and direction, an intuitive post-processing approach is to model a
bivariate process for the zonal and meridional wind components.

For stations in complex terrain, a possible drawback of the bivariate
post-processing approach of

Alternatively to bivariate calibration methods, wind direction can also be
employed in univariate settings. In a post-processing approach for wind speed,

In this study, we directly model the zonal and meridional wind components,
employing the bivariate Gaussian distribution as suggested by

The paper is structured as follows: Sect.

In Sect.

Overview of bivariate Gaussian model specifications. For the
“baseline model” (BLM-0; see Sect.

The zonal and meridional components of the horizontal wind vector are
represented by a bivariate Gaussian distribution. Its likelihood function

To be able to utilize the information of cyclic covariates, such as, e.g., wind
direction in addition to linear covariates, we follow

The baseline model (BLM-0) combines two univariate heteroscedastic
regression models that post-process each wind component separately with
correlation fixed at zero. Hence, for the location and scale part, it uses its
direct counterparts of the EPS as covariates, namely EPS-forecasted zonal
wind information (

Equation (

In the second model, labeled the rotation-allowing model (RAM-0), we
extend the BLM-0 setup by employing the zonal and meridional wind
information of the ensemble for the linear predictors of all location and scale
parameters. That means we use the ensemble information of both the zonal and
meridional wind components for the two components of the response

By explicitly modeling the correlation, we further extend the RAM-0
setup within this section. For the estimation of the correlation structure
different model specifications are tested. The most advanced specification,
RAM-ADV, assumes that the correlation mainly depends on the mean
ensemble wind direction (

Other implementations tested for the correlation parameter are an
intercept-only model (RAM-IC), a model with a cyclic effect solely
depending on wind direction (RAM-DIR), the
RAM-0 independent-component model (Sect.

The validation and comparison of the different model specifications are
performed for

Overview of the study area with selected stations classified as plain, foreland, and alpine station sites. The labeled stations with a white background, Hamburg and Innsbruck, are discussed in detail in Sect.

Covariates are derived from the global 50-member EPS of the European Centre for
Medium-Range Weather Forecasts (ECMWF). These EPS forecasts have a horizontal
resolution of approximately

Empirical wind distributions of observations (OBS) and mean ensemble
forecasts (ENS) for Innsbruck and Hamburg. The probability of occurrence is
color-coded and the wind speed is represented by contour lines (m s

This section presents the results of the statistical post-processing models.
The structure is as follows. First, the estimated effects of the baseline
model, BLM-0 (Sect.

The model estimation is performed on data of the first

For BLM-0, the cyclic seasonal effects for stations Hamburg and
Innsbruck are shown in Fig.

Cyclic seasonal intercept and slope effects according to
Eq. (

For Hamburg, for both location parameters

By contrast, for Innsbruck the estimated effects show a distinct annual
cycle for the location parameters

Figure

Estimated mean effects for the derived post-processed wind direction
at Innsbruck

To investigate the predictive performance of the two competing setups,
Fig.

Predictive performance in terms of the logarithmic score (LS) and the energy score (ES) based on the full predictive bivariate distribution for the out-of-sample validation period. The two specifications
BLM-0 (Eq.

After investigating the two competing location or scale setups, we now focus on
an extension of the RAM-0 model by explicitly estimating the underlying
correlation structure. Different model specifications for the correlation
parameter

Figure

Distribution of the correlation parameters for the underlying
dependence structure of the raw ensemble and for the fitted correlation
according to the models specified in Table

Figure

Skill scores aggregated over all forecast steps from

Multivariate rank histograms for raw and post-processed ensemble
forecasts according to the correlation model setups RAM-0 and
RAM-ADV. The results are shown for Innsbruck

To validate the calibration of the post-processed predictions, multivariate
rank histograms

After the previous model comparison at two weather stations,
Fig.

Aggregated skill scores (LS:

The post-processing employed by the simplest model, BLM-0, already shows a
distinct improvement over the raw EPS with the largest values for alpine valley
sites. In terms of the ES, the skill scores range between mean values of

In this study, we model the zonal and meridional wind components employing the bivariate Gaussian distribution in a distributional regression framework. In contrast to previous studies all distribution parameters, namely the location and scale parameters for both wind components but also the correlation coefficient between them, are estimated simultaneously. The overall performance of the models is evaluated for three groups of station types classified as topographically plain, mountain foreland, and alpine valley sites.

Section

The rotation-allowing model (RAM-0) utilizes the zonal and meridional ensemble wind forecasts for both components of the two-dimensional location and scale parameters. This allows the statistical model to adjust for potential misspecifications in the ensemble wind direction by a smooth rotation conditional on the day of the year and the forecasted wind direction. For stations in complex terrain, this may be particularly advantageous due to unresolved topographical features.

The estimated effects confirm a distinct wind rotation for the valley site
(Innsbruck), while for the station in the plain (Hamburg) barely any
adjustments of the forecasted wind direction are needed (see
Fig.

These findings are supported by an additional comparison against the model
inspired by

Two exemplary forecasts showing the respective observation (black
cross), the climatological estimate (gray dashed line), the EPS member
forecasts (gray points) and their empirical density (brown line), and the
estimated bivariate distributions for the setups RAM-0 and
RAM-ADV, without (green line) and with (blue line) modeled correlation, respectively. The climatological estimate uses the mean, the standard deviation, and the correlation of the observed wind components as bivariate distribution parameters. The lines show the

Several different model specifications for the correlation parameter have been
tested, among others a flexible setup employing wind direction and speed as
potential covariates for the correlation parameter by nonlinear smooth effects
following the idea of

As an illustration of the potential reasons for no more pronounced enhancements
by an explicit estimation of the dependence structure, Fig.

The study shows that the flexible rotation-allowing models bring significant performance benefits for stations located in complex terrain as well as for stations in the plain. Therefore, we propose using a similar setup employing both EPS wind components by a smooth rotation-allowing framework. For correlation, we have not found a clear distinction between the different correlation models tested for stations located in the plain and the foreland. For stations located within an alpine valley, minor improvements could be found. Despite these somewhat unexpected findings, this has clear advantages for operational usage: estimating a single bivariate response distribution forcing the correlation dependence structure to zero is the same as post-processing each wind component separately in a univariate setup with marginal Gaussian response distributions. A univariate post-processing approach for each respective wind component simplifies the estimation process in terms of complexity of the required statistical models and reduces computational time with only little loss of predictive skill, at least for the stations tested in this study.

The bivariate Gaussian model estimation is performed in R 3.5.2

Generalized additive models

To account for seasonal variations of the intercept and the linear coefficients, seasonal cyclic splines are used. If the covariates provide sufficient information, a time-adaptive training scheme might not be required. However, if the bias and/or the slope coefficient are not constant throughout the year or the covariate's skill varies over the year, these terms are mandatory to allow the statistical model to depict seasonal features.

We therefore fit one statistical model over a training data set including
several years of data, but allow the coefficient included in the linear
predictor(s)

To compare the different bivariate Gaussian models of this study, we employ
skill scores. A skill score shows the improvements over a reference. For all measures
with a perfect score of zero, the skill score simplifies to

In this study we use the logarithmic score (LS,

The calculation of the ES is based on the R package

The logarithmic score is defined based on the log-density (or log-likelihood):

To benchmark the models as presented in this study, we compare our
specifications to those of

As Fig.

In this study, we apply

Since one of the major aspects within this study is the rotation of the wind
direction, we compare our models to a model inspired by

As Fig.

Figure

This study is based on the PhD work of MNL under supervision of GJM and AZ. The majority of the work for this study was performed by MNL with strong guidance of RS. All the authors worked closely together in discussing the results and commented on the paper.

The authors declare that they have no conflict of interest.

This project was funded by the Austrian Research Promotion Agency (FFG), grant no. 858537. We also thank the Zentralanstalt für Meteorologie und Geodynamik (ZAMG) for providing access to the data. Furthermore, we are grateful to the editor and the reviewers for their valuable comments.

This research has been supported by the Austrian Research Promotion Agency (FFG) (grant no. 858537).

This paper was edited by Christopher Paciorek and reviewed by Sebastian Lerch and one anonymous referee.