Several multi-site stochastic generators of zonal and meridional components of wind are proposed in this paper. A regime-switching framework is introduced to account for the alternation of intensity and variability that is observed in wind conditions due to the existence of different weather types. This modeling blocks time series into periods in which the series is described by a single model. The regime-switching is modeled by a discrete variable that can be introduced as a latent (or hidden) variable or as an observed variable. In the latter case a clustering algorithm is used before fitting the model to extract the regime. Conditional on the regimes, the observed wind conditions are assumed to evolve as a linear Gaussian vector autoregressive (VAR) model. Various questions are explored, such as the modeling of the regime in a multi-site context, the extraction of relevant clusterings from extra variables or from the local wind data, and the link between weather types extracted from wind data and large-scale weather regimes derived from a descriptor of the atmospheric circulation. We also discuss the relative advantages of hidden and observed regime-switching models. For artificial stochastic generation of wind sequences, we show that the proposed models reproduce the average space–time motions of wind conditions, and we highlight the advantage of regime-switching models in reproducing the alternation of intensity and variability in wind conditions.

In this section, we present the context of our work and then the data used to compare the proposed Markov-switching autoregressive models.

Stochastic weather generators have been used to generate artificial sequences
of small-scale meteorological data with statistical properties similar to the
data set used for calibration. Various wind condition generators at a single
site have been proposed in the literature; see

In the northeastern Atlantic, the spatiotemporal dynamics of the wind field
is complex. This area is under the influence of an unstable atmospheric jet
stream whose large-scale fluctuations induce local alternations between
periods with high wind intensity and strong temporal variability, and less
intense and variable periods. Scientists have proposed describing the North
Atlantic atmospheric dynamics through a finite number of preferred states,
namely, weather regimes or weather types

Depending on the availability of good descriptors of the current weather
state, regime-switching can be introduced with either observed or latent
regimes. Regimes are said to be observed when they are identified a priori,
before the modeling of the local dynamics. In this case, clustering methods
are run on adequate variables to obtain relevant regimes: either the local
variables or extra variables characterizing the large-scale weather
situation, such as descriptors of the large-scale atmospheric circulation

When the regimes are said to be latent, they are introduced as a hidden
variable in the model. This framework is more complex from a statistical
point of view and the conditional distribution of wind given that the regime
has to be simple and tractable. Hidden Markov models (HMMs) have been widely
used for meteorological data

To the best of our knowledge, no comparison between observed and latent
regime-switching has been proposed in the field of stochastic generators of
wind conditions. In

In the multi-site context, the regime can either be common to all sites
(i.e., scalar; see

The paper is organized as follows. MS-AR models are introduced in
Sect.

The data under study are zonal (west–east) and meridional (north–south)
surface wind components

Left: spatial hierarchical clustering of the moving variance
associated with wind speed with four clusters (symbols). Right: joint and
marginal distribution of

We focus on gridded locations between latitudes

Components

In this section, we introduce the proposed models and discuss their parameter estimation in cases of both observed and latent regimes.

In this paper, we consider the following class of models. Let

For both kinds of models, covariates can be included. The easiest way is to
include them in the intercept parameter

To avoid over-parameterization of the conditional models, we first work with a reduced data set. In the following, all the proposed models will be fitted on the subset of sites (1, 6, 10, 13, 18), the extension to a wider region being left for future studies.

First, let us suppose that the complete set of observations

For each

Concerning the Markov chain

Time series of wind speed in January 2012 and a posteriori regimes
from the fitting of a H-MS-VAR. The lighter is the grey; the smaller
is the determinant of

When observations only of process

The EM algorithm cycles through two steps: the expectation step and the
maximization step

Leftmost panel: matrix with the number of the station is printed;
then, from left to right, conditional probabilities of occurrence of regime

In this paper, we use

When the current weather state is not estimated a priori, it is introduced as
a latent variable. Hidden regime-switching models have been used in various
fields; see

Here, the assumption of a common regional regime is investigated, and we show
that this assumption is acceptable when the considered area is homogeneous.
The homogeneous single-site MS-AR model introduced in

The sequences of regimes are compared in Fig.

Moreover, we can notice an eastward propagation in wind events, the darkest
regimes often being observed at western stations (station 1) prior to eastern
sites (10 and 18). The bottom panel of Fig.

In Fig.

At each site, the physical interpretation of each regime is similar. Indeed,
the first regime corresponds mainly to anticyclonic conditions with easterly
winds and a slowly varying intensity (the variance of the innovation of the
AR model is lower than in the two other regimes, and the first AR coefficient
is larger; see Table

Parameter values obtained when fitting a H-MS-VAR at the
different sites: diagonal of the transition matrix

The two other regimes correspond to cyclonic conditions with westerly winds
and a higher temporal variability in the intensity (see Fig.

Top panel: moving mean of wind speed computed on 2-day intervals
(nine time steps) for each regime of the H-MS-VAR model fitted at
site

In Fig.

Coefficients of the autoregressive process

The assumption of a regional regime seems appropriate in the considered area and is thus kept for the modeling of the multi-site wind in the following.

Conversely to the previous section, one may derive the regimes separately from the fitting of the conditional model. For such a priori regime-switching models, the derivation of observed regimes can be done with appropriate clustering methods. We seek weather states that are distinct from one other and in which the data are homogeneous. Clustering can be run either on the local variables under study or on extra variables: the former leads to weather states that are more appropriate to the local data, while the latter can provide more meteorologically consistent regimes, for example, with more information about the large-scale situation. In this subsection, we propose three clusterings, which differ by the clustering method and/or by the variables used to derive the a priori regimes.

As a first clustering, we use a classification into four large-scale weather
regimes that is commonly used in climate studies to characterize the
wintertime atmospheric dynamics over the North Atlantic/European sector

The positive phase of the North Atlantic Oscillation (hereafter NAO

The negative phase of the NAO (NAO

The Scandinavian blocking (BL), characterized by a strong anticyclone over northern Europe able to totally block the westerly flow over western Europe

The Atlantic Ridge (AR), characterized by a strong west–east pressure dipole bringing polar air masses over western Europe

To derive these regimes, we use the same methodology as in

To derive observed regimes from local wind variables, one can first use a

The hidden structure of the Markov chain provides more stable regimes than
with a

Then two sets of descriptors of the data (i.e., local variables) are
proposed. The first partition, denoted

Time series of wind speed in January 2012 and a priori regimes
extracted from the proposed methods above. The darker is the grey; the
smaller is the determinant of

The proposed clusterings are compared through various analyses. We seek a
clustering that is physically meaningful and appropriate in terms of
conditional autoregressive models. For a proper comparison, for all
clusterings, we decide to order regimes from the more persistent to the less
persistent. This is done according to the determinant of the matrix

Sequences of regimes from the proposed clusterings are shown in
Fig.

In Fig.

Since different descriptors are used,

The regimes of

Average fields of

To compare the associations between the different classifications, a multiple
correspondence analysis is made between the four categorical variables that
represent each classification. This analysis can be viewed as an analog of a
principal component analysis for categorical variables where the associations
between the variables are measured with the Chi-squared distance. The regimes
of each classification are projected on the first two components and
displayed in Fig.

First plan of the multiple correspondence analysis made of the four classifications. Each regime of the four classifications is depicted.

Quantitative criteria are considered in order to complete this analysis. The
optimal value of the complete log-likelihood of the model is generally a good
measure of the statistical relevance of a model. The complete log-likelihood,
given in Eq. (

Note that the first term is a function of the total time spent in each regime
and the associated determinant of covariance matrix of innovation (note that
the one-step-ahead error of the forecast is linked to this quantity). The
longer the time spent in a regime with a weak determinant of covariance of
innovation, the greater the log-likelihood (see Table

Joint probability of occurrence of the three local regimes identified by the proposed models in rows and the four large-scale regimes in columns.

Left: joint and marginal distributions of simulated data at site

The clustering

In this section we quantitatively compare the large-scale regimes described
by

For the three clusterings, the local regimes seem to appear in preferential
large-scale weather regimes. The strongest link with

The regimes of H-MS-VAR and

Top: correlation of between

Moving standard deviation of the value

In this section, we compare models VAR(2),

First, marginal statistics at the central site

The space–time correlation function of the multivariate process

To study patterns at an instantaneous timescale, we focus on the ability of
the models to reproduce the alternation of temporal variability. Indeed, the
alternation of different weather states induces an alternation in the
intensity and temporal variability of wind. In Fig.

Similar diagnostics to Fig.

In Sect.

In this paper we have introduced an observed and latent regime-switching
framework, and we have shown that both types of regime-switching models have
various advantages. Models with observed switchings may account for relevant
regimes that correspond to characteristic meteorological conditions in
Europe. The choice of the clustering method and of the descriptors of the
data is crucial, as discussed in Sect.

The hidden regime-switching framework seems to overcome this insufficiency by providing regimes that are driven by the conditional distribution and therefore adapted to the estimation. When considering hidden regime-switching models, however, the estimation procedure may become challenging when sophisticated marginal models are considered. The extracted regimes are driven mainly by the local data and the proposed conditional distribution, and consequently they might have less physical interpretation than do regimes derived from other clusterings. Nevertheless, in this study we saw that for the proposed model and studied data set, the associated regimes were not physically inconsistent. Moreover, the use of hidden regime-switching models saves effort in choosing an appropriate observed a priori clustering.

Concerning the proposed observed regime-switching models, there seems to be a
compromise between physically interpretable regimes and a good description of
the conditional model by a VAR, as highlighted in Sect.

Future work may involve investigating reduced parameterizations of the autoregressive coefficients and of the matrices of covariance of innovations, thus helping to adapt the model to a larger data set. Indeed, the number of parameters is already high with the small data set under consideration, and attempts to use parametric shapes for parameters reveal that a huge effort will be needed to extract consistent results. Furthermore, when looking at the autoregressive matrices, one sees generally privileged predictors according to the regimes, a situation that motivates the use of constraint matrices in each regime.

The submitted paper has been created by UChicago Argonne, LLC, Operator of Argonne National Laboratory (“Argonne”). Argonne, a US Department of Energy Office of Science laboratory, is operated under contract no. DE-AC02-06CH11357. Edited by: W. Kleiber Reviewed by: one anonymous referee