Many marine activities, such as designing ocean structures and planning marine operations, require the characterization of sea-state climate. This study investigates the statistical relationship between wind and sea states, considering its spatiotemporal behavior. A transfer function is established between wind fields over the North Atlantic (predictors) and the significant wave height (predictand) at three locations: southwest of the French coast (Gironde), the English Channel, and the Gulf of Maine. The developed method considers both wind seas and swells by including local and global predictors. Using a fully data-driven approach, the global predictors' spatiotemporal structure is defined to account for the non-local and non-instantaneous relationship between wind and waves. Weather types are constructed using a regression-guided clustering method, and the resulting clusters correspond to different wave systems (swells and wind seas). Then, in each weather type, a penalized linear regression model is fitted between the predictor and the predictand. The validation analysis proves the models skill in predicting the significant wave height, with a root mean square error of approximately 0.3 m in the three considered locations. Additionally, the study discusses the physical insights underlying the proposed method.

A sea state is a statistical description of the sea surface waves generated by wind at a given time and location. The sea state is characterized by a superposition of wind seas and swells (

High-quality wave data are essential for many marine applications, such as designing coastal and offshore structures and planning marine operations. Observational, numerical, and statistical models are the methods used for sea-state characterization.
Traditional in situ measurements obtained from buoys provide the most reliable data for sea-state parameters; however, they are only available for the last decades and are limited spatially

Various studies have compared statistical and numerical models for ocean wave parameters and other climate variables.

This study presents a statistical approach for estimating the relationship between wind conditions and ocean waves. The approach is based on weather types, which are constructed using a regression-guided clustering algorithm. These weather types are then used to link the space–time wind fields over the North Atlantic (predictors) and the significant wave height (predictand) at three locations: northwest and southwest of the French coast and the English Channel. Then, regression with ridge regularization is used to fit the relationship between wind conditions and significant wave height at each weather type. The proposed methodology considers wind sea and swells and provides additional information about the spatiotemporal relationship between wind and waves. The main contribution of this work is that it provides an entirely data-driven approach for estimating the travel time of waves from any source point to a target point, which is essential for the definition of predictors. To the best of our knowledge, the only other approach in the literature that can be used for this purpose is ESTELA

Mean Climate Forecast System Reanalysis (CFSR) zonal and meridional wind components over the period 2014–2019.

Local model results (Eq.

This paper is structured as follows. After describing the data in Sect. 2, the local predictors are defined in Sect. 3. Then, Sect. 4 describes the construction of the global predictors. Next, Sect. 5 presents the statistical model that combines the local and global predictors. Then, Sect. 6 presents the results of the SD model. Finally, the study is concluded in Sect. 7.

The atmospheric data used in this work to construct predictors are extracted from the Climate Forecast System Reanalysis (CFSR)

To comprehensively evaluate the method across a range of observed sea states, we consider three different locations: Gironde (45.2

The historical wave data used in this work for the Gironde and English Channel locations are the sea-state hindcast database HOMERE ^{®} model forced by CFSR wind. The database covers the English Channel and the Bay of Biscay with unstructured computational mesh. It contains 37 parameters and the frequency spectra on high spatial resolution, ranging from 200 to 10 km, with a 1 h time step. For the Gulf of Maine location, we consider the IOWAGA database ^{®} model forced by CFSR and ECMWF wind. To validate and interpret the results of the SD method, we consider the energy spectral partitioning, which identifies different wave systems. HOMERE uses the watershed algorithm

The temporal resolution of both predictors and predictand is upscaled from hourly to 3 h resolutions to facilitate the analysis. Both datasets comprise a common period of 26 years, from 1994 to 2019. The 1994–2013 period is used as the calibration period, while the 2014–2019 period is used as a validation period.

Wind speed, duration, and the fetch impact the characteristics of the wind sea

To investigate the capability of local variables to explain

Wind projection representation. The initial wind vector (V) at each source point is transformed into a component (B) aligned with the bearing (b) of the target point, as determined by a great circle path (dashed blue line).

The model is fitted using data from 1994 to 2013 and is assessed in a validation period from 2014 to 2019 using the Pearson correlation

Results of the local model as a function of the peak period of the three considered locations are shown in Fig.

Relationship between the target-projected wind and the spread parameter at varying values of the angle difference

Mean target-projected wind for Gironde in the winter (DJF), spring (MAM), summer (JJA), and autumn (SNO) over the period 2014–2019.

Target-projected wind at point located in (45.5

In order to take swells into account, a global predictor which describes wind conditions over the North Atlantic has to be considered. Wind data have two components, the zonal and meridional components. Each of the two components in space and time carries more or less information about the waves observed at the target point at a given date. However, using all of them as inputs to a statistical model is computationally challenging, given the high dimensionality of the data and may lead to hardly interpretable results due to the strong correlation between wind conditions at closed locations in space and time. This section defines the global predictor related to the spatiotemporal domain of the wave generation area.

Estimated travel time of waves (top panel) and the temporal width (bottom panel) using Eq. (

Results of cross-validation using two weather types: RMSE (green line) and classification accuracy (purple line) versus the logarithm of

Following

To reduce the dimension of the atmospheric variables and to create a more interpretable model, wind components at each grid point are projected into the bearing of the target point in a great circle path (Fig.

The parameter

Estimated global coefficients

Time series of

According to the dispersion relation, the group velocity of waves is expressed as

At each location

Mean of

RMSE versus the number of WTs for the validation period.

The parameters

Figure

Regions located 35

After defining the predictors, this section presents the statistical downscaling model. Firstly, the linear model that combines the local and the global predictor is considered:

Model (

Using the global predictor to construct weather types leads to clusters that only account for the global atmospheric circulation and not for the local environment (not shown). This subsection describes a regression-guided clustering method that considers both the global predictor and the predictand.

After estimating the coefficients, the contribution of a source point

We expect swell systems coming from contributions from distant areas, whereas wind sea will be associated with local contributions. A natural question that arises is whether we can identify these wave systems by using

The statistical downscaling model described in the previous section has

The most usual approach to choosing the regularization parameter

Mean of

Monthly and annual (in December–January–February) frequency occurrence of WTs in the calibration period. The continuous black line corresponds to the mean annual winter (DJF) time series of the NAO (North Atlantic Oscillation) index, and the horizontal black line indicates when NAO is less or greater than zero. When the continuous black line is below the horizontal line, the NAO is less than zero.

For the purpose of brevity, we only present the results of the weather types in Gironde, as they were found to be consistent in the other two locations.
Figure

Contingency table of

Figure

Observed versus predicted values of

Time series of observed and predicted values of

In this section, the methodology's results are presented. As for the last section, the results of the weather types were found to be consistent across the three studied locations; therefore, only the results for Gironde will be displayed. Subsequently, the overall methodology results for all three locations will be provided (see Fig.

The clusters obtained in the last section seem to be interpretable and correspond to sea-state classes obtained from the energy partitioning algorithm provided by HOMERE

Left panel: histogram of observed versus predicted

Figure

Figure

The monthly variability of WTs is shown in the left panel of Fig.

The results of the model described in Eq. (

This study proposes a method that describes the spatiotemporal relationship between wind and the significant wave height (

The statistical downscaling model combines the local and global predictors to predict

In this paper, we introduced a methodology based on observed weather types, constructed prior to the regression problem using a clustering algorithm. For future research, these weather types could be treated as latent variables within a mixture regression framework, which can be estimated using the expectation maximization (EM) algorithm. This approach would evaluate variables according to

The hindcast HOMERE data are available on their website:

Conceptualization: SO, VM, NR, and PA; methodology: SO, VM, NR, and PA; data curation: SO and NR; data visualization: SO; software: SO; supervision: VM, NR, and PA; and writing original draft: SO. All authors approved the final submitted paper.

The contact author has declared that none of the authors has any competing interests.

Publisher’s note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The authors express their gratitude to the editor and anonymous reviewers for their valuable input and extensive feedback, which greatly enhanced the quality of the work.

This paper was edited by Xiaolan Wang and reviewed by three anonymous referees.