Maximum Wave Height Prediction Based on Buoy Data: Application of LightGBM and TCN-BiGRU

Yang, Baisong; Deng, Lihao; Xu, Nan; Lv, Yaxuan; Cui, Yani

doi:10.3390/jmse13102009

Open AccessArticle

Maximum Wave Height Prediction Based on Buoy Data: Application of LightGBM and TCN-BiGRU

by

Baisong Yang

¹

,

Lihao Deng

²,

Nan Xu

¹,

Yaxuan Lv

² and

Yani Cui

^2,*

¹

Faculty of Data Science, City University of Macau, Macau 999078, China

²

School of Information and Communication Engineering, Hainan University, Haikou 570228, China

^*

Author to whom correspondence should be addressed.

J. Mar. Sci. Eng. 2025, 13(10), 2009; https://doi.org/10.3390/jmse13102009

Submission received: 3 September 2025 / Revised: 13 October 2025 / Accepted: 17 October 2025 / Published: 20 October 2025

(This article belongs to the Section Physical Oceanography)

Download

Browse Figures

Versions Notes

Abstract

Extreme sea conditions caused by tropical cyclones pose significant risks to coastal safety, infrastructure, and ecosystems. Although existing models have advanced in predicting Significant Wave Height (SWH), their performance in predicting Maximum Wave Height (MWH) remains limited, particularly in capturing rapid wave fluctuations and localized meteorological dynamics. This study proposes a novel MWH prediction framework that integrates high-resolution buoy observations with deep learning. A moored buoy deployed in the Qiongzhou Strait provides precise nearshore observations, compensating for limitations in reanalysis datasets. Light Gradient Boosting Machine (LightGBM) is employed for key feature selection, and a hybrid Bidirectional Temporal Convolutional Network-Bidirectional Gated Recurrent Unit (BiTCN-BiGRU) model is constructed to capture both short- and long-term temporal dependencies. The results show that BiTCN-BiGRU outperforms BiGRU, reducing MAE by 6.11%, 5.41%, and 14.09% for 1-h, 3-h, and 6-h forecasts. This study also introduces the Time Distortion Index (TDI) into MWH prediction as a novel metric for evaluating temporal alignment. This study offers valuable insights for disaster warning, coastal protection, and risk mitigation under extreme marine conditions.

Keywords:

Maximum Wave Height; LightGBM; BiGRU; TCN

1. Introduction

Global climate change is profoundly destabilizing Earth’s systems, with a growing emphasis on extreme weather events triggered by warming trends. In the Western North Pacific, the translation speed of tropical cyclones has slowed since the 1980s, resulting in prolonged durations that exacerbate storm surge intensity and increase coastal vulnerability [1]. The anticipated intensification of powerful typhoons is likely to compound the already elevated risk level in the South China Sea region [2]. Tropical cyclones not only trigger secondary disasters such as storm surges and flooding but also have profound impacts on the ecological environment and socio-economic development of coastal areas [3]. For instance, in the fall of 2024, consecutive typhoons—including the destructive No. 11 “YAGI”—severely affected Hainan Province, resulting in major losses to life and property [4]. Against this backdrop, there is heightened concern regarding the destructive potential of extreme sea conditions, particularly their impact on coastal regions during severe weather events.

Wave forecasting is a crucial reference for marine engineering, offshore activities, and coastal infrastructure design [5]. Under severe weather conditions, extreme wave heights can pose serious risks to the safety of marine facilities and lead to significant loss of life and property in coastal regions [6]. While most existing research has concentrated on the prediction of SWH, limited attention has been paid to predicting MWH. This is because statistically defined extreme events—such as waves exceeding specified thresholds—occur infrequently in time-series data, reducing their influence on conventional accuracy metrics [7]. However, MWH refers to the highest single wave observed within a specific period, rather than a probabilistic extreme, and often coincides with the most hazardous sea states during tropical cyclones. Accurate prediction of MWH is therefore vital for improving marine safety and disaster resilience. In regions like Hainan Province along the South China Sea, strengthening MWH monitoring and early warning systems is key to reducing tropical cyclone–related impacts and enhancing disaster preparedness.

Currently, wave height forecasting primarily relies on Numerical Weather Prediction (NWP) models, particularly the widely-used WAVEWATCH-III (WW3) and Simulating WAves Nearshore (SWAN) models [8,9,10]. Umesh and Behera [11] improved wave height forecasting in the nearshore waters of Eastern India by using a nested SWAN, Simulating WAves till SHore (SWASH), model and optimizing grid resolution through sensitivity analysis, thereby enhancing the accuracy of nearshore predictions. Vijayan et al. [12] improved the accuracy of hurricane wave modeling in the Gulf of Mexico by dynamically coupling the SWAN and ADvanced CIRCulation (ADCIRC) models, and validated their reliability during Category 5 Hurricane Michael. Cicon et al. [13] conducted probabilistic forecasting of rogue waves in the Northeast Pacific using the WAVEWATCH III model. They confirmed that the crest–trough correlation r—a spectral shape parameter related to wave bandwidth—exhibited the highest univariate correlation with rogue wave probability, outperforming conventional metrics like the Benjamin–Feir Index (BFI).

Although NWP models can provide global wave forecasts, they are typically limited to specific grid points with low resolution, making them inadequate for localized, short-term fine-scale predictions [14,15]. In recent years, machine learning models have gained attention for their outstanding performance in short-term predictions, offering a new approach to wave forecasting [16,17,18]. Savitha et al. [19] further applied sequential learning algorithms, namely the Minimal Resource Allocation Network (MRAN) and the Growing and Pruning Radial Basis Function (GAP-RBF) network, to forecast daily wave heights, evaluating model performance at different terrain stations. The results showed that it outperformed traditional methods in both generalization ability and prediction accuracy. Gracia et al. [20] proposed an integrated model combining a Multilayer Perceptron (MLP) and Gradient Boosting Decision Tree (GBDT) and validated its effectiveness at a buoy station in the Spanish Estuary Port. The results showed that combining machine learning with numerical models significantly improved prediction accuracy. Afzal et al. [21] used Support Vector Machine (SVM) to predict SWH and analyzed its seasonal variation using the Generalized Extreme Value (GEV) theory. The study showed that the SVM model achieved a prediction accuracy of 99.80%, outperforming Linear Regression (LR) and Artificial Neural Networks (ANNs), demonstrating superior performance in predicting SWH. However, conventional machine learning can only capture simple local features, struggle with complex data, and have poor generalization ability. In contrast, deep learning can automatically extract features, handle complex data, and exhibit stronger learning and generalization capabilities on large datasets [22].

With the rapid development of deep learning technology, its potential in wave height forecasting has gradually emerged. Jörges et al. [23] proposed a model based on Long Short-Term Memory (LSTM) networks for both short-term and long-term SWH forecasting in coastal areas. The study showed that the LSTM model, after incorporating bathymetric data, significantly outperformed the Deep Feedforward Neural Network (FNN) in forecasting performance. Elbisy and Elbisy [24] explored the combination of ANN and Multivariate Additive Regression Trees (MARTs), proposing several improved models. The results showed that the MART model excelled in both accuracy and efficiency. Wei and Davison [25] proposed a model based on Convolutional Neural Networks (CNNs) to forecast nearshore waves and fluid dynamics, validating its high accuracy in complex wave propagation and circulation prediction. Gao et al. [26] developed a Convolutional Long Short-Term Memory (ConvLSTM) model to predict the SWH, mean period, and mean wavelength in the Northwest Pacific wave field, with computational efficiency hundreds of times higher than traditional numerical models. Chen and Huang [27] proposed a model based on Convolutional Gated Recurrent Units (CGRU), which effectively extracts spatiotemporal features from X-band ocean radar backscatter image sequences to estimate SWH. The experimental results demonstrated that the CGRU-based model significantly outperformed methods based on Signal-to-Noise Ratio (SNR) and CNN in rainy day image sequences. Specifically, the model’s Root Mean Square Deviation (RMSD) was reduced from nearly 0.90 m to 0.54 m, while significantly improving the underestimation issue.

While deep learning algorithms continue to evolve, their effectiveness often depends on the availability of high-quality observational data. In this context, buoy data serve as a vital input source for enhancing model performance, particularly under complex and dynamic ocean conditions. Buoy data, as an essential component of wave forecasting, provide critical support for improving model accuracy. Chen and Wang [28] predicted typhoon wave heights along the Taiwan coast using a Support Vector Regression (SVR) model. The study showed that incorporating nearby buoy data significantly improved prediction accuracy. Dogan et al. [29] developed a model based on Bidirectional Recurrent Neural Networks (Bi-RNN) and LSTM, achieving high-precision predictions by utilizing buoy-observed wave parameters. Wang and Ying [30] developed a hybrid model based on LSTM–Gated Recurrent Unit (GRU)–Kernel Density Estimation (KDE), integrating multiple feature data to predict wave heights. The results showed that the model outperformed traditional methods in multi-step predictions. Minuzzi and Farina [31] proposed an LSTM model combining ECMWF Reanalysis 5th Generation (ERA5) reanalysis data and buoy data for short-term real-time wave height forecasting, demonstrating strong applicability. Breunung and Balachandran [32] trained a neural network to predict anomalous wave heights in real time using buoy-recorded data, reducing the uncertainty associated with traditional theories’ dependence on the causes of anomalous wave heights.

The accuracy of wave height prediction models is strongly influenced by the quality of input data, particularly buoy observations, which offer direct and high-frequency measurements [29,33,34,35]. As deep learning techniques continue to evolve, a growing body of research has explored their integration with buoy data, demonstrating significant potential in operational wave forecasting [36]. As shown in Table 1, existing studies differ considerably in terms of data sources and prediction targets, revealing heterogeneous modeling strategies. Nevertheless, several research gaps remain. First, although SWH has been extensively studied, MWH—which often better reflects the potential risk in marine environments—has received comparatively less attention [37]. Second, although reanalysis products such as ERA5 provide spatially comprehensive datasets, their inherent latency limits real-time applicability [38]. Meanwhile, buoy deployment remains sparse in complex and nearshore sea areas, which restricts the prediction accuracy of models trained on limited observational data [39]. Third, techniques such as the Time Distortion Index (TDI) and Dynamic Time Warping (DTW) have been introduced in some studies for time-series model evaluation [40]. DTW has been applied to improve multi-step SWH forecasting [41]. However, the use of DTW in MWH prediction remains uncommon, limiting the ability of existing models to quantify prediction delays and time-alignment performance. Addressing these challenges requires not only algorithmic improvements but also advances in data integration, sensor coverage, and evaluation frameworks.

To solve the above problems, this study establishes a comprehensive research framework (Figure 1) encompassing data collection, feature selection, model development, and performance evaluation. Hourly observational data collected from a moored buoy deployed in the Qiongzhou Strait are used to compensate for nearshore data gaps. LightGBM is employed to identify and select key features, thereby improving the relevance of inputs for MWH prediction. Building on these inputs, two improved models combining Temporal Convolutional Networks (TCN) and Bidirectional Gated Recurrent Units (BiGRU) (STCN-BiGRU and BiTCN-BiGRU) are proposed for static multi-step MWH prediction. In addition, the Time Distortion Index (TDI) is introduced—for the first time in this context—as an evaluation metric to better quantify the time-alignment capability of MWH prediction models. This integrated approach establishes a novel framework for improving both the accuracy and temporal consistency of MWH prediction, thereby providing valuable technical support for disaster warning and coastal engineering design.

The structure of this study is organized as follows: Section 2 introduces the data and methods, focusing on the study area, data sources, and data preprocessing; Section 3 describes the methods and models used in this study in detail, including the introduction of the TDI, and further explains the experimental design and related settings; Section 4 presents the experimental results, validating model performance under normal weather conditions—specifically, the test period during which no tropical cyclones were recorded in the South China Sea; Section 5 summarizes the study, discusses its limitations, and suggests future improvements and applications.

2. Data and Data Processing

This section details the source, structure, and preprocessing of the buoy observational data used for model training. It also outlines the steps for constructing temporal input–output sequences necessary for multi-step MWH prediction.

2.1. Study Area and Dataset

The study area is located within the Qiongzhou Strait, spanning from 109°42′ E to 110°41′ E and from 19°52′ N to 20°16′ N. The full observational dataset spans the period from 17 May–20 August 2024, with a temporal resolution of 1 h, providing high-frequency measurements essential for time-series modeling and short-term wave forecasting. This dataset was subsequently divided into training, validation, and testing subsets following a forward-looking time-series forecasting strategy. Detailed data segmentation and preprocessing procedures are described in Section 2.2.

A moored buoy was deployed at 110.02° E, 19.97° N to collect in-situ observations for wave analysis and model development. As illustrated in Figure 2, the study area’s topographic map is shown in Figure 2a, with the buoy deployment location labeled as “Buoy.” A photograph of the deployed buoy is presented in Figure 2b. Figure 2c displays the complete time series of MWH, Mean Wave Height (WHmean), and SWH recorded by the buoy during the entire study period, illustrating the dynamic range and temporal characteristics of the wave data used for model development and evaluation. Figure 2d presents a wind rose diagram based on the in-situ measurements, characterizing the prevailing local wind regime.

The buoy captured the overall wave field, including both locally generated wind waves and remotely propagated swell. However, due to the lack of directional spectral decomposition data and the dominance of wind–wave activity during the observation period, swell-specific characteristics were not analyzed separately.

2.2. Data Processing

This study uses observational data from the self-owned moored buoy in the Qiongzhou Strait as the core data source. The buoy monitors atmospheric parameters, sea surface properties, and subsurface characteristics (down to 15 m below sea level). The observational data consists of 34 attributes, including: Time, Longitude (Lon), Latitude (Lat), Spectral Peak Wave Direction (SPWD), Wave Direction (WvD), 1/10 Wave Height (WH1/10), 1/3 Wave Height (WH1/3), MWH, Average Wave Height (WHmean), Spectral Effective Average Wave Height (WHs), 1/10 Wave Period (WT1/10), 1/3 Wave Period (WT1/3), Maximum Wave Period (WTmax), Average Wave Period (WTmean), Spectral Peak Period (WTp), Spectral Average Wave Period (WTm), Wave Num (WN), Current Direction (CD), Current Speed (CS), Air Humidity (AH), Atmospheric Pressure (AP), Air Temperature (AT), Wind Direction (WD), Wind Level (WL), Wind Speed (WS), and Optical Rainfall (OR).

As shown in Figure 3, the raw buoy data were first reformatted and cleaned by converting all observations into a wide-table structure and standardizing the timestamps to hourly resolution. A continuous hourly time series was then constructed to identify and fill missing records using linear interpolation. Variable units were also standardized—for example, current speed was converted to meters per second (m/s)—to ensure consistency across all parameters. After data cleaning, lagged input features (

t - 1

to

t - 12

) and multi-step prediction targets (

t + 1

to

t + 6

) were generated through temporal shifting, producing a complete dataset for sequence-to-sequence modeling. The final dataset spans the period from 17 May–20 August 2024, comprising 2263 hourly records. It was chronologically split into training (17 May–1 August 2024; 1810 samples), validation (1–10 August 2024; 226 samples), and testing (10–20 August 2024; 227 samples) sets in an 8:1:1 ratio, ensuring temporal integrity and robust model evaluation.

3. Method and Framework

This section outlines the proposed framework for MWH prediction, which integrates a feature selection stage with deep learning-based temporal modeling. The key components, including model architecture, activation functions, and optimization strategies, are explained in detail to illustrate the modeling pipeline.

3.1. LightGBM for Feature Selection

In this study, LightGBM was used to analyze the contribution of each feature in the moored buoy observational data to the prediction of MWH, with the aim of identifying the most critical features and optimizing model performance. LightGBM, an efficient GBDT algorithm, optimizes based on an additive model, adding a new learner in each iteration [43]. The iterative update formula for LightGBM is as follows:

f_{m} (x) = f_{m - 1} (x) + η \cdot h_{m} (x),

(1)

In the LightGBM framework, the model prediction after the m-th iteration is denoted as

f_{m} (x)

, where

η

represents the learning rate and

h_{m} (x)

is the decision tree constructed during that iteration. At each boosting step, LightGBM leverages both the first-order gradient and the second-order derivative (Hessian) to determine the optimal tree-splitting strategy, thereby accelerating convergence and improving model robustness.

Owing to its high computational efficiency, low memory footprint, and scalability for large datasets, LightGBM has become a widely adopted algorithm in various machine learning tasks. Compared to conventional Gradient Boosting Decision Tree (GBDT) approaches, LightGBM achieves superior performance through optimizations such as histogram-based learning and leaf-wise tree growth. In this study, LightGBM was chosen as the primary tool for feature selection and dimensionality reduction due to its proven effectiveness in identifying key predictors relevant to MWH forecasting.

3.2. BiGRU

This study chooses BiGRU as the core model for MWH prediction. Compared to traditional RNN and LSTM, BiGRU combines the simple structure of GRU with bidirectional propagation, enabling more efficient capture of temporal dependencies with lower computational cost. RNN, one of the earliest proposed recurrent neural networks, transmits information between time steps through its internal recurrent structure to capture dependencies in sequence data [44]. Its core formula is:

h_{t} = σ (W_{h} h_{t - 1} + U_{h} x_{t} + b_{h}),

(2)

Here,

h_{t}

represents the hidden state at the current time step,

h_{t - 1}

denotes the hidden state from the previous time step, and

x_{t}

is the input at the current time step.

W_{h}

,

U_{h}

, and

b_{h}

are the weight matrices for the hidden state, input-to-hidden weights, and bias term, respectively.

σ

is the activation function (typically Sigmoid).

However, RNN suffers from the vanishing or exploding gradient problem when processing long time-series data, limiting their ability to model long-term dependencies. To address this, LSTM introduces a gating mechanism, including the forget gate, input gate, and output gate, which significantly alleviates the vanishing gradient problem, allowing LSTM to excel in modeling long-term sequences [45]. However, the complex structure of LSTM results in higher computational costs. In contrast, GRU simplifies the model structure by merging the input and forget gates while retaining the reset and update gates, reducing computational costs [46]. Therefore, GRU consists of three components: the update gate, reset gate, and candidate hidden state, with the core formula given by:

z_{t} = σ (W_{z} h_{t - 1} + U_{z} x_{t} + b_{z}),

(3)

r_{t} = σ (W_{r} h_{t - 1} + U_{r} x_{t} + b_{r}),

(4)

{\tilde{h}}_{t} = tanh (W_{h} (r_{t} ⊙ h_{t - 1}) + U_{h} x_{t} + b_{h}),

(5)

h_{t} = (1 - z_{t}) ⊙ h_{t - 1} + z_{t} ⊙ {\tilde{h}}_{t},

(6)

Here,

z_{t}

represents the output of the update gate,

r_{t}

represents the output of the reset gate,

{\tilde{h}}_{t}

represents the candidate hidden state,

h_{t}

represents the current hidden state,

h_{t - 1}

represents the previous hidden state,

x_{t}

represents the input at the current time step,

W_{*}

,

U_{*}

and

b_{*}

represent the weight matrices, input weight matrices, and bias terms for different gating mechanisms, and ⊙ denotes element-wise multiplication.

BiGRU extends the GRU by incorporating a bidirectional propagation mechanism, which allows it to simultaneously process both forward and backward sequential data, thereby enhancing the model’s ability to capture temporal dependencies. Figure 4 illustrates the structure of the BiGRU model. On the left, the bidirectional BiGRU model is shown, while on the right is a single GRU unit. In the BiGRU model, both forward and backward GRU units work together to process sequential data, significantly enhancing the model’s temporal modeling capability. Compared to RNN and LSTM, BiGRU not only simplifies the structure but also improves the model’s ability to handle complex prediction tasks by leveraging bidirectional sequential data. Overall, RNN is more suitable for short-term sequence tasks, while LSTM effectively handles long-term sequences but at a higher computational cost. In contrast, BiGRU, by modeling bidirectional temporal data, improves prediction accuracy while reducing computational cost, making it the ideal choice for the MWH prediction task in this study.

3.3. TCN

TCN is a time-series prediction model based on CNN, designed to address issues such as gradient vanishing and low computational efficiency encountered by traditional methods like RNN when handling long sequences. Compared to RNN, TCN replaces recurrent layers with causal and dilated convolutions, enabling parallel computation at each time step and thus improving training efficiency [47]. Dilated convolutions increase the receptive field, allowing the model to capture dependencies over long time steps, while causal convolutions ensure that the model relies only on past input information for prediction, thus preventing leakage of future data.

Figure 5 illustrates the TCN architecture. By combining these two convolution types, TCN enhances its ability to capture temporal dependencies while processing long sequences stably without introducing significant computational burden, effectively mitigating the gradient vanishing and explosion problems common in RNN. Furthermore, TCN enhances model performance through techniques such as Dropout, activation functions, and weight normalization: Dropout improves generalization by randomly dropping neurons; activation functions introduce nonlinear mappings to enhance representational power; and weight normalization standardizes layer weights to promote stability in weight updates during training, accelerating convergence.

Due to the limited sample size in this study, the model may suffer from reduced generalization ability, potentially affecting prediction stability and accuracy [48]. To mitigate this issue, the Temporal Convolutional Network (TCN) architecture was enhanced by incorporating the GELU (Gaussian Error Linear Unit) activation function and Layer Normalization (Layer Norm), as shown in Figure 6. GELU offers a smooth activation profile that dynamically adjusts output based on input magnitude, combining the benefits of dropout and ReLU (Rectified Linear Unit) [49]. Compared to the ReLU, which outputs zero for negative inputs and passes positive values unchanged, GELU reduces sharp gradient fluctuations and improves sensitivity to subtle input variations, thereby improving training stability [50]. To further enhance model robustness, Layer Norm was applied. Unlike Batch Normalization, which normalizes features across the batch dimension and relies on batch statistics, Layer Norm normalizes across the feature dimension for each data point independently [51,52]. This characteristic makes it more effective for small-batch training and sequential modeling tasks [53]. By stabilizing the feature distribution at each layer, Layer Norm facilitates faster convergence and alleviates internal covariate shift [54]. Although Weight Normalization can also improve convergence by reparameterizing the weight vectors, Layer Norm has demonstrated superior performance in small-sample scenarios with noisy or biased inputs [55].

In addition, this study adopts the bidirectional characteristics of BiGRU by setting two identical TCN in parallel, with one processing the sequence in reverse. By simultaneously handling both forward and reverse temporal information, the model is able to capture the dependencies in time-series data more comprehensively.

3.4. TCN-BiGRU

To effectively capture the temporal dependencies of MWH, this study proposes two models combining TCN and BiGRU: STCN-BiGRU and BiTCN-BiGRU. The key distinction between these two models lies in the structural design of the TCN layer.

In the STCN-BiGRU model, the TCN layer uses a unidirectional structure to capture long-range dependencies in the time series. In contrast, the BiTCN-BiGRU model employs a parallel structure, with one TCN processing the forward data and the other handling the reverse data. This parallel setup enables the model to more comprehensively process temporal data and improve adaptability to diverse data patterns.

Both STCN-BiGRU and BiTCN-BiGRU models integrate a BiGRU layer following the TCN layer. As the core component of the model, BiGRU further processes and enhances the temporal features extracted from the TCN layer. BiGRU uses bidirectional information transmission, allowing it to simultaneously learn temporal dependencies from both forward and reverse sequences. In the MWH prediction task, the BiGRU layer leverages both forward and backward temporal information to enhance the model’s prediction accuracy and stability.

The role of TCN is to efficiently capture long-range dependencies in time series, while BiGRU further captures temporal information through bidirectional propagation [56,57,58,59,60]. The combination of both models complements each other’s strengths, offering an efficient and accurate solution for MWH prediction tasks. To further enhance the model’s generalization ability, a Dropout layer is incorporated to randomly discard neurons, preventing overfitting and improving the model’s adaptability to unseen data. Finally, the output layer maps the extracted features to the final prediction space. A fully connected layer performs a weighted sum of the features, combined with a linear activation function, to output the predicted MWH at each time step for the next 1, 3, and 6 h.

During the model optimization process, hyperparameters are tuned using GridSearch. The key parameters and their ranges include the number of hidden layers, the number of units in each hidden layer, batch size, dropout rate, the number of convolutional kernels, kernel size, and dilation rate. The selection of these hyperparameters is aimed at enhancing the model’s predictive performance and generalization ability. During optimization, the Adam optimizer is used to enhance training efficiency, the Mean Squared Error (MSE) loss function is selected, and an EarlyStopping strategy (patience = 5) is employed to prevent overfitting, ensuring the model’s stability and generalization capability. Hyperparameter tuning and model training are performed using the training set, while the validation set is employed to monitor generalization performance and guide the selection of optimal model configurations. The final model’s effectiveness is then evaluated on the test set.

3.5. Model Evaluation Metrics

3.5.1. Traditional Evaluation Metrics

This study employs MAE, Root Mean Squared Error (RMSE), and R² as evaluation metrics to assess the model’s performance in predicting MWH. The specific formulas are as follows:

MAE = \frac{1}{n} \sum_{i = 1}^{n} |y_{i} - {\overset{⌢}{y}}_{i}|,

(7)

RMSE = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - {\overset{⌢}{y}}_{i})}^{2}},

(8)

R^{2} = 1 - \frac{\sum_{i = 1}^{n} {(y_{i} - {\overset{⌢}{y}}_{i})}^{2}}{\sum_{i = 1}^{n} {(y_{i} - \bar{y})}^{2}},

(9)

Here,

y_{i}

represents the actual value for the i-th data point,

{\overset{⌢}{y}}_{i}

is the predicted value for the i-th data point

\bar{y}

is the mean of the actual values, and n is the total number of data points.

Both MAE and RMSE are used to measure the difference between the predicted and actual values of the MWH. A smaller value indicates better prediction performance. MAE calculates the absolute difference between the predicted and actual values, with smaller values indicating predictions closer to the true values, reflecting the model’s accuracy. RMSE assesses the model’s ability to handle errors in extreme values and its overall stability, being particularly sensitive to larger errors, thus effectively reflecting the model’s performance in handling extreme values.

R^{2}

indirectly measures the model’s ability to reduce the sum of squared residuals. A higher

R^{2}

means the model captures the variation in the target variable better. As

R^{2}

approaches 1, the model fit improves, and as it approaches 0, the model fit worsens.

3.5.2. Time Distortion Index

In this study, in addition to traditional evaluation metrics such as MAE, RMSE, and

R^{2}

, the TDI is introduced for a more comprehensive assessment of the MWH model’s performance. While the aforementioned traditional error metrics are commonly used in time-series forecasting, they do not effectively measure the issue of prediction delay. Prediction delay refers to the misalignment between the predicted and actual values in terms of time, where low error metrics may still be accompanied by temporal lags. To address this issue, the TDI quantifies the extent of deformation along the time axis, offering a more thorough evaluation.

The calculation of TDI is based on the DTW algorithm, which aligns time series non-linearly and calculates the minimal distance [61]. However, DTW only reflects the degree of alignment between sequences and does not directly quantify time distortion [62,63]. TDI addresses this by analyzing the alignment path of DTW, evaluating changes in time steps, and thus quantifying the extent of time distortion [64]. The calculation steps of TDI include the following: (1) compute the DTW alignment path: obtain the optimal alignment path between two time series and the corresponding changes in time steps; (2) quantify time distortion: analyze the time step variations in the alignment path and calculate the cumulative degree of time distortion; (3) standardization: normalize the TDI to ensure comparability across datasets.

The Time Distortion Index (TDI) provides an effective means of quantifying prediction delays, complementing traditional error metrics such as MAE and RMSE, which focus solely on magnitude discrepancies. TDI is particularly valuable in time-series forecasting tasks where temporal alignment between predictions and observations is critical. In the context of this study, TDI offers a rigorous basis for evaluating the timing accuracy of MWH forecasts, enabling a more comprehensive assessment of model performance.

4. Results and Discussion

To comprehensively evaluate the performance of the proposed model, Section 4.1 first examines the effectiveness of the feature selection method, highlighting the influence of individual features on model performance and providing a foundation for subsequent dimensionality reduction. Building upon this, Section 4.2 evaluates the model’s predictive capabilities under normal weather conditions, while Section 4.2.3 focuses on its prediction timeliness under normal weather.

4.1. Feature Selection on the Test Set

To identify the most influential features for predicting MWH and achieve feature reduction to improve training efficiency and predictive performance, we employed LightGBM for feature selection. Additionally, to validate its superiority, we constructed three comparative models: Random Forest, CatBoost, and XGBoost. The models were evaluated using R² and training time as performance metrics.

Comparison experiments with Random Forest, CatBoost, and XGBoost models demonstrate that LightGBM exhibits a relatively strong fitting capability and can more effectively capture the patterns of MWH variations. As shown in Table 2, except for the CatBoost model, which performed best with a 3 h lead time for predicting the future 1 h MWH, all other models showed the best performance with a 12 h lead time for predicting the future 1, 3, and 6 h MWH. Specifically, LightGBM and Random Forest models demonstrated the best predictive performance with a 12 h lead time. LightGBM achieved R² values of 73.56% and 56.79% for predicting the future 3 and 6 h, respectively, and 87.22% for predicting the future 1 h, which is slightly lower than Random Forest’s R² of 89.95%. Based on the comparison of prediction results for different lead times, the sliding window length was set to 12 h. Additionally, considering computational efficiency and training time, LightGBM showed the lowest training cost and fastest computation speed among all models (Table 3). Due to these advantages, LightGBM was ultimately chosen as the primary tool for analyzing and optimizing the MWH prediction.

In the feature selection, LightGBM was used to evaluate the importance of each input variable, applying the Split Gain method, which measures a feature’s contribution by summing the improvements in the model’s loss function whenever the feature is used for data partitioning [65]. Based on the resulting importance scores, a forward incremental feature selection strategy was employed to identify feature combinations that optimize model performance.

As shown in Figure 7, the average RMSE values of the three LightGBM models (predicting MWH at 1, 3, and 6 h ahead with a 12 h lead time) decreased significantly as new features were added, and stabilized when the number of features reached 23. This trend indicates that the performance improvement was not solely due to the number of features, but also to the inclusion of physically interpretable variables such as wind speed, wind direction, air temperature, current direction, and current speed. These factors are well recognized in the literature as primary drivers of wave development and have been shown to enhance the accuracy of wave height prediction [66,67,68].

Although the ranking of features varied slightly across different forecast horizons, 21 features consistently appeared in all three models. Their average contributions are shown in Figure 8. These 21 common features were ultimately selected as the input set for subsequent BiGRU model training. This feature selection process provided physically interpretable and data-driven insights for optimizing the MWH prediction framework.

4.2. Prediction Performance on the Test Set

This study compared BiGRU with models such as BPNN, RNN, LSTM, BiLSTM, and GRU, demonstrating its significant advantages in error control and handling extreme values for the MWH prediction task. Building on this, two improved models, STCN-BiGRU and BiTCN-BiGRU, were proposed by incorporating TCN. Test results showed that the improved models, by integrating both unidirectional and parallel bidirectional TCN architectures, further enhanced the ability to capture time-series features and improve the prediction of extreme values. The following section will focus on analyzing the performance of the baseline and improved models in multi-step prediction tasks.

4.2.1. BiGRU Model Performance

To validate the superiority of the BiGRU model, we constructed five baseline models for comparison, including BPNN, RNN, LSTM, GRU, and BiLSTM. The key parameters of BiGRU are shown in Table 4, and the hyperparameter combinations were optimized using the GridSearch method.

BPNN, RNN, LSTM, GRU, BiLSTM, and BiGRU models were used to predict the MWH for future 1, 3, and 6 h with a 12 h lead time. As shown in Figure 8, the performance of the models varied with the number of layers, but overall, the BiGRU model demonstrated superior performance across all forecast horizons.

In the 1 h prediction scenario, as shown in Figure 9a,b, the BiGRU model with three layers performed best, with MAE and RMSE values of 0.0738 m and 0.0997 m, respectively. Although the 2-layer BiGRU showed a lower MAE (0.0692 m), its RMSE was 0.1019 m, which was higher than the three-layer model. This suggests that the two-layer BiGRU has weaker handling of extreme values. Therefore, the three-layer BiGRU was selected as the optimal configuration. This finding aligns with previous studies showing that deeper BiGRU structures can enhance sequence learning capacity [41].

In the 3 h prediction scenario, as shown in Figure 9c,d, the BiGRU model achieved the best performance, with an MAE of 0.0995 m and RMSE of 0.1473 m. In comparison, the BiLSTM model ranked second, yielding an MAE of 0.1066 m and RMSE of 0.1570 m. This result is consistent with prior research. Su and Jiang [69] found that BiGRU outperformed BiLSTM architectures in tide level prediction tasks, demonstrating better capability in modeling temporal dependencies. These findings further support the effectiveness of BiGRU in marine time-series prediction.

In the 6 h prediction scenario, as shown in Figure 9e,f, the three-layer BiGRU has an MAE of 0.1364 m, which is higher than the two-layer BPNN (0.1334 m) but lower than the one-layer BiLSTM (0.1399 m). This indicates that BiGRU performs stably in terms of overall error control. However, in terms of RMSE, the three-layer BiGRU model shows an error of 0.2051 m, which is higher than BPNN (0.1981 m) and BiLSTM (0.1939 m), reflecting greater error in handling extreme values in this scenario. In long time-step prediction tasks, the bidirectional structure of BiLSTM is more sensitive to extreme values, while the simpler structure of BPNN performs better in balancing both overall and extreme value errors. Given the lack of existing studies directly comparing these models for MWH forecasting at extended time steps, these observations require further validation across broader datasets and under more diverse oceanic conditions.

Although the BiGRU model performs slightly worse than BPNN and BiLSTM in the 6 h prediction scenario based on the RMSE metric, it outperforms the other models in both 1 h and 3 h predictions, demonstrating superior overall error control and precision. This suggests that, due to its excellent time-series modeling capability and stable handling of complex dependencies, BiGRU remains a reliable and effective choice for MWH prediction tasks.

4.2.2. TCN-BiGRU Model Performance

This study uses BiGRU as the baseline model and proposes two improved variants, STCN-BiGRU and BiTCN-BiGRU, to further enhance model performance. The introduction of TCN aims to enhance the model’s ability to capture time-series features, improving overall error control and the handling of extreme values. During the optimization process, both the hyperparameters of the BiGRU model and key TCN layer hyperparameters—such as the number of convolution kernels, kernel size, and dilation rate—were systematically explored and fine-tuned. The specific settings are shown in Table 5 and Table 6.

We tested the model’s performance under normal weather conditions using the test set data. During this period, the average MWH in the test set was 0.408 m, and the maximum MWH was 1.381 m. ERA reanalysis data showed an average MWH of 0.506 m and a maximum of 1.000 m, further confirming that no extreme events occurred in this sea area. The comparison results in Table 7, based on the test set under normal weather conditions, show that by incorporating the TCN layer, both STCN-BiGRU and BiTCN-BiGRU outperform the original BiGRU model, demonstrating improvements in performance.

The STCN-BiGRU model, which combines a unidirectional TCN structure with BiGRU, demonstrates significant advantages in predictions at different time steps. In the 1 h prediction scenario, the MAE of STCN-BiGRU is 0.0699 m, a 5.20% decrease compared to BiGRU (0.0738 m). However, the RMSE is 0.1023 m, a 2.62% increase compared to BiGRU (0.0997 m). This indicates that while the overall error has decreased, the ability to capture extreme values at larger errors has slightly diminished. In the 3 h prediction scenario, the MAE and RMSE of STCN-BiGRU are 0.0917 m and 0.1437 m, respectively, showing significant improvements over BiGRU (reducing by 7.82% and 2.41%, respectively). This demonstrates its balanced advantage in controlling both overall and extreme value errors. In the 6 h prediction scenario, although the MAE and RMSE of STCN-BiGRU are 0.1267 m and 0.1946 m, respectively, which represent reductions of 7.11% and 5.15% compared to BiGRU, the RMSE (0.1946 m) is still higher than BiLSTM (0.1939 m). This indicates that there is still potential for improvement in handling local extreme values in this scenario.

The BiTCN-BiGRU model further improves performance by introducing a parallel TCN architecture. In the 1-h prediction scenario, BiTCN-BiGRU has an MAE of 0.0693 m, which is a 6.11% decrease compared to BiGRU (0.0738 m) and slightly better than STCN-BiGRU (0.0699 m), demonstrating stronger overall error control. However, the RMSE of BiTCN-BiGRU is 0.1034 m, which is higher than both BiGRU and STCN-BiGRU. This indicates that although BiTCN-BiGRU further reduces the overall error, its ability to detect extreme values at larger errors has decreased. In the 3-h prediction scenario, BiTCN-BiGRU achieves an MAE of 0.0941 m and an RMSE of 0.1401 m, which represents a 5.40% and 4.89% decrease compared to BiGRU, respectively. Additionally, the RMSE is 2.54% lower than that of STCN-BiGRU, indicating BiTCN-BiGRU’s superior error control for extreme values in this scenario. In the 6-h prediction scenario, BiTCN-BiGRU achieves an MAE of 0.1172 m and an RMSE of 0.1937 m, which represent a 14.09% and 5.55% decrease compared to the BiGRU model, respectively. Compared to STCN-BiGRU, the MAE and RMSE are reduced by 4.98% and 2.24%, respectively. Furthermore, the RMSE surpasses that of BiLSTM (0.1939 m), indicating enhanced global feature modeling capability in this scenario.

Both STCN-BiGRU and BiTCN-BiGRU enhance the overall performance of BiGRU by incorporating TCN. In particular, they significantly improve the overall error and extreme value error control in multi-step prediction tasks for the 3 h and 6 h scenarios. However, in the 1 h prediction scenario, the complex TCN structure shows certain limitations in capturing extreme values, possibly due to the dilated convolution’s sampling mechanism weakening its ability to capture details of critical local time steps. Previous studies have demonstrated that while dilated convolutions expand the receptive field, they also introduce gridding artifacts, which restrict the effective bandwidth and reduce sensitivity to high-frequency components and local details [70,71,72]. This characteristic can hinder the accurate representation of abrupt spikes in time series. In addition, a research of TCN-GRU underperforming in extreme-value prediction further suggests that complex TCN architectures may face inherent limitations in extreme-value capture, highlighting the need for targeted improvements in sampling mechanisms and loss function design [73]. Future work can focus on optimizing the TCN architecture to further enhance its adaptability and performance balance across different time-step scenarios. Overall, STCN-BiGRU and BiTCN-BiGRU demonstrate higher reliability and practical value in static multi-step MWH prediction tasks.

4.2.3. Prediction Timeliness

This study evaluates the models from the perspective of temporal efficiency. Unlike traditional regression models that focus only on error metrics like MAE and RMSE, the temporal efficiency evaluation emphasizes two key aspects: Total Calculation Time (TCT) and TDI, both of which should be minimized for optimal performance. This approach comprehensively measures the model’s response speed in practical applications and quantifies the degree of distortion in the time-series output along the time axis. As shown in Table 8, there are significant differences in TCT and TDI performances across models in various time-step prediction scenarios.

Analyzing from the perspective of TCT, the simple structure of BPNN demonstrates the highest computational efficiency, achieving the lowest TCT values in all prediction scenarios (1.3052 s, 1.2130 s, and 1.1973 s). In contrast, as the complexity of the models increases, the TCT shows a gradual upward trend. Specifically, BiGRU and its improved models (STCN-BiGRU and BiTCN-BiGRU) exhibit lower computational efficiency compared to the other models. In the 1 h, 3 h, and 6 h prediction scenarios, the TCT for BiGRU are 3.3881 s, 2.4773 s, and 3.0653 s, respectively; for STCN-BiGRU, they are 3.8221 s, 3.5808 s, and 4.2626 s; and for BiTCN-BiGRU, they are 4.1900 s, 4.4543 s, and 7.6614 s. Although the computational cost of BiGRU and its improved models is higher, their complex structure grants them significant time-series feature capturing ability. Moreover, their overall response time remains below 5 s, with no significant increase despite the addition of TCN. This indicates that these models maintain high reliability and practicality in complex time-series modeling tasks.

Analyzing from the perspective of TDI, different models exhibit significant differences in their time alignment abilities. In the 1 h prediction scenario, BiGRU has the best performance with a TDI of 0.0165 h, in sharp contrast to BPNN, which has the highest TDI. As shown in Figure 10, the prediction sequence of BPNN is noticeably delayed relative to the true sequence compared to BiGRU, reflecting the significant advantage of BiGRU in capturing time-series features, thus laying a solid foundation for the design of subsequent improved models. The TDI values for STCN-BiGRU and BiTCN-BiGRU are 0.0193 h and 0.0288 h, respectively, slightly higher than BiGRU, indicating that the introduction of different TCN structures in complex network designs may introduce some interference in short time-step scenarios. In contrast, RNN (0.0216 h), LSTM (0.4366 h), and GRU (0.0226 h) perform worse than BiGRU and its improved models, further highlighting the advantage of BiGRU and its improved models in short time-step scenarios.

In the 3 h prediction scenario, BiGRU continues to perform the best with a TDI of 0.0382 h, demonstrating strong time alignment capability. BiTCN-BiGRU has a TDI of 0.0498 h, outperforming STCN-BiGRU (0.3462 h), indicating that parallel bidirectional TCN can alleviate the time alignment shortcomings of unidirectional TCN to some extent. In the 6 h prediction scenario, BiGRU’s TDI is 0.5980 h, outperforming STCN-BiGRU (1.5322 h) and BiTCN-BiGRU (3.0596 h), indicating that BiGRU experiences less time distortion in this scenario. However, compared to simpler structures like BPNN (0.3463 h), RNN (0.1438 h), and GRU (0.3309 h), BiGRU’s time alignment performance still lags behind. Among the improved models, BiTCN-BiGRU significantly reduces the TDI value through parallel bidirectional TCN, but its performance still does not surpass BiGRU. Furthermore, considering LSTM (1.5388 h), which shows the highest TDI in this scenario, it indicates that the improvement in time alignment capability by complex network architectures has limitations in long time-step scenarios.

Overall, BiGRU demonstrates excellent time alignment capability in both the 1 h and 3 h prediction scenarios. However, while STCN-BiGRU and BiTCN-BiGRU show potential in the 1 h prediction scenario, their performance in the 6 h prediction scenario reveals limitations in capturing global features over long time steps. To address this, future work could focus on optimizing the parallel architecture of TCN to enhance the model’s overall performance across different time step scenarios.

5. Conclusions

This study addresses the problem of MWH prediction and proposes an improved model based on moored buoy data. By optimizing feature selection and model design, the prediction performance is significantly enhanced. The main contributions of this paper are summarized as follows:

Data Support and Feature Optimization: By deploying a moored buoy to collect high-precision data, this study compensates for the lack of reanalysis data in nearshore regions. Based on feature importance analysis using LightGBM, 21 key features were selected to optimize the model input and improve prediction efficiency.
Model Improvement and Performance Enhancement: Two improved models combining TCN and BiGRU (STCN-BiGRU and BiTCN-BiGRU) are proposed. Their superiority under normal weather conditions is demonstrated through test set evaluation. While the models show strong potential, challenges remain in capturing rapidly evolving wave dynamics under extreme conditions, underscoring the need for a more representative dataset and better integration of physical mechanisms to enhance robustness and generalization.
Innovative Evaluation Metric: The TDI is applied for the first time in MWH prediction, offering a comprehensive evaluation of the model’s time alignment capability, thus addressing the limitations of traditional metrics. This metric offers a new perspective for forecasting marine disasters, enhancing the accuracy of predictions for extreme oceanic events.

In conclusion, although the proposed models demonstrate strong potential, their performance under tropical cyclone conditions remains unverified due to the limited availability of buoy observations during extreme events in the present dataset. Future efforts should focus on expanding the training and testing datasets to include diverse typhoon cases and extreme sea states, which will be essential for further improving model robustness, generalization, and practical applicability in marine disaster forecasting.

Author Contributions

Conceptualization, B.Y. and L.D.; methodology, B.Y.; software, B.Y.; validation, B.Y. and L.D.; formal analysis, N.X.; investigation, N.X. and Y.L.; resources, Y.C.; data curation, Y.L.; writing—original draft preparation, B.Y. and L.D.; writing—review and editing, Y.C.; visualization, B.Y.; supervision, Y.C.; project administration, N.X. and Y.L.; funding acquisition, Y.C. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China [62262016, 62302132] and the Key Research and Development Project of Hainan Province [ZDYF2024SHFZ125].

Data Availability Statement

The reanalysis data for this study were sourced from the ERA5 dataset provided by the Copernicus Climate Data Store (https://cds.climate.copernicus.eu/datasets/reanalysis-era5-single-levels?tab=download, accessed on 1 December 2024). The buoy data presented in this study are not readily available because they are part of an ongoing and broader research project. Requests to access these datasets should be directed to the corresponding author.

Acknowledgments

The authors have reviewed and edited the output and take full responsibility for the content of this publication.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Cha, E.J.; Knutson, T.R.; Lee, T.C.; Ying, M.; Nakaegawa, T. Third assessment on impacts of climate change on tropical cyclones in the Typhoon Committee Region–Part II: Future projections. Trop. Cyclone Res. Rev. 2020, 9, 75–86. [Google Scholar] [CrossRef]
Mori, N.; Takemi, T. Impact assessment of coastal hazards due to future changes of tropical cyclones in the North Pacific Ocean. Weather Clim. Extrem. 2016, 11, 53–69. [Google Scholar] [CrossRef]
Kumar, P.; Yadav, A.; Sardana, D.; Prasad, R.; Rajni. Extreme wave height response to climate modes and its association with tropical cyclones over the Indo-Pacific Ocean. Ocean Eng. 2024, 296, 116789. [Google Scholar] [CrossRef]
Zhang, P.; Zhang, Y.; Ding, Y.; Liu, N. Characteristics and Comparative Analysis of Disaster Caused by Super Typhoon “Yagi”. China Disaster Reduct. 2024, 8. [Google Scholar] [CrossRef]
Pavlova, A.; Myslenkov, S.; Arkhipkin, V.; Surkova, G. Storm surges and extreme wind waves in the Caspian sea in the present and future climate. Civ. Eng. J. 2022, 8, 2353–2377. [Google Scholar] [CrossRef]
Liu, G.; Yang, W.; Jiang, Y.; Yin, J.; Tian, Y.; Wang, L.; Xu, Y. Design Wave Height Estimation under the Influence of Typhoon Frequency, Distance, and Intensity. J. Mar. Sci. Eng. 2023, 11, 1712. [Google Scholar] [CrossRef]
Zhang, M.; Ding, D.; Pan, X.; Yang, M. Enhancing time series predictors with generalized extreme value loss. IEEE Trans. Knowl. Data Eng. 2021, 35, 1473–1487. [Google Scholar] [CrossRef]
Tolman, H.L. The Numerical Model WAVEWATCH: A Third Generation Model for Hindcasting of Wind Waves on Tides in Shelf Seas; Faculty of Civil Engineering, Delft University of Technology: Delft, The Netherlands, 1989. [Google Scholar]
Booij, N.; Ris, R.C.; Holthuijsen, L.H. A third-generation wave model for coastal regions: 1. Model description and validation. J. Geophys. Res. Ocean. 1999, 104, 7649–7666. [Google Scholar] [CrossRef]
Ris, R.; Holthuijsen, L.; Booij, N. A third-generation wave model for coastal regions: 2. Verification. J. Geophys. Res. Ocean. 1999, 104, 7667–7681. [Google Scholar] [CrossRef]
Umesh, P.; Behera, M.R. On the improvements in nearshore wave height predictions using nested SWAN-SWASH modelling in the eastern coastal waters of India. Ocean Eng. 2021, 236, 109550. [Google Scholar] [CrossRef]
Vijayan, L.; Huang, W.; Ma, M.; Ozguven, E.; Ghorbanzadeh, M.; Yang, J.; Yang, Z. Improving the accuracy of hurricane wave modeling in Gulf of Mexico with dynamically-coupled SWAN and ADCIRC. Ocean Eng. 2023, 274, 114044. [Google Scholar] [CrossRef]
Cicon, L.; Gemmrich, J.; Pouliot, B.; Bernier, N. A probabilistic prediction of rogue waves from a wavewatch III model for the northeast Pacific. Weather Forecast. 2023, 38, 2363–2377. [Google Scholar] [CrossRef]
Prasad, R.; Joseph, L.; Deo, R.C. Modeling and forecasting renewable energy resources for sustainable power generation: Basic concepts and predictive model results. In Translating the Paris Agreement into Action in the Pacific; Springer: Berlin/Heidelberg, Germany, 2020; pp. 59–79. [Google Scholar]
Roulston, M.S.; Ellepola, J.; von Hardenberg, J.; Smith, L.A. Forecasting wave height probabilities with numerical weather prediction models. Ocean Eng. 2005, 32, 1841–1863. [Google Scholar] [CrossRef]
Berbić, J.; Ocvirk, E.; Carević, D.; Lončar, G. Application of neural networks and support vector machine for significant wave height prediction. Oceanologia 2017, 59, 331–349. [Google Scholar] [CrossRef]
Browne, M.; Castelle, B.; Strauss, D.; Tomlinson, R.; Blumenstein, M.; Lane, C. Near-shore swell estimation from a global wind-wave model: Spectral process, linear, and artificial neural network models. Coast. Eng. 2007, 54, 445–460. [Google Scholar] [CrossRef]
James, S.C.; Zhang, Y.; O’Donncha, F. A machine learning framework to forecast wave conditions. Coast. Eng. 2018, 137, 1–10. [Google Scholar] [CrossRef]
Krishna Kumar, N.; Savitha, R.; Al Mamun, A. Regional ocean wave height prediction using sequential learning neural networks. Ocean Eng. 2017, 129, 605–612. [Google Scholar] [CrossRef]
Gracia, S.; Olivito, J.; Resano, J.; Martin-del Brio, B.; De Alfonso, M.; Álvarez, E. Improving accuracy on wave height estimation through machine learning techniques. Ocean Eng. 2021, 236, 108699. [Google Scholar] [CrossRef]
Afzal, M.S.; Kumar, L.; Chugh, V.; Kumar, Y.; Zuhair, M. Prediction of significant wave height using machine learning and its application to extreme wave analysis. J. Earth Syst. Sci. 2023, 132, 51. [Google Scholar] [CrossRef]
Chauhan, N.K.; Singh, K. A review on conventional machine learning vs deep learning. In Proceedings of the 2018 International Conference on Computing, Power and Communication Technologies (GUCON), Greater Noida, India, 28–29 September 2018; pp. 347–352. [Google Scholar]
Jörges, C.; Berkenbrink, C.; Stumpe, B. Prediction and reconstruction of ocean wave heights based on bathymetric data using LSTM neural networks. Ocean Eng. 2021, 232, 109046. [Google Scholar] [CrossRef]
Elbisy, M.S.; Elbisy, A.M. Prediction of significant wave height by artificial neural networks and multiple additive regression trees. Ocean Eng. 2021, 230, 109077. [Google Scholar] [CrossRef]
Wei, Z.; Davison, A. A convolutional neural network based model to predict nearshore waves and hydrodynamics. Coast. Eng. 2022, 171, 104044. [Google Scholar] [CrossRef]
Gao, Z.; Liu, X.; Yv, F.; Wang, J.; Xing, C. Learning wave fields evolution in North West Pacific with deep neural networks. Appl. Ocean Res. 2023, 130, 103393. [Google Scholar] [CrossRef]
Chen, X.; Huang, W. Spatial–temporal convolutional gated recurrent unit network for significant wave height estimation from shipborne marine radar data. IEEE Trans. Geosci. Remote Sens. 2021, 60, 4201711. [Google Scholar] [CrossRef]
Chen, S.T.; Wang, Y.W. Improving coastal ocean wave height forecasting during typhoons by using local meteorological and neighboring wave data in support vector regression models. J. Mar. Sci. Eng. 2020, 8, 149. [Google Scholar] [CrossRef]
Dogan, G.; Ford, M.; James, S. Predicting ocean-wave conditions using buoy data supplied to a hybrid RNN-LSTM neural network and machine learning models. In Proceedings of the 2021 IEEE International Conference on Machine Learning and Applied Network Technologies (ICMLANT), Soyapango, El Salvador, 16–17 December 2021; pp. 1–6. [Google Scholar]
Wang, M.; Ying, F. Point and interval prediction for significant wave height based on LSTM-GRU and KDE. Ocean Eng. 2023, 289, 116247. [Google Scholar] [CrossRef]
Minuzzi, F.C.; Farina, L. A deep learning approach to predict significant wave height using long short-term memory. Ocean Model. 2023, 181, 102151. [Google Scholar] [CrossRef]
Breunung, T.; Balachandran, B. Prediction of freak waves from buoy measurements. Sci. Rep. 2024, 14, 16048. [Google Scholar] [CrossRef] [PubMed]
Timmermans, B.; Shaw, A.G.; Gommenginger, C. Reliability of extreme significant wave height estimation from satellite altimetry and in situ measurements in the coastal zone. J. Mar. Sci. Eng. 2020, 8, 1039. [Google Scholar] [CrossRef]
Patane, L.; Iuppa, C.; Faraci, C.; Xibilia, M.G. A deep hybrid network for significant wave height estimation. Ocean Model. 2024, 189, 102363. [Google Scholar] [CrossRef]
Wang, J.; Wang, Z.; Wang, Y.; Liu, S.; Li, Y. Current situation and trend of marine data buoy and monitoring network technology of China. Acta Oceanol. Sin. 2016, 35, 1–10. [Google Scholar] [CrossRef]
Shi, J.; Su, T.; Li, X.; Wang, F.; Cui, J.; Liu, Z.; Wang, J. A machine-learning approach based on attention mechanism for significant wave height forecasting. J. Mar. Sci. Eng. 2023, 11, 1821. [Google Scholar] [CrossRef]
Mo, J.; Wang, X.; Huang, S.; Wang, R. Advance in Significant Wave Height Prediction: A Comprehensive Survey. Complex Syst. Model. Simul. 2024, 4, 402–439. [Google Scholar] [CrossRef]
Yu, C.; Li, Z.; Blewitt, G. Global comparisons of ERA5 and the operational HRES tropospheric delay and water vapor products with GPS and MODIS. Earth Space Sci. 2021, 8, e2020EA001417. [Google Scholar] [CrossRef]
Menéndez, M.; Méndez, F.J.; Losada, I.J.; Graham, N.E. Variability of extreme wave heights in the northeast Pacific Ocean based on buoy measurements. Geophys. Res. Lett. 2008, 35, L22607. [Google Scholar] [CrossRef]
Jhin, S.Y.; Kim, S.; Park, N. Addressing prediction delays in time series forecasting: A continuous gru approach with derivative regularization. In Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Barcelona, Spain, 25–29 August 2024; pp. 1234–1245. [Google Scholar]
Pang, J.; Dong, S. Multi-Step Significant Wave Height Forecasting Using an Artificial Intelligence Model Optimized by Dynamic Time Warping. In Proceedings of the International Conference on Offshore Mechanics and Arctic Engineering, Vancouver, BC, Canada, 22–27 June 2025; American Society of Mechanical Engineers: Little Falls, NJ, USA, 2025; Volume 88926, p. V003T06A076. [Google Scholar]
Gaidai, O.; Cao, Y.; Wang, F.; Zhu, Y. Applying the multivariate Gaidai reliability method in combination with an efficient deconvolution scheme to prediction of extreme ocean wave heights. Mar. Syst. Ocean Technol. 2024, 19, 165–178. [Google Scholar] [CrossRef]
Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; Liu, T.Y. Lightgbm: A highly efficient gradient boosting decision tree. Adv. Neural Inf. Process. Syst. 2017, 30, 3149–3157. [Google Scholar]
Cho, K.; Van Merriënboer, B.; Bahdanau, D.; Bengio, Y. On the properties of neural machine translation: Encoder-decoder approaches. arXiv 2014, arXiv:1409.1259. [Google Scholar]
Graves, A. Long short-term memory. In Supervised Sequence Labelling with Recurrent Neural Networks; Springer: Berlin/Heidelberg, Germany, 2012; pp. 37–45. [Google Scholar]
Chung, J.; Gulcehre, C.; Cho, K.; Bengio, Y. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv 2014, arXiv:1412.3555. [Google Scholar]
Xie, Y.; Zeng, P.; Chen, J. Hybrid attention-based improved temporal convolutional BiGRU approach for short-term load forecasting. J. Physics Conf. Ser. 2024, 2703, 012052. [Google Scholar] [CrossRef]
Babé, A.; Cuingnet, R.; Scuturici, M.; Miguet, S. Generalization abilities of foundation models in waste classification. Waste Manag. 2025, 198, 187–197. [Google Scholar] [CrossRef]
Hendrycks, D.; Gimpel, K. Gaussian error linear units (gelus). arXiv 2016, arXiv:1606.08415. [Google Scholar]
Bai, Y. RELU-function and derived function review. In Proceedings of the 2022 International Conference on Science and Technology Ethics and Human Future (STEHF 2022), Dali, China, 13–15 May 2022; EDP Sciences: Les Ulis, France, 2022; Volume 144, p. 02006. [Google Scholar]
Ba, J.L.; Kiros, J.R.; Hinton, G.E. Layer normalization. arXiv 2016, arXiv:1607.06450. [Google Scholar]
Ulyanov, D.; Vedaldi, A.; Lempitsky, V. Instance normalization: The missing ingredient for fast stylization. arXiv 2016, arXiv:1607.08022. [Google Scholar]
Ziaee, A.; Çano, E. Batch Layer Normalization A new normalization layer for CNNs and RNNs. In Proceedings of the 6th International Conference on Advances in Artificial Intelligence, Birmingham, UK, 21–23 October 2022; pp. 40–49. [Google Scholar]
Du, Z.; Sun, J.; Li, A.; Chen, P.Y.; Zhang, J.; Li, H.H.; Chen, Y. Rethinking normalization methods in federated learning. In Proceedings of the 3rd International Workshop on Distributed Machine Learning, Rome Italy, 9 December 2022; pp. 16–22. [Google Scholar]
Salimans, T.; Kingma, D.P. Weight normalization: A simple reparameterization to accelerate training of deep neural networks. Adv. Neural Inf. Process. Syst. 2016, 29, 901–909. [Google Scholar]
Gu, T.; Zhang, Y.; Wang, L. TCN-BiGRU: A Hybrid Deep Learning Architecture for Enhanced Temperature Time Series Forecasting. In Proceedings of the 2024 6th International Conference on Robotics, Intelligent Control and Artificial Intelligence (RICAI), Nanjing, China, 6–8 December 2024; pp. 590–594. [Google Scholar]
Fan, J.; Zhang, K.; Huang, Y.; Zhu, Y.; Chen, B. Parallel spatio-temporal attention-based TCN for multivariate time series prediction. Neural Comput. Appl. 2023, 35, 13109–13118. [Google Scholar] [CrossRef]
Feng, J.; Shang, R.; Zhang, M.; Jiang, G.; Wang, Q.; Zhang, G.; Jin, W. Transformer Abnormal State Identification Based on TCN-Transformer Model in Multiphysics. IEEE Access 2025, 13, 44775–44788. [Google Scholar] [CrossRef]
Ma, J.; Huang, X.; Wu, H.; Yan, K.; Liu, Y. Bidirectional Gated Recurrent Unit (BiGRU)-Based Model for Concrete Gravity Dam Displacement Prediction. Sustainability 2025, 17, 7401. [Google Scholar] [CrossRef]
Duan, Y.; Liu, Y.; Wang, Y.; Ren, S.; Wang, Y. Improved BIGRU model and its application in stock price forecasting. Electronics 2023, 12, 2718. [Google Scholar] [CrossRef]
Frías-Paredes, L.; Mallor, F.; Gastón-Romeo, M.; León, T. Assessing energy forecasting inaccuracy by simultaneously considering temporal and absolute errors. Energy Convers. Manag. 2017, 142, 533–546. [Google Scholar] [CrossRef]
Liu, Y.; Zhang, Y.A.; Zeng, M.; Zhao, J. A novel distance measure based on dynamic time warping to improve time series classification. Inf. Sci. 2024, 656, 119921. [Google Scholar] [CrossRef]
Rajput, K.; Nguyen, D.B.; Chen, G. Evaluating DTW measures via a synthesis framework for time-series data. arXiv 2024, arXiv:2402.08943. [Google Scholar]
Le Guen, V.; Thome, N. Deep time series forecasting with shape and temporal criteria. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 45, 342–355. [Google Scholar] [CrossRef] [PubMed]
Saberi, A.N.; Belahcen, A.; Sobra, J.; Vaimann, T. LightGBM-based fault diagnosis of rotating machinery under changing working conditions using modified recursive feature elimination. IEEE Access 2022, 10, 81910–81925. [Google Scholar] [CrossRef]
Hashim, R.; Roy, C.; Motamedi, S.; Shamshirband, S.; Petković, D. Selection of climatic parameters affecting wave height prediction using an enhanced Takagi-Sugeno-based fuzzy methodology. Renew. Sustain. Energy Rev. 2016, 60, 246–257. [Google Scholar] [CrossRef]
Kamranzad, B.; Etemad-Shahidi, A.; Kazeminezhad, M. Wave height forecasting in Dayyer, the Persian Gulf. Ocean Eng. 2011, 38, 248–255. [Google Scholar] [CrossRef]
Ardhuin, F.; Gille, S.T.; Menemenlis, D.; Rocha, C.B.; Rascle, N.; Chapron, B.; Gula, J.; Molemaker, J. Small-scale open ocean currents have large effects on wind wave heights. J. Geophys. Res. Ocean. 2017, 122, 4500–4517. [Google Scholar] [CrossRef]
Su, Y.; Jiang, X. Prediction of tide level based on variable weight combination of LightGBM and CNN-BiGRU model. Sci. Rep. 2023, 13, 9. [Google Scholar] [CrossRef]
Liu, Y.; Zhu, M.; Wang, J.; Guo, X.; Yang, Y.; Wang, J. Multi-scale deep neural network based on dilated convolution for spacecraft image segmentation. Sensors 2022, 22, 4222. [Google Scholar] [CrossRef] [PubMed]
Chen, L.; Gu, L.; Zheng, D.; Fu, Y. Frequency-adaptive dilated convolution for semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024; pp. 3414–3425. [Google Scholar]
Wang, Z.; Ji, S. Smoothed dilated convolutions for improved dense prediction. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, London, UK, 19–23 August 2018; pp. 2486–2495. [Google Scholar]
Luo, S.; Dong, L. Intelligent Prediction of Blast Furnace Permeability Index Using a Hybrid TCN-GRU Model with Mode Decomposition and Error Compensation. ISIJ Int. 2025, 65, 1267–1278. [Google Scholar] [CrossRef]

Figure 1. Workflow of the proposed Maximal Wave Height prediction framework, comprising four stages: (1) data collection from a moored buoy; (2) feature selection using LightGBM; (3) model prediction with TCN-BiGRU hybrid models; and (4) model evaluation using traditional evaluation metrics and the TDI.

Figure 2. Observational setup and initial data from the Qiongzhou Strait buoy. (a) Study area bathymetry (terrain colormap) and buoy location (black star). (b) In situ photograph of the deployed buoy. (c) Time-series data of collected wave heights, including Maximal Wave Height (MWH), Mean Wave Height (WHmean), and Significant Wave Height (SWH). (d) Wind rose showing the prevailing wind conditions during the study period.

Figure 3. Data preprocessing flowchart.

Figure 4. BiGRU model architecture.

Figure 5. TCN model architecture.

Figure 6. Improved TCN model architecture.

Figure 7. The trend of RMSE of LightGBM model retraining.

Figure 8. Average feature importance across LightGBM models.

Figure 9. Performance comparison of different layers between baseline models across varying prediction horizons: (a) MAE for 12For1; (b) RMSE for 12For1; (c) MAE for 12For3; (d) RMSE for 12For3; (e) MAE for 12For6; (f) RMSE for 12For6.

Figure 10. Comparison of prediction latency of each models (taking BPNN and BiGRU as examples).

Table 1. Overview of related studies on wave height prediction.

Prediction Model	Data Source Type	Wave Height Type
MRAN and GAP-RBF [19]	Observational data (from buoy)	SWH
LSTM [23]	Observational data (from buoy)	SWH
CGRU [27]	Observational data (from buoy)	SWH
ANN and MART [24]	Observation data (from weather stations and underwater sensors)	SWH
MLP and GBDT ensemble model [20]	Observational data (from buoy) & Numerical solution data (from buoy)	SWH
Bi-RNN and LSTM [29]	Observational data (from buoy)	SWH
Nested SWAN-SWASH model [11]	Observational data (from buoy)	SWH
SVM [21]	Reanalyze data (from ERA5)	SWH
SWAN and ADCIRC coupled model [12]	Observational data (from buoy)	SWH
WAVEWATCH III [13]	Observational data (from buoy)	Rogue Wave Height (RWH)
Convolutional LSTM [26]	Numerical simulation data (from WW3)	SWH
LSTM-GRU-KDE hybrid model [30]	Observational data (from buoy)	SWH
multivariate Gaidai reliability combination with efficient deconvolution [42]	Observational data (from buoy)	Extreme Wave Height (EWH)

Table 2. Comparison of the MWH fitting ability of each model.

Model	3for1	6for1	12for1	3for3	6for3	12for3	3for6	6for6	12for6
LightGBM	86.02	86.62	87.22	71.18	70.57	73.56	47.58	50.64	56.79
Random Forest	86.93	86.50	89.95	70.73	68.38	71.36	46.84	47.27	52.86
CatBoost	84.91	83.57	82.18	67.51	66.20	68.68	46.42	48.25	53.24
XGBoost	85.30	85.88	86.18	69.32	66.24	70.05	45.70	48.71	49.59

All values are expressed as R² (%). Model names follow the format “XforY”, where “X” denotes the input window length (in hours) and “Y” denotes the predict horizon (in hours). For example, “12for6” indicates using the past 12 h of data to predict 6 h ahead. Bold values indicate the best performance.

Table 3. Comparison of training efficiency of each model.

Model	12for1	12for3	12for6
LightGBM	0.8203	2.2799	4.6167
Random Forest	26.9028	82.2354	163.2643
CatBoost	2.6178	7.6703	15.0051
XGBoost	4.5881	17.5427	23.5943

Training time in seconds. Bold values indicate the best performance.

Table 4. BiGRU model hyperparameter combination.

Parameter Types	12for1	12for3	12for6
Num of layers	3	2	3
Hidden layer 1 (Units)	64	128	128
Hidden layer 2 (Units)	16	16	16
Hidden layer 3 (Units)	64	–	16
Fully Connected Layer	128	128	64
Batch Size	64	64	32

“–” indicates the parameter is not applicable in that setting.

Table 5. STCN-BiGRU model hyperparameter combination.

Parameter Types	12for1	12for3	12for6
Number of filters	128	128	128
Kernel Size	3	5	3
Dilations	[1, 3]	[1, 2, 4, 8]	[1, 2, 4, 8]
Num of layers (BiGRU)	3	2	3
Hidden layer 1 (Units)	64	128	128
Hidden layer 2 (Units)	16	16	16
Hidden layer 3 (Units)	64	–	16
Fully Connected Layer	128	128	64
Batch Size	64	64	32
Dropout Rate	0.1	0.1	0.3

“–” indicates the parameter is not applicable in that setting.

Table 6. BiTCN-BiGRU model hyperparameter combination.

Parameter Types	12for1	12for3	12for6
Number of filters	128	128	64
Kernel Size	6	8	7
Dilations	[1, 2]	[1, 2, 4, 8]	[1, 2]
Num of layers (BiGRU)	3	2	3
Hidden layer 1 (Units)	64	128	64
Hidden layer 2 (Units)	16	16	16
Hidden layer 3 (Units)	64	–	64
Fully Connected Layer	128	128	128
Batch Size	64	64	64
Dropout Rate	0.1	0.1	0.2

“–” indicates the parameter is not applicable in that setting.

Table 7. Comparison of errors among models on the test set.

Model	12for1		12for3		12for6
Model	MAE	RMSE	MAE	RMSE	MAE	RMSE
BPNN	0.0937	0.1328	0.1120	0.1725	0.1334	0.1981
RNN	0.0774	0.1150	0.1127	0.1673	0.1517	0.2213
LSTM	0.0878	0.1301	0.1317	0.1901	0.1466	0.2186
GRU	0.0773	0.1088	0.1131	0.1607	0.1500	0.2180
BiLSTM	0.0738	0.1073	0.1066	0.1570	0.1399	0.1939
BiGRU	0.0738	0.0997	0.0995	0.1473	0.1364	0.2051
STCN-BiGRU	0.0699	0.1023	0.0917	0.1437	0.1267	0.1946
BiTCN-BiGRU	0.0693	0.1034	0.0941	0.1401	0.1172	0.1937

Bold values indicate the best performance.

Table 8. Comparison of prediction delay between models.

Model	12for1		12for3		12for6
Model	TCT	th	TCT	thI	TCT	thI
BPNN	1.3052	0.5327	1.2130	0.0482	1.1973	0.3463
RNN	1.5884	0.0216	1.6635	1.1219	1.5884	0.1438
LSTM	2.2580	0.4366	1.9607	0.8485	2.2959	1.5388
GRU	2.1071	0.0226	1.5715	0.1088	1.5826	0.3309
BiLSTM	1.9966	0.0358	2.0347	0.4444	1.9839	0.5651
BiGRU	3.3881	0.0165	2.4773	0.0382	3.0653	0.5980
STCN-BiGRU	3.8221	0.0193	3.5808	0.3462	4.2626	1.5322
BiTCN-BiGRU	4.1900	0.0288	4.4543	0.0498	7.6614	3.0596

TCT: Total Calculation Time (s); TDI: Time Distortion Index (h). Bold values indicate the best performance.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yang, B.; Deng, L.; Xu, N.; Lv, Y.; Cui, Y. Maximum Wave Height Prediction Based on Buoy Data: Application of LightGBM and TCN-BiGRU. J. Mar. Sci. Eng. 2025, 13, 2009. https://doi.org/10.3390/jmse13102009

AMA Style

Yang B, Deng L, Xu N, Lv Y, Cui Y. Maximum Wave Height Prediction Based on Buoy Data: Application of LightGBM and TCN-BiGRU. Journal of Marine Science and Engineering. 2025; 13(10):2009. https://doi.org/10.3390/jmse13102009

Chicago/Turabian Style

Yang, Baisong, Lihao Deng, Nan Xu, Yaxuan Lv, and Yani Cui. 2025. "Maximum Wave Height Prediction Based on Buoy Data: Application of LightGBM and TCN-BiGRU" Journal of Marine Science and Engineering 13, no. 10: 2009. https://doi.org/10.3390/jmse13102009

APA Style

Yang, B., Deng, L., Xu, N., Lv, Y., & Cui, Y. (2025). Maximum Wave Height Prediction Based on Buoy Data: Application of LightGBM and TCN-BiGRU. Journal of Marine Science and Engineering, 13(10), 2009. https://doi.org/10.3390/jmse13102009

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Maximum Wave Height Prediction Based on Buoy Data: Application of LightGBM and TCN-BiGRU

Abstract

1. Introduction

2. Data and Data Processing

2.1. Study Area and Dataset

2.2. Data Processing

3. Method and Framework

3.1. LightGBM for Feature Selection

3.2. BiGRU

3.3. TCN

3.4. TCN-BiGRU

3.5. Model Evaluation Metrics

3.5.1. Traditional Evaluation Metrics

3.5.2. Time Distortion Index

4. Results and Discussion

4.1. Feature Selection on the Test Set

4.2. Prediction Performance on the Test Set

4.2.1. BiGRU Model Performance

4.2.2. TCN-BiGRU Model Performance

4.2.3. Prediction Timeliness

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI