Multi-Horizon Significant Wave Height Forecasting with Multiscale Decomposition and Topological Feature Selection

Liu, Zeping; Shi, Guoyou; Lv, Mina; Wu, Tao; Wang, Xinjian

doi:10.3390/jmse14121095

Open AccessArticle

Multi-Horizon Significant Wave Height Forecasting with Multiscale Decomposition and Topological Feature Selection

by

Zeping Liu

^1,2,

Guoyou Shi

^1,2,*,

Mina Lv

^1,2,

Tao Wu

^1,2 and

Xinjian Wang

¹

Navigation College, Dalian Maritime University, Dalian 116026, China

²

Key Laboratory of Navigation Safety Guarantee of Liaoning Province, Navigation College, Dalian Maritime University, Dalian 116026, China

^*

Author to whom correspondence should be addressed.

J. Mar. Sci. Eng. 2026, 14(12), 1095; https://doi.org/10.3390/jmse14121095 (registering DOI)

Submission received: 14 May 2026 / Revised: 6 June 2026 / Accepted: 10 June 2026 / Published: 13 June 2026

(This article belongs to the Section Ocean Engineering)

Download

Browse Figures

Versions Notes

Abstract

Accurate multi-horizon Significant Wave Height (SWH) forecasting is vital for offshore safety and efficiency. Beyond scheduling maintenance windows, reliable lead-time predictions provide critical early warnings to protect personnel and high-value assets from hazardous high-wave conditions. However, the non-stationary and multi-scale nature of sea states poses challenges for consistent long-term accuracy. To address this challenge, we propose a robust three-stage framework for decomposition, feature selection, and multi-horizon forecasting. Specifically, Optimal Variational Mode Decomposition (OVMD) is adopted to construct multiscale and multi-view representations of nonlinear SWH sequences, while a Triangulated Maximally Filtered Graph (TMFG) constructs a sparse dependency network to select informative and non-redundant predictors from decomposed components and environmental variables. A hybrid prediction model then combines a Temporal Convolutional Network (TCN) for local multi-scale patterns with a Bidirectional Gated Recurrent Unit (BiGRU) for long-range dependencies. Experiments on real-world buoy observations show that the proposed approach improves accuracy and robustness over commonly used statistical and deep-learning baselines across short-, medium-, and long-term horizons. Ablation studies confirm that integrating modal decomposition with sparse feature selection enhances model robustness, offering reliable decision support for offshore window planning and high-wave condition monitoring.

Keywords:

significant wave height; multi-horizon forecasting; multiscale decomposition; hybrid deep learning

1. Introduction

As the global energy sector accelerates its transition toward sustainable development, the ocean has become a strategic frontier for renewable energy exploitation [1]. The proportion of renewable energy in global electricity generation is projected to increase from 30% in 2023 to 35% by 2025 [2]. According to the European Commission’s updated strategy [3], the European Union aims to deploy at least 1 GW of ocean energy by 2030 and reach 40 GW by 2050. Among various forms of marine energy, offshore wind power technology and markets are relatively mature, whereas wave energy is considered a highly promising emerging field due to its higher energy density and predictability [4,5]. The safe and cost-effective operation of marine energy facilities relies heavily on accurate wave forecasting. In particular, installation and maintenance activities are constrained by sea-state windows, where decisions such as go/no-go scheduling require reliable predictions at different lead times. Since ocean waves represent a dominant component of environmental loading, developing wave-forecasting technologies with high accuracy and robustness is essential for reducing operational risk and downtime, and for improving overall energy utilization and project economics [6,7,8,9].

Wave forecasting has traditionally depended on physics-based numerical models. Third-generation spectral models, including WAM [10], WaveWatch III [11], and SWAN [12], represent the standard physical forecasting systems for both open-ocean and nearshore environments [13,14]. Although coupled models are capable of reproducing wave evolution during extreme events such as hurricanes [15,16], these physics-based approaches exhibit notable limitations. Specifically, their substantial computational requirements and dependence on high-precision wind-field inputs constrain their effectiveness for rapid-response or high-resolution site-specific forecasting.

In contrast, data-driven methods learn patterns directly from historical observations [17]. Early statistical models, such as AR [18] and ARIMA [19], improved calculation efficiency but were limited by assumptions of stationarity and linearity. To capture the inherent nonlinearity of ocean waves, traditional machine learning techniques were introduced. Approaches utilizing Support Vector Machines (SVM) [20], Artificial Neural Networks (ANN) [21], and ensemble learning [22] have demonstrated superior accuracy compared to statistical baselines. However, these traditional ML methods typically require complex manual feature engineering and often struggle to capture long-term temporal dependencies in complex wave sequences.

In recent years, a new generation of artificial intelligence technologies, represented by deep learning, has brought about a paradigm shift in wave forecasting through powerful automatic feature extraction and end-to-end learning capabilities. Fan et al. [23] utilized Long Short-Term Memory (LSTM) networks to predict significant wave heights across various marine environments, demonstrating that LSTM can achieve effective results under differing ocean conditions. Lou et al. [24] designed two wave height prediction models based on LSTM tailored for open sea and nearshore navigation conditions, both of which yielded satisfactory results. Li et al. [25] employed a Gated Recurrent Unit (GRU) network for 1-h and 3-h wave forecasts, with experiments showing that its performance surpassed benchmark models such as LSTM.

However, single models often struggle to capture the complex, multi-scale patterns inherent in wave data [26]. To address this, hybrid architectures integrating complementary deep learning techniques have become the state of the art. Zhang et al. [27] proposed a CNN-LSTM model that demonstrated superior long-term robustness compared to SVM and standalone LSTM baselines. Wang et al. [28] integrated LSTM and GRU with Kernel Density Estimation (KDE), effectively outperforming single models in both point and interval forecasting. Ahmed et al. [29] developed a CLSTM-BiGRU system, which was validated to exceed benchmark performance across multiple wave energy sites. Similar hybrid deep learning frameworks have also proven effective in diverse time-series domains, ranging from tidal [30] and wind power [31] to cryptocurrency forecasting [32]. In this context, integrating temporal convolutional networks (TCN) and Bidirectional GRU (BiGRU) offers a particularly promising solution. TCN excels at capturing local, multi-scale features via dilated convolutions [33], while BiGRU effectively models global dependencies by processing information bidirectionally [34]. This complementarity allows the hybrid model to capture both rapid fluctuations and long-term trends, yielding consistently strong performance from short- to long-horizon forecasts, and thus making it well suited for wave prediction.

Wave sequences, influenced by a combination of diverse forcing factors such as wind fields and tropical cyclones, exhibit inherent non-stationarity characterized by multi-scale quasi-periodic signals and sporadic extreme peaks [35,36]. This complexity impedes models from effectively distinguishing meaningful wave components from noise, which restricts the accuracy of medium- to long-term forecasting [37]. Signal decomposition has therefore evolved from a simple preprocessing step to a critical strategy for multiscale feature extraction and component decoupling. Zhou et al. [38] combined Empirical Mode Decomposition (EMD) with LSTM, achieving improved accuracy compared to standalone models in the Atlantic. Lou et al. [39] proposed an EMD-TCN framework and validated its effectiveness across eight buoy stations. Song et al. [40] constructed an EEMD-LSTM model tailored for deep-ocean environments, verifying its superiority over comparative models across 1- to 18-h forecast windows. Wang et al. [41] used the Improved Empirical Wavelet Transform (IEWT) to enhance LSTM performance across various horizons. More recently, Chen et al. [42] introduced a VMD-LSTM-TCN model, demonstrating that Variational Mode Decomposition (VMD) handles non-stationarity more effectively than EMD-based methods. Indeed, VMD is increasingly favored for its superior stability in diverse fields, such as wind speed [43], financial [44], power [45] and traffic forecasting [46]. However, the standard VMD algorithm relies on a manually preset number of modes K, which significantly affects decomposition quality. To reduce the sensitivity of VMD to a manually preset mode number, this study adopts an Optimal VMD (OVMD) strategy, in which the number of decomposed modes is determined on the training set according to reconstruction error and center-frequency separation. This strategy enables the construction of multiscale representations without using information from the testing set.

While incorporating multidimensional environmental variables enhances physical interpretability, it frequently results in the curse of dimensionality, which causes computational redundancy and reduced performance [47,48]. Consequently, feature selection is critical for optimizing model performance. Lu et al. [49] enhanced prediction efficiency by utilizing Pearson correlation to filter out weakly correlated variables. Li et al. [50] proposed a MIC-LSTM framework, demonstrating that the Maximal Information Coefficient (MIC) captures nonlinear dependencies more effectively than linear metrics. Similarly, Zhou et al. [51] optimized model inputs using Mutual Information and Spearman correlation. However, these traditional methods generally assess feature relevance independently or in pairs, and may therefore provide limited insight into the global dependency structure among input variables. To overcome this limitation, the Triangulated Maximally Filtered Graph (TMFG) is introduced. This graph-theoretic approach filters information by constructing a global dependency network. To the best of our knowledge, TMFG has not yet been applied to wave forecasting.

Despite recent progress, three limitations remain in decomposition-based SWH forecasting. First, most VMD/EMD-based models directly feed all decomposed components into temporal predictors, which may introduce redundant information and obscure the dependency structure among multiscale wave components and environmental variables. Second, commonly used feature selection methods mainly rely on pairwise relevance or linear projection, and therefore may fail to preserve global dependency relationships among predictors. Third, multi-horizon SWH forecasting requires simultaneous modeling of local short-term fluctuations and longer temporal dependencies, yet single temporal models often suffer from performance degradation as the forecast horizon increases. To address these issues, this study develops a framework integrating multiscale decomposition, feature selection, and forecasting. OVMD is adopted to construct multiscale representations of nonlinear SWH sequences from multiple frequency views, TMFG is introduced to select informative predictors by exploiting topological relationships among decomposed SWH components and environmental variables, and a TCN-BiGRU predictor is designed to capture local fluctuations and long-range dependencies within historical input windows. The main contributions of this study are summarized as follows:

OVMD decomposes SWH into intrinsic modes to capture both macroscopic trends and microscopic details, thereby constructing multi-view features.
A TMFG-based topological feature selection strategy is introduced to identify informative and non-redundant predictors while preserving global dependency structures among candidate variables.
A cascaded TCN-BiGRU predictor is designed to model local temporal fluctuations and contextual dependencies within the selected feature sequence, improving forecasting robustness across short-, medium-, and long-horizon SWH prediction.

This study evaluates the overall performance of the proposed model on multiple real-world buoy station datasets and compares it with mainstream benchmark models to verify its advantages in prediction accuracy, generalization ability, and robustness. The remainder of this paper is organized as follows: Section 2 elaborates on the architectural design and specific implementation process of the proposed OVMD-TMFG-TCN-BiGRU model. Section 3 introduces the experimental design, including the datasets used and evaluation metrics. Section 4 presents and discusses the experimental results. Section 5 summarizes the full text and provides an outlook on future research directions. For clarity and readability, the main abbreviations and acronyms used throughout this paper are summarized in Table 1.

2. Methods

2.1. General Framework

The overall architecture of the proposed hybrid forecasting model is systematically illustrated in Figure 1. As shown in the flowchart, the framework operates through a hierarchical pipeline comprising Input, Feature Engineering, Prediction, and Evaluation stages. The specific workflow is executed as follows:

Input & Decomposition: The process begins with historical SWH observations and environmental feature sequences within a fixed input window. In the Feature Engineering block, OVMD is applied to the available historical SWH segment to construct multiscale representations. This step is crucial for multi-view feature construction. It isolates intrinsic multi-scale IMFs from the raw signal, thereby constructing a comprehensive feature space that encompasses both macroscopic trends and microscopic frequency details.
Topological Feature Selection: After decomposition, the generated Intrinsic Mode Functions (IMFs) are integrated with environmental feature sequences. The TMFG algorithm is then employed during the Feature Engineering stage to address the high-dimensional feature space. This algorithm constructs a sparse dependency network and retains predictors directly connected to the target SWH node for the downstream model.
Cascaded Prediction: The topologically selected features are subsequently input into the Prediction block, which utilizes a cascaded TCN-BiGRU architecture. In this sequential design, the TCN layer first serves as a local feature extractor to capture high-frequency variations, which are then fed into the BiGRU layer to model long-term global dependencies. This hierarchical approach enables the model to learn progressively from local details to macroscopic trends.
Output & Evaluation: Finally, the model generates the predicted SWH at the specified forecasting horizon. In this study, the forecasting task is conducted separately for each lead time, including 1 h, 6 h, 12 h, 24 h, and 48 h. Therefore, for each buoy and each forecasting horizon, the model produces one predicted SWH sequence by sliding the input window through the testing period. The predicted sequence is then compared with the corresponding observed SWH sequence at the same lead time using RMSE, MAE, MAPE, R, and NSEC.

Figure 1. Flowchart of the OVMD-TMFG-TCN-BiGRU prediction model.

The proposed framework integrates decomposition, topological feature selection, and temporal prediction into a cohesive pipeline. In this architecture, OVMD reduces the non-stationarity of raw data, TMFG minimizes the computational burden arising from high-dimensional redundancy, and the cascaded TCN-BiGRU structure ensures deep extraction of temporal features. This synergistic design enables the model to maintain robust performance and generalization ability across forecasting horizons ranging from short- to long-term.

To further clarify the decomposition, feature selection, and forecasting workflow, the process of the proposed OVMD-TMFG-TCN-BiGRU framework is summarized in Algorithm 1. The OVMD mode number and the TMFG-selected feature subset are determined using only the training set and then fixed for validation and testing.

Algorithm 1 The Process Flow of OVMD-TMFG-TCN-BiGRU

Require: SWH sequence S, environmental feature sequences Z, training set D_train, validation set D_val, testing set D_test, candidate mode numbers K, forecasting horizons H = {1 h, 6 h, 12 h, 24 h, 48 h}
Ensure: Predicted SWH sequences Ŷ_h
// OVMD decomposition
1: K* = Select optimal mode number from K using D_train
2: U = OVMD(S, K*) using the fixed K*
3: X_cand = Concatenate(U, S, Z)
// TMFG-based feature selection
4: W = DependenceMatrix(X_cand from D_train)
5: GTMFG = TMFG(W)
6: F* = Select predictors directly connected to the target SWH node
7: X_selected = SelectFeatures(X_cand, F*)
8: Fix K* and F* for validation and testing
// TCN-BiGRU forecasting
9: for each h in H do
10: X_train, h = ConstructSamples(X_selected from D_train, h)
11: M_h = Train TCN-BiGRU(X_train, h)
12: X_test, h = ConstructSamples(X_selected from D_tes_t, h)
13: Ŷ_h = M_h(X_test, h)
14: end for

2.2. Optimal Variational Mode Decomposition (OVMD)

VMD is a method designed to decompose complex, nonlinear, and non-stationary signals into a series of Intrinsic Mode Functions (IMFs) [52]. Each mode oscillates around an adaptive central frequency, which facilitates tasks such as feature extraction, denoising, and prediction. The VMD decomposition can be formulated as the following constrained variational model:

\{\begin{cases} \min \{\sum_{k = 1}^{K} ‖\partial_{t} ([(\partial (t) + \frac{j}{π t}) u_{k} (t)] e^{- j ω_{k} t})‖\} \\ s . t \sum_{k = 1}^{K} u_{k} = f (\sum_{k = 1}^{K} u_{k}) \end{cases}

(1)

where f denotes the original signal,

u_{k}

represents the mode function, and

ω_{k}

is its corresponding center frequency. The formula is expressed as follows:

\begin{matrix} L ({u_{k}}, {ω_{k}}, λ) = α \sum_{k = 1}^{K} {‖\partial_{t} [(δ (t) + \frac{j}{π t}) * u_{k} (t)] e^{- j ω_{k} t}‖}_{2}^{2} \\ + ‖f (t) - \sum_{k = 1}^{K} u_{k} (t)‖ + 〈 λ (t), f (t) - \sum_{k = 1}^{K} u_{k} (t) 〉 \end{matrix}

(2)

where

α

represents the quadratic penalty parameter, while

λ

denotes the Lagrangian multiplier. Subsequently, to solve the aforementioned equation, the Alternating Direction Method of Multipliers (ADMM) is employed to obtain the mode components

u_{k}

and center frequencies

ω_{k}

:

{\hat{u}}_{k}^{n + 1} (ω) = \frac{\hat{f} (ω) - \sum_{i \neq k} {\hat{u}}_{i} (ω) + \hat{λ} (ω) / 2}{1 + 2 α {(ω - ω_{k})}^{2}} u_{k}

(3)

ω_{k}^{n + 1} = \frac{\int_{0}^{\infty} ω | {\hat{u}}_{k} (ω) |^{2} d ω}{\int_{0}^{\infty} | {\hat{u}}_{k} (ω) |^{2} d ω}

(4)

It can be observed from Equation (2) that the mode number K has a direct influence on the decomposition performance of VMD. An excessively small K may lead to insufficient decomposition and loss of important multiscale information, whereas an excessively large K may introduce redundant modes and increase the risk of over-decomposition. Therefore, this study adopts an optimal VMD strategy to determine a suitable mode number before constructing the multiscale feature set.

To avoid information leakage, the selection of K was performed only on the training set, while the validation and testing sets were not used in this process. Following the leakage-avoidance principle emphasized in decomposition-based forecasting studies, the decomposition procedure and parameter selection were restricted to the available historical data rather than the entire dataset. Candidate values of K were searched within a predefined range. For each candidate K, the decomposed IMFs were reconstructed to calculate the reconstruction error, and the center-frequency distribution of the decomposed modes was examined. The optimal K was selected as the smallest candidate value for which the reconstruction error no longer decreased substantially and the center frequencies remained stable and distinguishable. The reconstructed signal

\hat{X} (t)

can be expressed as follows:

\hat{X} (t) = \sum_{k = 1}^{K} \sum_{t = 1}^{L} u_{k} (t)

(5)

When the signal is sufficiently decomposed, further increasing K does not lead to significant changes in the reconstruction error, while the center frequencies of the newly added frequency components tend to stabilize [53]. Therefore, careful consideration of both factors allows determination of an optimal K value.

2.3. Triangulated Maximally Filtered Graph (TMFG)

The triangulated maximally filtered graph (TMFG) is a graph-based information filtering method that constructs a sparse dependency structure from high-dimensional feature relationships [54]. In this study, TMFG is used to identify direct dependency relationships between candidate predictors and the target SWH node. Compared with conventional feature selection methods based mainly on pairwise relevance, TMFG provides a sparse network representation that retains structurally important relationships among decomposed SWH components and environmental variables.

The construction process of TMFG follows an iterative greedy procedure. Initially, a dependency matrix is calculated among all candidate features, where each node represents a candidate predictor and each edge weight denotes the dependency strength between two variables. The initialization step involves selecting a clique with four nodes that has the highest total edge weight as the seed structure of the graph. Subsequently, at each iteration, the algorithm uses a gain function to evaluate the insertion of each remaining node into each existing triangular face. The node and triangular face with the maximum gain are selected. The selected node is then inserted into the current graph and connected to the three vertices of the selected triangle, as illustrated in Figure 2. This process continues until all candidate nodes have been embedded into the sparse triangulated graph. Specifically, the gain function can be expressed as follows:

S (v_{h}, t) = W (v_{h}, v_{a}) + W (v_{h}, v_{b}) + W (v_{h}, v_{c})

(6)

where

W (v_{h}, v_{a})

denotes the weight between the nodes. Furthermore, to ensure that the algorithm selects the node with the maximum gain for insertion during each iteration, thereby optimizing the graph structure, the TMFG algorithm maintains a cache of the maximum gain value and the corresponding optimal node for each triangle. The maximum gain value, MaxGain, is expressed as:

M a x G a i n = \max_{v \in {v_{1}, \dots, v_{k}}} S (v, t_{1}), \max_{v \in {v_{1}, \dots, v_{k}}} S (v, t_{2}), \dots, \max_{v \in {v_{1}, \dots, v_{k}}} S (v, t_{m})

(7)

v_{1}, v_{2} \dots, v_{k}

denotes the set of remaining uninserted nodes, and

t_{1}, t_{2}, \dots t_{m}

represents the set of all current triangles. The index corresponding to the maximum gain is given by:

\begin{matrix} B e s t V e r t e x = \arg \max_{v \in {v_{1}, \dots, v_{k}}} S (v, t_{1}), \arg \max_{v \in {v_{1}, \dots, v_{k}}} S (v, t_{2}), \\ \dots, \arg \max_{v \in {v_{1}, \dots, v_{k}}} S (v, t_{m}) \end{matrix}

(8)

After the TMFG sparse graph is constructed, feature selection is performed according to the direct connectivity between each candidate predictor and the target SWH node. In the TMFG network, non-zero elements in the sparse inverse covariance matrix indicate direct conditional dependencies between variables, whereas zero elements indicate conditional independence. Therefore, the candidate predictors directly connected to the target SWH node are retained as the final input subset for the forecasting model. In this way, TMFG is not used merely as a visualization tool. Instead, it serves as a graph-based feature selector that preserves direct dependency relationships while removing redundant or weakly connected variables.

Based on the above selection rule, the novelty of using TMFG in this study lies in its graph-based feature-selection criterion. Conventional methods such as Pearson correlation, Mutual Information (MI), and Maximal Information Coefficient (MIC) generally rank candidate variables according to their individual relevance to the target SWH. Although MI and MIC can capture nonlinear dependence, these methods still mainly operate from a pairwise relevance perspective and do not explicitly model the dependency structure among candidate predictors. This is important for OVMD-based forecasting, because decomposed IMF components and environmental variables may contain redundant or overlapping information. In contrast, TMFG constructs a sparse dependency graph involving all candidate predictors and the target SWH node, and feature selection is performed according to direct topological connectivity with the target node. Thus, TMFG aims to preserve structurally informative predictors while removing redundant or weakly connected variables.

To avoid information leakage, the TMFG-based feature selection rule is fitted only on the training set, and the selected feature subset is then fixed for validation and testing.

2.4. Cascaded Temporal Forecasting Architecture

To address the temporal complexity inherent in wave height prediction, a hybrid deep learning model is constructed. This model integrates the strengths of the TCN and the BiGRU [55,56]. The TCN is effective at aggregating local short-term fluctuations, while the BiGRU captures long-term dependencies. By combining these two architectures, the approach seeks to leverage their complementary advantages to improve overall prediction accuracy. The hybrid model is trained using an end-to-end joint training paradigm, with the Adam optimizer employed for iterative parameter updates. Figure 3 presents the overall architecture of the model, which comprises a six-layer network structure. The specific functions and parameter settings for each layer are detailed as follows:

The first TCN layer: The first layer initially receives the pre-processed input and is constructed from TCN modules, the residual block structure of which is illustrated in Figure 3a. It operates by using local receptive fields to extract short- and mid-range temporal patterns from the input sequence. The dilated architecture increases the receptive field without altering the sequence length. This parallel and stable feature extractor addresses the long-term dependency problem commonly encountered in RNN-based models. In each residual block, the convolution kernel size is set to 3, with a dilation factor of 1, the number of convolution filters is set to 25, and the ReLU function is adopted as the activation.
The second TCN layer: This layer receives time-series feature maps of identical length produced by the first layer. Composed of TCN modules with distinct parameters and functions, it expands the receptive field and synthesizes compound patterns from the preceding features. In the residual block, the kernel size is set to 5, the dilation factor d is set to 2, the number of filters is set to 50, and the activation function is set to ReLU. The rationale for this parameter configuration enables the second layer to capture intermediate- to long-term dependencies through a larger dilation factor. Stacking these two layers facilitates hierarchical feature extraction across different temporal scales.
The first BiGRU layer: The third layer processes high-level temporal features spanning extended time windows, which are produced by the second layer. As shown in Figure 3b, this layer consists of forward and backward GRU components that model the sequence in both temporal directions. The outputs are concatenated along the last dimension to generate the input for the subsequent layer. This design introduces bidirectional context into the convolutional features extracted by the TCN, thereby enhancing the model’s ability to capture long-term dependencies while preserving temporal length. In this configuration, a BiGRU layer with 32 hidden units is employed, utilizing tanh as the state nonlinearity and returning the entire sequence as output.
The second BiGRU layer: Building upon the bidirectional context incorporated by the third layer, the fourth layer performs further gated transformations and temporal aggregation to refine more abstract and stable temporal semantics. Its parameters and activation functions are identical to those of the third layer.
Feature Concatenation Layer: The fifth layer routes the original input directly after the fourth layer to preserve low-level feature information. Its schematic diagram is illustrated in Figure 3c. The purpose of this design is to provide supplementary information to subsequent layers, preventing the loss of critical low-level features within the deep network while simultaneously facilitating smoother gradient propagation. The output of this layer consists of the projected low-level features, concatenated along the feature dimension with the high-level temporal features from the fourth layer, serving as the input to the subsequent fully connected layer.
Dense Layer: The sixth layer serves as the output layer of the neural network, functioning to map the network’s feature representations to the final prediction results. Its structure, as depicted in Figure 3d, is composed of an input layer, a hidden layer, and an output layer, with the number of neurons determined by the input dimensions.

Figure 3. TCN-BiGRU network structure.

2.4.1. Temporal Convolutional Network (TCN)

The TCN is a fully convolutional one-dimensional architecture developed for sequence data, which incorporates the parallel computing capabilities of Convolutional Neural Network (CNN) into temporal modeling. In comparison to Recurrent Neural Network (RNN) and their variants, such as LSTM and GRU, TCN generally offers greater computational parallelism, more stable gradient propagation, and reduced memory consumption in long-sequence tasks. The core principle of TCN is to establish a sufficiently large receptive field using causal and dilated convolutions to capture long-range dependencies, while maintaining trainability and robustness through residual connections and regularization techniques.

In the temporal dimension, TCN enforces causality: the output y_t at any time step t depends solely on the current and historical inputs x₀, x₁, …, x_t, independent of future observations. In implementation, causal convolution is employed with appropriate zero-padding in the forward direction to prevent information leakage, thereby ensuring the model satisfies temporal constraints while maintaining an invariant sequence length. To effectively expand the receptive field without significantly increasing parameters and computational load, TCN introduces dilated convolutions with a dilation factor d in each convolutional layer, as shown in Figure 4. For a 1-D sequence

x

and a filter

f : {0, \dots, k - 1} \to ℝ

of length k, the dilated convolution operation F on sequence element s is defined as follows:

F (s) = (x *_{d} f) (s) = \sum_{i = 0}^{k - 1} f (i) \cdot x_{s - d \cdot i}

(9)

When d = 1, the operation degenerates into a standard convolution. By progressively increasing the dilation factors along the network depth, the TCN’s effective receptive field grows approximately exponentially with the number of layers, enabling coverage of long-term dependencies even at relatively shallow depths.

To address vanishing and exploding gradient issues during deep network training and to improve optimization stability, the TCN utilizes residual blocks as its primary structural components. As shown in Figure 3a, each residual block consists of two layers of dilated causal convolutions, each followed by a nonlinear activation function. Regularization methods, including weight normalization and Dropout, are implemented after each convolution to further stabilize training. The block incorporates a residual connection by summing the input with the output, using a 1 × 1 convolution to match channel dimensions when required. This architecture supports deep representation learning and facilitates effective gradient propagation across layers, which enables the network to achieve robust convergence and generalization, even with large receptive fields.

2.4.2. Bidirectional Gated Recurrent Unit (BiGRU)

The BiGRU is a sequence modeling architecture that enables bidirectional information flow by combining a forward GRU and a backward GRU. BiGRU captures forward and backward dependencies within the available historical input window. The backward GRU processes the same historical window in reverse order and does not access observations beyond the prediction origin. The complementary bidirectional mechanism substantially improves performance in sequence data modeling tasks, as shown in the structural diagram in Figure 3b.

The GRU unit serves as the core component of the BiGRU. In comparison to the LSTM, the GRU provides similar modeling capabilities while requiring fewer parameters due to its more efficient gating structure.

The GRU regulates information flow within a sequence by introducing two gating mechanisms: the Reset Gate and the Update Gate. This design effectively mitigates the vanishing gradient problem commonly observed in traditional RNNs. Figure 5 presents a schematic diagram of the GRU unit. The reset gate determines the extent to which information from the previous hidden state is discarded. The GRU unit serves as the core component of the BiGRU architecture. In comparison to the LSTM network, the GRU achieves similar modeling capabilities with fewer parameters due to its more streamlined gating structure. The calculation of the reset gate is defined as follows:

r_{t} = σ (W_{r} \cdot [h_{t - 1}, x_{t}] + b_{r})

(10)

where

x_{t}

denotes the input at the current time step,

h_{t - 1}

represents the hidden state of the previous time step, and

[h_{t - 1}, x_{t}]

indicates the concatenation of these two vectors.

W_{r}

and

b_{r}

are the weight matrix and bias term for the reset gate, respectively, and

σ

refers to the Sigmoid activation function, which maps the output values to the interval between 0 and 1. The update gate controls the amount of information from the previous hidden state

h_{t - 1}

that can be directly transmitted to the current hidden state

h_{t}

. Its calculation formula is given by:

z_{t} = σ (W_{z} \cdot [h_{t - 1}, x_{t}] + b_{z})

(11)

A value of the update gate closer to 1 indicates that a greater proportion of the information from the previous state is retained. By integrating these two gates, the GRU computes the current candidate hidden state

{\tilde{h}}_{t}

and the final hidden state

h_{t}

:

{\tilde{h}}_{t} = \tanh (W_{h} \cdot [r_{t} ⊙ h_{t - 1}, x_{t}] + b_{h})

(12)

h_{t} = (1 - z_{t}) ⊙ h_{t - 1} + z_{t} ⊙ {\tilde{h}}_{t}

(13)

The hidden state of the BiGRU can be expressed as:

\{\begin{cases} {\vec{h}}_{t} = GRU (x_{t}, {\vec{h}}_{t - 1}) \\ {\overset{\leftarrow}{h}}_{t} = GRU (x_{t}, {\overset{\leftarrow}{h}}_{t - 1}) \\ h_{t} = f (W_{h_{t}}^{\to} {\vec{h}}_{t} + W_{h_{t}}^{\leftarrow} {\overset{\leftarrow}{h}}_{t} + b_{t}) \end{cases}

(14)

where

{\vec{h}}_{t}

and

{\overset{\leftarrow}{h}}_{t}

denote the forward and backward hidden representations within the available historical input window, respectively.

W_{h_{t}}^{\to}

and

W_{h_{t}}^{\leftarrow}

represent the forward and backward weights of the hidden layer, respectively.

3. Data Description and Model Evaluation Criteria

3.1. Data Description

Buoys are the primary instruments for collecting wave data, operating by monitoring their motion in water through integrated high-precision sensors that measure parameters such as wave height, period, and direction. The model developed in this study utilizes buoy data from three stations near the Gulf of Mexico—42012, 42036, and 42055—with data sourced from the National Data Buoy Center (https://www.ndbc.noaa.gov). These datasets, labeled A, B, and C, cover the five-year period from 2020 to 2024 and include variables such as significant wave height (SWH), Wind Direction (WDIR), Wind Speed (WSPD), Gust Speed (GST), Water Temperature (WTMP), Average Wave Period (APD), Dominant Wave Period (DPD), Mean Wave Direction (MWD), Air Temperature (ATMP), Atmospheric Pressure (PRES), and Dew Point Temperature (DEWP). The buoy records were resampled at an hourly interval. The station coordinates are reported with a precision of 0.01 degrees. Table 2 presents the statistical characteristics of the data. Figure 6 displays the geographical locations of the stations and the statistical features of selected data from each station. The three stations differ in both geographical location and meteorological conditions: stations A and B are nearshore buoys, while station C is offshore. The significant differences in their characteristic data support the use of these stations for validating the model’s broad applicability and generalization capability.

3.2. Chronological Forecasting Protocol

The datasets were divided chronologically into training, validation, and testing subsets to reflect practical forecasting conditions. Specifically, the records from 2020 to 2022 were used for model training, the records from 2023 were used for validation and hyperparameter tuning, and the records from 2024 were used for final testing. No random shuffling was applied. The normalization parameters were calculated from the training set only and then applied to the validation and testing sets. The mode number of OVMD and the TMFG feature selection rule were determined using only the training set, and the selected settings were then fixed for validation and testing. For each forecast horizon, the input sample was constructed from the historical input window before the prediction origin, and the testing set was used only for final evaluation. The forecasting task was conducted separately for each lead time, including 1 h, 6 h, 12 h, 24 h, and 48 h. For each buoy dataset and each forecasting horizon, the historical input window was moved chronologically through the testing period to generate one predicted SWH sequence. The accuracy at each lead time was then evaluated by comparing this predicted sequence with the corresponding observed SWH sequence at the same forecasting horizon.

3.3. Evaluation Criteria

To comprehensively evaluate the forecasting performance of the proposed model, five statistical metrics are employed: Mean Absolute Percentage Error (MAPE), Mean Absolute Error (MAE), Root Mean Square Error (RMSE), Correlation Coefficient (R), and Nash-Sutcliffe Efficiency coefficient (NSEC).

MAPE measures the relative deviation of predictions from observed values. In this study, MAPE is calculated in decimal form to assess the relative error proportion. MAE quantifies the average absolute difference between predicted and observed wave heights, serving as a robust metric for overall accuracy. RMSE, as the square root of the mean squared error, is sensitive to large errors and retains the same unit as the original data, which makes it particularly suitable for capturing the high volatility of wave data. R assesses the strength of the linear relationship between predicted and observed series. Values approaching 1 suggest that the model accurately captures the temporal trends of the waves. Finally, NSEC evaluates the model’s predictive performance relative to the observed variance. An NSEC value closer to 1 indicates higher model efficiency and reliability compared to using the mean of the observed data.

RMSE = \sqrt{\frac{1}{z} \sum_{j = 1}^{z} {(w_{j} - {\hat{w}}_{j})}^{2}}

(15)

MAE = \frac{1}{z} \sum_{j = 1}^{z} |w_{j} - {\hat{w}}_{j}|

(16)

MAPE = \frac{1}{z} \sum_{j = 1}^{z} |\frac{w_{j} - {\hat{w}}_{j}}{w_{j}}|

(17)

R = \frac{\sum_{j = 1}^{z} (w_{j} - w_{avg}) ({\hat{w}}_{j} - {\hat{w}}_{avg})}{\sqrt{[{\sum_{j = 1}^{z} (w_{j} - w_{avg})}^{2}] [{\sum_{j = 1}^{z} ({\hat{w}}_{j} - {\hat{w}}_{avg})}^{2}]}}

(18)

NSEC = 1 - \frac{\sum_{j = 1}^{z} {(w_{j} - {\hat{w}}_{j})}^{2}}{\sum_{j = 1}^{z} {(w_{j} - {\hat{w}}_{avg})}^{2}}

(19)

where

w_{j}

and

w_{avg}

refer to the measured values and their mean, while

{\hat{w}}_{j}

and

{\hat{w}}_{avg}

represent the predicted values and their mean.

4. Experimental Results

This section presents experimental studies conducted on three datasets (A, B, and C) with distinct characteristics. First, the modeling results of the individual components are analyzed, including the experimental parameter settings, the parameter selection and decomposition results of OVMD, and the feature selection results obtained using TMFG. Subsequently, to verify the effectiveness of the proposed OVMD-TMFG-TCN-BiGRU forecasting method, several representative models are selected for comparative analysis, namely EMD-TCN, EEMD-LSTM, TCN, BiGRU, SVM, ANN and Transformer. In addition, a persistence baseline is included as a zero-training reference to evaluate the lead-time-dependent degradation of forecasting accuracy. At the same time, ablation experiments are conducted on the OVMD-TMFG-TCN-BiGRU model to validate the contribution of each component. To evaluate the model’s forecasting capability across short-term, medium-term, and long-term horizons, the experimental time horizons are set to 1, 6, 12, 24, and 48 h, and five evaluation metrics (RMSE, MAE, MAPE, R, and NSEC) are employed to comprehensively assess the predictive performance of each method.

All experiments were conducted on a computing system equipped with an AMD Ryzen 7 5800H processor (8 cores) and 12 GB of RAM. The software environment consisted of the Windows 11 operating system, utilizing PyTorch 2.3.1 as the deep learning framework, along with Python 3.12 and CUDA 11.0.

4.1. Experimental Parameter Settings

In this experiment, several comparative models were employed for verification. including EMD-TCN, EEMD-LSTM, TCN, BiGRU, SVM, ANN, Transformer, and a persistence baseline, along with the variants used in the ablation studies. The lookback window length was set to 24 h for all deep learning models, and the forecast horizons were set to 1, 6, 12, 24, and 48 h. The parameter settings for both the proposed model and the comparative models are presented in Table 3. The persistence baseline predicts the future SWH using the observed value at the prediction origin. Since this baseline has no trainable parameters, it is used only as a reference method and is not listed in Table 3. To ensure a fair comparison, hyperparameters were determined by referring to the associated literature and further tuned using the validation set. The testing set was used only for final evaluation. For parameters not explicitly specified, empirical settings were adopted in this study.

4.2. Result of OVMD Decomposition

To construct multiscale representations of SWH, OVMD was applied to datasets A, B, and C. The number of intrinsic mode functions, denoted as K, is a critical parameter that directly affects decomposition quality. A small K may lead to insufficient decomposition and the loss of important signal components, whereas an excessively large K may introduce redundant modes and reduce the stability of the decomposed components. Therefore, in this study, K was determined by jointly considering the reconstruction error and the center frequency distribution of the decomposed modes.

Candidate values of K were searched from 3 to 15. For each candidate K, the original SWH sequence was reconstructed from the decomposed IMF components, and the reconstruction error was calculated using RMSE. A lower RMSE indicates smaller information loss and better reconstruction quality, whereas a higher RMSE suggests potential signal distortion or insufficient decomposition. Figure 7, Figure 8 and Figure 9 show the RMSE and center frequency distributions of datasets A, B, and C under different K values. As K increases, the reconstruction error initially decreases rapidly, indicating improved decomposition accuracy. However, after a certain value of K, the decrease in RMSE becomes limited, suggesting that additional modes provide only marginal information and may introduce redundant decomposition.

The center frequency distributions were further examined to avoid over-decomposition. When K is too large, newly generated IMFs tend to occupy similar frequency bands and become close to adjacent components, which indicates spectral overlap and redundancy. In this study, the final K was selected near the inflection point of the reconstruction error curve while maintaining distinguishable center frequency separation. Based on this criterion, the optimal mode numbers were set to K = 10 for dataset A, K = 11 for dataset B, and K = 11 for dataset C.

The computational cost of the OVMD mode-number selection process was also evaluated under the same computing environment as the forecasting experiments. This selection process requires repeated decomposition of the training set under different candidate values of K. In our experiments, the total time required to determine the optimal K was controlled within 5 min, and the average execution time for a single OVMD decomposition was 15.59 s. Since the forecasting task in this study is conducted at an hourly interval, this computational cost does not hinder near-real-time hourly SWH forecasting. More importantly, the OVMD mode-number selection is performed only during the offline training stage. Once the optimal K is determined, it is fixed during validation, testing, and practical forecasting. Similarly, TMFG-based feature subset determination and TCN-BiGRU model training are also completed offline. Therefore, repeated OVMD parameter optimization, TMFG fitting, and model training are not required during online inference. The online forecasting stage only applies the fixed decomposition configuration, the fixed feature subset, and the trained prediction model to generate forecasts, which supports the computational feasibility of the proposed framework for practical hourly SWH forecasting.

Figure 10, Figure 11 and Figure 12 present the final OVMD decomposition results for datasets A, B, and C, respectively. Although the selected K values differ slightly among the three datasets, the resulting OVMD components consistently exhibit a hierarchical frequency structure. In all three datasets, the first several IMFs mainly correspond to high-frequency components, reflecting short-term fluctuations in the SWH series. The intermediate IMFs capture medium-frequency oscillations and describe wave variability at intermediate temporal scales. The final oscillatory modes before the trend component represent low-frequency variations with slower temporal changes. Specifically, IMF8–IMF9 for dataset A and IMF8–IMF10 for datasets B and C can be regarded as low-frequency components, while IMF10 for dataset A and IMF11 for datasets B and C represent the long-term trend components. Comparative analysis across the three datasets indicates that, despite differences in the selected K, the IMFs maintain a coherent multiscale structure. This suggests that OVMD can consistently decompose SWH records into components with distinct temporal characteristics, thereby providing a multiscale feature basis for subsequent TMFG-based feature selection and forecasting. Meanwhile, differences in amplitude and local fluctuation patterns among the three datasets may be related to their geographical locations, water depths, and regional marine environments.

From the perspective of wave dynamics, the hierarchical IMF structure can be interpreted as a multiscale representation of SWH variability. The high-frequency IMFs mainly describe rapid short-term fluctuations, which may be associated with local wind forcing, short-period wind waves, and measurement-scale variability. The intermediate-frequency IMFs reflect more persistent oscillatory variations and may be related to the combined effects of evolving wind-sea conditions and swell modulation. The low-frequency IMFs and trend components represent slowly varying background sea-state evolution, which may be influenced by larger-scale meteorological forcing, longer-period wave systems, and regional marine conditions. Therefore, although the IMFs should not be regarded as one-to-one physical wave modes, they provide temporal-scale information that is consistent with the multiscale nature of wave dynamics and can support subsequent feature selection and forecasting.

4.3. Result of Feature Selection Using TMFG

The candidate feature set for TMFG-based selection was constructed by integrating the auxiliary buoy variables described in Section 3.1, the multiscale IMF components generated by OVMD, and the historical SWH sequence. Following the TMFG-based selection strategy described in Section 2.3, a sparse dependency network was constructed for each dataset using only the training set. In this network, candidate predictors directly connected to the target SWH node were regarded as informative features and retained as the final input subset for the forecasting model. Figure 13 illustrates the correlation matrix for dataset A, the corresponding TMFG network structure, and the feature screening results. In the TMFG network, the connected nodes represent direct dependency relationships retained by the sparse filtering process. Based on this target-node adjacency criterion, the optimal feature subsets for datasets A, B, and C were identified, as shown in Table 4.

The selected environmental variables are also physically consistent with SWH evolution. WSPD and GST represent local wind forcing and wind fluctuation intensity, which directly affect wind-wave generation and short-term wave growth. DPD and APD describe characteristic wave periods and provide information on wave energy distribution and sea-state maturity. Their repeated selection across the three datasets indicates that wave-period information is important for distinguishing locally generated wind waves from more developed sea states. WTMP and ATMP are selected at some stations, suggesting that local air–sea thermal conditions may provide supplementary information on regional meteorological and oceanic variability. Overall, the TMFG-selected variables are consistent with the physical factors that influence SWH evolution, while the selected IMF components provide multiscale descriptions of historical SWH dynamics.

4.4. Overall Performance Comparison

The overall predictive performance of the OVMD-TMFG-TCN-BiGRU model was evaluated across different forecast horizons and datasets. In the experiment, representative time steps were selected to cover varying prediction horizons, specifically including short-term forecasting represented by 1 h and 6 h, medium-term forecasting centered on 12 h and 24 h, and long-term forecasting benchmarked at 48 h. Using TCN, BiGRU, SVM, ANN, EEMD-LSTM, EMD-TCN, Transformer and the persistence baseline as benchmark methods, the experimental results on datasets A, B, and C are presented in Table 5, Table 6 and Table 7. The results indicate that the OVMD-TMFG-TCN-BiGRU outperforms all benchmark methods across all evaluation metrics and time scales.

4.4.1. Short-Term Forecasting

Table 5 summarizes the prediction results for the 1 h and 6 h horizons across stations A, B, and C, and Figure 14 visually illustrates the evaluation results of the nine models, demonstrating that the OVMD-TMFG-TCN-BiGRU model achieved optimal results across all evaluation metrics for both time steps. In addition, Table 5 shows that, among the compared models, only the proposed model, EMD-TCN, and EEMD-LSTM consistently outperform the persistence reference in short term forecasting. This indicates that simple sea state persistence remains a strong reference for very short lead times, while decomposition-assisted models can better extract multiscale wave fluctuation information and therefore provide additional predictive skill beyond persistence. Specifically, taking station A as an example, compared to other baseline models at the 1 h horizon, the proposed method achieved average reductions in RMSE, MAE, and MAPE of 64.9%, 67.0%, and 66.2%, respectively, while R and NSEC increased by an average of 0.0173 and 0.0414, respectively. At the 6 h horizon, although the predictive performance of all models declined, the OVMD-TMFG-TCN-BiGRU consistently outperformed other benchmark models, with the sole exception occurring at station C, where its MAPE was slightly higher than that of EEMD-LSTM by 0.001. Furthermore, it is observed that compared to single models such as TCN and BiGRU, hybrid models like OVMD-TMFG-TCN-BiGRU exhibit a smaller magnitude of performance degradation, demonstrating the superior multi-step forecasting capability of hybrid architectures.

Figure 15 illustrates the observed and predicted values for the OVMD-TMFG-TCN-BiGRU model at 1 h and 6 h forecast intervals for stations A, B, and C. The results show that all models closely align with the observed curves for the 1 h prediction, indicating satisfactory predictive performance at this time step. However, when the forecasting horizon extends to 6 h, all models experience a decline in performance, as evidenced by increased fluctuations in the prediction curves. These fluctuations highlight the challenges of multi-step forecasting in highly dynamic environments with rapid and unpredictable variations. Single models, such as TCN and BiGRU, display greater fluctuation amplitudes, whereas hybrid models like EMD-TCN maintain lower overall errors. Notably, the OVMD-TMFG-TCN-BiGRU model achieves the closest alignment with the measured curves.

To further assess model stability, scatter plots were employed to visually evaluate the prediction performance of the OVMD-TMFG-TCN-BiGRU model and other comparative models, as illustrated in Figure 16. In these scatter plots, the x-axis denotes the measured sample values, while the y-axis indicates the predicted values generated by the models. Ideally, data points should align closely along the diagonal line y = x, which would indicate perfect agreement between predicted and measured values. The figure demonstrates that, for the 1 h prediction, data points for all models cluster near the diagonal, suggesting that each model effectively captures the primary trends in single-step forecasting. As the prediction horizon extends to 6 h, the scatter points increasingly deviate from the diagonal. Nevertheless, the OVMD-TMFG-TCN-BiGRU model’s predicted values exhibit greater consistency with the measured values than those of the other models.

4.4.2. Medium-Term Forecasting

Table 6 presents the prediction results of the nine models at the three stations for the 12 h and 24 h horizons, while Figure 17 summarizes the evaluation metrics for learning-based models. The RMSE, MAE, and MAPE for medium-term forecasting are significantly higher than those for short-term forecasting, and performance at 24 h further deteriorates compared to 12 h. This trend indicates a decline in model accuracy as the prediction horizon increases. Nevertheless, the OVMD-TMFG-TCN-BiGRU model consistently outperforms the other models across all metrics, exhibiting less performance degradation as the forecast horizon increases. For example, at station C, the model’s RMSE for the 12 h and 24 h forecasts was reduced by an average of 56.7% and 55.9%, respectively, compared to other models, while the NSEC increased by 0.259 and 0.452, respectively. The NSEC for this model decreased by only 0.0501 from 12 h to 24 h, which is substantially less than the average decrease of 0.2437 observed in other models. These results demonstrate that the OVMD-TMFG-TCN-BiGRU model maintains superior stability as the time step increases.

Figure 18 illustrates the comparison curves between predicted and observed values for the 12 h and 24 h forecasts at stations A, B, and C. In regions with significant wave heights, the OVMD-TMFG-TCN-BiGRU model demonstrates larger errors relative to short-term forecasts, yet these errors remain within an acceptable range. In contrast, although the prediction curves of EMD-TCN and EEMD-LSTM generally follow the overall trend of the measured values, they display substantial deviations at specific points. The OVMD-TMFG-TCN-BiGRU curve is smoother and more accurately represents actual wave height conditions. Additionally, the prediction curves of the LSTM, TCN, SVM, ANN, Transformer and the persistence baseline show significant deviations from the measured value curves.

To further investigate the predictive performance of the models during medium-term forecasting, Figure 19 presents the corresponding scatter plots. Compared to Figure 17, the dispersion trend in the plots is markedly enhanced, indicating that predictive performance degrades as the forecast horizon extends. However, in contrast to other algorithms, the variation in the dispersion of predicted values for the OVMD-TMFG-TCN-BiGRU model remains relatively small. As wave height increases, the scatter points for most comparison models distribute below the diagonal line, revealing a tendency to underestimate peak values. In contrast, the proposed model shows smaller dispersion than the comparative models under relatively high sea states, although prediction uncertainty still increases with wave height.

4.4.3. Long-Term Forecasting

Table 7 and Figure 20 present the prediction results for the 48-h horizon. As the prediction horizon extends to 48 h, all models exhibit a substantial decline in predictive performance. The NSEC for the TCN, BiGRU, SVM, ANN, Transformer and the persistence baseline falls below 0.1 at most observation stations, indicating that these models have lost explanatory power and perform only slightly better than a mean or persistence baseline. The persistence reference performs reasonably at short horizons because recent SWH observations contain strong continuity. However, its performance deteriorates rapidly as the forecasting horizon increases, indicating that medium- and long-term SWH forecasting cannot rely solely on sea-state persistence. In contrast, although the OVMD-TMFG-TCN-BiGRU model’s performance decreases compared to short- and medium-term forecasts, it continues to demonstrate the highest predictive accuracy. For example, at station B, the RMSE, MAE, and MAPE for the OVMD-TMFG-TCN-BiGRU model are 0.3379, 0.2458, and 0.3332, respectively, which are considerably lower than the average values of 0.5970, 0.4234, and 0.5583 for the other models. Additionally, the R and NSEC values for this model reach 0.8583 and 0.7348, respectively, exceeding the corresponding averages of 0.4099 and 0.1317 achieved by the comparative models. These results indicate that the proposed model effectively maintains connections with historical data and captures time-series relationships, even in long-term forecasting.

Figure 21 compares the performance curves of the seven models for the 48 h prediction across the various stations. Although the performance of the OVMD-TMFG-TCN-BiGRU model exhibits a slight decline compared to short- and medium-term forecasts, its prediction curve maintains the highest degree of fit with the measured data. It is worth noting that the model’s performance in capturing high wave heights shows some attenuation. In contrast, the other comparative models exhibit significantly more pronounced deviations between predicted and measured values.

Figure 22 presents the correlation analysis between the measured and predicted values. Compared to Figure 16 and Figure 19, the dispersion of data points for all models is significantly increased. Simultaneously, as wave height increases in this forecast, the data points for the OVMD-TMFG-TCN-BiGRU model exhibit a notable trend of distributing below the diagonal line, which is particularly evident at higher wave heights. This reflects a challenge in predicting high wave heights, yet the model consistently achieves the minimum dispersion and deviation relative to its counterparts.

The degradation at longer forecasting horizons can be attributed to several factors. First, SWH has strong short-term persistence, but this persistence weakens as the lead time increases, which is also reflected by the rapid deterioration of the persistence baseline. Second, medium- and long-term SWH evolution is increasingly affected by future wind forcing, storm development, swell propagation, and remote wave systems, which cannot be fully inferred from a fixed historical input window. Third, nonlinear interactions among wind sea, swell, and local bathymetric effects become more difficult to represent as the forecasting horizon extends. Therefore, all models exhibit reduced accuracy at the 48 h horizon.

4.4.4. Extreme-Wave-Event and Storm Peak Forecasting Analysis

To further evaluate the robustness of the proposed model under severe sea-state conditions, a storm-peak evaluation and an extreme-wave-event case study were conducted on dataset B. Dataset B was selected because it contains the most pronounced observed SWH peak during the independent testing year among the three datasets. The storm-peak samples were identified according to the observed SWH rather than the predicted values. Specifically, the 95th percentile of observed SWH in the testing set of dataset B was used as the threshold, and samples exceeding this threshold were defined as storm-peak samples. This definition allows the evaluation to focus on high-wave conditions that are most relevant to practical offshore applications.

To keep the storm-peak analysis concise while covering different types of forecasting methods, three representative benchmark models were selected for comparison. EMD-TCN was selected as a decomposition-based hybrid deep learning baseline, TCN was selected as a single deep learning baseline, and the persistence baseline was included as a training-free reference. These benchmark models showed relatively strong performance within their respective methodological categories in the overall comparison and represent different levels of model complexity.

Table 8 summarizes the forecasting accuracy of the proposed model and the representative benchmark models on the storm-peak samples of dataset B. RMSE and MAE were used to evaluate the storm-peak prediction performance. As shown in Table 8, the proposed OVMD-TMFG-TCN-BiGRU model achieves the lowest RMSE and MAE across all forecasting horizons. At the 1 h horizon, the proposed model obtains an RMSE of 0.0971 and an MAE of 0.0699, which are lower than those of EMD-TCN, TCN, and the persistence baseline. As the forecasting horizon increases, the storm-peak prediction errors of all models increase, indicating that high-wave forecasting becomes more difficult at longer lead times. Nevertheless, the proposed model maintains a clear advantage. At the 48 h horizon, its RMSE and MAE are 0.7775 and 0.6156, respectively, which remain lower than those of the representative benchmark models. These results indicate that the proposed framework provides more reliable storm-peak forecasting under severe sea-state conditions.

In addition to the storm-peak subset evaluation, a representative extreme-wave-event case was selected from dataset B. The selected event corresponds to the continuous high-wave process containing the maximum observed SWH in the testing period. A time window around the observed peak was used for visualization. Figure 23 compares the observed SWH with the 12 h predictions of the proposed model and the representative benchmark models during this event. The 12 h horizon was selected because it represents a challenging medium-term forecasting condition while retaining practical relevance for offshore operation planning. As shown in Figure 23, the observed SWH increases rapidly before the storm peak and then decreases after the peak. The proposed OVMD-TMFG-TCN-BiGRU model follows this growth and decay process more closely than the benchmark models. Although the maximum peak magnitude is still underestimated, the proposed model produces a smaller deviation near the observed storm peak than EMD-TCN, TCN, and the persistence baseline. In contrast, TCN substantially underestimates the high-wave process, while the persistence baseline shows an evident lag and remains high after the observed peak. These results indicate that the proposed framework provides more robust forecasting performance under severe sea-state conditions, although accurate peak prediction remains challenging.

Nevertheless, extreme-wave forecasting remains challenging. Storm peaks are often associated with rapidly evolving wind forcing, swell propagation, and nonlinear wave growth processes, which cannot be fully inferred from historical buoy observations alone. Therefore, although the proposed model improves storm-peak prediction accuracy, future work may further incorporate numerical weather prediction products or wave model outputs to improve forecasting reliability under severe sea states.

4.4.5. Statistical Significance Analysis

To assess whether the improvement of the proposed model over the benchmark models is statistically significant, the Wilcoxon signed-rank test was employed to compare paired absolute error sequences. For each forecasting horizon, the absolute error sequences from the three stations were pooled. The absolute error of the proposed model was defined as the absolute difference between the predicted and observed SWH values, and the same definition was applied to each benchmark model. The null hypothesis assumes that there is no significant difference between the paired absolute error sequences, whereas the one-sided alternative hypothesis assumes that the proposed model yields smaller absolute prediction errors than the corresponding benchmark model. A significance level of 0.05 was adopted.

Table 9 summarizes the Wilcoxon signed-rank test results between the proposed model and the benchmark models across the five forecasting horizons. For each benchmark model, the table reports the mean Z-value, the Z-value range, the p-value summary, and the number of horizons with statistically significant differences. All benchmark comparisons produce negative Z-values at all forecasting horizons, indicating that the absolute prediction errors of the proposed model are generally smaller than those of the benchmark models. Moreover, all p-values are lower than 0.001, and statistically significant differences are observed for all five horizons for each benchmark model. These results remain significant after considering multiple comparisons, indicating that the improvements of the proposed model are statistically robust rather than being caused by random variations in the testing samples.

4.5. Ablation Experiments

To validate the effectiveness of the proposed method, this study constructed four comparative models designed to exclude specific key components. Through systematic ablation experiments, these models enable the assessment of the independent contributions and functional mechanisms of each component within the overall architecture. Furthermore, all models were comprehensively tested across three datasets to quantify further the impact of individual modules within the OVMD-TMFG-TCN-BiGRU architecture on the final predictive performance.

w/o OVMD (without OVMD): This model excludes the use of OVMD for signal decomposition in constructing feature subsets, serving to evaluate the impact of OVMD on predictive performance.
w/o TMFG (without TMFG): This model omits TMFG for feature selection, aiming to investigate the effect of the absence of TMFG-based feature selection on predictive performance.
w/o TCN (without TCN): This model excludes TCN from the feature extraction process. It investigates the impact of removing TCN by comparing performance changes, thereby analyzing the role of TCN in extracting local and multiscale temporal features and enhancing prediction accuracy.
w/o BiGRU (without BiGRU): This model does not utilize BiGRU to model global dependencies. By removing the BiGRU layer and analyzing performance shifts, this setup evaluates the specific contribution of BiGRU in capturing bidirectional long-range temporal dependencies and improving prediction precision.

4.5.1. Impact of OVMD on the Model

Figure 24 presents a bar chart comparing the performance of models that utilize different feature extraction methods to assess the enhancement effect of OVMD. Comparison between the OVMD-TMFG-TCN-BiGRU and the model without OVMD demonstrates that omitting OVMD significantly reduces performance across all three datasets, with the most pronounced impact observed in the 24 h and 48 h forecasts. As shown in Table 10, for the 24 h forecast, the proposed model achieved reductions in RMSE, MAE, and MAPE of 0.283, 0.200, and 0.255, respectively, and increases in R and NSEC of 0.313 and 0.497, respectively, compared to the model without OVMD. These results indicate that OVMD decomposition substantially enhances model performance, especially for medium- and long-term forecasting.

To visually demonstrate the enhancement of model fitting capability by OVMD, Figure 25, Figure 26 and Figure 27 present comparative line charts. Results indicate that both the OVMD-TMFG-TCN-BiGRU model and the w/o OVMD model exhibit good fitting to the true values in short-term forecasting. However, distinct differences emerge in the medium- and long-term forecasts, where the model lacking OVMD deviates significantly from the measured curves, while the proposed model maintains a superior fit.

4.5.2. The Impact of TMFG Feature Selection on the Model

To assess the impact of TMFG feature selection, the OVMD-TMFG-TCN-BiGRU model was compared with its counterpart lacking TMFG, as shown in Table 11 and Figure 28. The results indicate that the inclusion of TMFG consistently enhances predictive performance across all time steps. In short- and medium-term forecasting, although performance metrics are similar, the proposed model demonstrates consistently higher accuracy. In long-term forecasting, the OVMD-TMFG-TCN-BiGRU model achieves notable NSEC improvements of 7.8%, 12.0%, and 14.2% on datasets A, B, and C, respectively. These results underscore the effectiveness of TMFG in filtering redundant noise and preserving essential information, thereby improving the model’s robustness and generalization, especially for long-term forecasting.

To further clarify the difference between TMFG and conventional feature selection methods, Pearson correlation, mutual information (MI), and maximal information coefficient (MIC) were introduced for comparison on the three datasets. All comparison methods used the same candidate feature set generated by OVMD and the same TCN-BiGRU forecasting model. The only difference among these methods was the feature selection strategy. To ensure a fair comparison, Pearson correlation, MI, and MIC retained the same number of features as TMFG for each dataset. Figure 29 presents the average RMSE and NSEC values of different feature selection methods over datasets A, B, and C across all forecasting horizons.

As shown in Figure 29, TMFG achieves better average forecasting performance at most forecasting horizons, with lower RMSE and higher NSEC. It should be noted that at the 6 h horizon, the average performance of TMFG is slightly lower than that of MIC, indicating that feature selection based on nonlinear relevance can still be competitive in some short- and medium-term forecasting cases. Overall, TMFG provides more stable performance across most horizons. When multiscale IMF components and environmental variables are jointly used for forecasting, TMFG can retain variables that have direct structural connections with the target SWH node through a sparse topological dependency network. In contrast, Pearson correlation, MI, and MIC mainly rank variables according to individual relevance or dependency strength. Therefore, TMFG is more effective in reducing redundant features while preserving structurally informative feature combinations, which further supports its role in improving the overall stability of the forecasting model.

4.5.3. Impact of TCN and BiGRU on the Model

To verify the effectiveness and superiority of the proposed method, a systematic comparison was conducted across three datasets between the complete OVMD-TMFG-TCN-BiGRU model and two ablated variants: the w/o TCN and w/o BiGRU models. Table 12 and Figure 30 present the primary performance metrics for each model across various prediction horizons. In short-term forecasting, the Mean Absolute Error of the OVMD-TMFG-TCN-BiGRU model is significantly lower than that of the comparative models. For instance, on dataset A, the Mean Absolute Error is reduced by approximately 18.5% compared to the w/o TCN and w/o BiGRU models, with similar improvements observed on datasets B and C. In medium-term forecasting, all three models generally maintain high prediction accuracy. However, the OVMD-TMFG-TCN-BiGRU consistently achieves the best results across all tasks. This advantage is particularly evident in the Mean Absolute Error and Mean Absolute Percentage Error metrics, indicating superior stability and robustness. As the prediction horizon extends to long-term forecasting, all models experience some performance degradation, yet the complete model maintains higher accuracy than the ablated variants. This superiority is especially apparent in the NSEC metric. For example, on dataset A, the NSEC improved by approximately 14.2%. Collectively, the ablation experiments demonstrate that the hybrid model, which combines TCN and BiGRU, effectively leverages their respective strengths in local temporal feature extraction and long-range dependency modeling. This integration enhances prediction accuracy and generalization across different time scales, thereby supporting the rationale and necessity of the OVMD-TMFG-TCN-BiGRU model’s structural design.

4.5.4. Sensitivity Analysis of Rolling Time Window Size

To justify the selection of the 24 h lookback window, a sensitivity analysis was conducted using input window lengths of 12 h, 24 h, 48 h, and 72 h. All other model settings were kept unchanged. Figure 31 shows the RMSE and NSEC values under different input window lengths for the three datasets and five forecasting horizons. Overall, the forecasting performance varies only slightly when the input window length changes from 12 h to 72 h, indicating that the proposed framework is relatively robust to the choice of lookback window length.

The 24 h window generally provides stable performance across different datasets and horizons. Although a longer input window may contain more historical information, it can also introduce redundant or less relevant temporal patterns and increase computational cost. In contrast, the 12 h window may be insufficient to capture longer sea-state evolution for medium- and long-horizon forecasting. Therefore, the 24 h lookback window was selected as a balanced setting between forecasting accuracy, temporal information coverage, and computational efficiency.

5. Conclusions

Addressing the need for high-precision multi-horizon significant wave height forecasting to support offshore operational decisions, including maintenance and installation window planning and high-wave monitoring, this study proposes a hybrid deep forecasting framework that integrates Optimal Variational Mode Decomposition (OVMD), Triangulated Maximally Filtered Graph (TMFG)-based feature selection, and a cascaded temporal predictor combining a Temporal Convolutional Network (TCN) with a Bidirectional Gated Recurrent Unit (BiGRU). Systematic experiments and ablation analyses on multiple real-world buoy datasets lead to the following conclusions:

The proposed OVMD-TMFG-TCN-BiGRU framework effectively handles nonlinear and non-stationary SWH sequences. OVMD is used to decompose the original SWH series into components with different temporal scales, thereby constructing multi-view representations from high-frequency fluctuations to low-frequency trends. TMFG constructs a sparse dependency network to select informative and non-redundant predictors from decomposed components and environmental variables. In the forecasting module, TCN extracts local temporal patterns, while BiGRU captures forward and backward dependencies within the available historical input window. Their combination improves the representation of temporal information across different forecast horizons.
Across three buoy stations and forecasting lead times from 1 h to 48 h, the proposed method achieves consistently better accuracy and robustness than representative statistical, machine-learning, and deep-learning baselines under multiple evaluation metrics. The improvements are more pronounced for medium- and long-horizon forecasts, and the proposed model exhibits smaller performance degradation as the lead time increases, indicating stronger stability across varying sea states and time scales. In addition, the storm-peak evaluation and extreme-wave-event case study further demonstrate that the proposed model maintains better forecasting accuracy under severe sea-state conditions. Although the peak magnitude is still underestimated during rapidly evolving extreme-wave events, the proposed framework captures the growth and decay process more effectively than the representative benchmark models.
Ablation results confirm that OVMD, TMFG, and the TCN-BiGRU predictor provide essential and complementary contributions. Removing OVMD leads to clear error increases for medium- to long-horizon forecasts, highlighting the role of multi-scale decomposition in extracting informative structures from non-stationary wave records. The exclusion of TMFG has only a limited effect at short forecasting horizons, whereas it leads to clear performance degradation at longer lead times. This finding indicates that feature selection based on dependency networks is beneficial for identifying effective feature combinations. Removing either TCN or BiGRU increases errors and dispersion, indicating that jointly modeling local multi-scale patterns and long-range dependencies is critical for reliable multi-horizon forecasting.

Overall, the proposed framework shows promising applicability for significant wave height forecasting and can provide decision support for offshore operations. Several limitations remain. First, although the three buoy stations used in this study have different water depths and nearshore/offshore characteristics, they are all located within the Gulf of Mexico. Therefore, the present validation mainly demonstrates cross-station robustness within the same broad oceanic region, while the cross-region generalization capability of the proposed framework remains to be further verified. Second, the model is mainly data-driven and relies on historical input windows; therefore, future wind forcing, storm evolution, swell propagation, and other external wave-generation processes are not explicitly represented, which may limit its accuracy at longer forecasting horizons. Third, this study focuses on deterministic point prediction, while uncertainty information is also important for operational decision making under extreme sea states. Future work will therefore focus on evaluating the proposed framework using buoy datasets from different oceanic regions with distinct wave climates and environmental forcing conditions, incorporating external forecast products and physically informed constraints to improve medium- and long-horizon forecasting, and extending the framework to probabilistic forecasting and uncertainty quantification.

Author Contributions

Z.L.: investigation, writing—original manuscript preparation, software.; G.S.: supervision, methodology.; M.L.: visualization.; T.W.: data curation.; X.W.: funding acquisition. All authors have read and agreed to the published version of the manuscript.

Funding

This work was financially supported by the National Natural Science Foundation of China (NSFC) under Grant No. 52571403 (General Program) and Grant No. 52101399 (Young Scientists Fund).

Data Availability Statement

Data will be made available on request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Astariz, S.; Iglesias, G. The economics of wave energy: A review. Renew. Sustain. Energy Rev. 2015, 45, 397–408. [Google Scholar] [CrossRef]
International Energy Agency. Renewables 2023: Analysis and Forecasts to 2028; IEA: Paris, France, 2024. [Google Scholar]
European Commission. EU Strategy on Offshore Renewable Energy (Updated Regional Goals, December 2024). Directorate-General for Energy; European Commission: Brussels, Belgium, 2024. [Google Scholar]
Aderinto, T.; Li, H. Ocean wave energy converters: Status and challenges. Energies 2018, 11, 1250. [Google Scholar] [CrossRef]
Falcão, A.F.D.O. Wave energy utilization: A review of the technologies. Renew. Sustain. Energy Rev. 2010, 14, 899–918. [Google Scholar] [CrossRef]
Sá, M.D.M.; Da Fonseca, F.X.C.; Amaral, L.; Castro, R. Optimising O&M scheduling in offshore wind farms considering weather forecast uncertainty and wake losses. Ocean Eng. 2024, 301, 117518. [Google Scholar] [CrossRef]
Taylor, J.W.; Jeon, J. Probabilistic forecasting of wave height for offshore wind turbine maintenance. Eur. J. Oper. Res. 2018, 267, 877–890. [Google Scholar] [CrossRef]
Wu, M.; Stefanakos, C.; Gao, Z.; Haver, S. Prediction of short-term wind and wave conditions for marine operations using a multi-step-ahead decomposition-ANFIS model and quantification of its uncertainty. Ocean Eng. 2019, 188, 106300. [Google Scholar] [CrossRef]
Wang, X.; Yuan, Y.; Fang, S.; Zhang, Z.; Wang, J. A novel causal inference method of exit choice behaviour analysis for passenger ships during emergency evacuation. Reliab. Eng. Syst. Saf. 2026, 272, 112489. [Google Scholar] [CrossRef]
Group, T.W. The WAM model—A third generation ocean wave prediction model. J. Phys. Oceanogr. 1988, 18, 1775–1810. [Google Scholar] [CrossRef]
Tolman, H.L. A third-generation model for wind waves on slowly varying, unsteady, and inhomogeneous depths and currents. J. Phys. Oceanogr. 1991, 21, 782–797. [Google Scholar] [CrossRef]
Booij, N.; Ris, R.C.; Holthuijsen, L.H. A third-generation wave model for coastal regions: 1. Model description and validation. J. Geophys. Res. Oceans 1999, 104, 7649–7666. [Google Scholar] [CrossRef]
Mentaschi, L.; Besio, G.; Cassola, F.; Mazzino, A. Problems in RMSE-based wave model validations. Ocean Model. 2013, 72, 53–58. [Google Scholar] [CrossRef]
Rogers, W.E.; Campbell, T.J. Implementation of Curvilinear Coordinate System in the WAVEWATCH III Model; Naval Research Laboratory: Washington, DC, USA, 2009. [Google Scholar]
Bilskie, M.V.; Asher, T.G.; Miller, P.W.; Fleming, J.G.; Hagen, S.C.; Luettich, R.A. Real-time simulated storm surge predictions during Hurricane Michael (2018). Weather Forecast. 2022, 37, 1085–1102. [Google Scholar] [CrossRef]
Sarker, M.A. Numerical modelling of waves and surge from Cyclone Chapala (2015) in the Arabian sea. Ocean Eng. 2018, 158, 299–310. [Google Scholar] [CrossRef]
Ma, J.; Cao, L.; Feng, Y.; Karatuğ, Ç.; Buber, M.; Wang, X. Intelligent analysis of ship collision accidents via Low-Rank Adaptation-based fine-tuning of medium-scale Large Language Models. Reliab. Eng. Syst. Saf. 2026, 275, 112774. [Google Scholar] [CrossRef]
Soares, C.G.; Ferreira, A.M.; Cunha, C. Linear models of the time series of significant wave height on the Southwest Coast of Portugal. Coast. Eng. 1996, 29, 149–167. [Google Scholar] [CrossRef]
Kang, B.H.; Kim, T.H.; Kong, G.Y. A novel method for long-term time series analysis of significant wave height. In Proceedings of the 2016 Techno-Ocean (Techno-Ocean), Kobe, Japan, 6–8 October 2016; IEEE: Piscataway, NJ, USA, 2016; pp. 478–484. [Google Scholar] [CrossRef]
Mahjoobi, J.; Mosabbeb, E.A. Prediction of significant wave height using regressive support vector machines. Ocean Eng. 2009, 36, 339–347. [Google Scholar] [CrossRef]
Agrawal, J.D.; Deo, M.C. On-line wave prediction. Mar. Struct. 2002, 15, 57–74. [Google Scholar] [CrossRef]
Callens, A.; Morichon, D.; Abadie, S.; Delpey, M.; Liquet, B. Using Random forest and Gradient boosting trees to improve wave forecast at a specific location. Appl. Ocean Res. 2020, 104, 102339. [Google Scholar] [CrossRef]
Fan, S.; Xiao, N.; Dong, S. A novel model to predict significant wave height based on long short-term memory network. Ocean Eng. 2020, 205, 107298. [Google Scholar] [CrossRef]
Lou, R.; Wang, W.; Li, X.; Zheng, Y.; Lv, Z. Prediction of ocean wave height suitable for ship autopilot. IEEE Trans. Intell. Transp. Syst. 2021, 23, 25557–25566. [Google Scholar] [CrossRef]
Li, X.; Cao, J.; Guo, J.; Liu, C.; Wang, W.; Jia, Z.; Su, T. Multi-step forecasting of ocean wave height using gate recurrent unit networks with multivariate time series. Ocean Eng. 2022, 248, 110689. [Google Scholar] [CrossRef]
Hajirahimi, Z.; Khashei, M. Hybrid structures in time series modeling and forecasting: A review. Eng. Appl. Artif. Intell. 2019, 86, 83–106. [Google Scholar] [CrossRef]
Zhang, J.; Luo, F.; Quan, X.; Wang, Y.; Shi, J.; Shen, C.; Zhang, C. Improving wave height prediction accuracy with deep learning. Ocean Model. 2024, 188, 102312. [Google Scholar] [CrossRef]
Wang, M.; Ying, F. Point and interval prediction for significant wave height based on LSTM-GRU and KDE. Ocean Eng. 2023, 289, 116247. [Google Scholar] [CrossRef]
Ahmed, A.A.M.; Jui, S.J.J.; Al-Musaylh, M.S.; Raj, N.; Saha, R.; Deo, R.C.; Saha, S.K. Hybrid deep learning model for wave height prediction in Australia’s wave energy region. Appl. Soft Comput. 2024, 150, 111003. [Google Scholar] [CrossRef]
Almaliki, A.H.; Khattak, A. Short- and long-term tidal level forecasting: A novel hybrid TCN + LSTM framework. J. Sea Res. 2025, 204, 102577. [Google Scholar] [CrossRef]
Faruque, M.O.; Hossain, M.A.; Alam, S.M.M.; Khalid, M. Constraint-aware wind power forecasting with an optimized hybrid machine learning model. Energy Convers. Manag. X 2025, 27, 101026. [Google Scholar] [CrossRef]
Mahdi, E.; Martin-Barreiro, C.; Cabezas, X. A novel hybrid approach using an attention-based transformer+GRU model for predicting cryptocurrency prices. Mathematics 2025, 13, 1484. [Google Scholar] [CrossRef]
Kong, X.; Chen, Z.; Liu, W.; Ning, K.; Zhang, L.; Marier, S.M.; Liu, Y.; Chen, Y.; Xia, F. Deep learning for time series forecasting: A survey. Int. J. Mach. Learn. Cybern. 2025, 16, 5079–5112. [Google Scholar] [CrossRef]
Meng, F.; Song, T.; Xu, D.; Xie, P.; Li, Y. Forecasting tropical cyclones wave height using bidirectional gated recurrent unit. Ocean Eng. 2021, 234, 108795. [Google Scholar] [CrossRef]
Colosi, L.V.; Bôas, A.B.V.; Gille, S.T. The seasonal cycle of significant wave height in the ocean: Local versus remote forcing. J. Geophys. Res. Oceans 2021, 126, e2021JC017198. [Google Scholar] [CrossRef]
Grossmann-Matheson, G.; Young, I.R.; Meucci, A.; Alves, J.H. Global tropical cyclone extreme wave height climatology. Sci. Rep. 2024, 14, 4167. [Google Scholar] [CrossRef]
Olivetti, L.; Messori, G. Advances and prospects of deep learning for medium-range extreme weather forecasting. Geosci. Model Dev. 2024, 17, 2347–2358. [Google Scholar] [CrossRef]
Zhou, S.; Bethel, B.J.; Sun, W.; Zhao, Y.; Xie, W.; Dong, C. Improving significant wave height forecasts using a joint empirical mode decomposition–long short-term memory network. J. Mar. Sci. Eng. 2021, 9, 744. [Google Scholar] [CrossRef]
Lou, R.; Lv, Z.; Guizani, M. Wave height prediction suitable for maritime transportation based on green ocean of things. IEEE Trans. Artif. Intell. 2022, 4, 328–337. [Google Scholar] [CrossRef]
Song, T.; Wang, J.; Huo, J.; Wei, W.; Han, R.; Xu, D.; Meng, F. Prediction of significant wave height based on EEMD and deep learning. Front. Mar. Sci. 2023, 10, 1089357. [Google Scholar] [CrossRef]
Wang, J.; Bethel, B.J.; Xie, W.; Dong, C. A hybrid model for significant wave height prediction based on an improved empirical wavelet transform decomposition and long-short term memory network. Ocean Model. 2024, 189, 102367. [Google Scholar] [CrossRef]
Chen, J.; Li, S.; Zhu, J.; Liu, M.; Li, R.; Cui, X.; Li, L. Significant wave height prediction based on variational mode decomposition and dual network model. Ocean Eng. 2025, 323, 120533. [Google Scholar] [CrossRef]
Xu, R.; Fang, H.; Zeng, H.; Wu, B. A novel interpretable wind speed forecasting based on the multivariate variational mode decomposition and temporal fusion transformer. Energy 2025, 331, 136497. [Google Scholar] [CrossRef]
Yu, Y.; Dai, D.; Yang, Q.; Zeng, Q.; Lin, Y.; Chen, Y. An intelligent framework based on optimized variational mode decomposition and temporal convolutional network: Applications to stock index multi-step forecasting. Expert Syst. Appl. 2025, 268, 126222. [Google Scholar] [CrossRef]
Ma, K.; Nie, X.; Yang, J.; Zha, L.; Li, G.; Li, H. A power load forecasting method in port based on VMD-ICSS-hybrid neural network. Appl. Energy 2025, 377, 124246. [Google Scholar] [CrossRef]
Ma, C.; Hu, Y.; Xu, X. Hybrid deep learning model with VMD-BiLSTM-GRU networks for short-term traffic flow prediction. Data Sci. Manag. 2024, 8, 257–269. [Google Scholar] [CrossRef]
Li, G.; Yu, Z.; Yang, K.; Lin, M.; Chen, C.L.P. Exploring feature selection with limited labels: A comprehensive survey of semi-supervised and unsupervised approaches. IEEE Trans. Knowl. Data Eng. 2024, 36, 6124–6144. [Google Scholar] [CrossRef]
Cao, W.; Wang, X.; Shu, Y.; Li, H.; Zhou, J.; Yang, Z. An integrated method of advanced optimisation and adaptive ensemble learning for ship fuel consumption prediction. Transp. Res. Part C Emerg. Technol. 2026, 188, 105659. [Google Scholar] [CrossRef]
Lu, P.; Chen, Y.; Chen, M.; Wang, Z.; Zheng, Z.; Wang, T.; Kong, R. An improved stacking-based model for wave height prediction. Electron. Res. Arch. 2024, 32, 4543–4562. [Google Scholar] [CrossRef]
Li, Y.; Qin, X.; Zhu, D. Nearshore significant wave height prediction based on MIC-LSTM model. Earth Sci. Inform. 2023, 16, 3963–3979. [Google Scholar] [CrossRef]
Zhou, J.; Zhou, L.; Zhao, Y.; Wu, K. Significant wave height prediction based on improved fuzzy C-means clustering and bivariate kernel density estimation. Renew. Energy 2025, 245, 122787. [Google Scholar] [CrossRef]
Wang, H.; Chen, S.; Zhai, W. Variational generalized nonlinear mode decomposition: Algorithm and applications. Mech. Syst. Signal Process. 2024, 206, 110913. [Google Scholar] [CrossRef]
Ni, Q.; Ji, J.C.; Feng, K.; Halkon, B. A fault information-guided variational mode decomposition (FIVMD) method for rolling element bearings diagnosis. Mech. Syst. Signal Process. 2022, 164, 108216. [Google Scholar] [CrossRef]
Liu, Q.; Yahyapour, R.; Murray, R. A novel clustering-forecast method with nonlinear logo information filtering networks. Int. J. Intell. Syst. 2025, 2025, 6410414. [Google Scholar] [CrossRef]
Chen, Y.; Li, D.; Huang, X.; Hong, J.; Mu, C.; Wu, L.; Li, K. Exploring life warning solution of lithium-ion batteries in real-world scenarios: TCN-transformer fusion model for battery pack SOH estimation. Energy 2025, 335, 138053. [Google Scholar] [CrossRef]
Wang, Z.; Zhang, H.; Li, B.; Fan, X.; Ma, Z.; Zhou, J. An IMFO-LSTM_BIGRU combined network for long-term multiple battery states prediction for electric vehicles. Energy 2024, 309, 133069. [Google Scholar] [CrossRef]

Figure 2. TMFG topological move diagram.

Figure 4. Architecture of dilated causal convolutions.

Figure 5. Schematic diagram of GRU unit.

Figure 6. Original data from three stations.

Figure 7. Error metrics and center frequencies for dataset A under different decomposition numbers.

Figure 8. Error metrics and center frequencies for dataset B under different decomposition numbers.

Figure 9. Error metrics and center frequencies for dataset C under different decomposition numbers.

Figure 10. Decomposition results of SWH for dataset A.

Figure 11. Decomposition results of SWH for dataset B.

Figure 12. Decomposition results of SWH for dataset C.

Figure 13. TMFG-based feature selection results for dataset A.The thickness of each line represents the correlation strength between variables, with thicker lines indicating stronger correlations.

Figure 14. Visualization of short-term prediction performance.

Figure 15. Comparison curves of short-term predicted values versus measured values.

Figure 16. Scatter plots comparing measured values and short-term predicted values.

Figure 17. Visualization of medium-term prediction performance.

Figure 18. Comparison curves of medium-term predicted values versus measured values.

Figure 19. Scatter plots comparing measured values and medium-term predicted values.

Figure 20. Visualization of long-term prediction performance.

Figure 21. Comparison curves of long-term predicted values versus measured values.

Figure 22. Scatter plots comparing measured values and long-term predicted values.

Figure 23. Extreme-wave-event forecasting case on dataset B.

Figure 24. Bar chart of the OVMD ablation study results.

Figure 25. Line chart of OVMD ablation results for short-term prediction.

Figure 26. Line chart of OVMD ablation results for medium-term prediction.

Figure 27. Line chart of OVMD ablation results for long-term prediction.

Figure 28. Bar chart of the TMFG ablation study results.

Figure 29. Average RMSE and NSEC comparison of different feature selection methods across forecasting horizons.

Figure 30. Bar chart of the TCN and BiGRU ablation study results.

Figure 31. Sensitivity analysis of lookback window length based on RMSE and NSEC.

Table 1. List of abbreviations and acronyms used in the paper.

Abbreviation	Definition	Abbreviation	Definition
SWH	Significant Wave Height	EMD	Empirical Mode Decomposition
WDIR	Wind Direction	EEMD	Ensemble Empirical Mode Decomposition
WSPD	Wind Speed	TCN	Temporal Convolutional Network
GST	Gust Speed	BiGRU	Bidirectional Gated Recurrent Unit
APD	Average Wave Period	GRU	Gated Recurrent Unit
PRES	Atmospheric Pressure	LSTM	Long Short-Term Memory
ATMP	Air Temperature	SVM	Support Vector Machine
WTMP	Water Temperature	ANN	Artificial Neural Network
DEWP	Dew Point Temperature	RMSE	Root Mean Square Error
DPD	Dominant Wave Period	MAPE	Mean Absolute Percentage Error
MWD	Mean Wave Direction	MAE	Mean Absolute Error
TMFG	Triangulated Maximally Filtered Graph	R	Correlation Coefficient
OVMD	Optimal Variational Mode Decomposition	NSEC	Nash-Sutcliffe Efficiency Coefficient

Table 2. Statistical properties of the data utilized in this study.

Station	Latitude/°N	Longitude/°W	Period	Depth/m	Max SWH/m	Min SWH/m
A	30.060	87.548	2020–2024	23.5	8.19	0.08
B	28.500	84.505	2020–2024	53.3	7.03	0.09
C	24.140	94.122	2020–2024	3608	7.69	0.11

Table 3. Main parameters of models.

Model	Parameter Name	Value
OVMD-TMFG-TCN-BiGRU/w/o OVMD/w/o TMFG/w/o TCN/w/o BiGRU	Kernel size	(3, 5)
	Number of filters	(25, 50)
	Time steps	24
	Learning rate	0.001
	Batch size	128
	Activation	ReLU
	Optimizer	Adam
EMD-TCN/TCN	Kernel size	3
	Number of filters	24
	Time steps	24
	Learning rate	0.001
	Batch size	128
	Activation	ReLU
	Optimizer	Adam
EEMD-LSTM	Number of neurons	24
	Time steps	24
	Learning rate	0.001
	Batch size	128
	Activation	Tanh
	Optimizer	Adam
BiGRU	Number of neurons	64
	Time steps	24
	Learning rate	0.01
	Batch size	128
	Activation	Tanh
	Optimizer	Adam
SVM	Regularization Parameter	1.0
	Kernel	rbf
	Gamma	Scale
	Epsilon	0.1
ANN	Number of neurons	128
	Time steps	24
	Learning rate	0.001
	Batch size	128
	Activation	ReLU
	Optimizer	Adam
Transformer	Number of layers	2
	Number of heads	4
	Time steps	24
	Learning rate	0.001
	Batch size	128
	Activation	GeLU
	Optimizer	Adam

Table 4. Results of feature selection.

Dataset	Feature Selection Result
A	IMF1, IMF2, IMF3, IMF4, IMF5, IMF8, IMF9, IMF10, WSPD, GST, DPD, APD, WTMP
B	IMF1, IMF2, IMF3, IMF4, IMF5, IMF6, IMF7, IMF8, IMF9, IMF10, IMF11, WSPD, GST, DPD, APD
C	IMF1, IMF2, IMF3, IMF4, IMF5, IMF6, IMF7, IMF10, IMF11, WSPD, GST, DPD, APD, ATMP, WTMP

Table 5. Performance metrics of short-term prediction for all models across the three stations.

Station	Model	Time Steps	RMSE	MAE	MAPE	R	NSEC
A	OVMD-TMFG-TCN-BiGRU	1 h	0.0343	0.0228	0.0324	0.9982	0.9963
		6 h	0.0878	0.0574	0.0838	0.9879	0.9758
	EMD-TCN	1 h	0.0659	0.0450	0.0627	0.9935	0.9864
		6 h	0.1014	0.0657	0.0934	0.9837	0.9677
	EEMD-LSTM	1 h	0.0673	0.0459	0.0632	0.9932	0.9858
		6 h	0.1104	0.0789	0.1114	0.9851	0.9618
	TCN	1 h	0.1073	0.0764	0.1081	0.9827	0.9639
		6 h	0.2673	0.1883	0.2509	0.8814	0.7759
	BiGRU	1 h	0.1057	0.0896	0.1389	0.9851	0.9649
		6 h	0.2663	0.1872	0.2558	0.8824	0.7775
	SVM	1 h	0.1447	0.1145	0.1943	0.9755	0.9343
		6 h	0.2778	0.2034	0.2938	0.8736	0.7579
	ANN	1 h	0.1918	0.1538	0.2545	0.9484	0.8846
		6 h	0.3240	0.2302	0.3149	0.8193	0.6707
	Transformer	1 h	0.1000	0.0661	0.0848	0.9863	0.9686
		6 h	0.3031	0.2067	0.2380	0.8835	0.7119
	Persistence	1 h	0.0919	0.0581	0.0693	0.9867	0.9735
		6 h	0.2691	0.1815	0.2205	0.8864	0.7728
B	OVMD-TMFG-TCN-BiGRU	1 h	0.0376	0.0257	0.0301	0.9985	0.9967
		6 h	0.1166	0.0822	0.0934	0.9851	0.9684
	EMD-TCN	1 h	0.0576	0.0358	0.0394	0.9963	0.9923
		6 h	0.1484	0.0992	0.1114	0.9741	0.9488
	EEMD-LSTM	1 h	0.0544	0.0342	0.038	0.9966	0.9931
		6 h	0.1503	0.0996	0.1076	0.9741	0.9474
	TCN	1 h	0.1035	0.0789	0.1069	0.9907	0.9751
		6 h	0.3028	0.2108	0.2675	0.8879	0.7867
	BiGRU	1 h	0.1337	0.1013	0.1468	0.9808	0.9584
		6 h	0.3002	0.2042	0.255	0.8906	0.7905
	SVM	1 h	0.1749	0.1283	0.1846	0.9688	0.9289
		6 h	0.3209	0.2176	0.2801	0.8785	0.7605
	ANN	1 h	0.1847	0.126	0.1484	0.9628	0.9206
		6 h	0.3578	0.241	0.2807	0.8394	0.7023
	Transformer	1 h	0.1572	0.1107	0.1390	0.9741	0.9425
		6 h	0.3201	0.2207	0.2455	0.9016	0.7618
	Persistence	1 h	0.1033	0.0645	0.0651	0.9876	0.9752
		6 h	0.3246	0.2092	0.2156	0.8774	0.7548
C	OVMD-TMFG-TCN-BiGRU	1 h	0.0428	0.0302	0.0322	0.9985	0.9964
		6 h	0.0958	0.0647	0.0640	0.9918	0.9819
	EMD-TCN	1 h	0.0648	0.0408	0.0367	0.9962	0.9917
		6 h	0.1285	0.0794	0.0689	0.9848	0.9674
	EEMD-LSTM	1 h	0.0811	0.0569	0.0579	0.9941	0.9870
		6 h	0.1084	0.0684	0.0630	0.9884	0.9768
	TCN	1 h	0.1377	0.0864	0.0803	0.9816	0.9626
		6 h	0.3309	0.2071	0.1717	0.8973	0.7840
	BiGRU	1 h	0.1388	0.0924	0.0821	0.9837	0.9620
		6 h	0.3214	0.2057	0.1778	0.9115	0.7963
	SVM	1 h	0.1982	0.1359	0.1379	0.9627	0.9225
		6 h	0.3343	0.2091	0.1891	0.8898	0.7795
	ANN	1 h	0.2018	0.1437	0.1410	0.9596	0.9196
		6 h	0.3663	0.2377	0.2070	0.8761	0.7353
	Transformer	1 h	0.1365	0.0942	0.0940	0.9846	0.9633
		6 h	0.2992	0.1881	0.1634	0.9119	0.8234
	Persistence	1 h	0.1100	0.0682	0.0557	0.9881	0.9761
		6 h	0.2882	0.1736	0.1445	0.9180	0.8361

Table 6. Performance metrics of medium-term prediction for all models across the three stations.

Station	Model	Time Steps	RMSE	MAE	MAPE	R	NSEC
A	OVMD-TMFG-TCN-BiGRU	12 h	0.1356	0.0901	0.127	0.9722	0.9424
		24 h	0.2305	0.1433	0.1903	0.9151	0.8334
	EMD-TCN	12 h	0.1807	0.1287	0.1696	0.9564	0.8976
		24 h	0.2730	0.1845	0.2225	0.8941	0.7664
	EEMD-LSTM	12 h	0.1887	0.1361	0.1812	0.9563	0.8884
		24 h	0.2808	0.1833	0.2213	0.8877	0.7529
	TCN	12 h	0.3754	0.2653	0.3721	0.7502	0.5581
		24 h	0.4798	0.3297	0.4486	0.5427	0.2784
	BiGRU	12 h	0.3752	0.2680	0.3832	0.7504	0.5586
		24 h	0.4807	0.3244	0.4198	0.5471	0.2758
	SVM	12 h	0.3768	0.2650	0.3723	0.7458	0.5548
		24 h	0.4880	0.3399	0.4769	0.5234	0.2536
	ANN	12 h	0.4015	0.2762	0.3767	0.7035	0.4945
		24 h	0.5092	0.3787	0.5770	0.4701	0.1872
	Transformer	12 h	0.4063	0.2630	0.2828	0.7403	0.4823
		24 h	0.4677	0.3092	0.3667	0.6004	0.3143
	Persistence	12 h	0.4109	0.2746	0.3396	0.7353	0.4707
		24 h	0.5797	0.3971	0.5015	0.4738	−0.0529
B	OVMD-TMFG-TCN-BiGRU	12 h	0.1651	0.1212	0.1454	0.9714	0.9366
		24 h	0.2216	0.1573	0.2027	0.9446	0.8858
	EMD-TCN	12 h	0.2063	0.1451	0.1598	0.9551	0.9010
		24 h	0.2893	0.2194	0.2718	0.9210	0.8054
	EEMD-LSTM	12 h	0.1914	0.1361	0.1561	0.9599	0.9148
		24 h	0.2811	0.2141	0.2619	0.9223	0.8163
	TCN	12 h	0.4265	0.3034	0.3950	0.7630	0.5771
		24 h	0.5394	0.3902	0.5409	0.5746	0.3237
	BiGRU	12 h	0.4308	0.2956	0.3725	0.7553	0.5684
		24 h	0.5346	0.3835	0.5402	0.5901	0.3356
	SVM	12 h	0.4277	0.2829	0.3522	0.7642	0.5746
		24 h	0.5420	0.3562	0.4357	0.5786	0.3171
	ANN	12 h	0.4265	0.3034	0.3950	0.7630	0.5771
		24 h	0.5394	0.3902	0.5409	0.5746	0.3237
	Transformer	12 h	0.4591	0.2979	0.2967	0.7673	0.5099
		24 h	0.5732	0.3649	0.3657	0.5708	0.2363
	Persistence	12 h	0.4823	0.3167	0.3355	0.7296	0.4592
		24 h	0.6587	0.4305	0.4747	0.4962	−0.0074
C	OVMD-TMFG-TCN-BiGRU	12 h	0.1352	0.0849	0.0770	0.9818	0.9640
		24 h	0.2089	0.1384	0.1273	0.9566	0.9139
	EMD-TCN	12 h	0.1695	0.1130	0.1074	0.9719	0.9433
		24 h	0.3210	0.1962	0.1928	0.8951	0.7969
	EEMD-LSTM	12 h	0.1682	0.1095	0.1004	0.9756	0.9442
		24 h	0.2964	0.1826	0.1912	0.9163	0.8268
	TCN	12 h	0.4236	0.2665	0.2331	0.8168	0.6461
		24 h	0.5570	0.3520	0.3089	0.6382	0.3883
	BiGRU	12 h	0.4206	0.2572	0.2154	0.8275	0.6511
		24 h	0.5561	0.3461	0.2804	0.6782	0.3904
	SVM	12 h	0.4476	0.2763	0.2413	0.7904	0.6050
		24 h	0.5839	0.3621	0.3025	0.5968	0.3280
	ANN	12 h	0.4672	0.2952	0.2565	0.7625	0.5695
		24 h	0.5853	0.3674	0.3031	0.6117	0.3246
	Transformer	12 h	0.4324	0.2648	0.2275	0.8142	0.6313
		24 h	0.5921	0.3736	0.3183	0.5818	0.3089
	Persistence	12 h	0.4202	0.2613	0.2194	0.8259	0.6519
		24 h	0.5839	0.3687	0.3132	0.6642	0.3292

Table 7. Performance metrics of long-term prediction for all models across the three stations.

Station	Model	Time Steps	RMSE	MAE	MAPE	R	NSEC
A	OVMD-TMFG-TCN-BiGRU	48 h	0.3479	0.2462	0.3309	0.7956	0.6208
	EMD-TCN	48 h	0.3903	0.2736	0.3641	0.7281	0.5226
	EEMD-LSTM	48 h	0.3974	0.2758	0.3691	0.7188	0.5051
	TCN	48 h	0.5469	0.4208	0.6615	0.3220	0.0629
	BiGRU	48 h	0.5391	0.4052	0.6187	0.3314	0.0893
	SVM	48 h	0.5539	0.4073	0.5716	0.3095	0.0386
	ANN	48 h	0.5500	0.4244	0.6737	0.2694	0.0523
	Transformer	48 h	0.5425	0.3897	0.5373	0.3065	0.0778
	Persistence	48 h	0.7346	0.5341	0.7296	0.1572	−0.6865
B	OVMD-TMFG-TCN-BiGRU	48 h	0.3379	0.2458	0.3332	0.8583	0.7348
	EMD-TCN	48 h	0.3945	0.2779	0.3729	0.8107	0.6385
	EEMD-LSTM	48 h	0.3942	0.2864	0.3627	0.8022	0.6391
	TCN	48 h	0.6347	0.4574	0.6376	0.2694	0.0642
	BiGRU	48 h	0.6197	0.4447	0.6230	0.3289	0.1079
	SVM	48 h	0.6412	0.4359	0.5229	0.2902	0.0450
	ANN	48 h	0.6363	0.4581	0.6334	0.2682	0.0595
	Transformer	48 h	0.6311	0.4496	0.6072	0.2978	0.0748
	Persistence	48 h	0.8245	0.5770	0.7063	0.2122	−0.5754
C	OVMD-TMFG-TCN-BiGRU	48 h	0.3759	0.2469	0.2345	0.8476	0.7174
	EMD-TCN	48 h	0.4487	0.2738	0.2754	0.7900	0.5973
	EEMD-LSTM	48 h	0.4564	0.2924	0.2622	0.7731	0.5833
	TCN	48 h	0.6673	0.4355	0.3771	0.4196	0.1230
	BiGRU	48 h	0.6810	0.4399	0.3618	0.4695	0.0865
	SVM	48 h	0.6769	0.4510	0.3999	0.3792	0.0976
	ANN	48 h	0.6998	0.4803	0.4101	0.4228	0.0354
	Transformer	48 h	0.6792	0.4509	0.3846	0.4035	0.0915
	Persistence	48 h	0.7551	0.5041	0.4586	0.4393	−0.1188

Table 8. Forecasting accuracy on storm-peak samples of dataset B.

Forecast Horizon	Model	RMSE_Peak	MAE_Peak
1 h	OVMD-TMFG-TCN-BiGRU	0.0971	0.0699
	EMD-TCN	0.1525	0.1102
	TCN	0.2411	0.1734
	Persistence	0.2831	0.2114
6 h	OVMD-TMFG-TCN-BiGRU	0.2555	0.1956
	EMD-TCN	0.3670	0.2806
	TCN	0.7969	0.6248
	Persistence	0.8890	0.6895
12 h	OVMD-TMFG-TCN-BiGRU	0.3312	0.2559
	EMD-TCN	0.5015	0.3987
	TCN	1.1785	0.9492
	Persistence	1.3203	1.0630
24 h	OVMD-TMFG-TCN-BiGRU	0.4676	0.3461
	EMD-TCN	0.5778	0.4598
	TCN	1.5950	1.3697
	Persistence	1.6696	1.3750
48 h	OVMD-TMFG-TCN-BiGRU	0.7775	0.6156
	EMD-TCN	0.8791	0.6655
	TCN	1.9720	1.7965
	Persistence	1.9173	1.6680

Table 9. Summary of Wilcoxon signed-rank test results across forecasting horizons.

Benchmark Model	Mean Z-Value	Z-Value Range	p-Value Summary
EMD-TCN	−34.2893	−47.5312 to −13.4611	<0.001 for all horizons
EEMD-LSTM	−41.1268	−67.2927 to −20.4219	<0.001 for all horizons
TCN	−88.2854	−100.1076 to −65.6662	<0.001 for all horizons
BiGRU	−88.2287	−103.1514 to −67.2136	<0.001 for all horizons
SVM	−91.734	−114.7246 to −70.9846	<0.001 for all horizons
ANN	−94.6545	−115.3260 to −65.5838	<0.001 for all horizons
Transformer	−87.0389	−104.0576 to −67.9830	<0.001 for all horizons
Persistence	−84.0683	−90.1019 to −77.1228	<0.001 for all horizons

Table 10. Ablation results of OVMD.

Station	Time Steps	Model	RMSE	MAE	MAPE	R	NSEC
A	1 h	OVMD-TMFG-TCN-BiGRU	0.0343	0.0228	0.0324	0.9982	0.9963
		w/oOVMD	0.0917	0.0640	0.0869	0.9885	0.9736
	6 h	OVMD-TMFG-TCN-BiGRU	0.0878	0.0574	0.0838	0.9879	0.9758
		w/oOVMD	0.2456	0.1805	0.2640	0.9045	0.8108
	12 h	OVMD-TMFG-TCN-BiGRU	0.1356	0.0901	0.1270	0.9722	0.9424
		w/oOVMD	0.3568	0.2420	0.3143	0.7791	0.6009
	24 h	OVMD-TMFG-TCN-BiGRU	0.2426	0.1538	0.1877	0.9146	0.8155
		w/oOVMD	0.4656	0.3172	0.4133	0.5862	0.3206
	48 h	OVMD-TMFG-TCN-BiGRU	0.3479	0.2462	0.3309	0.7956	0.6208
		w/oOVMD	0.5394	0.3965	0.5784	0.3304	0.0881
B	1 h	OVMD-TMFG-TCN-BiGRU	0.0376	0.0257	0.0301	0.9985	0.9967
		w/oOVMD	0.0976	0.0709	0.0915	0.9907	0.9778
	6 h	OVMD-TMFG-TCN-BiGRU	0.1166	0.0822	0.0934	0.9851	0.9684
		w/oOVMD	0.2895	0.2043	0.2672	0.9026	0.8050
	12 h	OVMD-TMFG-TCN-BiGRU	0.1651	0.1212	0.1454	0.9714	0.9366
		w/oOVMD	0.4311	0.3109	0.4232	0.7635	0.5678
	24 h	OVMD-TMFG-TCN-BiGRU	0.2216	0.1573	0.2027	0.9446	0.8858
		w/oOVMD	0.5226	0.3731	0.5176	0.6110	0.3651
	48 h	OVMD-TMFG-TCN-BiGRU	0.3379	0.2458	0.3332	0.8583	0.7348
		w/oOVMD	0.6271	0.4479	0.6144	0.2958	0.0864
C	1 h	OVMD-TMFG-TCN-BiGRU	0.0428	0.0302	0.0322	0.9985	0.9964
		w/oOVMD	0.1092	0.0723	0.0640	0.9887	0.9765
	6 h	OVMD-TMFG-TCN-BiGRU	0.0958	0.0647	0.0640	0.9918	0.9819
		w/oOVMD	0.2921	0.1797	0.1491	0.9217	0.8317
	12 h	OVMD-TMFG-TCN-BiGRU	0.1352	0.0849	0.0770	0.9818	0.9640
		w/oOVMD	0.4010	0.2460	0.2154	0.8363	0.6829
	24 h	OVMD-TMFG-TCN-BiGRU	0.2089	0.1384	0.1273	0.9566	0.9139
		w/oOVMD	0.5337	0.3579	0.3517	0.6806	0.4385
	48 h	OVMD-TMFG-TCN-BiGRU	0.3759	0.2469	0.2345	0.8476	0.7174
		w/oOVMD	0.6513	0.4353	0.3997	0.4621	0.1645

Table 11. Ablation results of TMFG.

Station	Time Steps	Model	RMSE	MAE	MAPE	R	NSEC
A	1 h	OVMD-TMFG-TCN-BiGRU	0.0343	0.0228	0.0324	0.9982	0.9963
		w/oTMFG	0.0581	0.0400	0.0561	0.9948	0.9894
	6 h	OVMD-TMFG-TCN-BiGRU	0.0878	0.0574	0.0838	0.9879	0.9758
		w/oTMFG	0.1011	0.0702	0.0985	0.9868	0.9679
	12 h	OVMD-TMFG-TCN-BiGRU	0.1356	0.0901	0.1270	0.9722	0.9424
		w/oTMFG	0.1968	0.1374	0.1766	0.9449	0.8785
	24 h	OVMD-TMFG-TCN-BiGRU	0.2426	0.1538	0.1877	0.9146	0.8155
		w/oTMFG	0.2574	0.1637	0.2021	0.9003	0.7924
	48 h	OVMD-TMFG-TCN-BiGRU	0.3479	0.2462	0.3309	0.7956	0.6208
		w/oTMFG	0.3679	0.2627	0.3729	0.7675	0.5758
B	1 h	OVMD-TMFG-TCN-BiGRU	0.0376	0.0257	0.0301	0.9985	0.9967
		w/oTMFG	0.0453	0.0297	0.0333	0.9976	0.9952
	6 h	OVMD-TMFG-TCN-BiGRU	0.1166	0.0822	0.0934	0.9851	0.9684
		w/oTMFG	0.1358	0.0931	0.1094	0.9799	0.9571
	12 h	OVMD-TMFG-TCN-BiGRU	0.1651	0.1212	0.1454	0.9714	0.9366
		w/oTMFG	0.1848	0.1304	0.1628	0.9617	0.9206
	24 h	OVMD-TMFG-TCN-BiGRU	0.2216	0.1573	0.2027	0.9446	0.8858
		w/oTMFG	0.2498	0.1775	0.2303	0.9324	0.8550
	48 h	OVMD-TMFG-TCN-BiGRU	0.3379	0.2458	0.3332	0.8583	0.7348
		w/oTMFG	0.3902	0.2797	0.3427	0.8096	0.6464
C	1 h	OVMD-TMFG-TCN-BiGRU	0.0428	0.0302	0.0322	0.9985	0.9964
		w/oTMFG	0.0516	0.0322	0.0297	0.9978	0.9947
	6 h	OVMD-TMFG-TCN-BiGRU	0.0958	0.0647	0.0640	0.9918	0.9819
		w/oTMFG	0.1043	0.0647	0.0583	0.9896	0.9785
	12 h	OVMD-TMFG-TCN-BiGRU	0.1352	0.0849	0.0770	0.9818	0.9640
		w/oTMFG	0.1415	0.0896	0.0823	0.9801	0.9605
	24 h	OVMD-TMFG-TCN-BiGRU	0.2089	0.1384	0.1273	0.9566	0.9139
		w/oTMFG	0.2536	0.1693	0.1649	0.9365	0.8732
	48 h	OVMD-TMFG-TCN-BiGRU	0.3759	0.2469	0.2345	0.8476	0.7174
		w/oTMFG	0.4310	0.2684	0.2645	0.8115	0.6284

Table 12. Ablation results of TCN and BiGRU.

Station	Time Steps	Model	RMSE	MAE	MAPE	R	NSEC
A	1 h	OVMD-TMFG-TCN-BiGRU	0.0343	0.0228	0.0324	0.9982	0.9963
		w/oTCN	0.0451	0.0305	0.0405	0.9969	0.9936
		w/oBiGRU	0.0440	0.0298	0.0392	0.9970	0.9939
	6 h	OVMD-TMFG-TCN-BiGRU	0.0878	0.0574	0.0838	0.9879	0.9758
		w/oTCN	0.0967	0.0640	0.0905	0.9853	0.9706
		w/oBiGRU	0.0978	0.0674	0.0951	0.9876	0.9700
	12 h	OVMD-TMFG-TCN-BiGRU	0.1356	0.0901	0.1270	0.9722	0.9424
		w/oTCN	0.1597	0.1118	0.1658	0.9594	0.9200
		w/oBiGRU	0.1735	0.1248	0.1679	0.9594	0.9056
	24 h	OVMD-TMFG-TCN-BiGRU	0.2426	0.1538	0.1877	0.9146	0.8155
		w/oTCN	0.2666	0.1741	0.2122	0.8932	0.7773
		w/oBiGRU	0.2524	0.1616	0.2022	0.9034	0.8004
	48 h	OVMD-TMFG-TCN-BiGRU	0.3479	0.2462	0.3309	0.7956	0.6208
		w/oTCN	0.3838	0.2588	0.3159	0.7566	0.5385
		w/oBiGRU	0.3793	0.2765	0.4111	0.7445	0.5492
B	1 h	OVMD-TMFG-TCN-BiGRU	0.0376	0.0257	0.0301	0.9985	0.9967
		w/oBiGRU	0.0398	0.0290	0.0389	0.9985	0.9963
		w/oTCN	0.0508	0.0331	0.0365	0.9970	0.9940
	6 h	OVMD-TMFG-TCN-BiGRU	0.1166	0.0822	0.0934	0.9851	0.9684
		w/oTCN	0.1288	0.0878	0.1017	0.9825	0.9614
		w/oBiGRU	0.1405	0.0937	0.1048	0.9780	0.9541
	12 h	OVMD-TMFG-TCN-BiGRU	0.1651	0.1212	0.1454	0.9714	0.9366
		w/oTCN	0.1906	0.1335	0.1532	0.9591	0.9155
		w/oBiGRU	0.1907	0.1316	0.1562	0.9571	0.9154
	24 h	OVMD-TMFG-TCN-BiGRU	0.2216	0.1573	0.2027	0.9446	0.8858
		w/oTCN	0.2609	0.1889	0.2217	0.9267	0.8418
		w/oBiGRU	0.2650	0.1950	0.2334	0.9252	0.8368
	48 h	OVMD-TMFG-TCN-BiGRU	0.3379	0.2458	0.3332	0.8583	0.7348
		w/oTCN	0.3615	0.2644	0.3099	0.8630	0.6964
		w/oBiGRU	0.3853	0.2662	0.3277	0.8131	0.6551
C	1 h	OVMD-TMFG-TCN-BiGRU	0.0428	0.0302	0.0322	0.9985	0.9964
		w/oTCN	0.0469	0.0359	0.0418	0.9985	0.9957
		w/oBiGRU	0.0542	0.0346	0.0303	0.9971	0.9942
	6 h	OVMD-TMFG-TCN-BiGRU	0.0958	0.0647	0.0640	0.9918	0.9819
		w/oTCN	0.1013	0.0715	0.0756	0.9912	0.9798
		w/oBiGRU	0.1246	0.0822	0.0734	0.9854	0.9694
	12 h	OVMD-TMFG-TCN-BiGRU	0.1352	0.0849	0.0770	0.9818	0.9640
		w/oTCN	0.1433	0.0932	0.0868	0.9800	0.9595
		w/oBiGRU	0.1481	0.0946	0.0858	0.9784	0.9567
	24 h	OVMD-TMFG-TCN-BiGRU	0.2089	0.1384	0.1273	0.9566	0.9139
		w/oTCN	0.2235	0.1435	0.1319	0.9520	0.9015
		w/oBiGRU	0.2869	0.1943	0.1938	0.9184	0.8377
	48 h	OVMD-TMFG-TCN-BiGRU	0.3759	0.2469	0.2345	0.8476	0.7174
		w/oTCN	0.4177	0.2655	0.2616	0.8195	0.6511
		w/oBiGRU	0.4093	0.2670	0.2367	0.8284	0.6649

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Liu, Z.; Shi, G.; Lv, M.; Wu, T.; Wang, X. Multi-Horizon Significant Wave Height Forecasting with Multiscale Decomposition and Topological Feature Selection. J. Mar. Sci. Eng. 2026, 14, 1095. https://doi.org/10.3390/jmse14121095

AMA Style

Liu Z, Shi G, Lv M, Wu T, Wang X. Multi-Horizon Significant Wave Height Forecasting with Multiscale Decomposition and Topological Feature Selection. Journal of Marine Science and Engineering. 2026; 14(12):1095. https://doi.org/10.3390/jmse14121095

Chicago/Turabian Style

Liu, Zeping, Guoyou Shi, Mina Lv, Tao Wu, and Xinjian Wang. 2026. "Multi-Horizon Significant Wave Height Forecasting with Multiscale Decomposition and Topological Feature Selection" Journal of Marine Science and Engineering 14, no. 12: 1095. https://doi.org/10.3390/jmse14121095

APA Style

Liu, Z., Shi, G., Lv, M., Wu, T., & Wang, X. (2026). Multi-Horizon Significant Wave Height Forecasting with Multiscale Decomposition and Topological Feature Selection. Journal of Marine Science and Engineering, 14(12), 1095. https://doi.org/10.3390/jmse14121095

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Multi-Horizon Significant Wave Height Forecasting with Multiscale Decomposition and Topological Feature Selection

Abstract

1. Introduction

2. Methods

2.1. General Framework

2.2. Optimal Variational Mode Decomposition (OVMD)

2.3. Triangulated Maximally Filtered Graph (TMFG)

2.4. Cascaded Temporal Forecasting Architecture

2.4.1. Temporal Convolutional Network (TCN)

2.4.2. Bidirectional Gated Recurrent Unit (BiGRU)

3. Data Description and Model Evaluation Criteria

3.1. Data Description

3.2. Chronological Forecasting Protocol

3.3. Evaluation Criteria

4. Experimental Results

4.1. Experimental Parameter Settings

4.2. Result of OVMD Decomposition

4.3. Result of Feature Selection Using TMFG

4.4. Overall Performance Comparison

4.4.1. Short-Term Forecasting

4.4.2. Medium-Term Forecasting

4.4.3. Long-Term Forecasting

4.4.4. Extreme-Wave-Event and Storm Peak Forecasting Analysis

4.4.5. Statistical Significance Analysis

4.5. Ablation Experiments

4.5.1. Impact of OVMD on the Model

4.5.2. The Impact of TMFG Feature Selection on the Model

4.5.3. Impact of TCN and BiGRU on the Model

4.5.4. Sensitivity Analysis of Rolling Time Window Size

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI