Next Article in Journal
Chrs-Net: A Dual-Stream YOLO Network for Underwater RGB–Sonar Object Detection
Previous Article in Journal
Data Feedback Correction: A Method for Eliminating Heave Residuals in Shallow-Water Multibeam Bathymetry
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Multi-Horizon Significant Wave Height Forecasting with Multiscale Decomposition and Topological Feature Selection

1
Navigation College, Dalian Maritime University, Dalian 116026, China
2
Key Laboratory of Navigation Safety Guarantee of Liaoning Province, Navigation College, Dalian Maritime University, Dalian 116026, China
*
Author to whom correspondence should be addressed.
J. Mar. Sci. Eng. 2026, 14(12), 1095; https://doi.org/10.3390/jmse14121095 (registering DOI)
Submission received: 14 May 2026 / Revised: 6 June 2026 / Accepted: 10 June 2026 / Published: 13 June 2026
(This article belongs to the Section Ocean Engineering)

Abstract

Accurate multi-horizon Significant Wave Height (SWH) forecasting is vital for offshore safety and efficiency. Beyond scheduling maintenance windows, reliable lead-time predictions provide critical early warnings to protect personnel and high-value assets from hazardous high-wave conditions. However, the non-stationary and multi-scale nature of sea states poses challenges for consistent long-term accuracy. To address this challenge, we propose a robust three-stage framework for decomposition, feature selection, and multi-horizon forecasting. Specifically, Optimal Variational Mode Decomposition (OVMD) is adopted to construct multiscale and multi-view representations of nonlinear SWH sequences, while a Triangulated Maximally Filtered Graph (TMFG) constructs a sparse dependency network to select informative and non-redundant predictors from decomposed components and environmental variables. A hybrid prediction model then combines a Temporal Convolutional Network (TCN) for local multi-scale patterns with a Bidirectional Gated Recurrent Unit (BiGRU) for long-range dependencies. Experiments on real-world buoy observations show that the proposed approach improves accuracy and robustness over commonly used statistical and deep-learning baselines across short-, medium-, and long-term horizons. Ablation studies confirm that integrating modal decomposition with sparse feature selection enhances model robustness, offering reliable decision support for offshore window planning and high-wave condition monitoring.

1. Introduction

As the global energy sector accelerates its transition toward sustainable development, the ocean has become a strategic frontier for renewable energy exploitation [1]. The proportion of renewable energy in global electricity generation is projected to increase from 30% in 2023 to 35% by 2025 [2]. According to the European Commission’s updated strategy [3], the European Union aims to deploy at least 1 GW of ocean energy by 2030 and reach 40 GW by 2050. Among various forms of marine energy, offshore wind power technology and markets are relatively mature, whereas wave energy is considered a highly promising emerging field due to its higher energy density and predictability [4,5]. The safe and cost-effective operation of marine energy facilities relies heavily on accurate wave forecasting. In particular, installation and maintenance activities are constrained by sea-state windows, where decisions such as go/no-go scheduling require reliable predictions at different lead times. Since ocean waves represent a dominant component of environmental loading, developing wave-forecasting technologies with high accuracy and robustness is essential for reducing operational risk and downtime, and for improving overall energy utilization and project economics [6,7,8,9].
Wave forecasting has traditionally depended on physics-based numerical models. Third-generation spectral models, including WAM [10], WaveWatch III [11], and SWAN [12], represent the standard physical forecasting systems for both open-ocean and nearshore environments [13,14]. Although coupled models are capable of reproducing wave evolution during extreme events such as hurricanes [15,16], these physics-based approaches exhibit notable limitations. Specifically, their substantial computational requirements and dependence on high-precision wind-field inputs constrain their effectiveness for rapid-response or high-resolution site-specific forecasting.
In contrast, data-driven methods learn patterns directly from historical observations [17]. Early statistical models, such as AR [18] and ARIMA [19], improved calculation efficiency but were limited by assumptions of stationarity and linearity. To capture the inherent nonlinearity of ocean waves, traditional machine learning techniques were introduced. Approaches utilizing Support Vector Machines (SVM) [20], Artificial Neural Networks (ANN) [21], and ensemble learning [22] have demonstrated superior accuracy compared to statistical baselines. However, these traditional ML methods typically require complex manual feature engineering and often struggle to capture long-term temporal dependencies in complex wave sequences.
In recent years, a new generation of artificial intelligence technologies, represented by deep learning, has brought about a paradigm shift in wave forecasting through powerful automatic feature extraction and end-to-end learning capabilities. Fan et al. [23] utilized Long Short-Term Memory (LSTM) networks to predict significant wave heights across various marine environments, demonstrating that LSTM can achieve effective results under differing ocean conditions. Lou et al. [24] designed two wave height prediction models based on LSTM tailored for open sea and nearshore navigation conditions, both of which yielded satisfactory results. Li et al. [25] employed a Gated Recurrent Unit (GRU) network for 1-h and 3-h wave forecasts, with experiments showing that its performance surpassed benchmark models such as LSTM.
However, single models often struggle to capture the complex, multi-scale patterns inherent in wave data [26]. To address this, hybrid architectures integrating complementary deep learning techniques have become the state of the art. Zhang et al. [27] proposed a CNN-LSTM model that demonstrated superior long-term robustness compared to SVM and standalone LSTM baselines. Wang et al. [28] integrated LSTM and GRU with Kernel Density Estimation (KDE), effectively outperforming single models in both point and interval forecasting. Ahmed et al. [29] developed a CLSTM-BiGRU system, which was validated to exceed benchmark performance across multiple wave energy sites. Similar hybrid deep learning frameworks have also proven effective in diverse time-series domains, ranging from tidal [30] and wind power [31] to cryptocurrency forecasting [32]. In this context, integrating temporal convolutional networks (TCN) and Bidirectional GRU (BiGRU) offers a particularly promising solution. TCN excels at capturing local, multi-scale features via dilated convolutions [33], while BiGRU effectively models global dependencies by processing information bidirectionally [34]. This complementarity allows the hybrid model to capture both rapid fluctuations and long-term trends, yielding consistently strong performance from short- to long-horizon forecasts, and thus making it well suited for wave prediction.
Wave sequences, influenced by a combination of diverse forcing factors such as wind fields and tropical cyclones, exhibit inherent non-stationarity characterized by multi-scale quasi-periodic signals and sporadic extreme peaks [35,36]. This complexity impedes models from effectively distinguishing meaningful wave components from noise, which restricts the accuracy of medium- to long-term forecasting [37]. Signal decomposition has therefore evolved from a simple preprocessing step to a critical strategy for multiscale feature extraction and component decoupling. Zhou et al. [38] combined Empirical Mode Decomposition (EMD) with LSTM, achieving improved accuracy compared to standalone models in the Atlantic. Lou et al. [39] proposed an EMD-TCN framework and validated its effectiveness across eight buoy stations. Song et al. [40] constructed an EEMD-LSTM model tailored for deep-ocean environments, verifying its superiority over comparative models across 1- to 18-h forecast windows. Wang et al. [41] used the Improved Empirical Wavelet Transform (IEWT) to enhance LSTM performance across various horizons. More recently, Chen et al. [42] introduced a VMD-LSTM-TCN model, demonstrating that Variational Mode Decomposition (VMD) handles non-stationarity more effectively than EMD-based methods. Indeed, VMD is increasingly favored for its superior stability in diverse fields, such as wind speed [43], financial [44], power [45] and traffic forecasting [46]. However, the standard VMD algorithm relies on a manually preset number of modes K, which significantly affects decomposition quality. To reduce the sensitivity of VMD to a manually preset mode number, this study adopts an Optimal VMD (OVMD) strategy, in which the number of decomposed modes is determined on the training set according to reconstruction error and center-frequency separation. This strategy enables the construction of multiscale representations without using information from the testing set.
While incorporating multidimensional environmental variables enhances physical interpretability, it frequently results in the curse of dimensionality, which causes computational redundancy and reduced performance [47,48]. Consequently, feature selection is critical for optimizing model performance. Lu et al. [49] enhanced prediction efficiency by utilizing Pearson correlation to filter out weakly correlated variables. Li et al. [50] proposed a MIC-LSTM framework, demonstrating that the Maximal Information Coefficient (MIC) captures nonlinear dependencies more effectively than linear metrics. Similarly, Zhou et al. [51] optimized model inputs using Mutual Information and Spearman correlation. However, these traditional methods generally assess feature relevance independently or in pairs, and may therefore provide limited insight into the global dependency structure among input variables. To overcome this limitation, the Triangulated Maximally Filtered Graph (TMFG) is introduced. This graph-theoretic approach filters information by constructing a global dependency network. To the best of our knowledge, TMFG has not yet been applied to wave forecasting.
Despite recent progress, three limitations remain in decomposition-based SWH forecasting. First, most VMD/EMD-based models directly feed all decomposed components into temporal predictors, which may introduce redundant information and obscure the dependency structure among multiscale wave components and environmental variables. Second, commonly used feature selection methods mainly rely on pairwise relevance or linear projection, and therefore may fail to preserve global dependency relationships among predictors. Third, multi-horizon SWH forecasting requires simultaneous modeling of local short-term fluctuations and longer temporal dependencies, yet single temporal models often suffer from performance degradation as the forecast horizon increases. To address these issues, this study develops a framework integrating multiscale decomposition, feature selection, and forecasting. OVMD is adopted to construct multiscale representations of nonlinear SWH sequences from multiple frequency views, TMFG is introduced to select informative predictors by exploiting topological relationships among decomposed SWH components and environmental variables, and a TCN-BiGRU predictor is designed to capture local fluctuations and long-range dependencies within historical input windows. The main contributions of this study are summarized as follows:
  • OVMD decomposes SWH into intrinsic modes to capture both macroscopic trends and microscopic details, thereby constructing multi-view features.
  • A TMFG-based topological feature selection strategy is introduced to identify informative and non-redundant predictors while preserving global dependency structures among candidate variables.
  • A cascaded TCN-BiGRU predictor is designed to model local temporal fluctuations and contextual dependencies within the selected feature sequence, improving forecasting robustness across short-, medium-, and long-horizon SWH prediction.
This study evaluates the overall performance of the proposed model on multiple real-world buoy station datasets and compares it with mainstream benchmark models to verify its advantages in prediction accuracy, generalization ability, and robustness. The remainder of this paper is organized as follows: Section 2 elaborates on the architectural design and specific implementation process of the proposed OVMD-TMFG-TCN-BiGRU model. Section 3 introduces the experimental design, including the datasets used and evaluation metrics. Section 4 presents and discusses the experimental results. Section 5 summarizes the full text and provides an outlook on future research directions. For clarity and readability, the main abbreviations and acronyms used throughout this paper are summarized in Table 1.

2. Methods

2.1. General Framework

The overall architecture of the proposed hybrid forecasting model is systematically illustrated in Figure 1. As shown in the flowchart, the framework operates through a hierarchical pipeline comprising Input, Feature Engineering, Prediction, and Evaluation stages. The specific workflow is executed as follows:
  • Input & Decomposition: The process begins with historical SWH observations and environmental feature sequences within a fixed input window. In the Feature Engineering block, OVMD is applied to the available historical SWH segment to construct multiscale representations. This step is crucial for multi-view feature construction. It isolates intrinsic multi-scale IMFs from the raw signal, thereby constructing a comprehensive feature space that encompasses both macroscopic trends and microscopic frequency details.
  • Topological Feature Selection: After decomposition, the generated Intrinsic Mode Functions (IMFs) are integrated with environmental feature sequences. The TMFG algorithm is then employed during the Feature Engineering stage to address the high-dimensional feature space. This algorithm constructs a sparse dependency network and retains predictors directly connected to the target SWH node for the downstream model.
  • Cascaded Prediction: The topologically selected features are subsequently input into the Prediction block, which utilizes a cascaded TCN-BiGRU architecture. In this sequential design, the TCN layer first serves as a local feature extractor to capture high-frequency variations, which are then fed into the BiGRU layer to model long-term global dependencies. This hierarchical approach enables the model to learn progressively from local details to macroscopic trends.
  • Output & Evaluation: Finally, the model generates the predicted SWH at the specified forecasting horizon. In this study, the forecasting task is conducted separately for each lead time, including 1 h, 6 h, 12 h, 24 h, and 48 h. Therefore, for each buoy and each forecasting horizon, the model produces one predicted SWH sequence by sliding the input window through the testing period. The predicted sequence is then compared with the corresponding observed SWH sequence at the same lead time using RMSE, MAE, MAPE, R, and NSEC.
Figure 1. Flowchart of the OVMD-TMFG-TCN-BiGRU prediction model.
Figure 1. Flowchart of the OVMD-TMFG-TCN-BiGRU prediction model.
Jmse 14 01095 g001
The proposed framework integrates decomposition, topological feature selection, and temporal prediction into a cohesive pipeline. In this architecture, OVMD reduces the non-stationarity of raw data, TMFG minimizes the computational burden arising from high-dimensional redundancy, and the cascaded TCN-BiGRU structure ensures deep extraction of temporal features. This synergistic design enables the model to maintain robust performance and generalization ability across forecasting horizons ranging from short- to long-term.
To further clarify the decomposition, feature selection, and forecasting workflow, the process of the proposed OVMD-TMFG-TCN-BiGRU framework is summarized in Algorithm 1. The OVMD mode number and the TMFG-selected feature subset are determined using only the training set and then fixed for validation and testing.
Algorithm 1 The Process Flow of OVMD-TMFG-TCN-BiGRU
Require: SWH sequence S, environmental feature sequences Z, training set Dtrain, validation set Dval, testing set Dtest, candidate mode numbers K, forecasting horizons H = {1 h, 6 h, 12 h, 24 h, 48 h}
Ensure: Predicted SWH sequences Ŷh
// OVMD decomposition
1: K* = Select optimal mode number from K using Dtrain
2: U = OVMD(S, K*) using the fixed K*
3: Xcand = Concatenate(U, S, Z)
// TMFG-based feature selection
4: W = DependenceMatrix(Xcand from Dtrain)
5: GTMFG = TMFG(W)
6: F* = Select predictors directly connected to the target SWH node
7: Xselected = SelectFeatures(Xcand, F*)
8: Fix K* and F* for validation and testing
// TCN-BiGRU forecasting
9: for each h in H do
10:  Xtrain, h = ConstructSamples(Xselected from Dtrain, h)
11:  Mh = Train TCN-BiGRU(Xtrain, h)
12:  Xtest, h = ConstructSamples(Xselected from Dtest, h)
13:  Ŷh = Mh(Xtest, h)
14: end for

2.2. Optimal Variational Mode Decomposition (OVMD)

VMD is a method designed to decompose complex, nonlinear, and non-stationary signals into a series of Intrinsic Mode Functions (IMFs) [52]. Each mode oscillates around an adaptive central frequency, which facilitates tasks such as feature extraction, denoising, and prediction. The VMD decomposition can be formulated as the following constrained variational model:
min k = 1 K t ( t ) + j π t u k ( t ) e j ω k t s . t k = 1 K u k = f ( k = 1 K u k )
where f denotes the original signal, u k represents the mode function, and ω k is its corresponding center frequency. The formula is expressed as follows:
L { u k } , { ω k } , λ = α k = 1 K t δ ( t ) + j π t u k ( t ) e j ω k t 2 2 + f ( t ) k = 1 K u k ( t ) + λ ( t ) , f ( t ) k = 1 K u k ( t )
where α represents the quadratic penalty parameter, while λ denotes the Lagrangian multiplier. Subsequently, to solve the aforementioned equation, the Alternating Direction Method of Multipliers (ADMM) is employed to obtain the mode components u k and center frequencies ω k :
u ^ k n + 1 ( ω ) = f ^ ( ω ) i k u ^ i ( ω ) + λ ^ ( ω ) / 2 1 + 2 α ( ω ω k ) 2 u k
ω k n + 1 = 0 ω | u ^ k ( ω ) | 2 d ω 0 | u ^ k ( ω ) | 2 d ω
It can be observed from Equation (2) that the mode number K has a direct influence on the decomposition performance of VMD. An excessively small K may lead to insufficient decomposition and loss of important multiscale information, whereas an excessively large K may introduce redundant modes and increase the risk of over-decomposition. Therefore, this study adopts an optimal VMD strategy to determine a suitable mode number before constructing the multiscale feature set.
To avoid information leakage, the selection of K was performed only on the training set, while the validation and testing sets were not used in this process. Following the leakage-avoidance principle emphasized in decomposition-based forecasting studies, the decomposition procedure and parameter selection were restricted to the available historical data rather than the entire dataset. Candidate values of K were searched within a predefined range. For each candidate K, the decomposed IMFs were reconstructed to calculate the reconstruction error, and the center-frequency distribution of the decomposed modes was examined. The optimal K was selected as the smallest candidate value for which the reconstruction error no longer decreased substantially and the center frequencies remained stable and distinguishable. The reconstructed signal X ^ ( t ) can be expressed as follows:
X ^ ( t ) = k = 1 K t = 1 L u k ( t )
When the signal is sufficiently decomposed, further increasing K does not lead to significant changes in the reconstruction error, while the center frequencies of the newly added frequency components tend to stabilize [53]. Therefore, careful consideration of both factors allows determination of an optimal K value.

2.3. Triangulated Maximally Filtered Graph (TMFG)

The triangulated maximally filtered graph (TMFG) is a graph-based information filtering method that constructs a sparse dependency structure from high-dimensional feature relationships [54]. In this study, TMFG is used to identify direct dependency relationships between candidate predictors and the target SWH node. Compared with conventional feature selection methods based mainly on pairwise relevance, TMFG provides a sparse network representation that retains structurally important relationships among decomposed SWH components and environmental variables.
The construction process of TMFG follows an iterative greedy procedure. Initially, a dependency matrix is calculated among all candidate features, where each node represents a candidate predictor and each edge weight denotes the dependency strength between two variables. The initialization step involves selecting a clique with four nodes that has the highest total edge weight as the seed structure of the graph. Subsequently, at each iteration, the algorithm uses a gain function to evaluate the insertion of each remaining node into each existing triangular face. The node and triangular face with the maximum gain are selected. The selected node is then inserted into the current graph and connected to the three vertices of the selected triangle, as illustrated in Figure 2. This process continues until all candidate nodes have been embedded into the sparse triangulated graph. Specifically, the gain function can be expressed as follows:
S ( v h , t ) = W ( v h , v a ) + W ( v h , v b ) + W ( v h , v c )
where W ( v h , v a ) denotes the weight between the nodes. Furthermore, to ensure that the algorithm selects the node with the maximum gain for insertion during each iteration, thereby optimizing the graph structure, the TMFG algorithm maintains a cache of the maximum gain value and the corresponding optimal node for each triangle. The maximum gain value, MaxGain, is expressed as:
M a x G a i n = max v { v 1 , , v k } S ( v , t 1 ) , max v { v 1 , , v k } S ( v , t 2 ) , , max v { v 1 , , v k } S ( v , t m )
v 1 , v 2 , v k denotes the set of remaining uninserted nodes, and t 1 , t 2 , t m represents the set of all current triangles. The index corresponding to the maximum gain is given by:
B e s t V e r t e x = arg max v { v 1 , , v k } S ( v , t 1 ) , arg max v { v 1 , , v k } S ( v , t 2 ) , , arg max v { v 1 , , v k } S ( v , t m )
After the TMFG sparse graph is constructed, feature selection is performed according to the direct connectivity between each candidate predictor and the target SWH node. In the TMFG network, non-zero elements in the sparse inverse covariance matrix indicate direct conditional dependencies between variables, whereas zero elements indicate conditional independence. Therefore, the candidate predictors directly connected to the target SWH node are retained as the final input subset for the forecasting model. In this way, TMFG is not used merely as a visualization tool. Instead, it serves as a graph-based feature selector that preserves direct dependency relationships while removing redundant or weakly connected variables.
Based on the above selection rule, the novelty of using TMFG in this study lies in its graph-based feature-selection criterion. Conventional methods such as Pearson correlation, Mutual Information (MI), and Maximal Information Coefficient (MIC) generally rank candidate variables according to their individual relevance to the target SWH. Although MI and MIC can capture nonlinear dependence, these methods still mainly operate from a pairwise relevance perspective and do not explicitly model the dependency structure among candidate predictors. This is important for OVMD-based forecasting, because decomposed IMF components and environmental variables may contain redundant or overlapping information. In contrast, TMFG constructs a sparse dependency graph involving all candidate predictors and the target SWH node, and feature selection is performed according to direct topological connectivity with the target node. Thus, TMFG aims to preserve structurally informative predictors while removing redundant or weakly connected variables.
To avoid information leakage, the TMFG-based feature selection rule is fitted only on the training set, and the selected feature subset is then fixed for validation and testing.

2.4. Cascaded Temporal Forecasting Architecture

To address the temporal complexity inherent in wave height prediction, a hybrid deep learning model is constructed. This model integrates the strengths of the TCN and the BiGRU [55,56]. The TCN is effective at aggregating local short-term fluctuations, while the BiGRU captures long-term dependencies. By combining these two architectures, the approach seeks to leverage their complementary advantages to improve overall prediction accuracy. The hybrid model is trained using an end-to-end joint training paradigm, with the Adam optimizer employed for iterative parameter updates. Figure 3 presents the overall architecture of the model, which comprises a six-layer network structure. The specific functions and parameter settings for each layer are detailed as follows:
  • The first TCN layer: The first layer initially receives the pre-processed input and is constructed from TCN modules, the residual block structure of which is illustrated in Figure 3a. It operates by using local receptive fields to extract short- and mid-range temporal patterns from the input sequence. The dilated architecture increases the receptive field without altering the sequence length. This parallel and stable feature extractor addresses the long-term dependency problem commonly encountered in RNN-based models. In each residual block, the convolution kernel size is set to 3, with a dilation factor of 1, the number of convolution filters is set to 25, and the ReLU function is adopted as the activation.
  • The second TCN layer: This layer receives time-series feature maps of identical length produced by the first layer. Composed of TCN modules with distinct parameters and functions, it expands the receptive field and synthesizes compound patterns from the preceding features. In the residual block, the kernel size is set to 5, the dilation factor d is set to 2, the number of filters is set to 50, and the activation function is set to ReLU. The rationale for this parameter configuration enables the second layer to capture intermediate- to long-term dependencies through a larger dilation factor. Stacking these two layers facilitates hierarchical feature extraction across different temporal scales.
  • The first BiGRU layer: The third layer processes high-level temporal features spanning extended time windows, which are produced by the second layer. As shown in Figure 3b, this layer consists of forward and backward GRU components that model the sequence in both temporal directions. The outputs are concatenated along the last dimension to generate the input for the subsequent layer. This design introduces bidirectional context into the convolutional features extracted by the TCN, thereby enhancing the model’s ability to capture long-term dependencies while preserving temporal length. In this configuration, a BiGRU layer with 32 hidden units is employed, utilizing tanh as the state nonlinearity and returning the entire sequence as output.
  • The second BiGRU layer: Building upon the bidirectional context incorporated by the third layer, the fourth layer performs further gated transformations and temporal aggregation to refine more abstract and stable temporal semantics. Its parameters and activation functions are identical to those of the third layer.
  • Feature Concatenation Layer: The fifth layer routes the original input directly after the fourth layer to preserve low-level feature information. Its schematic diagram is illustrated in Figure 3c. The purpose of this design is to provide supplementary information to subsequent layers, preventing the loss of critical low-level features within the deep network while simultaneously facilitating smoother gradient propagation. The output of this layer consists of the projected low-level features, concatenated along the feature dimension with the high-level temporal features from the fourth layer, serving as the input to the subsequent fully connected layer.
  • Dense Layer: The sixth layer serves as the output layer of the neural network, functioning to map the network’s feature representations to the final prediction results. Its structure, as depicted in Figure 3d, is composed of an input layer, a hidden layer, and an output layer, with the number of neurons determined by the input dimensions.
Figure 3. TCN-BiGRU network structure.
Figure 3. TCN-BiGRU network structure.
Jmse 14 01095 g003

2.4.1. Temporal Convolutional Network (TCN)

The TCN is a fully convolutional one-dimensional architecture developed for sequence data, which incorporates the parallel computing capabilities of Convolutional Neural Network (CNN) into temporal modeling. In comparison to Recurrent Neural Network (RNN) and their variants, such as LSTM and GRU, TCN generally offers greater computational parallelism, more stable gradient propagation, and reduced memory consumption in long-sequence tasks. The core principle of TCN is to establish a sufficiently large receptive field using causal and dilated convolutions to capture long-range dependencies, while maintaining trainability and robustness through residual connections and regularization techniques.
In the temporal dimension, TCN enforces causality: the output yt at any time step t depends solely on the current and historical inputs x0, x1, …, xt, independent of future observations. In implementation, causal convolution is employed with appropriate zero-padding in the forward direction to prevent information leakage, thereby ensuring the model satisfies temporal constraints while maintaining an invariant sequence length. To effectively expand the receptive field without significantly increasing parameters and computational load, TCN introduces dilated convolutions with a dilation factor d in each convolutional layer, as shown in Figure 4. For a 1-D sequence x and a filter f : { 0 , , k 1 } of length k, the dilated convolution operation F on sequence element s is defined as follows:
F ( s ) = ( x * d f ) ( s ) = i = 0 k 1 f ( i ) x s d i
When d = 1, the operation degenerates into a standard convolution. By progressively increasing the dilation factors along the network depth, the TCN’s effective receptive field grows approximately exponentially with the number of layers, enabling coverage of long-term dependencies even at relatively shallow depths.
To address vanishing and exploding gradient issues during deep network training and to improve optimization stability, the TCN utilizes residual blocks as its primary structural components. As shown in Figure 3a, each residual block consists of two layers of dilated causal convolutions, each followed by a nonlinear activation function. Regularization methods, including weight normalization and Dropout, are implemented after each convolution to further stabilize training. The block incorporates a residual connection by summing the input with the output, using a 1 × 1 convolution to match channel dimensions when required. This architecture supports deep representation learning and facilitates effective gradient propagation across layers, which enables the network to achieve robust convergence and generalization, even with large receptive fields.

2.4.2. Bidirectional Gated Recurrent Unit (BiGRU)

The BiGRU is a sequence modeling architecture that enables bidirectional information flow by combining a forward GRU and a backward GRU. BiGRU captures forward and backward dependencies within the available historical input window. The backward GRU processes the same historical window in reverse order and does not access observations beyond the prediction origin. The complementary bidirectional mechanism substantially improves performance in sequence data modeling tasks, as shown in the structural diagram in Figure 3b.
The GRU unit serves as the core component of the BiGRU. In comparison to the LSTM, the GRU provides similar modeling capabilities while requiring fewer parameters due to its more efficient gating structure.
The GRU regulates information flow within a sequence by introducing two gating mechanisms: the Reset Gate and the Update Gate. This design effectively mitigates the vanishing gradient problem commonly observed in traditional RNNs. Figure 5 presents a schematic diagram of the GRU unit. The reset gate determines the extent to which information from the previous hidden state is discarded. The GRU unit serves as the core component of the BiGRU architecture. In comparison to the LSTM network, the GRU achieves similar modeling capabilities with fewer parameters due to its more streamlined gating structure. The calculation of the reset gate is defined as follows:
r t = σ ( W r [ h t 1 , x t ] + b r )
where x t denotes the input at the current time step, h t 1 represents the hidden state of the previous time step, and [ h t 1 , x t ] indicates the concatenation of these two vectors. W r and b r are the weight matrix and bias term for the reset gate, respectively, and σ refers to the Sigmoid activation function, which maps the output values to the interval between 0 and 1. The update gate controls the amount of information from the previous hidden state h t 1 that can be directly transmitted to the current hidden state h t . Its calculation formula is given by:
z t = σ ( W z [ h t 1 , x t ] + b z )
A value of the update gate closer to 1 indicates that a greater proportion of the information from the previous state is retained. By integrating these two gates, the GRU computes the current candidate hidden state h ~ t and the final hidden state h t :
h ~ t = tanh ( W h [ r t h t 1 , x t ] + b h )
h t = ( 1 z t ) h t 1 + z t h ~ t
The hidden state of the BiGRU can be expressed as:
h t = GRU x t , h t 1 h t = GRU x t , h t 1 h t = f W h t h t + W h t h t + b t
where h t and h t denote the forward and backward hidden representations within the available historical input window, respectively. W h t and W h t represent the forward and backward weights of the hidden layer, respectively.

3. Data Description and Model Evaluation Criteria

3.1. Data Description

Buoys are the primary instruments for collecting wave data, operating by monitoring their motion in water through integrated high-precision sensors that measure parameters such as wave height, period, and direction. The model developed in this study utilizes buoy data from three stations near the Gulf of Mexico—42012, 42036, and 42055—with data sourced from the National Data Buoy Center (https://www.ndbc.noaa.gov). These datasets, labeled A, B, and C, cover the five-year period from 2020 to 2024 and include variables such as significant wave height (SWH), Wind Direction (WDIR), Wind Speed (WSPD), Gust Speed (GST), Water Temperature (WTMP), Average Wave Period (APD), Dominant Wave Period (DPD), Mean Wave Direction (MWD), Air Temperature (ATMP), Atmospheric Pressure (PRES), and Dew Point Temperature (DEWP). The buoy records were resampled at an hourly interval. The station coordinates are reported with a precision of 0.01 degrees. Table 2 presents the statistical characteristics of the data. Figure 6 displays the geographical locations of the stations and the statistical features of selected data from each station. The three stations differ in both geographical location and meteorological conditions: stations A and B are nearshore buoys, while station C is offshore. The significant differences in their characteristic data support the use of these stations for validating the model’s broad applicability and generalization capability.

3.2. Chronological Forecasting Protocol

The datasets were divided chronologically into training, validation, and testing subsets to reflect practical forecasting conditions. Specifically, the records from 2020 to 2022 were used for model training, the records from 2023 were used for validation and hyperparameter tuning, and the records from 2024 were used for final testing. No random shuffling was applied. The normalization parameters were calculated from the training set only and then applied to the validation and testing sets. The mode number of OVMD and the TMFG feature selection rule were determined using only the training set, and the selected settings were then fixed for validation and testing. For each forecast horizon, the input sample was constructed from the historical input window before the prediction origin, and the testing set was used only for final evaluation. The forecasting task was conducted separately for each lead time, including 1 h, 6 h, 12 h, 24 h, and 48 h. For each buoy dataset and each forecasting horizon, the historical input window was moved chronologically through the testing period to generate one predicted SWH sequence. The accuracy at each lead time was then evaluated by comparing this predicted sequence with the corresponding observed SWH sequence at the same forecasting horizon.

3.3. Evaluation Criteria

To comprehensively evaluate the forecasting performance of the proposed model, five statistical metrics are employed: Mean Absolute Percentage Error (MAPE), Mean Absolute Error (MAE), Root Mean Square Error (RMSE), Correlation Coefficient (R), and Nash-Sutcliffe Efficiency coefficient (NSEC).
MAPE measures the relative deviation of predictions from observed values. In this study, MAPE is calculated in decimal form to assess the relative error proportion. MAE quantifies the average absolute difference between predicted and observed wave heights, serving as a robust metric for overall accuracy. RMSE, as the square root of the mean squared error, is sensitive to large errors and retains the same unit as the original data, which makes it particularly suitable for capturing the high volatility of wave data. R assesses the strength of the linear relationship between predicted and observed series. Values approaching 1 suggest that the model accurately captures the temporal trends of the waves. Finally, NSEC evaluates the model’s predictive performance relative to the observed variance. An NSEC value closer to 1 indicates higher model efficiency and reliability compared to using the mean of the observed data.
RMSE = 1 z j = 1 z w j w ^ j 2
MAE = 1 z j = 1 z w j w ^ j
MAPE = 1 z j = 1 z w j w ^ j w j
R = j = 1 z w j w avg w ^ j w ^ avg j = 1 z w j w avg 2 j = 1 z w ^ j w ^ avg 2
NSEC = 1 j = 1 z ( w j w ^ j ) 2 j = 1 z ( w j w ^ avg ) 2
where w j and w avg refer to the measured values and their mean, while w ^ j and w ^ avg represent the predicted values and their mean.

4. Experimental Results

This section presents experimental studies conducted on three datasets (A, B, and C) with distinct characteristics. First, the modeling results of the individual components are analyzed, including the experimental parameter settings, the parameter selection and decomposition results of OVMD, and the feature selection results obtained using TMFG. Subsequently, to verify the effectiveness of the proposed OVMD-TMFG-TCN-BiGRU forecasting method, several representative models are selected for comparative analysis, namely EMD-TCN, EEMD-LSTM, TCN, BiGRU, SVM, ANN and Transformer. In addition, a persistence baseline is included as a zero-training reference to evaluate the lead-time-dependent degradation of forecasting accuracy. At the same time, ablation experiments are conducted on the OVMD-TMFG-TCN-BiGRU model to validate the contribution of each component. To evaluate the model’s forecasting capability across short-term, medium-term, and long-term horizons, the experimental time horizons are set to 1, 6, 12, 24, and 48 h, and five evaluation metrics (RMSE, MAE, MAPE, R, and NSEC) are employed to comprehensively assess the predictive performance of each method.
All experiments were conducted on a computing system equipped with an AMD Ryzen 7 5800H processor (8 cores) and 12 GB of RAM. The software environment consisted of the Windows 11 operating system, utilizing PyTorch 2.3.1 as the deep learning framework, along with Python 3.12 and CUDA 11.0.

4.1. Experimental Parameter Settings

In this experiment, several comparative models were employed for verification. including EMD-TCN, EEMD-LSTM, TCN, BiGRU, SVM, ANN, Transformer, and a persistence baseline, along with the variants used in the ablation studies. The lookback window length was set to 24 h for all deep learning models, and the forecast horizons were set to 1, 6, 12, 24, and 48 h. The parameter settings for both the proposed model and the comparative models are presented in Table 3. The persistence baseline predicts the future SWH using the observed value at the prediction origin. Since this baseline has no trainable parameters, it is used only as a reference method and is not listed in Table 3. To ensure a fair comparison, hyperparameters were determined by referring to the associated literature and further tuned using the validation set. The testing set was used only for final evaluation. For parameters not explicitly specified, empirical settings were adopted in this study.

4.2. Result of OVMD Decomposition

To construct multiscale representations of SWH, OVMD was applied to datasets A, B, and C. The number of intrinsic mode functions, denoted as K, is a critical parameter that directly affects decomposition quality. A small K may lead to insufficient decomposition and the loss of important signal components, whereas an excessively large K may introduce redundant modes and reduce the stability of the decomposed components. Therefore, in this study, K was determined by jointly considering the reconstruction error and the center frequency distribution of the decomposed modes.
Candidate values of K were searched from 3 to 15. For each candidate K, the original SWH sequence was reconstructed from the decomposed IMF components, and the reconstruction error was calculated using RMSE. A lower RMSE indicates smaller information loss and better reconstruction quality, whereas a higher RMSE suggests potential signal distortion or insufficient decomposition. Figure 7, Figure 8 and Figure 9 show the RMSE and center frequency distributions of datasets A, B, and C under different K values. As K increases, the reconstruction error initially decreases rapidly, indicating improved decomposition accuracy. However, after a certain value of K, the decrease in RMSE becomes limited, suggesting that additional modes provide only marginal information and may introduce redundant decomposition.
The center frequency distributions were further examined to avoid over-decomposition. When K is too large, newly generated IMFs tend to occupy similar frequency bands and become close to adjacent components, which indicates spectral overlap and redundancy. In this study, the final K was selected near the inflection point of the reconstruction error curve while maintaining distinguishable center frequency separation. Based on this criterion, the optimal mode numbers were set to K = 10 for dataset A, K = 11 for dataset B, and K = 11 for dataset C.
The computational cost of the OVMD mode-number selection process was also evaluated under the same computing environment as the forecasting experiments. This selection process requires repeated decomposition of the training set under different candidate values of K. In our experiments, the total time required to determine the optimal K was controlled within 5 min, and the average execution time for a single OVMD decomposition was 15.59 s. Since the forecasting task in this study is conducted at an hourly interval, this computational cost does not hinder near-real-time hourly SWH forecasting. More importantly, the OVMD mode-number selection is performed only during the offline training stage. Once the optimal K is determined, it is fixed during validation, testing, and practical forecasting. Similarly, TMFG-based feature subset determination and TCN-BiGRU model training are also completed offline. Therefore, repeated OVMD parameter optimization, TMFG fitting, and model training are not required during online inference. The online forecasting stage only applies the fixed decomposition configuration, the fixed feature subset, and the trained prediction model to generate forecasts, which supports the computational feasibility of the proposed framework for practical hourly SWH forecasting.
Figure 10, Figure 11 and Figure 12 present the final OVMD decomposition results for datasets A, B, and C, respectively. Although the selected K values differ slightly among the three datasets, the resulting OVMD components consistently exhibit a hierarchical frequency structure. In all three datasets, the first several IMFs mainly correspond to high-frequency components, reflecting short-term fluctuations in the SWH series. The intermediate IMFs capture medium-frequency oscillations and describe wave variability at intermediate temporal scales. The final oscillatory modes before the trend component represent low-frequency variations with slower temporal changes. Specifically, IMF8–IMF9 for dataset A and IMF8–IMF10 for datasets B and C can be regarded as low-frequency components, while IMF10 for dataset A and IMF11 for datasets B and C represent the long-term trend components. Comparative analysis across the three datasets indicates that, despite differences in the selected K, the IMFs maintain a coherent multiscale structure. This suggests that OVMD can consistently decompose SWH records into components with distinct temporal characteristics, thereby providing a multiscale feature basis for subsequent TMFG-based feature selection and forecasting. Meanwhile, differences in amplitude and local fluctuation patterns among the three datasets may be related to their geographical locations, water depths, and regional marine environments.
From the perspective of wave dynamics, the hierarchical IMF structure can be interpreted as a multiscale representation of SWH variability. The high-frequency IMFs mainly describe rapid short-term fluctuations, which may be associated with local wind forcing, short-period wind waves, and measurement-scale variability. The intermediate-frequency IMFs reflect more persistent oscillatory variations and may be related to the combined effects of evolving wind-sea conditions and swell modulation. The low-frequency IMFs and trend components represent slowly varying background sea-state evolution, which may be influenced by larger-scale meteorological forcing, longer-period wave systems, and regional marine conditions. Therefore, although the IMFs should not be regarded as one-to-one physical wave modes, they provide temporal-scale information that is consistent with the multiscale nature of wave dynamics and can support subsequent feature selection and forecasting.

4.3. Result of Feature Selection Using TMFG

The candidate feature set for TMFG-based selection was constructed by integrating the auxiliary buoy variables described in Section 3.1, the multiscale IMF components generated by OVMD, and the historical SWH sequence. Following the TMFG-based selection strategy described in Section 2.3, a sparse dependency network was constructed for each dataset using only the training set. In this network, candidate predictors directly connected to the target SWH node were regarded as informative features and retained as the final input subset for the forecasting model. Figure 13 illustrates the correlation matrix for dataset A, the corresponding TMFG network structure, and the feature screening results. In the TMFG network, the connected nodes represent direct dependency relationships retained by the sparse filtering process. Based on this target-node adjacency criterion, the optimal feature subsets for datasets A, B, and C were identified, as shown in Table 4.
The selected environmental variables are also physically consistent with SWH evolution. WSPD and GST represent local wind forcing and wind fluctuation intensity, which directly affect wind-wave generation and short-term wave growth. DPD and APD describe characteristic wave periods and provide information on wave energy distribution and sea-state maturity. Their repeated selection across the three datasets indicates that wave-period information is important for distinguishing locally generated wind waves from more developed sea states. WTMP and ATMP are selected at some stations, suggesting that local air–sea thermal conditions may provide supplementary information on regional meteorological and oceanic variability. Overall, the TMFG-selected variables are consistent with the physical factors that influence SWH evolution, while the selected IMF components provide multiscale descriptions of historical SWH dynamics.

4.4. Overall Performance Comparison

The overall predictive performance of the OVMD-TMFG-TCN-BiGRU model was evaluated across different forecast horizons and datasets. In the experiment, representative time steps were selected to cover varying prediction horizons, specifically including short-term forecasting represented by 1 h and 6 h, medium-term forecasting centered on 12 h and 24 h, and long-term forecasting benchmarked at 48 h. Using TCN, BiGRU, SVM, ANN, EEMD-LSTM, EMD-TCN, Transformer and the persistence baseline as benchmark methods, the experimental results on datasets A, B, and C are presented in Table 5, Table 6 and Table 7. The results indicate that the OVMD-TMFG-TCN-BiGRU outperforms all benchmark methods across all evaluation metrics and time scales.

4.4.1. Short-Term Forecasting

Table 5 summarizes the prediction results for the 1 h and 6 h horizons across stations A, B, and C, and Figure 14 visually illustrates the evaluation results of the nine models, demonstrating that the OVMD-TMFG-TCN-BiGRU model achieved optimal results across all evaluation metrics for both time steps. In addition, Table 5 shows that, among the compared models, only the proposed model, EMD-TCN, and EEMD-LSTM consistently outperform the persistence reference in short term forecasting. This indicates that simple sea state persistence remains a strong reference for very short lead times, while decomposition-assisted models can better extract multiscale wave fluctuation information and therefore provide additional predictive skill beyond persistence. Specifically, taking station A as an example, compared to other baseline models at the 1 h horizon, the proposed method achieved average reductions in RMSE, MAE, and MAPE of 64.9%, 67.0%, and 66.2%, respectively, while R and NSEC increased by an average of 0.0173 and 0.0414, respectively. At the 6 h horizon, although the predictive performance of all models declined, the OVMD-TMFG-TCN-BiGRU consistently outperformed other benchmark models, with the sole exception occurring at station C, where its MAPE was slightly higher than that of EEMD-LSTM by 0.001. Furthermore, it is observed that compared to single models such as TCN and BiGRU, hybrid models like OVMD-TMFG-TCN-BiGRU exhibit a smaller magnitude of performance degradation, demonstrating the superior multi-step forecasting capability of hybrid architectures.
Figure 15 illustrates the observed and predicted values for the OVMD-TMFG-TCN-BiGRU model at 1 h and 6 h forecast intervals for stations A, B, and C. The results show that all models closely align with the observed curves for the 1 h prediction, indicating satisfactory predictive performance at this time step. However, when the forecasting horizon extends to 6 h, all models experience a decline in performance, as evidenced by increased fluctuations in the prediction curves. These fluctuations highlight the challenges of multi-step forecasting in highly dynamic environments with rapid and unpredictable variations. Single models, such as TCN and BiGRU, display greater fluctuation amplitudes, whereas hybrid models like EMD-TCN maintain lower overall errors. Notably, the OVMD-TMFG-TCN-BiGRU model achieves the closest alignment with the measured curves.
To further assess model stability, scatter plots were employed to visually evaluate the prediction performance of the OVMD-TMFG-TCN-BiGRU model and other comparative models, as illustrated in Figure 16. In these scatter plots, the x-axis denotes the measured sample values, while the y-axis indicates the predicted values generated by the models. Ideally, data points should align closely along the diagonal line y = x, which would indicate perfect agreement between predicted and measured values. The figure demonstrates that, for the 1 h prediction, data points for all models cluster near the diagonal, suggesting that each model effectively captures the primary trends in single-step forecasting. As the prediction horizon extends to 6 h, the scatter points increasingly deviate from the diagonal. Nevertheless, the OVMD-TMFG-TCN-BiGRU model’s predicted values exhibit greater consistency with the measured values than those of the other models.

4.4.2. Medium-Term Forecasting

Table 6 presents the prediction results of the nine models at the three stations for the 12 h and 24 h horizons, while Figure 17 summarizes the evaluation metrics for learning-based models. The RMSE, MAE, and MAPE for medium-term forecasting are significantly higher than those for short-term forecasting, and performance at 24 h further deteriorates compared to 12 h. This trend indicates a decline in model accuracy as the prediction horizon increases. Nevertheless, the OVMD-TMFG-TCN-BiGRU model consistently outperforms the other models across all metrics, exhibiting less performance degradation as the forecast horizon increases. For example, at station C, the model’s RMSE for the 12 h and 24 h forecasts was reduced by an average of 56.7% and 55.9%, respectively, compared to other models, while the NSEC increased by 0.259 and 0.452, respectively. The NSEC for this model decreased by only 0.0501 from 12 h to 24 h, which is substantially less than the average decrease of 0.2437 observed in other models. These results demonstrate that the OVMD-TMFG-TCN-BiGRU model maintains superior stability as the time step increases.
Figure 18 illustrates the comparison curves between predicted and observed values for the 12 h and 24 h forecasts at stations A, B, and C. In regions with significant wave heights, the OVMD-TMFG-TCN-BiGRU model demonstrates larger errors relative to short-term forecasts, yet these errors remain within an acceptable range. In contrast, although the prediction curves of EMD-TCN and EEMD-LSTM generally follow the overall trend of the measured values, they display substantial deviations at specific points. The OVMD-TMFG-TCN-BiGRU curve is smoother and more accurately represents actual wave height conditions. Additionally, the prediction curves of the LSTM, TCN, SVM, ANN, Transformer and the persistence baseline show significant deviations from the measured value curves.
To further investigate the predictive performance of the models during medium-term forecasting, Figure 19 presents the corresponding scatter plots. Compared to Figure 17, the dispersion trend in the plots is markedly enhanced, indicating that predictive performance degrades as the forecast horizon extends. However, in contrast to other algorithms, the variation in the dispersion of predicted values for the OVMD-TMFG-TCN-BiGRU model remains relatively small. As wave height increases, the scatter points for most comparison models distribute below the diagonal line, revealing a tendency to underestimate peak values. In contrast, the proposed model shows smaller dispersion than the comparative models under relatively high sea states, although prediction uncertainty still increases with wave height.

4.4.3. Long-Term Forecasting

Table 7 and Figure 20 present the prediction results for the 48-h horizon. As the prediction horizon extends to 48 h, all models exhibit a substantial decline in predictive performance. The NSEC for the TCN, BiGRU, SVM, ANN, Transformer and the persistence baseline falls below 0.1 at most observation stations, indicating that these models have lost explanatory power and perform only slightly better than a mean or persistence baseline. The persistence reference performs reasonably at short horizons because recent SWH observations contain strong continuity. However, its performance deteriorates rapidly as the forecasting horizon increases, indicating that medium- and long-term SWH forecasting cannot rely solely on sea-state persistence. In contrast, although the OVMD-TMFG-TCN-BiGRU model’s performance decreases compared to short- and medium-term forecasts, it continues to demonstrate the highest predictive accuracy. For example, at station B, the RMSE, MAE, and MAPE for the OVMD-TMFG-TCN-BiGRU model are 0.3379, 0.2458, and 0.3332, respectively, which are considerably lower than the average values of 0.5970, 0.4234, and 0.5583 for the other models. Additionally, the R and NSEC values for this model reach 0.8583 and 0.7348, respectively, exceeding the corresponding averages of 0.4099 and 0.1317 achieved by the comparative models. These results indicate that the proposed model effectively maintains connections with historical data and captures time-series relationships, even in long-term forecasting.
Figure 21 compares the performance curves of the seven models for the 48 h prediction across the various stations. Although the performance of the OVMD-TMFG-TCN-BiGRU model exhibits a slight decline compared to short- and medium-term forecasts, its prediction curve maintains the highest degree of fit with the measured data. It is worth noting that the model’s performance in capturing high wave heights shows some attenuation. In contrast, the other comparative models exhibit significantly more pronounced deviations between predicted and measured values.
Figure 22 presents the correlation analysis between the measured and predicted values. Compared to Figure 16 and Figure 19, the dispersion of data points for all models is significantly increased. Simultaneously, as wave height increases in this forecast, the data points for the OVMD-TMFG-TCN-BiGRU model exhibit a notable trend of distributing below the diagonal line, which is particularly evident at higher wave heights. This reflects a challenge in predicting high wave heights, yet the model consistently achieves the minimum dispersion and deviation relative to its counterparts.
The degradation at longer forecasting horizons can be attributed to several factors. First, SWH has strong short-term persistence, but this persistence weakens as the lead time increases, which is also reflected by the rapid deterioration of the persistence baseline. Second, medium- and long-term SWH evolution is increasingly affected by future wind forcing, storm development, swell propagation, and remote wave systems, which cannot be fully inferred from a fixed historical input window. Third, nonlinear interactions among wind sea, swell, and local bathymetric effects become more difficult to represent as the forecasting horizon extends. Therefore, all models exhibit reduced accuracy at the 48 h horizon.

4.4.4. Extreme-Wave-Event and Storm Peak Forecasting Analysis

To further evaluate the robustness of the proposed model under severe sea-state conditions, a storm-peak evaluation and an extreme-wave-event case study were conducted on dataset B. Dataset B was selected because it contains the most pronounced observed SWH peak during the independent testing year among the three datasets. The storm-peak samples were identified according to the observed SWH rather than the predicted values. Specifically, the 95th percentile of observed SWH in the testing set of dataset B was used as the threshold, and samples exceeding this threshold were defined as storm-peak samples. This definition allows the evaluation to focus on high-wave conditions that are most relevant to practical offshore applications.
To keep the storm-peak analysis concise while covering different types of forecasting methods, three representative benchmark models were selected for comparison. EMD-TCN was selected as a decomposition-based hybrid deep learning baseline, TCN was selected as a single deep learning baseline, and the persistence baseline was included as a training-free reference. These benchmark models showed relatively strong performance within their respective methodological categories in the overall comparison and represent different levels of model complexity.
Table 8 summarizes the forecasting accuracy of the proposed model and the representative benchmark models on the storm-peak samples of dataset B. RMSE and MAE were used to evaluate the storm-peak prediction performance. As shown in Table 8, the proposed OVMD-TMFG-TCN-BiGRU model achieves the lowest RMSE and MAE across all forecasting horizons. At the 1 h horizon, the proposed model obtains an RMSE of 0.0971 and an MAE of 0.0699, which are lower than those of EMD-TCN, TCN, and the persistence baseline. As the forecasting horizon increases, the storm-peak prediction errors of all models increase, indicating that high-wave forecasting becomes more difficult at longer lead times. Nevertheless, the proposed model maintains a clear advantage. At the 48 h horizon, its RMSE and MAE are 0.7775 and 0.6156, respectively, which remain lower than those of the representative benchmark models. These results indicate that the proposed framework provides more reliable storm-peak forecasting under severe sea-state conditions.
In addition to the storm-peak subset evaluation, a representative extreme-wave-event case was selected from dataset B. The selected event corresponds to the continuous high-wave process containing the maximum observed SWH in the testing period. A time window around the observed peak was used for visualization. Figure 23 compares the observed SWH with the 12 h predictions of the proposed model and the representative benchmark models during this event. The 12 h horizon was selected because it represents a challenging medium-term forecasting condition while retaining practical relevance for offshore operation planning. As shown in Figure 23, the observed SWH increases rapidly before the storm peak and then decreases after the peak. The proposed OVMD-TMFG-TCN-BiGRU model follows this growth and decay process more closely than the benchmark models. Although the maximum peak magnitude is still underestimated, the proposed model produces a smaller deviation near the observed storm peak than EMD-TCN, TCN, and the persistence baseline. In contrast, TCN substantially underestimates the high-wave process, while the persistence baseline shows an evident lag and remains high after the observed peak. These results indicate that the proposed framework provides more robust forecasting performance under severe sea-state conditions, although accurate peak prediction remains challenging.
Nevertheless, extreme-wave forecasting remains challenging. Storm peaks are often associated with rapidly evolving wind forcing, swell propagation, and nonlinear wave growth processes, which cannot be fully inferred from historical buoy observations alone. Therefore, although the proposed model improves storm-peak prediction accuracy, future work may further incorporate numerical weather prediction products or wave model outputs to improve forecasting reliability under severe sea states.

4.4.5. Statistical Significance Analysis

To assess whether the improvement of the proposed model over the benchmark models is statistically significant, the Wilcoxon signed-rank test was employed to compare paired absolute error sequences. For each forecasting horizon, the absolute error sequences from the three stations were pooled. The absolute error of the proposed model was defined as the absolute difference between the predicted and observed SWH values, and the same definition was applied to each benchmark model. The null hypothesis assumes that there is no significant difference between the paired absolute error sequences, whereas the one-sided alternative hypothesis assumes that the proposed model yields smaller absolute prediction errors than the corresponding benchmark model. A significance level of 0.05 was adopted.
Table 9 summarizes the Wilcoxon signed-rank test results between the proposed model and the benchmark models across the five forecasting horizons. For each benchmark model, the table reports the mean Z-value, the Z-value range, the p-value summary, and the number of horizons with statistically significant differences. All benchmark comparisons produce negative Z-values at all forecasting horizons, indicating that the absolute prediction errors of the proposed model are generally smaller than those of the benchmark models. Moreover, all p-values are lower than 0.001, and statistically significant differences are observed for all five horizons for each benchmark model. These results remain significant after considering multiple comparisons, indicating that the improvements of the proposed model are statistically robust rather than being caused by random variations in the testing samples.

4.5. Ablation Experiments

To validate the effectiveness of the proposed method, this study constructed four comparative models designed to exclude specific key components. Through systematic ablation experiments, these models enable the assessment of the independent contributions and functional mechanisms of each component within the overall architecture. Furthermore, all models were comprehensively tested across three datasets to quantify further the impact of individual modules within the OVMD-TMFG-TCN-BiGRU architecture on the final predictive performance.
  • w/o OVMD (without OVMD): This model excludes the use of OVMD for signal decomposition in constructing feature subsets, serving to evaluate the impact of OVMD on predictive performance.
  • w/o TMFG (without TMFG): This model omits TMFG for feature selection, aiming to investigate the effect of the absence of TMFG-based feature selection on predictive performance.
  • w/o TCN (without TCN): This model excludes TCN from the feature extraction process. It investigates the impact of removing TCN by comparing performance changes, thereby analyzing the role of TCN in extracting local and multiscale temporal features and enhancing prediction accuracy.
  • w/o BiGRU (without BiGRU): This model does not utilize BiGRU to model global dependencies. By removing the BiGRU layer and analyzing performance shifts, this setup evaluates the specific contribution of BiGRU in capturing bidirectional long-range temporal dependencies and improving prediction precision.

4.5.1. Impact of OVMD on the Model

Figure 24 presents a bar chart comparing the performance of models that utilize different feature extraction methods to assess the enhancement effect of OVMD. Comparison between the OVMD-TMFG-TCN-BiGRU and the model without OVMD demonstrates that omitting OVMD significantly reduces performance across all three datasets, with the most pronounced impact observed in the 24 h and 48 h forecasts. As shown in Table 10, for the 24 h forecast, the proposed model achieved reductions in RMSE, MAE, and MAPE of 0.283, 0.200, and 0.255, respectively, and increases in R and NSEC of 0.313 and 0.497, respectively, compared to the model without OVMD. These results indicate that OVMD decomposition substantially enhances model performance, especially for medium- and long-term forecasting.
To visually demonstrate the enhancement of model fitting capability by OVMD, Figure 25, Figure 26 and Figure 27 present comparative line charts. Results indicate that both the OVMD-TMFG-TCN-BiGRU model and the w/o OVMD model exhibit good fitting to the true values in short-term forecasting. However, distinct differences emerge in the medium- and long-term forecasts, where the model lacking OVMD deviates significantly from the measured curves, while the proposed model maintains a superior fit.

4.5.2. The Impact of TMFG Feature Selection on the Model

To assess the impact of TMFG feature selection, the OVMD-TMFG-TCN-BiGRU model was compared with its counterpart lacking TMFG, as shown in Table 11 and Figure 28. The results indicate that the inclusion of TMFG consistently enhances predictive performance across all time steps. In short- and medium-term forecasting, although performance metrics are similar, the proposed model demonstrates consistently higher accuracy. In long-term forecasting, the OVMD-TMFG-TCN-BiGRU model achieves notable NSEC improvements of 7.8%, 12.0%, and 14.2% on datasets A, B, and C, respectively. These results underscore the effectiveness of TMFG in filtering redundant noise and preserving essential information, thereby improving the model’s robustness and generalization, especially for long-term forecasting.
To further clarify the difference between TMFG and conventional feature selection methods, Pearson correlation, mutual information (MI), and maximal information coefficient (MIC) were introduced for comparison on the three datasets. All comparison methods used the same candidate feature set generated by OVMD and the same TCN-BiGRU forecasting model. The only difference among these methods was the feature selection strategy. To ensure a fair comparison, Pearson correlation, MI, and MIC retained the same number of features as TMFG for each dataset. Figure 29 presents the average RMSE and NSEC values of different feature selection methods over datasets A, B, and C across all forecasting horizons.
As shown in Figure 29, TMFG achieves better average forecasting performance at most forecasting horizons, with lower RMSE and higher NSEC. It should be noted that at the 6 h horizon, the average performance of TMFG is slightly lower than that of MIC, indicating that feature selection based on nonlinear relevance can still be competitive in some short- and medium-term forecasting cases. Overall, TMFG provides more stable performance across most horizons. When multiscale IMF components and environmental variables are jointly used for forecasting, TMFG can retain variables that have direct structural connections with the target SWH node through a sparse topological dependency network. In contrast, Pearson correlation, MI, and MIC mainly rank variables according to individual relevance or dependency strength. Therefore, TMFG is more effective in reducing redundant features while preserving structurally informative feature combinations, which further supports its role in improving the overall stability of the forecasting model.

4.5.3. Impact of TCN and BiGRU on the Model

To verify the effectiveness and superiority of the proposed method, a systematic comparison was conducted across three datasets between the complete OVMD-TMFG-TCN-BiGRU model and two ablated variants: the w/o TCN and w/o BiGRU models. Table 12 and Figure 30 present the primary performance metrics for each model across various prediction horizons. In short-term forecasting, the Mean Absolute Error of the OVMD-TMFG-TCN-BiGRU model is significantly lower than that of the comparative models. For instance, on dataset A, the Mean Absolute Error is reduced by approximately 18.5% compared to the w/o TCN and w/o BiGRU models, with similar improvements observed on datasets B and C. In medium-term forecasting, all three models generally maintain high prediction accuracy. However, the OVMD-TMFG-TCN-BiGRU consistently achieves the best results across all tasks. This advantage is particularly evident in the Mean Absolute Error and Mean Absolute Percentage Error metrics, indicating superior stability and robustness. As the prediction horizon extends to long-term forecasting, all models experience some performance degradation, yet the complete model maintains higher accuracy than the ablated variants. This superiority is especially apparent in the NSEC metric. For example, on dataset A, the NSEC improved by approximately 14.2%. Collectively, the ablation experiments demonstrate that the hybrid model, which combines TCN and BiGRU, effectively leverages their respective strengths in local temporal feature extraction and long-range dependency modeling. This integration enhances prediction accuracy and generalization across different time scales, thereby supporting the rationale and necessity of the OVMD-TMFG-TCN-BiGRU model’s structural design.

4.5.4. Sensitivity Analysis of Rolling Time Window Size

To justify the selection of the 24 h lookback window, a sensitivity analysis was conducted using input window lengths of 12 h, 24 h, 48 h, and 72 h. All other model settings were kept unchanged. Figure 31 shows the RMSE and NSEC values under different input window lengths for the three datasets and five forecasting horizons. Overall, the forecasting performance varies only slightly when the input window length changes from 12 h to 72 h, indicating that the proposed framework is relatively robust to the choice of lookback window length.
The 24 h window generally provides stable performance across different datasets and horizons. Although a longer input window may contain more historical information, it can also introduce redundant or less relevant temporal patterns and increase computational cost. In contrast, the 12 h window may be insufficient to capture longer sea-state evolution for medium- and long-horizon forecasting. Therefore, the 24 h lookback window was selected as a balanced setting between forecasting accuracy, temporal information coverage, and computational efficiency.

5. Conclusions

Addressing the need for high-precision multi-horizon significant wave height forecasting to support offshore operational decisions, including maintenance and installation window planning and high-wave monitoring, this study proposes a hybrid deep forecasting framework that integrates Optimal Variational Mode Decomposition (OVMD), Triangulated Maximally Filtered Graph (TMFG)-based feature selection, and a cascaded temporal predictor combining a Temporal Convolutional Network (TCN) with a Bidirectional Gated Recurrent Unit (BiGRU). Systematic experiments and ablation analyses on multiple real-world buoy datasets lead to the following conclusions:
  • The proposed OVMD-TMFG-TCN-BiGRU framework effectively handles nonlinear and non-stationary SWH sequences. OVMD is used to decompose the original SWH series into components with different temporal scales, thereby constructing multi-view representations from high-frequency fluctuations to low-frequency trends. TMFG constructs a sparse dependency network to select informative and non-redundant predictors from decomposed components and environmental variables. In the forecasting module, TCN extracts local temporal patterns, while BiGRU captures forward and backward dependencies within the available historical input window. Their combination improves the representation of temporal information across different forecast horizons.
  • Across three buoy stations and forecasting lead times from 1 h to 48 h, the proposed method achieves consistently better accuracy and robustness than representative statistical, machine-learning, and deep-learning baselines under multiple evaluation metrics. The improvements are more pronounced for medium- and long-horizon forecasts, and the proposed model exhibits smaller performance degradation as the lead time increases, indicating stronger stability across varying sea states and time scales. In addition, the storm-peak evaluation and extreme-wave-event case study further demonstrate that the proposed model maintains better forecasting accuracy under severe sea-state conditions. Although the peak magnitude is still underestimated during rapidly evolving extreme-wave events, the proposed framework captures the growth and decay process more effectively than the representative benchmark models.
  • Ablation results confirm that OVMD, TMFG, and the TCN-BiGRU predictor provide essential and complementary contributions. Removing OVMD leads to clear error increases for medium- to long-horizon forecasts, highlighting the role of multi-scale decomposition in extracting informative structures from non-stationary wave records. The exclusion of TMFG has only a limited effect at short forecasting horizons, whereas it leads to clear performance degradation at longer lead times. This finding indicates that feature selection based on dependency networks is beneficial for identifying effective feature combinations. Removing either TCN or BiGRU increases errors and dispersion, indicating that jointly modeling local multi-scale patterns and long-range dependencies is critical for reliable multi-horizon forecasting.
Overall, the proposed framework shows promising applicability for significant wave height forecasting and can provide decision support for offshore operations. Several limitations remain. First, although the three buoy stations used in this study have different water depths and nearshore/offshore characteristics, they are all located within the Gulf of Mexico. Therefore, the present validation mainly demonstrates cross-station robustness within the same broad oceanic region, while the cross-region generalization capability of the proposed framework remains to be further verified. Second, the model is mainly data-driven and relies on historical input windows; therefore, future wind forcing, storm evolution, swell propagation, and other external wave-generation processes are not explicitly represented, which may limit its accuracy at longer forecasting horizons. Third, this study focuses on deterministic point prediction, while uncertainty information is also important for operational decision making under extreme sea states. Future work will therefore focus on evaluating the proposed framework using buoy datasets from different oceanic regions with distinct wave climates and environmental forcing conditions, incorporating external forecast products and physically informed constraints to improve medium- and long-horizon forecasting, and extending the framework to probabilistic forecasting and uncertainty quantification.

Author Contributions

Z.L.: investigation, writing—original manuscript preparation, software.; G.S.: supervision, methodology.; M.L.: visualization.; T.W.: data curation.; X.W.: funding acquisition. All authors have read and agreed to the published version of the manuscript.

Funding

This work was financially supported by the National Natural Science Foundation of China (NSFC) under Grant No. 52571403 (General Program) and Grant No. 52101399 (Young Scientists Fund).

Data Availability Statement

Data will be made available on request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Astariz, S.; Iglesias, G. The economics of wave energy: A review. Renew. Sustain. Energy Rev. 2015, 45, 397–408. [Google Scholar] [CrossRef]
  2. International Energy Agency. Renewables 2023: Analysis and Forecasts to 2028; IEA: Paris, France, 2024. [Google Scholar]
  3. European Commission. EU Strategy on Offshore Renewable Energy (Updated Regional Goals, December 2024). Directorate-General for Energy; European Commission: Brussels, Belgium, 2024. [Google Scholar]
  4. Aderinto, T.; Li, H. Ocean wave energy converters: Status and challenges. Energies 2018, 11, 1250. [Google Scholar] [CrossRef]
  5. Falcão, A.F.D.O. Wave energy utilization: A review of the technologies. Renew. Sustain. Energy Rev. 2010, 14, 899–918. [Google Scholar] [CrossRef]
  6. Sá, M.D.M.; Da Fonseca, F.X.C.; Amaral, L.; Castro, R. Optimising O&M scheduling in offshore wind farms considering weather forecast uncertainty and wake losses. Ocean Eng. 2024, 301, 117518. [Google Scholar] [CrossRef]
  7. Taylor, J.W.; Jeon, J. Probabilistic forecasting of wave height for offshore wind turbine maintenance. Eur. J. Oper. Res. 2018, 267, 877–890. [Google Scholar] [CrossRef]
  8. Wu, M.; Stefanakos, C.; Gao, Z.; Haver, S. Prediction of short-term wind and wave conditions for marine operations using a multi-step-ahead decomposition-ANFIS model and quantification of its uncertainty. Ocean Eng. 2019, 188, 106300. [Google Scholar] [CrossRef]
  9. Wang, X.; Yuan, Y.; Fang, S.; Zhang, Z.; Wang, J. A novel causal inference method of exit choice behaviour analysis for passenger ships during emergency evacuation. Reliab. Eng. Syst. Saf. 2026, 272, 112489. [Google Scholar] [CrossRef]
  10. Group, T.W. The WAM model—A third generation ocean wave prediction model. J. Phys. Oceanogr. 1988, 18, 1775–1810. [Google Scholar] [CrossRef]
  11. Tolman, H.L. A third-generation model for wind waves on slowly varying, unsteady, and inhomogeneous depths and currents. J. Phys. Oceanogr. 1991, 21, 782–797. [Google Scholar] [CrossRef]
  12. Booij, N.; Ris, R.C.; Holthuijsen, L.H. A third-generation wave model for coastal regions: 1. Model description and validation. J. Geophys. Res. Oceans 1999, 104, 7649–7666. [Google Scholar] [CrossRef]
  13. Mentaschi, L.; Besio, G.; Cassola, F.; Mazzino, A. Problems in RMSE-based wave model validations. Ocean Model. 2013, 72, 53–58. [Google Scholar] [CrossRef]
  14. Rogers, W.E.; Campbell, T.J. Implementation of Curvilinear Coordinate System in the WAVEWATCH III Model; Naval Research Laboratory: Washington, DC, USA, 2009. [Google Scholar]
  15. Bilskie, M.V.; Asher, T.G.; Miller, P.W.; Fleming, J.G.; Hagen, S.C.; Luettich, R.A. Real-time simulated storm surge predictions during Hurricane Michael (2018). Weather Forecast. 2022, 37, 1085–1102. [Google Scholar] [CrossRef]
  16. Sarker, M.A. Numerical modelling of waves and surge from Cyclone Chapala (2015) in the Arabian sea. Ocean Eng. 2018, 158, 299–310. [Google Scholar] [CrossRef]
  17. Ma, J.; Cao, L.; Feng, Y.; Karatuğ, Ç.; Buber, M.; Wang, X. Intelligent analysis of ship collision accidents via Low-Rank Adaptation-based fine-tuning of medium-scale Large Language Models. Reliab. Eng. Syst. Saf. 2026, 275, 112774. [Google Scholar] [CrossRef]
  18. Soares, C.G.; Ferreira, A.M.; Cunha, C. Linear models of the time series of significant wave height on the Southwest Coast of Portugal. Coast. Eng. 1996, 29, 149–167. [Google Scholar] [CrossRef]
  19. Kang, B.H.; Kim, T.H.; Kong, G.Y. A novel method for long-term time series analysis of significant wave height. In Proceedings of the 2016 Techno-Ocean (Techno-Ocean), Kobe, Japan, 6–8 October 2016; IEEE: Piscataway, NJ, USA, 2016; pp. 478–484. [Google Scholar] [CrossRef]
  20. Mahjoobi, J.; Mosabbeb, E.A. Prediction of significant wave height using regressive support vector machines. Ocean Eng. 2009, 36, 339–347. [Google Scholar] [CrossRef]
  21. Agrawal, J.D.; Deo, M.C. On-line wave prediction. Mar. Struct. 2002, 15, 57–74. [Google Scholar] [CrossRef]
  22. Callens, A.; Morichon, D.; Abadie, S.; Delpey, M.; Liquet, B. Using Random forest and Gradient boosting trees to improve wave forecast at a specific location. Appl. Ocean Res. 2020, 104, 102339. [Google Scholar] [CrossRef]
  23. Fan, S.; Xiao, N.; Dong, S. A novel model to predict significant wave height based on long short-term memory network. Ocean Eng. 2020, 205, 107298. [Google Scholar] [CrossRef]
  24. Lou, R.; Wang, W.; Li, X.; Zheng, Y.; Lv, Z. Prediction of ocean wave height suitable for ship autopilot. IEEE Trans. Intell. Transp. Syst. 2021, 23, 25557–25566. [Google Scholar] [CrossRef]
  25. Li, X.; Cao, J.; Guo, J.; Liu, C.; Wang, W.; Jia, Z.; Su, T. Multi-step forecasting of ocean wave height using gate recurrent unit networks with multivariate time series. Ocean Eng. 2022, 248, 110689. [Google Scholar] [CrossRef]
  26. Hajirahimi, Z.; Khashei, M. Hybrid structures in time series modeling and forecasting: A review. Eng. Appl. Artif. Intell. 2019, 86, 83–106. [Google Scholar] [CrossRef]
  27. Zhang, J.; Luo, F.; Quan, X.; Wang, Y.; Shi, J.; Shen, C.; Zhang, C. Improving wave height prediction accuracy with deep learning. Ocean Model. 2024, 188, 102312. [Google Scholar] [CrossRef]
  28. Wang, M.; Ying, F. Point and interval prediction for significant wave height based on LSTM-GRU and KDE. Ocean Eng. 2023, 289, 116247. [Google Scholar] [CrossRef]
  29. Ahmed, A.A.M.; Jui, S.J.J.; Al-Musaylh, M.S.; Raj, N.; Saha, R.; Deo, R.C.; Saha, S.K. Hybrid deep learning model for wave height prediction in Australia’s wave energy region. Appl. Soft Comput. 2024, 150, 111003. [Google Scholar] [CrossRef]
  30. Almaliki, A.H.; Khattak, A. Short- and long-term tidal level forecasting: A novel hybrid TCN + LSTM framework. J. Sea Res. 2025, 204, 102577. [Google Scholar] [CrossRef]
  31. Faruque, M.O.; Hossain, M.A.; Alam, S.M.M.; Khalid, M. Constraint-aware wind power forecasting with an optimized hybrid machine learning model. Energy Convers. Manag. X 2025, 27, 101026. [Google Scholar] [CrossRef]
  32. Mahdi, E.; Martin-Barreiro, C.; Cabezas, X. A novel hybrid approach using an attention-based transformer+GRU model for predicting cryptocurrency prices. Mathematics 2025, 13, 1484. [Google Scholar] [CrossRef]
  33. Kong, X.; Chen, Z.; Liu, W.; Ning, K.; Zhang, L.; Marier, S.M.; Liu, Y.; Chen, Y.; Xia, F. Deep learning for time series forecasting: A survey. Int. J. Mach. Learn. Cybern. 2025, 16, 5079–5112. [Google Scholar] [CrossRef]
  34. Meng, F.; Song, T.; Xu, D.; Xie, P.; Li, Y. Forecasting tropical cyclones wave height using bidirectional gated recurrent unit. Ocean Eng. 2021, 234, 108795. [Google Scholar] [CrossRef]
  35. Colosi, L.V.; Bôas, A.B.V.; Gille, S.T. The seasonal cycle of significant wave height in the ocean: Local versus remote forcing. J. Geophys. Res. Oceans 2021, 126, e2021JC017198. [Google Scholar] [CrossRef]
  36. Grossmann-Matheson, G.; Young, I.R.; Meucci, A.; Alves, J.H. Global tropical cyclone extreme wave height climatology. Sci. Rep. 2024, 14, 4167. [Google Scholar] [CrossRef]
  37. Olivetti, L.; Messori, G. Advances and prospects of deep learning for medium-range extreme weather forecasting. Geosci. Model Dev. 2024, 17, 2347–2358. [Google Scholar] [CrossRef]
  38. Zhou, S.; Bethel, B.J.; Sun, W.; Zhao, Y.; Xie, W.; Dong, C. Improving significant wave height forecasts using a joint empirical mode decomposition–long short-term memory network. J. Mar. Sci. Eng. 2021, 9, 744. [Google Scholar] [CrossRef]
  39. Lou, R.; Lv, Z.; Guizani, M. Wave height prediction suitable for maritime transportation based on green ocean of things. IEEE Trans. Artif. Intell. 2022, 4, 328–337. [Google Scholar] [CrossRef]
  40. Song, T.; Wang, J.; Huo, J.; Wei, W.; Han, R.; Xu, D.; Meng, F. Prediction of significant wave height based on EEMD and deep learning. Front. Mar. Sci. 2023, 10, 1089357. [Google Scholar] [CrossRef]
  41. Wang, J.; Bethel, B.J.; Xie, W.; Dong, C. A hybrid model for significant wave height prediction based on an improved empirical wavelet transform decomposition and long-short term memory network. Ocean Model. 2024, 189, 102367. [Google Scholar] [CrossRef]
  42. Chen, J.; Li, S.; Zhu, J.; Liu, M.; Li, R.; Cui, X.; Li, L. Significant wave height prediction based on variational mode decomposition and dual network model. Ocean Eng. 2025, 323, 120533. [Google Scholar] [CrossRef]
  43. Xu, R.; Fang, H.; Zeng, H.; Wu, B. A novel interpretable wind speed forecasting based on the multivariate variational mode decomposition and temporal fusion transformer. Energy 2025, 331, 136497. [Google Scholar] [CrossRef]
  44. Yu, Y.; Dai, D.; Yang, Q.; Zeng, Q.; Lin, Y.; Chen, Y. An intelligent framework based on optimized variational mode decomposition and temporal convolutional network: Applications to stock index multi-step forecasting. Expert Syst. Appl. 2025, 268, 126222. [Google Scholar] [CrossRef]
  45. Ma, K.; Nie, X.; Yang, J.; Zha, L.; Li, G.; Li, H. A power load forecasting method in port based on VMD-ICSS-hybrid neural network. Appl. Energy 2025, 377, 124246. [Google Scholar] [CrossRef]
  46. Ma, C.; Hu, Y.; Xu, X. Hybrid deep learning model with VMD-BiLSTM-GRU networks for short-term traffic flow prediction. Data Sci. Manag. 2024, 8, 257–269. [Google Scholar] [CrossRef]
  47. Li, G.; Yu, Z.; Yang, K.; Lin, M.; Chen, C.L.P. Exploring feature selection with limited labels: A comprehensive survey of semi-supervised and unsupervised approaches. IEEE Trans. Knowl. Data Eng. 2024, 36, 6124–6144. [Google Scholar] [CrossRef]
  48. Cao, W.; Wang, X.; Shu, Y.; Li, H.; Zhou, J.; Yang, Z. An integrated method of advanced optimisation and adaptive ensemble learning for ship fuel consumption prediction. Transp. Res. Part C Emerg. Technol. 2026, 188, 105659. [Google Scholar] [CrossRef]
  49. Lu, P.; Chen, Y.; Chen, M.; Wang, Z.; Zheng, Z.; Wang, T.; Kong, R. An improved stacking-based model for wave height prediction. Electron. Res. Arch. 2024, 32, 4543–4562. [Google Scholar] [CrossRef]
  50. Li, Y.; Qin, X.; Zhu, D. Nearshore significant wave height prediction based on MIC-LSTM model. Earth Sci. Inform. 2023, 16, 3963–3979. [Google Scholar] [CrossRef]
  51. Zhou, J.; Zhou, L.; Zhao, Y.; Wu, K. Significant wave height prediction based on improved fuzzy C-means clustering and bivariate kernel density estimation. Renew. Energy 2025, 245, 122787. [Google Scholar] [CrossRef]
  52. Wang, H.; Chen, S.; Zhai, W. Variational generalized nonlinear mode decomposition: Algorithm and applications. Mech. Syst. Signal Process. 2024, 206, 110913. [Google Scholar] [CrossRef]
  53. Ni, Q.; Ji, J.C.; Feng, K.; Halkon, B. A fault information-guided variational mode decomposition (FIVMD) method for rolling element bearings diagnosis. Mech. Syst. Signal Process. 2022, 164, 108216. [Google Scholar] [CrossRef]
  54. Liu, Q.; Yahyapour, R.; Murray, R. A novel clustering-forecast method with nonlinear logo information filtering networks. Int. J. Intell. Syst. 2025, 2025, 6410414. [Google Scholar] [CrossRef]
  55. Chen, Y.; Li, D.; Huang, X.; Hong, J.; Mu, C.; Wu, L.; Li, K. Exploring life warning solution of lithium-ion batteries in real-world scenarios: TCN-transformer fusion model for battery pack SOH estimation. Energy 2025, 335, 138053. [Google Scholar] [CrossRef]
  56. Wang, Z.; Zhang, H.; Li, B.; Fan, X.; Ma, Z.; Zhou, J. An IMFO-LSTM_BIGRU combined network for long-term multiple battery states prediction for electric vehicles. Energy 2024, 309, 133069. [Google Scholar] [CrossRef]
Figure 2. TMFG topological move diagram.
Figure 2. TMFG topological move diagram.
Jmse 14 01095 g002
Figure 4. Architecture of dilated causal convolutions.
Figure 4. Architecture of dilated causal convolutions.
Jmse 14 01095 g004
Figure 5. Schematic diagram of GRU unit.
Figure 5. Schematic diagram of GRU unit.
Jmse 14 01095 g005
Figure 6. Original data from three stations.
Figure 6. Original data from three stations.
Jmse 14 01095 g006
Figure 7. Error metrics and center frequencies for dataset A under different decomposition numbers.
Figure 7. Error metrics and center frequencies for dataset A under different decomposition numbers.
Jmse 14 01095 g007
Figure 8. Error metrics and center frequencies for dataset B under different decomposition numbers.
Figure 8. Error metrics and center frequencies for dataset B under different decomposition numbers.
Jmse 14 01095 g008
Figure 9. Error metrics and center frequencies for dataset C under different decomposition numbers.
Figure 9. Error metrics and center frequencies for dataset C under different decomposition numbers.
Jmse 14 01095 g009
Figure 10. Decomposition results of SWH for dataset A.
Figure 10. Decomposition results of SWH for dataset A.
Jmse 14 01095 g010
Figure 11. Decomposition results of SWH for dataset B.
Figure 11. Decomposition results of SWH for dataset B.
Jmse 14 01095 g011
Figure 12. Decomposition results of SWH for dataset C.
Figure 12. Decomposition results of SWH for dataset C.
Jmse 14 01095 g012
Figure 13. TMFG-based feature selection results for dataset A.The thickness of each line represents the correlation strength between variables, with thicker lines indicating stronger correlations.
Figure 13. TMFG-based feature selection results for dataset A.The thickness of each line represents the correlation strength between variables, with thicker lines indicating stronger correlations.
Jmse 14 01095 g013
Figure 14. Visualization of short-term prediction performance.
Figure 14. Visualization of short-term prediction performance.
Jmse 14 01095 g014
Figure 15. Comparison curves of short-term predicted values versus measured values.
Figure 15. Comparison curves of short-term predicted values versus measured values.
Jmse 14 01095 g015
Figure 16. Scatter plots comparing measured values and short-term predicted values.
Figure 16. Scatter plots comparing measured values and short-term predicted values.
Jmse 14 01095 g016
Figure 17. Visualization of medium-term prediction performance.
Figure 17. Visualization of medium-term prediction performance.
Jmse 14 01095 g017
Figure 18. Comparison curves of medium-term predicted values versus measured values.
Figure 18. Comparison curves of medium-term predicted values versus measured values.
Jmse 14 01095 g018
Figure 19. Scatter plots comparing measured values and medium-term predicted values.
Figure 19. Scatter plots comparing measured values and medium-term predicted values.
Jmse 14 01095 g019
Figure 20. Visualization of long-term prediction performance.
Figure 20. Visualization of long-term prediction performance.
Jmse 14 01095 g020
Figure 21. Comparison curves of long-term predicted values versus measured values.
Figure 21. Comparison curves of long-term predicted values versus measured values.
Jmse 14 01095 g021
Figure 22. Scatter plots comparing measured values and long-term predicted values.
Figure 22. Scatter plots comparing measured values and long-term predicted values.
Jmse 14 01095 g022
Figure 23. Extreme-wave-event forecasting case on dataset B.
Figure 23. Extreme-wave-event forecasting case on dataset B.
Jmse 14 01095 g023
Figure 24. Bar chart of the OVMD ablation study results.
Figure 24. Bar chart of the OVMD ablation study results.
Jmse 14 01095 g024
Figure 25. Line chart of OVMD ablation results for short-term prediction.
Figure 25. Line chart of OVMD ablation results for short-term prediction.
Jmse 14 01095 g025
Figure 26. Line chart of OVMD ablation results for medium-term prediction.
Figure 26. Line chart of OVMD ablation results for medium-term prediction.
Jmse 14 01095 g026
Figure 27. Line chart of OVMD ablation results for long-term prediction.
Figure 27. Line chart of OVMD ablation results for long-term prediction.
Jmse 14 01095 g027
Figure 28. Bar chart of the TMFG ablation study results.
Figure 28. Bar chart of the TMFG ablation study results.
Jmse 14 01095 g028
Figure 29. Average RMSE and NSEC comparison of different feature selection methods across forecasting horizons.
Figure 29. Average RMSE and NSEC comparison of different feature selection methods across forecasting horizons.
Jmse 14 01095 g029
Figure 30. Bar chart of the TCN and BiGRU ablation study results.
Figure 30. Bar chart of the TCN and BiGRU ablation study results.
Jmse 14 01095 g030
Figure 31. Sensitivity analysis of lookback window length based on RMSE and NSEC.
Figure 31. Sensitivity analysis of lookback window length based on RMSE and NSEC.
Jmse 14 01095 g031
Table 1. List of abbreviations and acronyms used in the paper.
Table 1. List of abbreviations and acronyms used in the paper.
AbbreviationDefinitionAbbreviationDefinition
SWHSignificant Wave HeightEMDEmpirical Mode Decomposition
WDIRWind DirectionEEMDEnsemble Empirical Mode Decomposition
WSPDWind SpeedTCNTemporal Convolutional Network
GSTGust SpeedBiGRUBidirectional Gated Recurrent Unit
APDAverage Wave PeriodGRUGated Recurrent Unit
PRESAtmospheric PressureLSTMLong Short-Term Memory
ATMPAir TemperatureSVMSupport Vector Machine
WTMPWater TemperatureANNArtificial Neural Network
DEWPDew Point TemperatureRMSERoot Mean Square Error
DPDDominant Wave PeriodMAPEMean Absolute Percentage Error
MWDMean Wave DirectionMAEMean Absolute Error
TMFGTriangulated Maximally Filtered GraphRCorrelation Coefficient
OVMDOptimal Variational Mode DecompositionNSECNash-Sutcliffe Efficiency Coefficient
Table 2. Statistical properties of the data utilized in this study.
Table 2. Statistical properties of the data utilized in this study.
StationLatitude/°NLongitude/°WPeriodDepth/mMax SWH/mMin SWH/m
A30.06087.5482020–202423.58.190.08
B28.50084.5052020–202453.37.030.09
C24.14094.1222020–202436087.690.11
Table 3. Main parameters of models.
Table 3. Main parameters of models.
ModelParameter NameValue
OVMD-TMFG-TCN-BiGRU/w/o OVMD/w/o TMFG/w/o TCN/w/o BiGRUKernel size(3, 5)
Number of filters(25, 50)
Time steps24
Learning rate0.001
Batch size128
ActivationReLU
OptimizerAdam
EMD-TCN/TCNKernel size3
Number of filters24
Time steps24
Learning rate0.001
Batch size128
ActivationReLU
OptimizerAdam
EEMD-LSTMNumber of neurons24
Time steps24
Learning rate0.001
Batch size128
ActivationTanh
OptimizerAdam
BiGRUNumber of neurons64
Time steps24
Learning rate0.01
Batch size128
ActivationTanh
OptimizerAdam
SVMRegularization Parameter1.0
Kernelrbf
GammaScale
Epsilon0.1
ANNNumber of neurons128
Time steps24
Learning rate0.001
Batch size128
ActivationReLU
OptimizerAdam
TransformerNumber of layers2
Number of heads4
Time steps24
Learning rate0.001
Batch size128
ActivationGeLU
OptimizerAdam
Table 4. Results of feature selection.
Table 4. Results of feature selection.
DatasetFeature Selection Result
AIMF1, IMF2, IMF3, IMF4, IMF5, IMF8, IMF9, IMF10, WSPD, GST, DPD, APD, WTMP
BIMF1, IMF2, IMF3, IMF4, IMF5, IMF6, IMF7, IMF8, IMF9, IMF10, IMF11, WSPD, GST, DPD, APD
CIMF1, IMF2, IMF3, IMF4, IMF5, IMF6, IMF7, IMF10, IMF11, WSPD, GST, DPD, APD, ATMP, WTMP
Table 5. Performance metrics of short-term prediction for all models across the three stations.
Table 5. Performance metrics of short-term prediction for all models across the three stations.
StationModelTime StepsRMSEMAEMAPERNSEC
AOVMD-TMFG-TCN-BiGRU1 h0.03430.02280.03240.99820.9963
6 h0.08780.05740.08380.98790.9758
EMD-TCN1 h0.06590.04500.06270.99350.9864
6 h0.10140.06570.09340.98370.9677
EEMD-LSTM1 h0.06730.04590.06320.99320.9858
6 h0.11040.07890.11140.98510.9618
TCN1 h0.10730.07640.10810.98270.9639
6 h0.26730.18830.25090.88140.7759
BiGRU1 h0.10570.08960.13890.98510.9649
6 h0.26630.18720.25580.88240.7775
SVM1 h0.14470.11450.19430.97550.9343
6 h0.27780.20340.29380.87360.7579
ANN1 h0.19180.15380.25450.94840.8846
6 h0.32400.23020.31490.81930.6707
Transformer1 h0.10000.06610.08480.98630.9686
6 h0.30310.20670.23800.88350.7119
Persistence1 h0.09190.05810.06930.98670.9735
6 h0.26910.18150.22050.88640.7728
BOVMD-TMFG-TCN-BiGRU1 h0.03760.02570.03010.99850.9967
6 h0.11660.08220.09340.98510.9684
EMD-TCN1 h0.05760.03580.03940.99630.9923
6 h0.14840.09920.11140.97410.9488
EEMD-LSTM1 h0.05440.03420.0380.99660.9931
6 h0.15030.09960.10760.97410.9474
TCN1 h0.10350.07890.10690.99070.9751
6 h0.30280.21080.26750.88790.7867
BiGRU1 h0.13370.10130.14680.98080.9584
6 h0.30020.20420.2550.89060.7905
SVM1 h0.17490.12830.18460.96880.9289
6 h0.32090.21760.28010.87850.7605
ANN1 h0.18470.1260.14840.96280.9206
6 h0.35780.2410.28070.83940.7023
Transformer1 h0.15720.11070.13900.97410.9425
6 h0.32010.22070.24550.90160.7618
Persistence1 h0.10330.06450.06510.98760.9752
6 h0.32460.20920.21560.87740.7548
COVMD-TMFG-TCN-BiGRU1 h0.04280.03020.03220.99850.9964
6 h0.09580.06470.06400.99180.9819
EMD-TCN1 h0.06480.04080.03670.99620.9917
6 h0.12850.07940.06890.98480.9674
EEMD-LSTM1 h0.08110.05690.05790.99410.9870
6 h0.10840.06840.06300.98840.9768
TCN1 h0.13770.08640.08030.98160.9626
6 h0.33090.20710.17170.89730.7840
BiGRU1 h0.13880.09240.08210.98370.9620
6 h0.32140.20570.17780.91150.7963
SVM1 h0.19820.13590.13790.96270.9225
6 h0.33430.20910.18910.88980.7795
ANN1 h0.20180.14370.14100.95960.9196
6 h0.36630.23770.20700.87610.7353
Transformer1 h0.13650.09420.09400.98460.9633
6 h0.29920.18810.16340.91190.8234
Persistence1 h0.11000.06820.05570.98810.9761
6 h0.28820.17360.14450.91800.8361
Table 6. Performance metrics of medium-term prediction for all models across the three stations.
Table 6. Performance metrics of medium-term prediction for all models across the three stations.
StationModelTime StepsRMSEMAEMAPERNSEC
AOVMD-TMFG-TCN-BiGRU12 h0.13560.09010.1270.97220.9424
24 h0.23050.14330.19030.91510.8334
EMD-TCN12 h0.18070.12870.16960.95640.8976
24 h0.27300.18450.22250.89410.7664
EEMD-LSTM12 h0.18870.13610.18120.95630.8884
24 h0.28080.18330.22130.88770.7529
TCN12 h0.37540.26530.37210.75020.5581
24 h0.47980.32970.44860.54270.2784
BiGRU12 h0.37520.26800.38320.75040.5586
24 h0.48070.32440.41980.54710.2758
SVM12 h0.37680.26500.37230.74580.5548
24 h0.48800.33990.47690.52340.2536
ANN12 h0.40150.27620.37670.70350.4945
24 h0.50920.37870.57700.47010.1872
Transformer12 h0.40630.26300.28280.74030.4823
24 h0.46770.30920.36670.60040.3143
Persistence12 h0.41090.27460.33960.73530.4707
24 h0.57970.39710.50150.4738−0.0529
BOVMD-TMFG-TCN-BiGRU12 h0.16510.12120.14540.97140.9366
24 h0.22160.15730.20270.94460.8858
EMD-TCN12 h0.20630.14510.15980.95510.9010
24 h0.28930.21940.27180.92100.8054
EEMD-LSTM12 h0.19140.13610.15610.95990.9148
24 h0.28110.21410.26190.92230.8163
TCN12 h0.42650.30340.39500.76300.5771
24 h0.53940.39020.54090.57460.3237
BiGRU12 h0.43080.29560.37250.75530.5684
24 h0.53460.38350.54020.59010.3356
SVM12 h0.42770.28290.35220.76420.5746
24 h0.54200.35620.43570.57860.3171
ANN12 h0.42650.30340.39500.76300.5771
24 h0.53940.39020.54090.57460.3237
Transformer12 h0.45910.29790.29670.76730.5099
24 h0.57320.36490.36570.57080.2363
Persistence12 h0.48230.31670.33550.72960.4592
24 h0.65870.43050.47470.4962−0.0074
COVMD-TMFG-TCN-BiGRU12 h0.13520.08490.07700.98180.9640
24 h0.20890.13840.12730.95660.9139
EMD-TCN12 h0.16950.11300.10740.97190.9433
24 h0.32100.19620.19280.89510.7969
EEMD-LSTM12 h0.16820.10950.10040.97560.9442
24 h0.29640.18260.19120.91630.8268
TCN12 h0.42360.26650.23310.81680.6461
24 h0.55700.35200.30890.63820.3883
BiGRU12 h0.42060.25720.21540.82750.6511
24 h0.55610.34610.28040.67820.3904
SVM12 h0.44760.27630.24130.79040.6050
24 h0.58390.36210.30250.59680.3280
ANN12 h0.46720.29520.25650.76250.5695
24 h0.58530.36740.30310.61170.3246
Transformer12 h0.43240.26480.22750.81420.6313
24 h0.59210.37360.31830.58180.3089
Persistence12 h0.42020.26130.21940.82590.6519
24 h0.58390.36870.31320.66420.3292
Table 7. Performance metrics of long-term prediction for all models across the three stations.
Table 7. Performance metrics of long-term prediction for all models across the three stations.
StationModelTime StepsRMSEMAEMAPERNSEC
AOVMD-TMFG-TCN-BiGRU48 h0.34790.24620.33090.79560.6208
EMD-TCN48 h0.39030.27360.36410.72810.5226
EEMD-LSTM48 h0.39740.27580.36910.71880.5051
TCN48 h0.54690.42080.66150.32200.0629
BiGRU48 h0.53910.40520.61870.33140.0893
SVM48 h0.55390.40730.57160.30950.0386
ANN48 h0.55000.42440.67370.26940.0523
Transformer48 h0.54250.38970.53730.30650.0778
Persistence48 h0.73460.53410.72960.1572−0.6865
BOVMD-TMFG-TCN-BiGRU48 h0.33790.24580.33320.85830.7348
EMD-TCN48 h0.39450.27790.37290.81070.6385
EEMD-LSTM48 h0.39420.28640.36270.80220.6391
TCN48 h0.63470.45740.63760.26940.0642
BiGRU48 h0.61970.44470.62300.32890.1079
SVM48 h0.64120.43590.52290.29020.0450
ANN48 h0.63630.45810.63340.26820.0595
Transformer48 h0.63110.44960.60720.29780.0748
Persistence48 h0.82450.57700.70630.2122−0.5754
COVMD-TMFG-TCN-BiGRU48 h0.37590.24690.23450.84760.7174
EMD-TCN48 h0.44870.27380.27540.79000.5973
EEMD-LSTM48 h0.45640.29240.26220.77310.5833
TCN48 h0.66730.43550.37710.41960.1230
BiGRU48 h0.68100.43990.36180.46950.0865
SVM48 h0.67690.45100.39990.37920.0976
ANN48 h0.69980.48030.41010.42280.0354
Transformer48 h0.67920.45090.38460.40350.0915
Persistence48 h0.75510.50410.45860.4393−0.1188
Table 8. Forecasting accuracy on storm-peak samples of dataset B.
Table 8. Forecasting accuracy on storm-peak samples of dataset B.
Forecast HorizonModelRMSE_PeakMAE_Peak
1 hOVMD-TMFG-TCN-BiGRU0.09710.0699
EMD-TCN0.15250.1102
TCN0.24110.1734
Persistence0.28310.2114
6 hOVMD-TMFG-TCN-BiGRU0.25550.1956
EMD-TCN0.36700.2806
TCN0.79690.6248
Persistence0.88900.6895
12 hOVMD-TMFG-TCN-BiGRU0.33120.2559
EMD-TCN0.50150.3987
TCN1.17850.9492
Persistence1.32031.0630
24 hOVMD-TMFG-TCN-BiGRU0.46760.3461
EMD-TCN0.57780.4598
TCN1.59501.3697
Persistence1.66961.3750
48 hOVMD-TMFG-TCN-BiGRU0.77750.6156
EMD-TCN0.87910.6655
TCN1.97201.7965
Persistence1.91731.6680
Table 9. Summary of Wilcoxon signed-rank test results across forecasting horizons.
Table 9. Summary of Wilcoxon signed-rank test results across forecasting horizons.
Benchmark ModelMean Z-ValueZ-Value Rangep-Value Summary
EMD-TCN−34.2893−47.5312 to −13.4611<0.001 for all horizons
EEMD-LSTM−41.1268−67.2927 to −20.4219<0.001 for all horizons
TCN−88.2854−100.1076 to −65.6662<0.001 for all horizons
BiGRU−88.2287−103.1514 to −67.2136<0.001 for all horizons
SVM−91.734−114.7246 to −70.9846<0.001 for all horizons
ANN−94.6545−115.3260 to −65.5838<0.001 for all horizons
Transformer−87.0389−104.0576 to −67.9830<0.001 for all horizons
Persistence−84.0683−90.1019 to −77.1228<0.001 for all horizons
Table 10. Ablation results of OVMD.
Table 10. Ablation results of OVMD.
StationTime StepsModelRMSEMAEMAPERNSEC
A1 hOVMD-TMFG-TCN-BiGRU0.03430.02280.03240.99820.9963
w/oOVMD0.09170.06400.08690.98850.9736
6 hOVMD-TMFG-TCN-BiGRU0.08780.05740.08380.98790.9758
w/oOVMD0.24560.18050.26400.90450.8108
12 hOVMD-TMFG-TCN-BiGRU0.13560.09010.12700.97220.9424
w/oOVMD0.35680.24200.31430.77910.6009
24 hOVMD-TMFG-TCN-BiGRU0.24260.15380.18770.91460.8155
w/oOVMD0.46560.31720.41330.58620.3206
48 hOVMD-TMFG-TCN-BiGRU0.34790.24620.33090.79560.6208
w/oOVMD0.53940.39650.57840.33040.0881
B1 hOVMD-TMFG-TCN-BiGRU0.03760.02570.03010.99850.9967
w/oOVMD0.09760.07090.09150.99070.9778
6 hOVMD-TMFG-TCN-BiGRU0.11660.08220.09340.98510.9684
w/oOVMD0.28950.20430.26720.90260.8050
12 hOVMD-TMFG-TCN-BiGRU0.16510.12120.14540.97140.9366
w/oOVMD0.43110.31090.42320.76350.5678
24 hOVMD-TMFG-TCN-BiGRU0.22160.15730.20270.94460.8858
w/oOVMD0.52260.37310.51760.61100.3651
48 hOVMD-TMFG-TCN-BiGRU0.33790.24580.33320.85830.7348
w/oOVMD0.62710.44790.61440.29580.0864
C1 hOVMD-TMFG-TCN-BiGRU0.04280.03020.03220.99850.9964
w/oOVMD0.10920.07230.06400.98870.9765
6 hOVMD-TMFG-TCN-BiGRU0.09580.06470.06400.99180.9819
w/oOVMD0.29210.17970.14910.92170.8317
12 hOVMD-TMFG-TCN-BiGRU0.13520.08490.07700.98180.9640
w/oOVMD0.40100.24600.21540.83630.6829
24 hOVMD-TMFG-TCN-BiGRU0.20890.13840.12730.95660.9139
w/oOVMD0.53370.35790.35170.68060.4385
48 hOVMD-TMFG-TCN-BiGRU0.37590.24690.23450.84760.7174
w/oOVMD0.65130.43530.39970.46210.1645
Table 11. Ablation results of TMFG.
Table 11. Ablation results of TMFG.
StationTime StepsModelRMSEMAEMAPERNSEC
A1 hOVMD-TMFG-TCN-BiGRU0.03430.02280.03240.99820.9963
w/oTMFG0.05810.04000.05610.99480.9894
6 hOVMD-TMFG-TCN-BiGRU0.08780.05740.08380.98790.9758
w/oTMFG0.10110.07020.09850.98680.9679
12 hOVMD-TMFG-TCN-BiGRU0.13560.09010.12700.97220.9424
w/oTMFG0.19680.13740.17660.94490.8785
24 hOVMD-TMFG-TCN-BiGRU0.24260.15380.18770.91460.8155
w/oTMFG0.25740.16370.20210.90030.7924
48 hOVMD-TMFG-TCN-BiGRU0.34790.24620.33090.79560.6208
w/oTMFG0.36790.26270.37290.76750.5758
B1 hOVMD-TMFG-TCN-BiGRU0.03760.02570.03010.99850.9967
w/oTMFG0.04530.02970.03330.99760.9952
6 hOVMD-TMFG-TCN-BiGRU0.11660.08220.09340.98510.9684
w/oTMFG0.13580.09310.10940.97990.9571
12 hOVMD-TMFG-TCN-BiGRU0.16510.12120.14540.97140.9366
w/oTMFG0.18480.13040.16280.96170.9206
24 hOVMD-TMFG-TCN-BiGRU0.22160.15730.20270.94460.8858
w/oTMFG0.24980.17750.23030.93240.8550
48 hOVMD-TMFG-TCN-BiGRU0.33790.24580.33320.85830.7348
w/oTMFG0.39020.27970.34270.80960.6464
C1 hOVMD-TMFG-TCN-BiGRU0.04280.03020.03220.99850.9964
w/oTMFG0.05160.03220.02970.99780.9947
6 hOVMD-TMFG-TCN-BiGRU0.09580.06470.06400.99180.9819
w/oTMFG0.10430.06470.05830.98960.9785
12 hOVMD-TMFG-TCN-BiGRU0.13520.08490.07700.98180.9640
w/oTMFG0.14150.08960.08230.98010.9605
24 hOVMD-TMFG-TCN-BiGRU0.20890.13840.12730.95660.9139
w/oTMFG0.25360.16930.16490.93650.8732
48 hOVMD-TMFG-TCN-BiGRU0.37590.24690.23450.84760.7174
w/oTMFG0.43100.26840.26450.81150.6284
Table 12. Ablation results of TCN and BiGRU.
Table 12. Ablation results of TCN and BiGRU.
StationTime StepsModelRMSEMAEMAPERNSEC
A1 hOVMD-TMFG-TCN-BiGRU0.03430.02280.03240.99820.9963
w/oTCN0.04510.03050.04050.99690.9936
w/oBiGRU0.04400.02980.03920.99700.9939
6 hOVMD-TMFG-TCN-BiGRU0.08780.05740.08380.98790.9758
w/oTCN0.09670.06400.09050.98530.9706
w/oBiGRU0.09780.06740.09510.98760.9700
12 hOVMD-TMFG-TCN-BiGRU0.13560.09010.12700.97220.9424
w/oTCN0.15970.11180.16580.95940.9200
w/oBiGRU0.17350.12480.16790.95940.9056
24 hOVMD-TMFG-TCN-BiGRU0.24260.15380.18770.91460.8155
w/oTCN0.26660.17410.21220.89320.7773
w/oBiGRU0.25240.16160.20220.90340.8004
48 hOVMD-TMFG-TCN-BiGRU0.34790.24620.33090.79560.6208
w/oTCN0.38380.25880.31590.75660.5385
w/oBiGRU0.37930.27650.41110.74450.5492
B1 hOVMD-TMFG-TCN-BiGRU0.03760.02570.03010.99850.9967
w/oBiGRU0.03980.02900.03890.99850.9963
w/oTCN0.05080.03310.03650.99700.9940
6 hOVMD-TMFG-TCN-BiGRU0.11660.08220.09340.98510.9684
w/oTCN0.12880.08780.10170.98250.9614
w/oBiGRU0.14050.09370.10480.97800.9541
12 hOVMD-TMFG-TCN-BiGRU0.16510.12120.14540.97140.9366
w/oTCN0.19060.13350.15320.95910.9155
w/oBiGRU0.19070.13160.15620.95710.9154
24 hOVMD-TMFG-TCN-BiGRU0.22160.15730.20270.94460.8858
w/oTCN0.26090.18890.22170.92670.8418
w/oBiGRU0.26500.19500.23340.92520.8368
48 hOVMD-TMFG-TCN-BiGRU0.33790.24580.33320.85830.7348
w/oTCN0.36150.26440.30990.86300.6964
w/oBiGRU0.38530.26620.32770.81310.6551
C1 hOVMD-TMFG-TCN-BiGRU0.04280.03020.03220.99850.9964
w/oTCN0.04690.03590.04180.99850.9957
w/oBiGRU0.05420.03460.03030.99710.9942
6 hOVMD-TMFG-TCN-BiGRU0.09580.06470.06400.99180.9819
w/oTCN0.10130.07150.07560.99120.9798
w/oBiGRU0.12460.08220.07340.98540.9694
12 hOVMD-TMFG-TCN-BiGRU0.13520.08490.07700.98180.9640
w/oTCN0.14330.09320.08680.98000.9595
w/oBiGRU0.14810.09460.08580.97840.9567
24 hOVMD-TMFG-TCN-BiGRU0.20890.13840.12730.95660.9139
w/oTCN0.22350.14350.13190.95200.9015
w/oBiGRU0.28690.19430.19380.91840.8377
48 hOVMD-TMFG-TCN-BiGRU0.37590.24690.23450.84760.7174
w/oTCN0.41770.26550.26160.81950.6511
w/oBiGRU0.40930.26700.23670.82840.6649
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Liu, Z.; Shi, G.; Lv, M.; Wu, T.; Wang, X. Multi-Horizon Significant Wave Height Forecasting with Multiscale Decomposition and Topological Feature Selection. J. Mar. Sci. Eng. 2026, 14, 1095. https://doi.org/10.3390/jmse14121095

AMA Style

Liu Z, Shi G, Lv M, Wu T, Wang X. Multi-Horizon Significant Wave Height Forecasting with Multiscale Decomposition and Topological Feature Selection. Journal of Marine Science and Engineering. 2026; 14(12):1095. https://doi.org/10.3390/jmse14121095

Chicago/Turabian Style

Liu, Zeping, Guoyou Shi, Mina Lv, Tao Wu, and Xinjian Wang. 2026. "Multi-Horizon Significant Wave Height Forecasting with Multiscale Decomposition and Topological Feature Selection" Journal of Marine Science and Engineering 14, no. 12: 1095. https://doi.org/10.3390/jmse14121095

APA Style

Liu, Z., Shi, G., Lv, M., Wu, T., & Wang, X. (2026). Multi-Horizon Significant Wave Height Forecasting with Multiscale Decomposition and Topological Feature Selection. Journal of Marine Science and Engineering, 14(12), 1095. https://doi.org/10.3390/jmse14121095

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop