Significant Wave Height Prediction Using LSTM Augmented by Singular Spectrum Analysis and Residual Correction

Ning, Chunlin; Li, Huanyong; Wang, Zongsheng; Li, Chao; Zeng, Lingkun; Shao, Wenmiao; Nie, Shiqiang

doi:10.3390/jmse13091635

Open AccessArticle

Significant Wave Height Prediction Using LSTM Augmented by Singular Spectrum Analysis and Residual Correction

by

Chunlin Ning

^1,2,3,4,5,*,

Huanyong Li

^1,2,

Zongsheng Wang

¹,

Chao Li

^2,3,4,5

,

Lingkun Zeng

^2,6,

Wenmiao Shao

^2,7 and

Shiqiang Nie

^2,7

¹

College of Ocean Science and Engineering, Shandong University of Science and Technology, Qingdao 266590, China

²

First Institute of Oceanography, Ministry of Natural Resources, Qingdao 266061, China

³

Key Laboratory of Marine Science and Numerical Modeling, Ministry of Natural Resources, Qingdao 266061, China

⁴

Shandong Key Laboratory of Marine Science and Numerical Modeling, Qingdao 266061, China

⁵

Laboratory for Regional Oceanography and Numerical Modeling, Qingdao Marine Science and Technology Center, Qingdao 266237, China

⁶

College of Oceanography and Space Informatics, China University of Petroleum, Qingdao 266580, China

⁷

Qingdao Innovation and Development Center, Harbin Engineering University, Qingdao 266000, China

^*

Author to whom correspondence should be addressed.

J. Mar. Sci. Eng. 2025, 13(9), 1635; https://doi.org/10.3390/jmse13091635

Submission received: 30 June 2025 / Revised: 10 August 2025 / Accepted: 11 August 2025 / Published: 27 August 2025

(This article belongs to the Section Physical Oceanography)

Download

Browse Figures

Versions Notes

Abstract

Significant wave height (SWH) is a key physical parameter influencing the safety of shipping, fisheries, and marine engineering projects, and is closely related to climate change and marine disasters. Existing models struggle to balance a high prediction accuracy with low parameter counts, and are challenging to deploy on platforms such as buoys. To address these issues, this study proposes an innovative method for SWH prediction by combining Singular Spectrum Analysis (SSA) with a residual correction mechanism in a Long Short-Term Memory (LSTM) network. This method utilizes SSA to decompose SWH time series, accurately extracting its main feature modes as inputs to the LSTM network and significantly enhancing the model’s ability to capture time-series data. Additionally, a residual correction module is introduced to fine-tune the prediction results, effectively improving the model’s 12 h forecasting accuracy. The experimental results show that for 1, 3, 6, and 12 h SWH predictions, by incorporating SSA and the residual correction module, the model reduces the Mean Squared Error (MSE), Root-Mean-Squared Error (RMSE), Mean Absolute Error (MAE), and Mean Absolute Percentage Error (MAPE) by 60–95%, and increases the coefficient of determination (R²) by 2–60%. The proposed model has only 10% of the parameters for LSTM based on Variational Mode Decomposition (VMD), striking an excellent balance between prediction accuracy and computational efficiency. This study provides a new methodology for deploying SWH prediction models on platforms such as buoys, and holds significant application value in marine disaster warning and environmental monitoring.

Keywords:

significant wave height prediction; SSA; LSTM; residual correction

1. Introduction

Significant wave height, as an important parameter in marine meteorology, has a significant impact on various fields, such as marine engineering, shipping, fisheries, and climate research. Accurate predictions of significant wave height can help vessels avoid harsh sea conditions, improving their navigation safety and reducing their likelihood of accidents, especially in shipping and marine engineering. Additionally, significant wave height prediction results provide key decision-making support for fishery management, marine resource development, and coastal infrastructure construction [1,2,3,4]. Due to the significant dynamic variability and complex spatial distribution patterns of wave height, achieving accurate predictions remains a core challenge in oceanography. Currently, the main methods for predicting significant wave height include wave spectrum inversion and numerical wave models, as well as traditional machine learning and deep learning approaches.

The wave spectrum inversion method is based on existing wave spectrum data, and it calculates the wave characteristics through inversion, making it suitable for wave forecasting based on observational data, especially in cases where direct physical modeling is not feasible. Its accuracy depends on the quality of the data and the precision of spectral analysis. Torsethaugen et al. proposed a bimodal wave spectrum model that includes both locally generated wind waves and swells, and modeled the two peaks using an extended JONSWAP spectrum [5]. Ochi and Hubble proposed a six-parameter dual-component ocean wave spectrum model, which combines a three-parameter formula to cover the entire evolution of storm seas and establishes a statistical relationship between parameters and significant wave height, thus generating a spectrum family adapted to specific sea conditions [6]. The wave spectrum inversion method relies on existing wave spectrum data, making it difficult for it to handle nonlinear effects and dynamic changes in wave systems, and it also has strict requirements with regard to data quality. To overcome these limitations, the wave numerical model method was introduced. This method simulates the generation, propagation, and evolution of waves by solving physical equations, and offering higher accuracy and applicability, particularly under complex sea conditions. The WAMDI Group proposed the WAM model, which integrates wind input, nonlinear transfer, and whitecap dissipation source functions, and demonstrated its wave forecasting capability in multiple sea areas [7]. Tolman developed the third-generation full-spectrum wind–wave model based on the previous two generations, detailing the model’s development, operation, and numerical methods [8]. Booij et al. introduced the third-generation numerical wave model SWAN, which is used to compute random short waves in shallow-water areas and coastal regions with environmental flows. Experimental validation showed that the model’s results were highly consistent with theoretical solutions and experimental observations [9]. Numerical models can forecast waves by simulating multiple factors in a marine environment, but they rely heavily on computational resources. They are very sensitive to the choice of initial and boundary conditions, and can be affected by the accuracy of input data, often yielding poor results in certain localized areas.

With the increase in computational power and data availability, traditional machine learning methods have gradually been introduced into wave height prediction. Traditional machine learning methods can handle large and complex datasets, and can perform predictions by learning underlying patterns from historical data. Many researchers have explored the use of machine learning methods such as Random Forest and Support Vector Machines (SVMs) for predicting significant wave height. Etemad et al. compared the performance of the M5’ model tree and an Artificial Neural Network (ANN) in predicting significant wave height for Lake Superior, finding that the M5’ model tree provided more interpretable rules and slightly outperformed the ANN in terms of prediction accuracy [10]. Mahjoobi et al. proposed using SVM to predict significant wave height and compared it with an ANN and other models. The results showed that the SVM performed excellently in terms of both accuracy and computational efficiency [11]. Malekmohamadi et al. evaluated the effectiveness of various soft computing methods for wave height prediction and found that the SVM, ANN, and Adaptive Network-based Fuzzy Inference System (ANFIS) provided effective predictions, while Bayesian Networks (BNs) performed poorly [12]. Feng et al. developed a machine learning model based on a Multi-Layer Perceptron (MLP) for wave forecasting in Lake Michigan, and the results showed that this model outperformed the traditional SWAN physical model in terms of both prediction accuracy and computational efficiency [13]. Chen et al. used Support Vector Regression (SVR) combined with local meteorological and neighboring wave data to improve the 1 h and 3 h prediction accuracy of coastal significant wave height during typhoon events [14]. Although traditional machine learning methods can achieve high prediction accuracy in situations with limited information and small sample sizes, they face the issue of error accumulation as the dataset grows, which limits their performance on larger-scale datasets [15].

In recent years, deep learning methods have been widely applied in the prediction of significant wave height. Specifically, time-series models such as Recurrent Neural Networks (RNNs), Long Short-Term Memory (LSTM), and Gated Recurrent Units (GRUs) have achieved significant results in wave height prediction. Sadeghifar et al. used an RNN for wave height prediction along the southern coast of the Caspian Sea, and the results showed that the RNN model outperformed previous neural network methods in terms of prediction accuracy across different time scales [16]. Despite the impressive performance of RNNs, issues such as vanishing and exploding gradients arise when processing long sequences, leading to the development of variants such as LSTM. LSTM, by introducing a gating mechanism with forget, input, and output gates, can dynamically control the retention and forgetting of information, significantly enhancing this method’s ability to model long-term dependencies. This structured improvement tailored to the characteristics of time-series data has made LSTM one of the most popular methods for predictive tasks in recent research [17]. Fan et al. used LSTM to predict significant wave height over different durations. The results showed that, compared to Backpropagation Neural Networks (BPNNs), Extreme Learning Machines (ELMs), SVMs, Residual Networks (ResNets), and Random Forest (RF), LSTM achieved better predictive performance [18]. Jörges et al. developed an LSTM-based model for significant wave height prediction and reconstruction. The experiments demonstrated that LSTM excelled in both the reconstruction and prediction of wave heights [19]. Minuzzi et al. used LSTM to predict 6, 12, 18, and 24 h significant wave height at seven different locations in the South-West Atlantic, and compared the LSTM predictions with those from the ERA5 numerical model. The results indicated that LSTM outperformed the ERA5 model [20]. Gao et al. proposed an LSTM-based significant wave height prediction model, which showed higher prediction accuracy than traditional numerical models when applied to three stations in the Bohai Sea [21]. Meng et al. proposed an LSTM-based long-term wave sequence prediction method, and improved the prediction accuracy of irregular waves through multiple prediction steps [22].

Some researchers have effectively improved the accuracy of time-series predictions by integrating LSTM models with other algorithmic architectures or data preprocessing techniques. Fu et al. proposed a hierarchical hybrid model that combines CEEMDAN-RCMSE feature decomposition with VMD secondary decomposition, significantly improving marine wave height prediction accuracy through multi-scale LSTM dynamic modeling and a weighted ensemble strategy [23]. Ni et al. introduced a deep learning model that integrates Principal Component Analysis (PCA) with LSTM for short-term wave height prediction. The results showed that this model outperformed other data-driven methods in terms of prediction accuracy [24]. Martin and Felix proposed an RNN-LSTM-based method for predicting significant wave height. This model can make accurate predictions across different time intervals, and performed better than traditional persistence models over longer prediction periods [25]. Although significant breakthroughs have been made in improving significant wave height prediction accuracy, there remains a clear research gap in lightweight modeling for platforms such as buoys. While signal processing techniques like VMD have effectively enhanced prediction performance through modal forecasting and re-integration, they significantly increase the number of model parameters. This leads to a sharp rise in the demand for computational resources and energy consumption during the training process. The contradiction between computational complexity and energy efficiency severely limits the feasibility of deploying these models on resource-constrained platforms such as marine observation buoys [26,27].

This study aims to effectively improve the prediction accuracy of short- to medium-term significant wave height under the constraint of low model-parameter counts. Current time-series prediction models, such as LSTM, GRUs, and TCNs, generally exhibit insufficient accuracy in significant wave height prediction tasks, making it difficult for them to meet practical application requirements. Although integrating data decomposition techniques is an effective strategy to address this issue, the application of traditional data decomposition methods often leads to a significant increase in model parameters. To resolve this contradiction, this study innovatively constructs an LSTM prediction model based on Singular Spectrum Analysis (SSA) and a residual correction mechanism. The core innovation of this model lies in using the feature modes obtained from SSA decomposition as the input features to the LSTM network. This design balances prediction performance and model complexity. Owing to its ability to capture key fluctuation patterns through feature extraction while effectively controlling its parameter scale, the model achieves the dual objectives of improved prediction accuracy and model lightweighting. This model provides a new methodology for significant wave height prediction on platforms such as buoys, and holds significant application value in marine disaster warning, shipping safety, and marine environmental monitoring.

2. Materials and Methods

2.1. Materials

2.1.1. Data Source

The National Data Buoy Center (NDBC), under the National Oceanic and Atmospheric Administration (NOAA), is entrusted with data collection using a monitoring network composed of nearly 100 buoys and Coastal-Marine Automated Network (C-MAN) stations. This study uses data from 2016 to 2018 from NDBC site 44013, located approximately 16 nautical miles east of Boston, Massachusetts. The geographical coordinates of the site are 42°20′44″ N, 70°39′4″ W (i.e., latitude 42.346° N, longitude 70.651° W), and the buoy’s depth is 64.6 m. The climate in this region exhibits significant seasonal characteristics: the average wind speed ranges from 10 to 15 knots (18 to 28 km/h), with stronger winds observed in spring and autumn. The prevailing wind direction is southwest, a phenomenon closely related to the local seasonal climate patterns and the land–sea breeze effect. In terms of temperature, the annual average temperature is about 11 °C, with summer highs reaching over 22 °C and winter lows around 0 °C. The corresponding sea surface temperature reaches 20 °C in summer and falls to about 5 °C in winter. Furthermore, humidity shows a clear seasonal contrast, with summer humidity often exceeding 80%, while winter humidity is lower. The overall atmospheric pressure remains relatively stable, with an annual average of approximately 1013 hPa. A wind rose for this location during the study period is shown in Figure 1.

2.1.2. Dataset Construction and Processing

A dataset was constructed using 26,051 data points from NOAA buoy 44013, spanning from 1 January 2016 to 31 December 2018. Each data entry included ten variables: wind direction (WDIR), wind speed (WSPD), gust speed (GST), wave period (DPD), average wave period (APD), wave direction (MWD), atmospheric pressure (PRES), air temperature (ATMP), water temperature (WTMP), and significant wave height (SWH).

The dataset had an hourly sampling interval, with a total of 1153 missing values, accounting for 4.4% of the data. The proportion of missing values was relatively small, with no long periods of continuous missing data, and the distribution was uniform. Missing values were handled through interpolation. The processed dataset was divided into training, validation, and test sets, accounting for 64%, 16%, and 20% of the data, respectively. Table 1 presents the details of the dataset used in this study.

2.1.3. Feature Factor Selection

The generation of, and variation in, waves are influenced by multiple factors, with wind speed, wind direction, atmospheric pressure, and temperature playing crucial roles in wave generation and propagation. Studies have shown that these meteorological elements can serve as input variables for predicting wave height. Hashim et al. investigated the major climatic parameters affecting offshore wave height prediction, and used ANFIS to select influencing factors, finding that wind speed, wind direction, air temperature, and sea surface temperature were the most important input parameters [28]. Pang et al. incorporated wind speed and wind direction into an LSTM wave height model, significantly improving both short-term and long-term significant wave height prediction accuracy through multi-dimensional input [29].

Although existing research has extensively explored the impact of these factors on waves, quantitative analyses of their correlation with wave height remain relatively scarce. Traditional studies mostly use empirical models or physics-based analytical methods; however, these approaches do not enable in-depth investigation of the degree of correlation between influencing factors and wave height; thus, certain key features may not have been fully considered [30]. Common correlation analysis methods include Pearson, Spearman, and Kendall correlation analyses. Pearson correlation analysis is primarily used to measure the linear relationship between variables, with the assumption that the data follows a normal distribution [31]. In contrast, while Kendall’s rank correlation analysis can effectively capture nonlinear relationships, it faces limitations in practical applications, such as high computational complexity and inefficiency when processing large-scale datasets [32]. On the other hand, Spearman’s rank correlation analysis, as a non-parametric method, can effectively address issues of nonlinear relationships and outliers, without relying on distributional assumptions of the data [33].

Spearman’s rank correlation analysis is a non-parametric statistical method used to measure the monotonic relationship between two variables. This method involves ranking the data of each variable and then calculating the differences between the ranks to assess the degree of association between the variables. Spearman’s rank correlation coefficient ranges from −1 to +1, where +1 indicates a perfect positive correlation, −1 indicates a perfect negative correlation, and 0 indicates no correlation.

The calculation formula for Spearman’s rank correlation analysis is shown in Equation (1) [34].

ρ = 1 - \frac{6 \sum d_{i}^{2}}{N (N^{2} - 1)}

(1)

where

d_{i}

is the rank difference for the

i

-th data pair in the two variables; the calculation formula is shown in Equation (2).

d_{i} = R (X_{i}) - R (Y_{i})

(2)

where

R (X_{i})

is the rank of the

i

-th data point in the first variable, and

R (Y_{i})

is the rank of the

i

-th data point in the second variable.

Since wave generation is a highly nonlinear process, the linear relationship between historical features is relatively weak. Therefore, features with positive correlations are selected as input features [18]. This study conducted Spearman’s rank correlation analysis on the ten features and, based on statistical conventions, set a correlation coefficient of 0.5 as the threshold for determining significant correlations. This value is widely regarded as a reasonable cutoff for measuring moderate-strength correlations between variables [35]. As shown in the Spearman’s rank correlation matrix in Figure 2, the correlation coefficients between WSPD and GST with significant wave height are 0.56 and 0.57, respectively, both exceeding the threshold of 0.5, indicating a relatively high correlation between these features and significant wave height. WSPD reflects wind speed variations, which are generally strongly associated with the generation and propagation of waves, thereby establishing a significant relationship with significant wave height. In contrast, GST only reflects instantaneous peak values over short time periods. Although its correlation coefficient is also relatively high, it primarily represents short-term fluctuations and fails to effectively capture the long-term variations of significant wave height [36]. Therefore, despite the correlation between GST and significant wave height, it is unsuitable as an independent predictor for a prediction horizon exceeding one hour, in this study.

In comparison, historical significant wave height data provides valuable temporal information that aids the model in capturing the changing trends of significant wave height, which is critical for long-term forecasting. Significant wave height exhibits clear temporal characteristics, and historical data reflects the underlying patterns of wave height variations, providing strong contextual information for the model. Therefore, by selecting WSPD and historical significant wave height data as input features, the model can account for both the immediate changes in wind speed and the temporal dynamics of past wave heights, thereby enhancing the accuracy and stability of the predictions.

2.2. Methods

2.2.1. SSA Principle

SSA originates from Karhumen–Loeve theory [37]. A one-dimensional wave height sequence x is arranged into a two-dimensional time-delay matrix, according to the nesting dimension M and a certain time lag. PCA is performed on the phase space of the original wave height sequence x in the time-delay matrix, to obtain the eigenvalues and eigenvectors of the matrix. Generally, the eigenvectors obtained through this method have orthogonal properties, with different eigenvectors corresponding to different fluctuation signals in the wave height sequence x. Given the wave height sequence x, embedding dimension M, and time lag 1, the time-delay matrix X is arranged as shown in Equation (3).

X = [\begin{matrix} x_{1} & x_{2} & \dots & x_{N - M + 1} \\ x_{2} & x_{3} & \dots & x_{N - M + 2} \\ ⋮ & ⋮ & ⋮ & ⋮ \\ x_{M} & x_{M + 1} & \dots & x_{N} \end{matrix}] = [\begin{matrix} x_{10} & x_{11} & \dots & x_{1, N - M} \\ x_{20} & x_{21} & \dots & x_{2, N - M} \\ ⋮ & ⋮ & ⋮ & ⋮ \\ x_{M 0} & x_{M 1} & \dots & x_{M, N - M} \end{matrix}]

(3)

where N is the sequence length and M is the embedding dimension. Many practical studies show that M is generally taken as N/3 [38]. The

i

-th state value of the delay matrix X is given by Equation (4).

X_{i} = [\begin{matrix} x_{i + 1} \\ x_{i + 2} \\ ⋮ \\ x_{i + M} \end{matrix}] = [\begin{matrix} X_{1 i} \\ X_{2 i} \\ ⋮ \\ X_{M i} \end{matrix}] i = 0, 1, 2, \dots, N - M

(4)

The formula in Equation (4) yields N − M + 1 states, and the elements of the delay matrix X correspond to the elements of the original wave height sequence x, according to the relationship

X_{j i} = x_{j + i}

. The covariance between the variables in Equation (3) represents the autocovariance of different lagged elements of the original sequence

x_{i}

. The constructed lagged autocovariance matrix

T_{x}

is shown in Equation (5).

T_{x} = [\begin{matrix} C (0) & C (1) & C (2) & \dots & C (M - 1) \\ C (1) & C (0) & C (1) & \dots & \dots \\ C (2) & C (1) & C (0) & \dots & \dots \\ ⋮ & ⋮ & ⋮ & ⋮ & ⋮ \\ ⋮ & ⋮ & ⋮ & C (0) & C (1) \\ C (M - 1) & C (M - 2) & \dots & \dots & C (1) \end{matrix}]

(5)

T_{x}

is a Toeplitz matrix and a real symmetric matrix. The main diagonal elements in the matrix

T_{x}

are the variances of the wave height sequence x.

C (j)

is the autocovariance of the wave height sequence x with a delay of j, where 0 ≤ j ≤ M − 1.

C (j)

is calculated using the Yule–Walker estimation method [39], as shown in formula (6).

C (j) = \frac{1}{N - j} \sum_{i = 0}^{N - j} x_{i} x_{i + j} j = 0, 1, 2, \dots, M - 1

(6)

T_{x} E^{k} = λ_{k} E^{k}, k = 1, 2, 3, \dots, M

(7)

The eigenvalues and eigenvectors of the matrix

T_{x}

are obtained by Equation (7). The eigenvector

E^{k}

of

T_{x}

represents a time series composed of M components, reflecting the temporal evolution pattern of the wave height sequence x. The projection of the state vector

X_{i}

onto the M-th eigenvector is then defined as shown in Equation (8).

a_{i}^{k} = X_{i} E^{k} = \sum_{j = 1}^{M} x_{i + j} E_{j}^{k} 0 \leq i \leq N - M

(8)

The components of x can be reconstructed from a subset of the eigenvectors and time coefficients, as shown in Equation (9).

x_{i}^{k} = \{\begin{matrix} \frac{1}{M} \sum_{j = 1}^{M} a_{i - j}^{k} E_{j}^{k} M \leq i \leq N - M + 1 \\ \frac{1}{i} \sum_{j = 1}^{i} a_{i - j}^{k} E_{j}^{k} 1 \leq i \leq M - 1 \\ \frac{1}{N - i + 1} \sum_{j = i - N + M}^{M} a_{i - j}^{k} E_{j}^{k} N - M + 2 \leq i \leq N \end{matrix}

(9)

The sum of the reconstructed components

x^{k}

equals the wave height sequence x. The eigenvalues

λ_{k}

of the matrix

T_{x}

are arranged in descending order, i.e.,

λ_{1} \geq λ_{2} \geq λ_{3} \geq \dots \dots \geq λ_{M} \geq 0

. The sum of the reconstructed components

x^{k}

corresponding to the first p eigenvalues yields a reconstructed sequence that can fully reflect the overall characteristics of the original wave height sequence while removing noise and random errors from the original sequence, as shown in Equation (10).

x \approx y = \sum_{k}^{p} x^{k} 1 \leq p \leq M

(10)

2.2.2. LSTM Principle

LSTM is a special type of recurrent neural network designed to address the vanishing- or exploding-gradient problems encountered by standard RNNs during the training of long sequences [40]. LSTM controls the flow of information by introducing three gating mechanisms—the input, forget, and output gates—thereby retaining long-term dependencies in time series. Specifically, the input gate determines the input of new information, the forget gate decides which information should be discarded, and the output gate controls the output of information at the current time step. LSTM effectively captures and memorizes information over long time spans, and it is widely applied in tasks such as speech recognition, natural language processing, and time-series prediction, particularly excelling in scenarios that require consideration of long-term dependencies. The specific structure is shown in Figure 3.

The calculation formula for the forget gate is shown below.

f_{t} = δ [W_{f} (h_{t - 1}, x_{t}) + b_{f}]

(11)

where

W_{f}

is the weight matrix of the forget gate;

b_{f}

is the bias term;

δ

is the sigmoid activation function;

x_{i}

is the current input; and

h_{t - 1}

is the hidden output at the previous moment.

The input gate calculation formulas are as follows:

f_{t} = δ [W_{f} (h_{t - 1}, x_{t}) + b_{f}]

(12)

\tilde{C_{t}} = \tanh [W_{c} (h_{t - 1}, x_{t}) + b_{c}]

(13)

C_{t} = f_{t} C_{t - 1} + i_{t} a_{t}

(14)

where

W_{i}

is the weight matrix of the sigmoid layer;

b_{i}

is the bias term of the input gate sigmoid layer;

W_{c}

is the weight matrix of the input gate tanh layer;

b_{c}

is the bias term of the output gate tanh layer; and

C_{t}

is the new cell shape.

The output gate calculation formula is as follows:

o_{t} = σ [W_{o} (h_{t - 1}, x_{t}) + b_{o}]

(15)

where

W_{o}

is the weight matrix of the output gate, and

b_{o}

is the bias term of the output gate.

The final output of the LSTM is shown below.

h_{t} = o_{t} t a n h C_{t}

(16)

2.2.3. Design and Implementation of SWH Prediction Model

Due to the non-stationarity of the significant wave height sequence, directly using the raw wave data without preprocessing for prediction would make it difficult for the model to capture key patterns, severely affecting its prediction performance. To address this, this study introduces Singular Spectrum Analysis (SSA) for data preprocessing. SSA is well-suited to identifying multi-time-scale variation features in data, and can effectively separate the main trend components from periodic fluctuation patterns. Through this feature extraction mechanism, the LSTM model can focus on learning the more predictive wave features, significantly improving its prediction accuracy.

This study first constructs an SSA-LSTM model for preliminary significant wave height prediction. It uses 6 h significant wave height data to predict the following 1 h significant wave height, 18 h data for 3 h prediction, 40-h data for 6 h prediction, and 80 h data for 12 h prediction, yielding preliminary prediction results. For the 12 h significant wave height prediction, to further uncover the unstable features in the original wave height data and correct any patterns and information not captured in the initial prediction, a second, two-layer LSTM network is used to model the residual sequence generated. The residual correction module is applied to predict the test set residuals, constructing an SSA-LSTM-R model for the 12 h significant wave height prediction, further improving the model’s prediction accuracy and reducing overall errors.

The architectures of the SSA-LSTM and SSA-LSTM-R models are shown in Figure 4 and Figure 5, respectively.

The processes in Figure 4 and Figure 5 are primarily composed of the following stages:

(1) Data processing stage: in the data processing stage, this study uses Spearman rank correlation analysis to select two key feature parameters, WSPD and historical significant wave height. The significant wave height sequence is then decomposed into modes using SSA, extracting 10 key modal components (

S I M F_{1}

,

S I M F_{2}

,

S I M F_{3}

, …,

S I M F_{10}

). Through feature fusion, traditional statistical features are combined with time–frequency domain features, ultimately constructing a hybrid input vector consisting of 12 dimensions, which provides multidimensional spatiotemporal feature representations for the subsequent LSTM model.

(2) Data input layer: this layer includes two key parameters: the time window length N and the input feature dimension. The time window N is dynamically adjusted based on the prediction duration, corresponding to the needs of different prediction scenarios (1, 3, 6, and 12 h predictions). The input feature dimension is fixed at 12, integrating the multi-source hybrid features extracted by the data processing module.

(3) Preliminary prediction model architecture: this stage uses a single-layer LSTM network structure, where the LSTM layer is configured with 128 memory units to capture temporal sequence features. After feature extraction, dimensional mapping is performed through a fully connected layer, designed as a linear transformation from a 128-dimensional input to a 1-dimensional output. Ultimately, the high-dimensional feature vector is compressed into a single significant wave height prediction value.

(4) Residual correction module: for 12 h wave height prediction, to further improve prediction accuracy, this study introduces a residual correction module based on the preliminary prediction. Residual modeling is achieved through the construction of a dual-layer LSTM network. The module employs a cascaded architecture design: the first layer is configured with an LSTM network containing 100 memory units for coarse-grained feature extraction, while the second layer, using an LSTM network with 50 units, models fine-grained features. The residual prediction value is then generated through a fully connected layer, which transforms a 50-dimensional input into a 1-dimensional output. This residual value is used to compensate for and correct the preliminary prediction results through linear addition, effectively capturing nonlinear error distribution features that are difficult for traditional single models to represent. Finally, the preliminary prediction and residual correction results are combined to form the final prediction output, which possesses fine-grained error correction capabilities.

2.2.4. Definitions and Background Information of Comparative Models

This section provides a brief introduction to the experimental comparison models, including Bidirectional Long Short-Term Memory Network (BiLSTM), Convolutional Neural Networ–-Long Short-Term Memory Network (CNN-LSTM), Gated Recurrent Unit (GRU), Bidirectional Gated Recurrent Unit (BiGRU), Convolutional Neural Network–Gated Recurrent Unit (CNN-GRU), and Temporal Convolutional Network (TCN).

BiLSTM is an extension of LSTM that processes data in both forward and backward directions, capturing bidirectional temporal dependencies within the input data. In the context of effective wave height prediction, BiLSTM can more comprehensively capture the dynamic features of the time series, particularly aiding in the prediction of future wave height trends. However, due to its bidirectional structure, BiLSTM incurs increased computational complexity and training time, resulting in higher computational cost and resource consumption.

CNN-LSTM combines CNN with LSTM, using CNN to extract spatial features and LSTM to model temporal dependencies. In effective wave height prediction, CNN-LSTM can simultaneously handle both spatial and temporal dependencies, making it suitable for tasks involving complex spatial patterns and time-series data. While CNN-LSTM excels in spatial feature extraction, it requires more computational resources and longer training times compared to LSTM when dealing with pure time-series data.

GRU is a simplified version of LSTM with fewer parameters and greater computational efficiency. By reducing the number of gating mechanisms, GRU demonstrates similar performance to LSTM in time-series modeling. In effective wave height prediction, GRU offers advantages in training speed and real-time performance, due to its lower computational demands, making it suitable for applications requiring fast responses. However, GRU’s accuracy in capturing long-term dependencies is slightly inferior to that of LSTM.

BiGRU is an extension of GRU that adopts a bidirectional structure, enabling it to capture both forward and backward temporal dependencies. Similar to BiLSTM, BiGRU can leverage both past and future information for predictions in effective wave height forecasting. However, due to its more streamlined architecture, BiGRU incurs lower computational costs and faster training times compared to BiLSTM. Despite these advantages, BiGRU’s modeling capability is somewhat limited when dealing with complex wave height patterns.

CNN-GRU integrates CNN with GRU, where CNN extracts spatial features and GRU models temporal dependencies. In effective wave height prediction, CNN-GRU is capable of processing data in shorter time frames and performing real-time predictions. Although it performs well in spatial feature extraction and computational efficiency, CNN-GRU does not match CNN-LSTM in terms of handling complex temporal dependencies.

TCN is a convolutional network specifically designed for sequential data, utilizing causal convolutions to ensure that the model only relies on past and present time steps for predictions. TCN captures long-range temporal dependencies effectively through dilated convolutions and skip connections, avoiding the vanishing gradient problem commonly encountered in traditional recurrent neural networks for long sequences. Although TCN handles long-term dependencies efficiently and offers faster training speeds, its accuracy in modeling complex temporal patterns is not as high as that of LSTM or BiLSTM.

2.2.5. Evaluation Indicators

This study constructs a multidimensional evaluation system using MSE, RMSE, MAE, MAPE, and R². MSE amplifies error differences through squaring, placing emphasis on larger prediction deviations. RMSE, derived by taking the square root of MSE, retains the same unit as the original data, allowing it to directly correspond to the error magnitude on the actual measurement scale. MAE calculates the absolute values, eliminating the effect of positive and negative errors canceling each other out, providing a more balanced reflection of the overall prediction bias. MAPE presents the relative error as a percentage, enabling cross-comparison across data with different scales. R² quantifies the model’s ability to explain data variability from a statistical perspective, with a value range of [0–1], intuitively demonstrating the model’s overall explanatory power over the data [41,42]. This combination balances absolute errors, relative errors, dimensional consistency, and statistical explanatory power, focusing on both error distribution characteristics and overall prediction accuracy, thus forming a comprehensive performance evaluation framework.

M S E = \frac{1}{N} \sum_{i = 1}^{N} {(y_{i} - {\hat{y}}_{i})}^{2}

(17)

R M S E = \sqrt{\frac{1}{N} \sum_{i = 1}^{N} {(y_{i} - {\hat{y}}_{i})}^{2}}

(18)

M A E = \frac{1}{N} \sum_{i = 1}^{N} |y_{i} - {\hat{y}}_{i}|

(19)

M A P E = \frac{1}{N} \sum_{i = 1}^{N} |\frac{y_{i} - {\hat{y}}_{i}}{y_{i}}| \times 100 %

(20)

R^{2} = 1 - \frac{\sum_{i = 1}^{N} {(y_{i} - {\hat{y}}_{i})}^{2}}{\sum_{i = 1}^{N} {(y_{i} - \bar{y})}^{2}}

(21)

where N represents the number of significant wave height data points;

y_{i}

is the actual value of significant wave height;

{\hat{y}}_{i}

is the predicted value of significant wave height; and

\bar{y}

is the mean of the actual significant wave height values. The smaller the values of MSE, RMSE, MAE, and MAPE, and the closer R² is to 1, the better the model’s prediction performance.

3. Results

3.1. Experimental Environment and Parameter Settings

The experimental environment of this study is the Windows 11 operating system, with an NVIDIA T1000 (8 GB) GPU and 64 GB of memory. The experiment uses the TensorFlow deep learning framework, with the model in this study tuned using the hyperparameter settings shown in Table 2.

3.2. Comparative Analysis of SWH Prediction Effects

3.2.1. SSA Decomposition

When utilizing SSA for data decomposition, selecting a correct number of modes is critical. Based on comprehensive evaluations encompassing multiple experimental outcomes and performance metrics, this study ultimately adopts 10 modes as the optimal count for SSA decomposition. Through systematic exploration of varying mode quantities, the optimal configuration was identified by assessing the model’s predictive error, goodness-of-fit statistics, and training efficiency. Figure 6 illustrates that an insufficient number of modes leads to inadequate capture of both low-frequency trend components and mid-to-high-frequency oscillatory patterns, resulting in an incomplete characterization of the data’s dynamic behavior and elevating the risk of underfitting. Conversely, as depicted in Figure 7, exceeding 10 modes compromises the effective representation of dominant fluctuation patterns, where redundant modal components may be erroneously interpreted as valid signals, thereby precipitating overfitting. The selection of 10 modes strikes a deliberate balance between preserving characteristic integrity and mitigating noise contamination. As evidenced in Figure 8, this configuration successfully retains critical time-frequency features—including low-frequency trends and salient oscillations—while suppressing noise interference through exclusion of higher-order noisy modes. The 10-mode SSA decomposition maintains the completeness of feature information while circumventing the curse of dimensionality, providing the LSTM with high-quality features that balance information density with computational efficiency.

3.2.2. Comparison of Model Prediction Effects

To validate the prediction performance of the SSA-LSTM model, it is compared with seven other significant wave height prediction models. Considering the model’s deployability, this study also evaluates the model’s performance, based on the number of parameters and training time, in addition to prediction accuracy. A comparison of the performance of SSA-LSTM with that of the other seven models on the test set is presented in Table 3, and the experimental data are taken from multiple trials, to ensure optimal results.

As shown in Table 3, for prediction durations of 1 h, 3 h, 6 h, and 12 h, the LSTM model, except for SSA-LSTM, achieves the best prediction metrics in terms of MSE, RMSE, MAE, MAPE, and R². In terms of model parameter count, LSTM ranks after BiGRU and GRU, but performs better than the other models. Regarding model training time, LSTM is only surpassed by CNN-LSTM, performing better than the other models. These results indicate that in the field of significant wave height prediction, LSTM not only offers high prediction accuracy, but also has a relatively low model parameter count and short training time, demonstrating well-balanced overall performance.

Based on the LSTM model’s predictions, the modes from SSA decomposition were added as input features to the LSTM model. Table 3 shows that the SSA-decomposed modes significantly improve the prediction accuracy of the LSTM model. For instance, in the 1 h prediction case, the model’s MSE, RMSE, MAE, and MAPE decreased by 99.82%, 95.94%, 95.80%, and 95.03%, respectively, while R² increased by 2.11%. Meanwhile, the number of model parameters and training time only increased by 7.62% and 6.80%, respectively. These results indicate that SSA decomposition of the significant wave height sequence can significantly enhance the model’s prediction performance, primarily because SSA decomposition removes noise, extracts multi-scale features, and improves the LSTM model’s ability to capture long-term dependencies. In the 3, 6- and 12 h predictions, the SSA-LSTM model also shows significant improvements in prediction performance over the LSTM model.

Significant wave height prediction curves for different models at various prediction durations are shown in Figure 9, Figure 10, Figure 11 and Figure 12.

In the 1 h significant wave height prediction graph, the prediction curves of all models generally exhibit similar trends. In the zoomed-in view, the blue SSA-LSTM prediction curve closely matches the black true-value curve, indicating that this model provides the best prediction performance. The prediction curves of the other models show slightly worse performance, but the overall difference is not significant.

In the 3 h significant wave height prediction graph, the prediction curves of all models exhibit greater fluctuation compared to the 1 h prediction, but the overall trend remains stable. In the zoomed-in view, the blue SSA-LSTM prediction curve closely matches the black true-value curve, indicating that this model still provides the best prediction performance. The prediction curves of the other models show increased fluctuation and perform worse than SSA-LSTM.

In the 6 h significant wave height prediction chart, the fluctuation of the prediction curves for each model is larger, compared to the 1 h prediction. In the zoomed-in view, the SSA-LSTM curve exhibits the best performance, with the smallest error. The prediction curves of the other models show larger fluctuations, and their prediction performance significantly decreases.

In the 12 h significant wave height prediction graph, all curves show a significant increase in fluctuation. In the zoomed-in view, only the SSA-LSTM curve remains closely synchronized with the true-value curve, while the prediction performance of the other models significantly declines, demonstrating poor results. Nevertheless, the prediction accuracy of SSA-LSTM no longer meets the practical application requirements and needs to be further improved.

To overcome the prediction accuracy bottleneck of the SSA-LSTM model for 12 h predictions, this study introduces a residual correction module into the SSA-LSTM model architecture, constructing an SSA-LSTM-R hybrid prediction model. During the training process for the SSA-LSTM model using the training and validation sets, residuals are generated between the model’s predicted values and the true values. These residuals are treated as a new dataset, which is divided into new training, validation, and test sets with the same proportions as the original sets, to train the residual correction module. After training, the residual module can correct the output of the SSA-LSTM model, further improving the significant wave height prediction accuracy. The effects of the residual correction are illustrated in Table 4.

As shown in Table 4, for the 12 h significant wave height prediction, compared to SSA-LSTM, SSA-LSTM-R achieved reductions of 90.74%, 68.15%, 63.24%, and 62.54% in MSE, RMSE, MAE, and MAPE, respectively, while R² increased by 9.43%, demonstrating significant improvement.

A comparison chart for SSA-LSTM-R 12 h significant wave height prediction is shown in Figure 13.

To highlight the superiority of the SSA-LSTM and SSA-LSTM-R models proposed in this study, in terms of prediction accuracy and model parameter count, a comparison is made with the LSTM model based on VMD. VMD is an efficient algorithm that uses variational optimization to decompose a signal into multiple frequency-independent modes, with good adaptability and precise frequency separation capabilities. It performs exceptionally well when dealing with complex and non-stationary signals, and is an important tool in the field of signal processing. The number of decomposition modes for the VMD algorithm is set to 10, consistent with the SSA decomposition method. A comparison of the model prediction results is shown in Table 5.

As shown in Table 5, in terms of prediction accuracy, the VMD-LSTM model was outperformed by both the SSA-LSTM model for 1 h, 3 h, and 6 h predictions and the SSA-LSTM-R model for 12 h predictions. In terms of model parameters and training time, the proposed models also have an advantage. For instance, in the 1-h prediction case, the SSA-LSTM model has 72,321 parameters, which is only 10.76% of the VMD-LSTM’s parameters, and a training time of 171.17 s, which is just 9.51% of that of VMD-LSTM, demonstrating a significant advantage. The main reason for these differences is that VMD-LSTM needs to perform predictions separately for each mode decomposed by VMD and then integrate all the predicted values, requiring substantial resources during the training process.

The above experimental results indicate that the proposed SSA-based LSTM model with residual correction achieves high prediction accuracy while having a low number of model parameters, a short training time, and low resource consumption. It demonstrates excellent overall performance, making it suitable for deployment on energy-limited offshore platforms such as buoys.

4. Discussion

To address the issue of balancing high prediction accuracy with a low model parameter count in current significant wave height prediction models, this study proposes an innovative SSA-based LSTM model with residual correction for significant wave height prediction. Unlike previous studies, this study uses the decomposed modes as input features for the LSTM model, achieving a good balance between model parameters and prediction accuracy. To predict 12 h significant wave height, a residual correction module is added after the initial LSTM prediction, to further improve accuracy. The experimental results show that the proposed model demonstrates superior prediction performance on the constructed significant wave height test set.

In terms of prediction accuracy, early studies using sequential learning algorithms such as MRAN and GAP-RBF achieved wave height prediction with a small number of neurons, but the highest accuracy reached only 93.63% [43]. In [18], the researchers used LSTM for 1 h significant wave height prediction, increasing the model’s accuracy to 98.04%. Furthermore, the CNN-LSTM model increased prediction accuracy to 99.52%, although the model’s parameter scale was significantly larger than that of the proposed model [44]. Compared to the aforementioned models, the proposed model not only improves prediction accuracy, but also effectively controls the model parameter count. Notably, when compared to the VMD-based LSTM model, the proposed model demonstrates superior prediction accuracy across various prediction durations, including 1, 3, 6, and 12 h.

In terms of model parameters, the LSTM model with the addition of SSA only increases the parameter count by about 7%, which is just 10% of the VMD-LSTM model’s parameters. The training time for the model using the same dataset is also approximately 10% of that for the VMD-LSTM model, making it suitable for deployment on energy-limited platforms. Considering factors such as prediction accuracy, model parameters, and training time, the proposed model demonstrates the best performance, achieving accurate, fast, and efficient significant wave height prediction.

However, there is still room for improvement in the proposed model. First, for long-term significant wave height predictions, such as those conducted over 24 h, 48 h, and 72 h, the prediction accuracy of the current model does not meet the required standards. Future work will focus on improving the model’s prediction performance for long-term forecasts by refining the data and model structure. Second, due to significant differences in wave motion models across different marine areas, the model’s generalization ability is somewhat limited when applied to data from other marine areas, and its prediction accuracy is lower compared to when it is applied to data from the marine area used for training. Future studies will involve training the model with data from multiple marine areas, to enhance its generalization capability. Overall, the proposed model exhibits a significantly improved prediction accuracy with only a slight increase in model parameters, representing a basis for developing new ideas and methods for deploying high-accuracy significant wave height prediction models. Deploying this model on offshore platforms such as buoys will facilitate long-term site-specific predictions of significant wave height in marine areas, providing strong support for marine disaster warning, shipping safety, and climate change research.

Author Contributions

Conceptualization, H.L., Z.W. and C.N.; methodology, H.L.,L.Z. and C.N.; software, H.L.; validation, H.L.; formal analysis, C.N. and W.S.; investigation, H.L. and L.Z.; resources, C.N., W.S. and L.Z.; data curation, S.N., C.N. and C.L.; writing—original draft, H.L., C.N. and S.N.; writing—review and editing, Z.W., C.N. and W.S.; visualization, H.L. and W.S.; supervision, C.N., C.L. and Z.W., project administration, C.N.; funding acquisition, C.N. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Key Research and Development Program of China [2022YFC3104301].

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available in from the National Data Buoy Center at https://www.ndbc.noaa.gov/station_history.php?station=44013 (accessed on 10 March 2025), reference number [Table 1]. These data were derived from the following resources in the public domain: the National Oceanic and Atmospheric Administration (https://www.noaa.gov/ [accessed on 10 March 2025]) and the National Data Buoy Center (https://www.ndbc.noaa.gov/station_history.php?station=44013 [accessed on 10 March 2025]) [Table 1].

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

SWH	Significant Wave Height
SSA	Singular Spectrum Analysis
LSTM	Long Short-Term Memory
MSE	Mean Squared Error
RMSE	Root-Mean-Squared Error
MAE	Mean Absolute Error
MAPE	Mean Absolute Percentage Error
R²	Coefficient of Determination
VMD	Variational Mode Decomposition
ANN	Artificial Neural Network
SVM	Support Vector Machine
ANFIS	Adaptive Neuro-Fuzzy Inference System
BN	Bayesian Network
MLP	Multilayer Perceptron
SVR	Support Vector Regression
RNN	Recurrent Neural Network
GRU	Gated Recurrent Unit
BPNN	Backpropagation Neural Network
ELM	Extreme Learning Machine
ResNet	Residual Network
PCA	Principal Component Analysis
BiLSTM	Bidirectional Long Short-Term Memory
BiGRU	Bidirectional Gated Recurrent Unit
TCN	Temporal Convolutional Network
RF	Random Forest
CNN-LSTM	Convolutional Neural Network–Long Short-Term Memory
CNN-GRU	Convolutional Neural Network–Gated Recurrent Unit

References

Young, I.R.; Zieger, S.; Babanin, A.V. Global trends in wind speed and wave height. Science 2011, 332, 451–455. [Google Scholar] [CrossRef] [PubMed]
Temarel, P.; Bai, W.; Bruns, A.; Derbanne, Q.; Dessi, D.; Dhavalikar, S.; Fonseca, N.; Fukasawa, T.; Gu, X.; Nestegård, A.; et al. Prediction of wave-induced loads on ships: Progress and challenges. Ocean Eng. 2016, 119, 274–308. [Google Scholar] [CrossRef]
Pérez-Collazo, C.; Greaves, D.; Iglesias, G. A review of combined wave and offshore wind energy. Renew. Sustain. Energy Rev. 2015, 42, 141–153. [Google Scholar] [CrossRef]
Bonar, P.A.J.; Bryden, I.G.; Borthwick, A.G.L. Social and ecological impacts of marine energy development. Renew. Sustain. Energy Rev. 2015, 47, 486–495. [Google Scholar] [CrossRef]
López, I.; Andreu, J.; Ceballos, S.; de Alegría, I.M.; Kortabarria, I. Review of wave energy technologies and the necessary power-equipment. Renew. Sustain. Energy Rev. 2013, 27, 413–434. [Google Scholar] [CrossRef]
Ochi, M.K.; Hubble, E.N. Six-parameter wave spectra. In Coastal Engineering 1976; Coastal Engineering Press: Tokyo, Japan, 1976; pp. 301–328. [Google Scholar]
Group, T.W. The WAM model—A third generation ocean wave prediction model. J. Phys. Oceanogr. 1988, 18, 1775–1810. [Google Scholar] [CrossRef]
Tolman, H.L. User Manual and System Documentation of WAVEWATCH III ^TM Version 3.14; Technical Note; MMAB Contribution No. 276; U. S. Department of Commerce; National Oceanic and Atmospheric Administration; National Weather Service; National Centers for Environmental Prediction: Camp Springs, MD, USA, 2009.
Booij, N.; Ris, R.C.; Holthuijsen, L.H. A third-generation wave model for coastal regions: 1. Model description and validation. J. Geophys. Res. Ocean. 1999, 104, 7649–7666. [Google Scholar] [CrossRef]
Etemad-Shahidi, A.; Mahjoobi, J. Comparison between M5′ model tree and neural networks for prediction of significant wave height in Lake Superior. Ocean Eng. 2009, 36, 1175–1181. [Google Scholar] [CrossRef]
Mahjoobi, J.; Mosabbeb, E.A. Prediction of significant wave height using regressive support vector machines. Ocean Eng. 2009, 36, 339–347. [Google Scholar] [CrossRef]
Malekmohamadi, I.; Bazargan-Lari, M.R.; Kerachian, R.; Nikoo, M.R.; Fallahnia, M. Evaluating the efficacy of SVMs, BNs, ANNs and ANFIS in wave height prediction. Ocean Eng. 2011, 38, 487–497. [Google Scholar] [CrossRef]
Feng, X.; Ma, G.; Su, S.-F.; Huang, C.; Boswell, M.K.; Xue, P. A multi-layer perceptron approach for accelerated wave forecasting in Lake Michigan. Ocean Eng. 2020, 211, 107526. [Google Scholar] [CrossRef]
Chen, S.T.; Wang, Y.W. Improving coastal ocean wave height forecasting during typhoons by using local meteorological and neighboring wave data in support vector regression models. J. Mar. Sci. Eng. 2020, 8, 149. [Google Scholar] [CrossRef]
Caruana, R.; Niculescu-Mizil, A. An empirical comparison of supervised learning algorithms. In Proceedings of the 23rd International Conference on Machine Learning, Pittsburgh, PA, USA, 25–29 June 2006; pp. 161–168. [Google Scholar]
Sadeghifar, T.; Motlagh, M.N.; Azad, M.T.; Mahdizadeh, M.M. Coastal wave height prediction using Recurrent Neural Networks (RNNs) in the south Caspian Sea. Mar. Geod. 2017, 40, 454–465. [Google Scholar] [CrossRef]
Yu, Y.; Si, X.; Hu, C.; Zhang, J. A review of recurrent neural networks: LSTM cells and network architectures. Neural Comput. 2019, 31, 1235–1270. [Google Scholar] [CrossRef]
Fan, S.; Xiao, N.; Dong, S. A novel model to predict significant wave height based on long short-term memory network. Ocean Eng. 2020, 205, 107298. [Google Scholar] [CrossRef]
Jörges, C.; Berkenbrink, C.; Stumpe, B. Prediction and reconstruction of ocean wave heights based on bathymetric data using LSTM neural networks. Ocean Eng. 2021, 232, 109046. [Google Scholar] [CrossRef]
Minuzzi, F.C.; Farina, L. A deep learning approach to predict significant wave height using long short-term memory. Ocean Model. 2023, 181, 102151. [Google Scholar] [CrossRef]
Gao, S.; Huang, J.; Li, Y.; Liu, G.; Bi, F.; Bai, Z. A forecasting model for wave heights based on a long short-term memory neural network. Acta Oceanol. Sin. 2021, 40, 62–69. [Google Scholar] [CrossRef]
Meng, Z.-F.; Chen, Z.; Khoo, B.C.; Zhang, A.-M. Long-time prediction of sea wave trains by LSTM machine learning method. Ocean Eng. 2022, 262, 112213. [Google Scholar] [CrossRef]
Fu, Y.; Ying, F.; Huang, L.; Liu, Y. Multi-step-ahead significant wave height prediction using a hybrid model based on an innovative two-layer decomposition framework and LSTM. Renew. Energy 2023, 203, 455–472. [Google Scholar] [CrossRef]
Ni, C.; Ma, X. An integrated long-short term memory algorithm for predicting polar westerlies wave height. Ocean Eng. 2020, 215, 107715. [Google Scholar] [CrossRef]
VS, F.E. Forecasting significant wave height using RNN-LSTM models. In Proceedings of the 2020 4th International Conference on Intelligent Computing and Control Systems (ICICCS), Madurai, India, 13–15 May 2020; pp. 1141–1146. [Google Scholar]
Dragomiretskiy, K.; Zosso, D. Variational mode decomposition. IEEE Trans. Signal Process. 2013, 62, 531–544. [Google Scholar] [CrossRef]
Rilling, G.; Flandrin, P.; Goncalves, P. On empirical mode decomposition and its algorithms. In Proceedings of the IEEE-EURASIP Workshop on Nonlinear Signal and Image Processing NSIP-03, Grado, Italy, 8–11 June 2003. [Google Scholar]
Hashim, R.; Roy, C.; Motamedi, S.; Shamshirband, S.; Petković, D. Selection of climatic parameters affecting wave height prediction using an enhanced Takagi-Sugeno-based fuzzy methodology. Renew. Sustain. Energy Rev. 2016, 60, 246–257. [Google Scholar] [CrossRef]
Pang, J.; Dong, S. A novel multivariable hybrid model to improve short and long-term significant wave height prediction. Appl. Energy 2023, 351, 121813. [Google Scholar] [CrossRef]
Sabique, L.; Annapurnaiah, K.; Nair, T.B.; Srinivas, K. Contribution of Southern Indian Ocean swells on the wave heights in the Northern Indian Ocean—A modeling study. Ocean Eng. 2012, 43, 113–120. [Google Scholar] [CrossRef]
Schober, P.; Boer, C.; Schwarte, L.A. Correlation coefficients: Appropriate use and interpretation. Anesth. Analg. 2018, 126, 1763–1768. [Google Scholar] [CrossRef] [PubMed]
Xu, W.; Hou, Y.; Hung, Y.S.; Zou, Y.X. A comparative analysis of Spearman’s rho and Kendall’s tau in normal and contaminated normal models. Signal Process. 2013, 93, 261–276. [Google Scholar] [CrossRef]
De Winter, J.C.F.; Gosling, S.D.; Potter, J. Comparing the Pearson and Spearman correlation coefficients across distributions and sample sizes: A tutorial using simulations and empirical data. Psychol. Methods 2016, 21, 273. [Google Scholar] [CrossRef]
Ramsey, P.H. Critical values for Spearman’s rank order correlation. J. Educ. Stat. 1989, 14, 245–253. [Google Scholar] [CrossRef]
Xu, J.; Mu, H.; Wang, Y.; Huang, F. Feature genes selection using supervised locally linear embedding and correlation coefficient for microarray classification. Comput. Math. Methods Med. 2018, 2018, 5490513. [Google Scholar] [CrossRef]
Holmes, J.D.; Ginger, J.D. The gust wind speed duration in AS/NZS 1170.2. Aust. J. Struct. Eng. 2012, 13, 207–217. [Google Scholar] [CrossRef]
Karhunen, K. Über lineare Methoden in der Wahrscheinlichkeitsrechnung. Ann. Acad. Sci. Fenn. 1947, 37, 1. [Google Scholar]
Hassani, H. Singular spectrum analysis: Methodology and comparison. J. Data Sci. 2007, 05, 396. [Google Scholar] [CrossRef]
Yule, G.U. On the theory of correlation. J. R. Stat. Soc. 1897, 60, 812–854. [Google Scholar] [CrossRef]
Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]
Botchkarev, A. Performance metrics (error measures) in machine learning regression, forecasting and prognostics: Properties and typology. arXiv 2018, arXiv:1809.03006. [Google Scholar] [CrossRef]
Chicco, D.; Warrens, M.J.; Jurman, G. The coefficient of determination R-squared is more informative than SMAPE, MAE, MAPE, MSE and RMSE in regression analysis evaluation. PeerJ Comput. Sci. 2021, 7, e623. [Google Scholar] [CrossRef]
Savitha, R.; Al Mamun, A. Regional ocean wave height prediction using sequential learning neural networks. Ocean Eng. 2017, 129, 605–612. [Google Scholar] [CrossRef]
Guan, X. Wave height prediction based on CNN-LSTM. In Proceedings of the 2020 2nd International Conference on Machine Learning, Big Data and Business Intelligence (MLBDBI), Taiyuan, China, 23–25 October 2020; IEEE: New York, NY, USA, 2020; pp. 10–17. [Google Scholar]

Figure 1. Wind rose diagram.

Figure 2. Results of Spearman’s rank correlation analysis.

Figure 3. LSTM structure.

Figure 4. SSA-LSTM model for significant wave height prediction.

Figure 5. SSA-LSTM-R model for significant wave height prediction.

Figure 6. Five-mode data decomposition.

Figure 7. Fifteen-mode data decomposition.

Figure 8. Ten-mode data decomposition.

Figure 9. One-hour significant wave height prediction and a zoomed-in view of the results.

Figure 10. Three-hour significant wave height prediction and a zoomed-in view of the results.

Figure 11. Six-hour significant wave height prediction and a zoomed-in view of the results.

Figure 12. Twelve-hour significant wave height prediction and a zoomed-in view of the results.

Figure 13. SSA-LSTM-R 12-h significant wave height prediction and a zoomed-in view of the results.

Table 1. Details of the dataset.

Indicator	Unit	Mean	Maximum	Minimum	Standard Deviation
WDIR	°	189.60	360.00	1.00	98.99
WSPD	m/s	6.32	22.80	0	3.36
GST	m/s	7.68	28.70	0.10	4.15
SWH	m	0.94	8.16	0.25	0.74
DPD	s	7.40	19.05	2.25	3.03
APD	s	4.83	11.73	2.66	1.32
MWD	°	126.01	360.00	1.00	85.83
PRES	hPa	1015.61	1043.90	972.50	8.65
ATMP	°C	10.26	29.30	−19.50	7.97
WTMP	°C	11.73	25.50	2.40	5.68

Table 2. Model hyperparameter configuration.

Epochs	Batch Size	Learning Rate	Optimizer	Loss	Units	Layers
100	36	0.001	Adam	MSE	128	3

Table 3. Comparison of prediction performance across models.

Model	Prediction Duration /h	MSE	RMSE	MAE	MAPE /%	R² /%	Parameter Count	T—Time /s
SSA-LSTM	1	0.00001	0.00309	0.00211	0.34991	99.99733	72,321	171.17
LSTM		0.00561	0.07607	0.05022	7.04013	97.88904	67,201	160.27
BiLSTM		0.00697	0.0842	0.05455	7.36843	97.60021	134,401	243.63
CNN-LSTM		0.01612	0.12594	0.08667	11.79888	94.37273	99,265	159.34
GRU		0.00638	0.07896	0.05172	7.13618	97.78301	50,817	205.73
BiGRU		0.00637	0.08243	0.05331	7.10264	97.57611	16,301	298.83
CNN-GRU		0.01466	0.12037	0.08103	10.95153	94.92639	74,945	399.11
TCN		0.00672	0.08324	0.05726	8.55907	97.51558	136,577	1062.66
SSA-LSTM	3	0.00007	0.00638	0.00472	0.77113	99.98361	72,579	419.36
LSTM		0.02257	0.14912	0.09893	13.92864	91.92131	67,459	388.31
BiLSTM		0.02637	0.16249	0.10502	15.02073	90.71068	134,915	701.28
CNN-LSTM		0.03901	0.19822	0.13309	18.51217	85.77199	99,523	333.25
GRU		0.02422	0.15305	0.10106	13.74597	91.76214	51,075	528.44
BiGRU		0.02422	0.15735	0.10276	14.82183	91.07953	16,503	834.77
CNN-GRU		0.04337	0.20778	0.13497	18.91261	84.90501	75,203	566.32
TCN		0.02499	0.15886	0.10588	15.31471	90.97502	136,707	1672.13
SSA-LSTM	6	0.00029	0.01373	0.00997	1.22876	99.93202	72,966	918.71
LSTM		0.06702	0.25889	0.16029	22.14553	75.76151	67,846	793.07
BiLSTM		0.07781	0.27802	0.17627	23.91034	72.61257	135,686	1522.28
CNN-LSTM		0.09638	0.30982	0.20441	29.21059	65.44921	99,910	679.61
GRU		0.08407	0.28961	0.19223	26.91282	70.29759	51,462	1182.32
BiGRU		0.07951	0.28148	0.18059	25.48862	71.51824	16,806	1471.71
CNN-GRU		0.11804	0.34483	0.22162	31.91234	57.94445	75,590	1367.73
TCN		0.08934	0.29829	0.19547	28.48735	68.05859	136,902	1783.93
SSA-LSTM	12	0.03089	0.17434	0.11047	15.01539	89.49211	73,740	1762.43
LSTM		0.19399	0.44031	0.28235	38.88538	31.78927	68,620	1517.26
BiLSTM		0.23507	0.48571	0.30402	43.26738	18.05041	137,228	3061.17
CNN-LSTM		0.20062	0.44817	0.29819	43.55292	29.42923	100,684	1253.46
GRU		0.21351	0.46173	0.31849	46.48286	24.81872	52,236	2206.91
BiGRU		0.18188	0.4276	0.28571	41.32891	36.84573	17,412	3192.65
CNN-GRU		0.18127	0.42583	0.27601	38.50116	36.08692	76,364	1681.33
TCN		0.16293	0.40338	0.25327	34.01271	43.43838	137,292	1854.22

Table 4. Effects of residual correction.

Model	Prediction Duration /h	MSE	RMSE	MAE	MAPE /%	R² /%
SSA-LSTM	12	0.03089	0.17434	0.11047	15.01539	89.49211
SSA-LSTM-R	12	0.00286	0.05553	0.04061	5.62437	98.91838

Table 5. Comparison of prediction performance of SSA-LSTM, SSA-LSTM-R, and VMD-LSTM.

Model	Prediction Duration /h	MSE	RMSE	MAE	MAPE /%	R² /%	Parameter Count	T—Time /s
SSA-LSTM	1	0.00001	0.00309	0.00211	0.34991	99.99733	72,321	171.17
VMD-LSTM	1	0.00172	0.04034	0.02651	3.65238	99.42166	672,010	1799.08
SSA-LSTM	3	0.00007	0.00638	0.00472	0.77113	99.98361	72,579	419.36
VMD-LSTM	3	0.00259	0.04862	0.03312	4.65024	99.15418	674,590	4817.43
SSA-LSTM	6	0.00029	0.01373	0.00997	1.22876	99.93202	72,966	918.71
VMD-LSTM	6	0.00457	0.06539	0.04886	7.25499	98.49747	678,460	8561.29
SSA-LSTM-R	12	0.00286	0.05553	0.04061	5.62437	98.91838	145,191	1793.41
VMD-LSTM	12	0.00429	0.06798	0.04672	6.87207	98.36697	686,200	16,936.08

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ning, C.; Li, H.; Wang, Z.; Li, C.; Zeng, L.; Shao, W.; Nie, S. Significant Wave Height Prediction Using LSTM Augmented by Singular Spectrum Analysis and Residual Correction. J. Mar. Sci. Eng. 2025, 13, 1635. https://doi.org/10.3390/jmse13091635

AMA Style

Ning C, Li H, Wang Z, Li C, Zeng L, Shao W, Nie S. Significant Wave Height Prediction Using LSTM Augmented by Singular Spectrum Analysis and Residual Correction. Journal of Marine Science and Engineering. 2025; 13(9):1635. https://doi.org/10.3390/jmse13091635

Chicago/Turabian Style

Ning, Chunlin, Huanyong Li, Zongsheng Wang, Chao Li, Lingkun Zeng, Wenmiao Shao, and Shiqiang Nie. 2025. "Significant Wave Height Prediction Using LSTM Augmented by Singular Spectrum Analysis and Residual Correction" Journal of Marine Science and Engineering 13, no. 9: 1635. https://doi.org/10.3390/jmse13091635

APA Style

Ning, C., Li, H., Wang, Z., Li, C., Zeng, L., Shao, W., & Nie, S. (2025). Significant Wave Height Prediction Using LSTM Augmented by Singular Spectrum Analysis and Residual Correction. Journal of Marine Science and Engineering, 13(9), 1635. https://doi.org/10.3390/jmse13091635

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Significant Wave Height Prediction Using LSTM Augmented by Singular Spectrum Analysis and Residual Correction

Abstract

1. Introduction

2. Materials and Methods

2.1. Materials

2.1.1. Data Source

2.1.2. Dataset Construction and Processing

2.1.3. Feature Factor Selection

2.2. Methods

2.2.1. SSA Principle

2.2.2. LSTM Principle

2.2.3. Design and Implementation of SWH Prediction Model

2.2.4. Definitions and Background Information of Comparative Models

2.2.5. Evaluation Indicators

3. Results

3.1. Experimental Environment and Parameter Settings

3.2. Comparative Analysis of SWH Prediction Effects

3.2.1. SSA Decomposition

3.2.2. Comparison of Model Prediction Effects

4. Discussion

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI