A Transfer-Learning-Based STL–LSTM Framework for Significant Wave Height Forecasting

Zhao, Guanhui; Cheng, Yuyan; Jia, Yuanhao; Li, Shuang; Si, Jicang

doi:10.3390/jmse14020146

Open AccessArticle

A Transfer-Learning-Based STL–LSTM Framework for Significant Wave Height Forecasting

by

Guanhui Zhao

^1,2,*,

Yuyan Cheng

³,

Yuanhao Jia

⁴,

Shuang Li

⁴ and

Jicang Si

⁴

¹

College of Computer Science and Technology, Zhejiang University, Hangzhou 310058, China

²

China Ship Development and Design Center, Wuhan 430064, China

³

School of Maritime Economics and Management, Dalian Maritime University, Dalian 116026, China

⁴

Marine Engineering College, Dalian Maritime University, Dalian 116026, China

^*

Author to whom correspondence should be addressed.

J. Mar. Sci. Eng. 2026, 14(2), 146; https://doi.org/10.3390/jmse14020146

Submission received: 11 December 2025 / Revised: 31 December 2025 / Accepted: 7 January 2026 / Published: 9 January 2026

(This article belongs to the Section Ocean Engineering)

Download

Browse Figures

Versions Notes

Abstract

Significant wave height (SWH) is a key descriptor of sea state, yet providing accurate, site-specific forecasts at low computational cost remains challenging. This study proposes a transfer-learning-based framework for SWH forecasting that combines Seasonal and Trend decomposition using Loess (STL), a stacked long short-term memory (LSTM) network, and an efficient sliding-window updating scheme. First, STL is applied to decompose the SWH time series into trend, seasonal, and remainder components; the resulting sub-series are then fed into a transfer-learning architecture in which the parameters of the stacked LSTM backbone are kept fixed, and only a fully connected output layer is updated in each window. Using multi-year observations from five National Data Buoy Center (NDBC) buoys, the proposed STL-LSTM-T model is compared with a STL-LSTM configuration that is fully retrained after each STL decomposition. For example, the transfer-learning setup reduces MAE, MSE, and RMSE by up to 11.2%, 19.2%, and 14.5% at buoy 46244, respectively, while reducing the average training time per update to about one-fifth of the baseline. Parameter analyses indicate that a two-layer LSTM backbone and moderate continuous forecast step (6–12 steps) provide a good balance between predictive accuracy, error accumulation, and computational cost, making STL-LSTM-T suitable for SWH forecasting on resource-constrained platforms.

Keywords:

significant wave height; sliding-window fore-casting; LSTM; STL decomposition

1. Introduction

Significant wave height (SWH) is one of the most important parameters for characterizing sea states and plays a central role in the design, operation, and safety assessment of marine and offshore engineering systems [1]. As an integrated statistical measure of wave energy, SWH directly affects ship motion responses, offshore platform loads, coastal infrastructure overtopping risk, as well as the performance and survivability of marine energy and monitoring devices [2]. In particular, accurate short-to medium-term SWH forecasts provide critical support for operational decision-making under dynamically evolving ocean conditions, enabling operators to avoid hazardous sea states, reduce unplanned downtime, and enhance both economic efficiency and navigational safety [3]. Accordingly, the development of robust and computationally efficient SWH prediction methods has become a key research focus at the intersection of physical oceanography, marine engineering, and data-driven environmental forecasting.

Rapid SWH forecasting in operational environments using traditional physics-based models, however, remains a challenging task, even though such models can, in principle, describe the influence of external forcing on wave generation and evolution. Ocean waves arise from the nonlinear interaction of wind forcing, wave–wave energy transfer, current–wave coupling, and bathymetric modulation, which are typically represented by complex spectral or phase-resolving numerical models such as third-generation WAM model [4], the SWAN model [5] and the WAVEWATCH III model [6]. These models require high-resolution wind and current fields, fine spatial and temporal discretisation, and iterative solution of coupled partial differential equations, resulting in substantial computational costs. Consequently, they are difficult to deploy on resource-constrained edge devices installed on buoys, offshore platforms, or coastal monitoring stations. This significantly limits their applicability to real-time, site-specific SWH prediction and distributed monitoring systems that demand both frequent updates and low power consumption.

From a time-series perspective, SWH exhibits pronounced autoregressive behaviour, in which the present sea state is strongly conditioned by its recent history and concurrent meteorological forcing. This property makes SWH particularly suitable for data-driven prediction, where statistical and machine-learning models learn mappings from past observations to future values without explicitly resolving the full wave–current–atmosphere dynamics. Early studies primarily adopted linear autoregressive schemes. For example, Soares et al. [7] employed autoregressive (AR) models to predict SWH time series at multiple sites along the Portuguese coast, showing that even simple linear structures can reproduce short-term variability with reasonable accuracy. Agrawal & Deo [8] compared artificial neural network (ANN) models with classical AR, ARMA, and ARIMA schemes using observed SWH along the Indian coastline. They demonstrated that ANNs provide higher accuracy for short-lead forecasts while performing comparably to linear models at longer horizons, thereby highlighting the added value of nonlinear function approximation in wave prediction. Subsequent studies by Deo & Naidu [9], Makarynsky & Etemad-Shahidi [10] further extended these ideas to different coastal environments and learning algorithms, confirming that ANNs, support vector machines (SVMs), and tree-based models can systematically improve SWH forecasts.

With the development of deep learning, recurrent neural network (RNN) [11] architectures have become important tools for sequence modelling in ocean wave forecasting. Long short-term memory (LSTM) networks [12], originally proposed to mitigate gradient-vanishing issues in standard RNNs, have been widely applied to SWH prediction to capture long-range temporal dependencies. For instance, Fan et al. [13] proposed an LSTM-based model that ingests recent wind speed and wave height as inputs and reported mean absolute errors below 0.1 m for 1 h forecasts at ten buoy stations, demonstrating the potential of deep recurrent networks for high-resolution operational forecasting. Hu et al. [14] combined XGBoost with LSTM to jointly predict SWH and peak wave height on Lake Erie and achieved mean absolute percentage errors (MAPE) of approximately 15.6–22.9%. Meng et al. [15] further showed that bidirectional gated recurrent units (BiGRUs) yield more accurate tropical-cyclone wave forecasts than several competing architectures, particularly under rapidly evolving extreme conditions. To capture richer spatio-temporal structures and exogenous information, more recent studies have moved towards hybrid deep-learning frameworks, in which convolutional neural networks (CNNs) [16], attention mechanisms [17] or transformers [18] are coupled with recurrent backbones. Jörges et al. [19] integrated bathymetry information into mixed-data CNN-LSTM networks to reconstruct and forecast spatial ocean wave fields, significantly reducing RMSE by explicitly exploiting seabed-induced spatial patterns. Gómez-Orellana et al. [20] designed multi-task evolutionary ANNs to simultaneously predict short-term SWH and wave energy flux, thereby improving predictive skill for energy-relevant quantities through parameter sharing across related tasks. Wu et al. [21] developed a RIME-CNN-BiLSTM architecture that attains substantially lower prediction errors and demonstrated its superior performance in multi-step SWH forecasting. Raj and Prakash [22] combined CNN and BiLSTM models to estimate SWH and evaluate ocean wave energy potential at Emu Park and Townsville in Australia, highlighting the applicability of deep hybrid models in wave-energy resource assessment.

Collectively, these studies demonstrate that machine-learning and deep-learning approaches can effectively exploit the autoregressive nature of SWH and auxiliary environmental information to deliver fast and accurate forecasts across a wide range of spatial and temporal scales. Nevertheless, most existing models are still trained on raw time series for individual sites or fixed configurations, with limited emphasis on model reuse, cross-site generalisation, and efficient model updating under sliding-window forecasting strategies-issues that are particularly critical for scalable deployment in realistic, data-heterogeneous marine environments.

Despite the remarkable progress of autoregressive and deep-learning-based models, they often exhibit systematic prediction lags, especially when sea states change rapidly [23,24,25]. This deficiency is closely related to the pronounced nonstationarity of SWH: the mean level, variance, and dominant periods of SWH can shift over time under the combined influence of seasonal cycles, synoptic-scale storms, and abrupt local wind events. Under such conditions, a single global mapping from past to future values—implicitly assuming quasi-stationary dynamics—tends to smooth sharp transitions and extrapolate recent trends too conservatively, so that predicted peaks and troughs are temporally delayed and attenuated relative to observations. To solve this problem, a widely adopted strategy is to perform data pre-processing and decompose the original nonstationary series into a set of approximately stationary components prior to prediction. Methods such as empirical mode decomposition (EMD) [26], seasonal-trend decomposition [27] methods and its variants are widely used. For example, Duan et al. [28] combined EMD with an AR model to construct an EMD-AR scheme that outperformed the standalone AR model for SWH prediction. Duan et al. [29] coupled EMD with support vector regression (SVR) to propose an EMD–SVR hybrid model and showed that it yields more accurate SWH forecasts than both a single SVR and a wavelet-decomposition-based SVR (WD-SVR). Additionally, Hao et al. [30] developed an EMD–LSTM prediction model for nonstationary waves at multiple offshore locations along the Chinese coast and demonstrated that EMD, by smoothing the nonstationary time series and mitigating phase shifts, can substantially enhance the predictive accuracy of the subsequent LSTM. Guo et al. [31] applied an EEMD-LSTM framework in which each intrinsic mode function (IMF) is predicted separately and subsequently recombined, and reported consistent improvements in RMSE and MAE across short-, medium-, and long-lead forecasts. Seasonal-Trend Decomposition [27] using Loess (STL) has been introduced into wave forecasting frameworks as a computationally efficient alternative to EMD. STL decomposes a time series into additive seasonal, trend, and remainder components through a parametric, regression-based procedure, which is typically faster and more stable than iterative sifting algorithms, and is therefore particularly attractive for short-term operational forecasting. Sun et al. [32] proposed a hybrid FFT-STL-deep learning model in which long-term trends, seasonal components, and stochastic residuals are first separated and then fed into dedicated predictors, achieving competitive accuracy for SWH forecasts Liu et al. [33] proposes a hybrid model combining STL-FFT-STFT-TCN-LSTM framework, where STL is used to decompose the wave-height series and subsequent spectral and deep-learning modules are applied to the decomposed components; their analysis revealed that STL-based decomposition contributes more significantly to accuracy improvements than other time-series pre-processing techniques within the framework. Yang et al. [34] further developed an STL-CNN-PE approach that combines STL with one-dimensional CNNs and positional encoding to forecast SWH efficiently. Their results showed that STL-CNN-PE achieves prediction accuracy comparable to that of EMD-LSTM models, while offering substantial gains in computational speed and thus providing a more favorable trade-off between accuracy and efficiency.

To fully exploit the information contained in long historical records, many decomposition-based forecasting frameworks adopt a sliding-window strategy [31]. In such schemes, the time series is first decomposed within a moving window, and the resulting sub-series are then used to train a prediction model that produces one-step or multi-step forecasts up to a given horizon. As the window advances, the decomposition is repeated on the updated data segment, and the forecasting model is retrained-or at least substantially updated-on the new set of decomposed sequences. This procedure, which has also been employed in our previous work on SWH prediction, helps maintain the timeliness of the learned relationships and generally improves forecast accuracy by continuously aligning the model with the most recent sea-state conditions [31]. However, it comes at a considerable computational cost, as each update requires reapplying the decomposition algorithm and relearning separate predictors for multiple components. Despite this practical bottleneck, existing studies have predominantly emphasized accuracy improvements by increasing model complexity or the dimensionality of decomposition/pre-processing, while the computational cost incurred by repeated retraining in operational sliding-window settings has received comparatively less attention. This accuracy-first tendency can significantly impede frequent model updates and near-real-time deployment, especially under limited on-board or edge-computing resources.

At the same time, the intrinsic characteristics of ocean waves suggest that there is a high degree of redundancy across successive sliding windows. Owing to the quasi-periodic nature of wind–wave generation and the underlying deterministic wave physics, when a fixed decomposition scheme is used, the decomposed components (e.g., trend, seasonal, and residual series) computed on overlapping windows tend to exhibit highly similar statistical and dynamical patterns. This observation naturally motivates a transfer-learning perspective: instead of repeatedly training a full deep network from scratch for each window, one can first learn a generic feature-extraction backbone on long-term decomposed data and then freeze most of its parameters, fine-tuning only a small set of task-specific layers as the window slides [35,36]. In this way, the number of trainable parameters and the cost of each update are drastically reduced, allowing the model to retain the expressive power and nonlinear representation capability of deep learning while achieving fast, incremental adaptation to new data and remaining suitable for near-real-time deployment on resource-constrained edge devices. Importantly, this perspective elevates computational efficiency from an implementation detail to an explicit design objective in sliding-window SWH forecasting, complementing the dominant accuracy-driven paradigm in prior decomposition-deep-learning hybrids.

In this study, we develop a sliding-window forecasting framework for SWH that combines STL decomposition with an LSTM-based predictor and an explicit transfer-learning strategy. The proposed transfer-learning model, referred to as STL-LSTM-T, exploits the flexibility of STL to separate the original nonstationary series into trend, seasonal, and remainder components, while leveraging the strong sequence-modelling capability of LSTMs to learn nonlinear temporal dependencies within these decomposed sub-series. After an initial training phase on long-term historical data, the stacked LSTM layers are treated as a generic temporal feature-extraction backbone and kept fixed. As the prediction window slides forward, only the fully connected layers that map the LSTM hidden states to future SWH values are fine-tuned using the most recent decomposed data. This mechanism is specifically tailored to the high-overlap nature of sliding windows, enabling efficient updates without repeatedly relearning largely redundant representations from scratch. Moreover, by coupling transfer learning with a fixed, physically interpretable STL decomposition (trend/seasonal/remainder), the adaptation is confined to a stable component space, which improves the robustness and reproducibility of incremental updates across windows. This design preserves the predictive skill of a deep recurrent architecture but reduces the number of parameters updated at each step to only a small fraction of the full model, thereby dramatically shortening training time and lowering computational demand. As a result, the proposed framework maintains forecast accuracy comparable to that of fully retrained models, while making sliding-window updates far more efficient and enhancing the feasibility of near-real-time deployment on resource-constrained edge devices.

2. Model and Methodologies

2.1. STL Decomposition

Seasonal-Trend decomposition using Loess (STL) [27] is a nonparametric decomposition technique that represents an observed time series

y_{t}

as the sum of three interpretable components.

y_{t} = T_{t} + S_{t} + R_{t}

(1)

where

T_{t}

is trend component,

S_{t}

is the seasonal (periodic) component, and

R_{t}

is the remainder capturing irregular fluctuations and high-frequency variability. Unlike classical linear seasonal models, STL does not impose a fixed functional form on either the trend or the seasonal pattern. Instead, both components are estimated by locally estimated scatterplot smoothing (LOESS), which allows the seasonal behaviour and the trend to evolve gradually over time. This flexibility makes STL particularly suitable for nonstationary environmental series such as significant wave height, where both the mean level and the dominant periodicity can change under varying meteorological and oceanographic conditions.

In the present work, the STL procedure is implemented in three main steps: (i) given a prescribed seasonal period m, the original series

y_{t}

is reorganized by phase within each cycle of length m and a LOESS smoother is applied phase-wise to estimate the seasonal component

S_{t}

; (ii) the seasonally adjusted series

y_{t} - S_{t}

is then smoothed over time using LOESS to obtain the trend component

T_{t}

; (iii) the remainder component is finally computed as

R_{t} = y_{t} - S_{t} - T_{t}

, and the seasonal and trend estimates are refined by iterating steps (i)–(ii) until convergence. A detailed description of the STL algorithm and its robust extension can be found in Refs. [27,34].

2.2. Long Short-Term Memory (LSTM) Transfer Learning Strategy

Long Short-Term Memory (LSTM) [12] networks are a specialized type of recurrent neural network (RNN) designed to capture long-range temporal dependencies while alleviating the vanishing and exploding gradient problems that affect conventional RNNs [37]. By introducing an internal memory cell and a set of gating mechanisms, LSTMs can selectively retain or discard information over extended time horizons, which makes them particularly suitable for modelling nonlinear, non-stationary geophysical series such as SWH. Figure 1 illustrates the internal structure of an LSTM unit. An LSTM unit augments the standard RNN structure with three gates, namely, forget gate, input gate, and output gate, and a cell state that acts as an explicit memory channel. At each time step t, given the current input x_t and the previous hidden state h_t−1, the gates are computed as

f_{t} = σ (W_{f} [h_{t - 1}, x_{t}] + b_{f})

(2)

i_{t} = σ (W_{i} [h_{t - 1}, x_{t}] + b_{i})

(3)

{\tilde{c}}_{t} = \tanh (W_{c} [h_{t - 1}, x_{t}] + b_{c})

(4)

where

σ (\cdot)

is the logistic sigmoid function and

t a n h (\cdot)

is the hyperbolic tangent function. The cell state

c_{t}

and hidden state

h_{t}

are then updated as

c_{t} = f_{t} ⊙ c_{t - 1} + i_{t} ⊙ {\tilde{c}}_{t}

(5)

o_{t} = σ (W_{o} [h_{t - 1}, x_{t}] + b_{o})

(6)

h_{t} = o_{t} ⊙ \tanh (c_{t})

(7)

with

⊙

denoting element-wise multiplication. Through these gated operations, the LSTM cell adaptively controls amount of past information preserved, new information is incorporated, and portion of the internal state exposed to the next layer or to the output.

In this study, we employ a stacked LSTM architecture in which multiple LSTM layers (one, two, or four) are serially connected to enhance the capacity for extracting multiscale temporal features from the STL components of significant wave height. For each input sequence, the decomposed sub-series are first fed into one or more LSTM layers to produce a sequence of hidden states. The final hidden state of the top LSTM layer is then passed to a fully connected (dense) neural network that performs the regression mapping from the learned temporal representation to the forecasted SWH values. The model is implemented using a standard deep-learning framework and trained with a mean squared error (MSE) loss using the Adam optimizer.

Within this architecture, the transfer learning strategy [38] is embedded by treating the stacked LSTM part as a shared temporal feature-extraction backbone and the terminal dense layers as lightweight, task-specific predictors. The backbone is first pre-trained on long historical STL-decomposed data to learn generic temporal features of the wave field. During subsequent sliding-window forecasting, the parameters of all LSTM layers are kept fixed, and only the weights of the final fully connected layer(s) are fine-tuned using the most recent data in each window. In this way, only a small fraction of the total parameters needs to be updated, which significantly reduces the computational cost and training time of continuous prediction while preserving the expressive power of the deep recurrent backbone. In what follows, the model with transfer learning enabled is referred to as STL-LSTM-T, whereas the configuration in which all parameters are updated in each window is denoted as STL-LSTM.

2.3. Sliding-Window Strategy for SWH Forecasting

In this study, a sliding-window STL-based strategy is adopted for consecutive prediction of significant wave height, as schematically illustrated in Figure 2. Rather than training a single global model on the entire record, the forecasting model is always conditioned on a finite window of the most recent data so as to track the gradual evolution of the sea state. For each current time point, a fixed-length segment of historical SWH (e.g., the previous six months, corresponding to 4320 time steps at 1 h resolution) is extracted and decomposed by STL into trend, seasonal, and remainder components. These decomposed sub-series are then used as inputs to the LSTM-based predictor with the transfer-learning strategy described in Section 2.2, and the resulting forecasts for the trend, seasonal, and remainder components are recombined to obtain the predicted SWH for the current prediction round.

Before constructing the sliding-window model, partial autocorrelation function (PACF) analysis is performed on the SWH series (and its STL components) to determine an appropriate autoregressive input length. PACF quantifies the direct correlation between the current value and its lagged values after removing intermediated-lag effects [39], and thus helps identify how many past time steps are most informative for prediction. As shown in Figure 3, the partial autocorrelation decays rapidly after a finite number of lags of about 10, indicating that the forecast skill is dominated by short-term history and that very long input sequences provide limited additional benefit. Accordingly, the length of the input sequence for the LSTM and the step size of the sliding window are chosen as only the most relevant recent ten observations, which helps to reduce model complexity and avoid over-fitting while retaining the essential temporal dependence.

The overall sliding-window forecasting procedure can be summarized as follows:

(i): STL decomposition in a moving window. For each current time point, a segment of SWH data covering the previous 4320 time steps is extracted and decomposed by STL into trend, seasonal and remainder components.
(ii): Continuous prediction using the transfer-learning LSTM. The decomposed sub-series within the window are fed into the pre-trained stacked LSTM backbone, and only the parameters of fully connected layer are fine-tuned on the current window to adapt to the latest sea-state conditions. The model then performs one-step or multi-step prediction for each component, and the predicted trend, seasonal, and remainder are summed to yield the SWH forecasts for this round.
(iii): Forward sliding of the prediction window. After each prediction round, both the training window and the prediction horizon are shifted forward by a fixed number of time steps (e.g., 6, 12 and 24 steps), and steps (i)–(ii) are repeated until the entire target period is covered. The forecasts from all rounds are finally concatenated to form a continuous predicted SWH time series.

Through this sliding-window STL-LSTM-T strategy, the model continuously updates its predictions based on the most recent data while avoiding the use of future information. At the same time, the transfer-learning design ensures that only a small subset of parameters is adjusted at each step, thereby enabling efficient and accurate SWH forecasting.

In this study, PyTorch framework (version 2.3.0) is utilized to train the model, with GPU acceleration to enhance training speed. The network adopts a stacked LSTM and fully connected output layer architecture, and the stacked depth (1/2/4 layers) is compared with a hidden size of 30. The mean squared error (MSE) is used as the loss function to calculate the difference between the model’s output and the actual target values. The Adam algorithm [40] is employed to train the model parameters, with a learning rate set at 0.001. Each model is trained for 2000 epochs to ensure thorough training and convergence. The prediction is implemented on GPU workstation configured with an Intel Core i9-14900K CPU and an NVIDIA GeForce RTX 3090 GPU (CUDA 12.0).

2.4. Data Sources

The observational data used in this study are obtained from the National Data Buoy Center (NDBC). Five directional wave buoys with different location are selected are shown in Figure 4. Both platforms provide long-term records of significant wave height (SWH), making them suitable test sites for evaluating the proposed STL–LSTM-T forecasting framework. Figure 5 presents the corresponding SWH time series for the two buoys over the study period and all data are download from National Data Buoy Center (https://www.ndbc.noaa.gov/ (accessed on 27 December 2025)).

The available records span from 1 January 2021 to 31 October 2025, yielding approximately five years of hourly observations for each buoy. Unless otherwise stated, model training is conducted on rolling six-month windows extracted from these continuous records, each window comprising 4320 data points sampled at 60 min intervals. As shown in Figure 5, the offshore buoy 46244 exhibits larger variance and more frequent extreme SWH values than the nearshore buoy 46253, while the two series are not strongly related, with a Pearson correlation coefficient of 0.303 between their SWH measurements.

2.5. Evaluation Metrics

To quantitatively evaluate the forecasting performance of the proposed transfer-learning-based STL–LSTM model and the benchmark methods, three commonly used error metrics are employed: absolute error (AE), root mean squared error (RMSE), mean squared error (MSE), and mean absolute error (MAE). These indices measure the discrepancy between the predicted significant wave height and the corresponding observations and are defined as

A E = {S W H}_{p r e d i c t, i} - {S W H}_{m e a s u r e d, i} |

(8)

M S E = \frac{1}{n} \sum_{i = 1}^{n} ({S W H}_{p r e d i c t, i} - {S W H}_{o b s e r v e d, i})^{2}

(9)

R M S E = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} ({S W H}_{p r e d i c t, i} - {S W H}_{m e a s u r e d, i})^{2}}

(10)

M A E = \frac{1}{n} \sum_{i = 1}^{n} | {S W H}_{p r e d i c t, i} - {S W H}_{m e a s u r e d, i} |

(11)

where n denotes the number of forecast-observation pairs,

{S W H}_{o b s e r v e d, i}

is the observed SWH at time step i, and

{S W H}_{p r e d i c t, i}

is the corresponding model prediction. Smaller values of RMSE, MSE, and MAE indicate higher predictive accuracy and better overall performance of the forecasting model.

3. Results and Discussion

To validate the effectiveness of the proposed STL-LSTM-T strategy for rapid sliding-window forecasting of significant wave height, this section performs retrospective predictions of SWH measured by two NDBC buoys (46253 and 46244) over the period from 1 November 2024 to 31 October 2025. By comparing the baseline STL-LSTM model (without transfer learning) with the transfer-learning-based STL-LSTM-T model under different forecasting configurations (e.g., continuous forecast horizon and number of stacked LSTM layers), we systematically assess the impact of the transfer-learning strategy on both predictive accuracy and computational efficiency.

3.1. Effect of the Transfer-Learning Strategy

Table 1 reports the MAE, MSE, and RMSE of SWH forecasts obtained with the STL-LSTM and STL-LSTM-T models at buoys 46253 and 46244. The transfer-learning-based STL-LSTM-T model achieves consistently higher predictive accuracy than the baseline STL-LSTM method. At buoy 46253, for instance, STL-LSTM-T yields an average reduction of approximately 9.5% in MAE, 14.6% in MSE, and 7.6% in RMSE compared with STL-LSTM, indicating a substantial improvement in prediction accuracy.

Figure 6 further visualizes this behaviour by presenting the AE between observed SWH and the corresponding predictions from STL-LSTM and STL-LSTM-T at both buoys, together with boxplots of the AE distributions. For STL-LSTM-T, outliers are noticeably less frequent and exhibit smaller magnitudes. The boxplots show that, under the transfer-learning strategy, the central tendency of the error distribution shifts downward, the interquartile range becomes narrower, and both the number and amplitude of extreme outliers are reduced. This indicates that the majority of forecasts benefit from a simultaneous reduction in error.

The results at buoy 46244 are consistent with those at buoy 46253, thereby reinforcing the effectiveness of the transfer-learning strategy. Because the SWH series at buoy 46244 exhibits stronger variability and more frequent extremes, the absolute error levels are higher than those at buoy 46253 for both models. Nevertheless, introducing transfer learning still leads to a marked improvement in model performance, with MAE, MSE, and RMSE reduced by approximately 11.2%, 19.2%, and 14.5%, respectively. Taken together, these experiments demonstrate that the transfer-learning strategy not only reduces computational cost and training time by freezing the parameters of a pre-trained LSTM backbone, but also delivers higher predictive accuracy than a fully retrained STL-LSTM model.

The better performance of STL-LSTM-T can be attributed to two main factors. First, the STL-decomposed sub-series in adjacent sliding windows exhibit highly similar statistical characteristics and temporal structures, reflecting comparable underlying dynamical patterns, as shown in Figure 7. Freezing the LSTM layers responsible for temporal feature extraction therefore allows the model to repeatedly exploit stable representations of the wave field across different windows, without re-optimising the full set of LSTM parameters at each update. Second, the number of autoregressive samples available within a single sliding window is relatively limited and is often insufficient to support full retraining of a deep recurrent architecture. Within the STL-LSTM-T framework, the LSTM backbone is first pre-trained on a long historical dataset (from 1 January 2021 to 31 October 2024) to learn representative temporal features and latent patterns of SWH. On this basis, only a small set of parameters in the terminal fully connected layer (30 weights and 1 bias) is fine-tuned in each sliding window. This drastic reduction in the number of trainable parameters simultaneously lowers the computational burden and prevents unstable updates of a large parameter set on small-sample windows, thereby mitigating the risk of underfitting. These findings suggest that the accuracy of SWH forecasting models is jointly controlled by model complexity and the adequacy of the training data, and that the proposed transfer-learning strategy offers a more favourable balance between these two factors.

Figure 8a illustrates the influence of the number of stacked LSTM layers on SWH prediction accuracy and model runtime under the transfer-learning strategy. As the depth of the LSTM stack increases from one to two layers, both MAE and RMSE decrease, indicating an overall improvement in predictive performance. However, when the number of layers is further increased to four, the prediction errors rise again. These results suggest that a moderate increase in network depth enables the LSTM to more effectively learn the autoregressive features of the STL-decomposed SWH series, whereas excessive depth substantially raises model complexity and, given the limited sample size, makes model pre-training more difficult and ultimately degrades generalization performance. Consequently, a two-layer stacked LSTM architecture is adopted for the transfer-learning model in all subsequent experiments.

Figure 8b shows, for the two-layer STL-LSTM-T model, the proportion of the total prediction error contributed by different stages of the sliding-window forecast. It can be seen that, as the window moves forward and the continuous prediction proceeds, the relative contribution of each stage to the overall error does not exhibit noticeable increase; the error shares remain relatively stable over time. This indicates that, within the short-term forecasting horizon considered in this study, there is no pronounced drift in the underlying physical behaviour of SWH. In combination with the transfer-learning framework, the model is therefore able, on the one hand, to alleviate underfitting caused by the limited number of samples in each window and, on the other hand, to substantially reduce the number of parameters updated during sliding-window prediction, thereby lowering computational cost and enhancing the practical applicability of the proposed approach.

3.2. Effect of the Continuous Forecast Steps

In SWH forecasting, the continuous forecast step directly determines the length of each sliding-window prediction and, indirectly, the frequency of time-series decomposition and model fine-tuning. A longer forecast step implies that more future time steps must be extrapolated within a single window, and that STL decomposition and LSTM updates are performed less frequently. As shown in Table 1, the prediction accuracy of both the baseline STL-LSTM model and the transfer-learning-based STL-LSTM-T model decreases markedly as the continuous forecast horizon increases.

Figure 9 further illustrates this effect by comparing scatter plots of predicted versus observed SWH values for different forecast steps of 6, 12, and 24 steps at buoys 46253 and 46244. As the continuous forecast step becomes longer, the scatter progressively departs from the line y = x, and the spread of the points around this line increases. For the STL-LSTM-T model at buoy 46253, the coefficient of determination R² is approximately 0.95 for a 6-step forecast horizon, but decreases to about 0.94 when the horizon is extended to 24 steps, indicating a substantial loss of explanatory power for longer-range recursive forecasts.

Figure 10 shows from violin plots to display the AE distributions for different continuous forecast steps for STL-LSTM and STL-STLM-T. As the number of forecast steps within each sliding window increases, the error distributions become progressively wider and flatter: the frequency of large-error events rises, the tails of the distributions become heavier, and extreme deviations occur more frequently. At the same time, the STL-LSTM-T configuration generally yields smaller mean errors and more compact central distributions, although a few larger outliers remain. This suggests that the transfer-learning strategy effectively improves the overall error level but, because only the fully connected layer is updated in each window, it may be less sensitive to certain instantaneous or short-lived fluctuations in the SWH series. From a computational perspective, however, lengthening the continuous forecast horizon reduces the number of STL decomposition and LSTM updates required to cover a given prediction period, so that the decrease in total runtime is typically faster than the corresponding increase in prediction error. As shown in Figure 10, the computation time decreases rapidly as the continuous prediction step size increases. Therefore, an appropriate choice of forecast horizon can offer a favourable trade-off between accuracy and efficiency.

The behaviour described above is largely driven by the accumulation of errors during the recursive prediction process. In the present forecasting scheme, as illustrated in Figure 2, the model uses its own prediction at step k − 1 as part of the input for step k, instead of the true observation. Consequently, any biases introduced at earlier steps propagate forward and may be amplified, so that initial errors increasingly contaminate subsequent predictions, leading to pronounced error accumulation over longer continuous forecast intervals.

To visualize this mechanism more clearly, Figure 11 introduces an error accumulation function and applies polynomial extrapolation to the cumulative error curves corresponding to 6-step and 12-step forecast horizons. Here, the error accumulation function is defined, for each sliding-window forecast, as the sum of absolute errors from step 1 to step n. The results show that the cumulative error grows rapidly with increasing forecast horizon, with the slope of the curve becoming progressively steeper. The extrapolated curves indicate that the rate of error accumulation over the first 1–6 steps is substantially lower than that over steps 6–12, and that the 12–24 step interval exhibits an even higher accumulation rate. These findings suggest that overly long continuous forecast horizons can greatly exacerbate error propagation and accumulation, thereby undermining the reliability of SWH forecasts. In practical applications, it is therefore necessary to balance forecast length and accuracy, and to avoid choosing excessively long continuous forecast horizons.

4. Conclusions

The present study proposed a transfer-learning-based sliding-window framework for significant wave height (SWH) forecasting that combines STL decomposition with a stacked LSTM predictive model (referred to as STL-LSTM-T). Using multi-year observations from five NDBC buoys, the framework was evaluated under realistic operational settings, with particular emphasis on computational efficiency and the robustness of forecast accuracy. A key novelty of this work is that it formulates sliding-window SWH forecasting as an explicit accuracy-efficiency co-optimization problem: rather than pursuing accuracy primarily through increasing model complexity, we target rapid model updating as a first-class requirement for operational deployment. The results demonstrate that STL-LSTM-T offers clear advantages over a baseline STL-LSTM model without transfer learning. Over the test period from November 2024 to October 2025, the transfer-learning configuration reduces MAE, MSE and RMSE by up to 9.5%, 14.6% and 7.6% at buoy 46253, and by 11.2%, 19.2% and 14.5% at buoy 46244, indicating a systematic improvement in predictive skill. These gains are achieved while substantially lowering the computational burden: by freezing a pre-trained LSTM backbone and fine-tuning only a small fully connected output layer (30 weights and 1 bias) within each sliding window, the average training time per update is reduced to roughly one-fifth of that required for full retraining. Mean–variance diagnostics on successive windows further show that the STL-decomposed sub-series possess broadly similar statistical characteristics, suggesting that their essential temporal features evolve only gradually. This statistical stability allows the backbone to reuse robust temporal representations across windows, enabling the STL-LSTM-T framework to exploit redundancy in the decomposed SWH series in a principled way, simultaneously maintaining high forecast accuracy and achieving large computational savings.

Parameter analyses highlight that an appropriate choice of model depth and continuous forecast horizon is crucial for balancing skill, robustness and efficiency. A two-layer LSTM backbone is found to be sufficient to capture the autoregressive structure of the STL components: increasing the depth from one to two layers improves accuracy, whereas further deepening to four layers degrades performance and increases runtime due to over-parameterisation relative to the limited data within each window. Similarly, extending the continuous forecast horizon from 6 to 12 and 24 steps leads to progressively larger errors, primarily as a result of error propagation in the recursive prediction scheme. Within the range examined, combining a two-layer backbone with moderate horizons (6–12 steps) offers a robust compromise between forecast accuracy, error accumulation and computational demand, making the STL-LSTM-T framework particularly attractive for near-real-time SWH forecasting on resource-constrained marine platforms.

Author Contributions

G.Z.: Writing—review and editing, Writing—original draft, Visualization, Validation, Software, Methodology, Investigation, Formal analysis, Data curation, Conceptualization. Y.C.: Writing—review and editing, Supervision, Resources, Conceptualization. Y.J.: Writing—review and editing, Investigation. S.L.: Writing—review and editing. J.S.: Writing—review and editing, Supervision. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data presented in this study are available upon request from the corresponding author.

Acknowledgments

The authors would like to show their appreciation to National Data Buoy Center (NDBC) for their open-sourced data of Santa Monica Bay, California.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following Symbols and abbreviations are used in this manuscript:

Symbols
$σ (\cdot)$	Logistic Sigmoid Function
$t a n h (\cdot)$	Hyperbolic Tangent Function
$c_{t}$	Cell State
$h_{t}$	Hidden State
$f_{t}$	Forget Gate
$i_{t}$	Input Gate
$⊙$	Element-wise Multiplication
T_t	Trend Component of STL Decomposition
S_t	Seasonal Component of STL Decomposition
R_t	Residual Component of STL Decomposition
Abbreviations
AE	Absolute Error
ANN	Artificial Neural Network
BiGRU	Bidirectional Gated Recurrent Unit
CNN	Convolutional Neural Network
EMD	Empirical Mode Decomposition
EMMD	Ensemble Empirical Mode Decomposition
IMF	Ntrinsic Mode Function
LSTM	Long Short-Term Memory
LOESS	Locally Estimated Scatterplot Smoothing
MSE	Mean Squared Error
MAPE	Mean Absolute Percentage Error
NDBC	National Data Buoy Center
PACF	Partial Autocorrelation Function
RMSE	Root Mean Squared Error
RNN	Recurrent Neural Network
STL	Seasonal-Trend Decomposition Using Loess
SWH	Significant Wave Height
SVMs	Support Vector Machines
WD-SVR	Wavelet-Decomposition-based Support Vector Regression

References

Cornejo-Bueno, L.; Garrido-Merchán, E.C.; Hernández-Lobato, D.; Salcedo-Sanz, S.J.N. Bayesian optimization of a hybrid system for robust ocean wave features prediction. Neurocomputing 2018, 275, 818–828. [Google Scholar] [CrossRef]
Dempwolff, L.-C.; Melling, G.; Windt, C.; Lojek, O.; Martin, T.; Holzwarth, I.; Bihs, H.; Goseberg, N. Loads and effects of ship-generated, drawdown waves in confined waterways-A review of current knowledge and methods. J. Coast. Hydraul. Struct. 2022, 2, 46. [Google Scholar] [CrossRef]
Pennino, S.; Scamardella, A. Motions assessment using a time domain approach for a research ship in Antarctic waters. J. Mar. Sci. Eng. 2023, 11, 558. [Google Scholar] [CrossRef]
Group, T.W. The WAM model—A third generation ocean wave prediction model. J. Phys. Oceanogr. 1988, 18, 1775–1810. [Google Scholar] [CrossRef]
Booij, N.; Ris, R.C.; Holthuijsen, L.H. A third-generation wave model for coastal regions: 1. Model description and validation. J. Geophys. Res. Oceans 1999, 104, 7649–7666. [Google Scholar] [CrossRef]
Tolman, H.L.; Balasubramaniyan, B.; Burroughs, L.D.; Chalikov, D.V.; Chao, Y.Y.; Chen, H.S.; Gerald, V.M. Development and implementation of wind-generated ocean surface wave Modelsat NCEP. Weather Forecast. 2002, 17, 311–333. [Google Scholar] [CrossRef]
Soares, C.G.; Ferreira, A.; Cunha, C. Linear models of the time series of significant wave height on the Southwest Coast of Portugal. Coast. Eng. 1996, 29, 149–167. [Google Scholar] [CrossRef]
Agrawal, J.; Deo, M. On-line wave prediction. Mar. Struct. 2002, 15, 57–74. [Google Scholar] [CrossRef]
Deo, M.; Naidu, C.S. Real time wave forecasting using neural networks. Ocean Eng. 1998, 26, 191–203. [Google Scholar] [CrossRef]
Etemad-Shahidi, A.; Mahjoobi, J. Comparison between M5′ model tree and neural networks for prediction of significant wave height in Lake Superior. Ocean Eng. 2009, 36, 1175–1181. [Google Scholar] [CrossRef]
Medsker, L.R.; Jain, L. Recurrent neural networks. Des. Appl. 2001, 5, 2. [Google Scholar]
Yu, Y.; Si, X.; Hu, C.; Zhang, J. A review of recurrent neural networks: LSTM cells and network architectures. Neural Comput. 2019, 31, 1235–1270. [Google Scholar] [CrossRef] [PubMed]
Fan, S.; Xiao, N.; Dong, S. A novel model to predict significant wave height based on long short-term memory network. Ocean Eng. 2020, 205, 107298. [Google Scholar] [CrossRef]
Hu, H.; van der Westhuysen, A.J.; Chu, P.; Fujisaki-Manome, A. Predicting Lake Erie wave heights and periods using XGBoost and LSTM. Ocean Model. Online 2021, 164, 101832. [Google Scholar] [CrossRef]
Meng, F.; Song, T.; Xu, D.; Xie, P.; Li, Y. Forecasting tropical cyclones wave height using bidirectional gated recurrent unit. Ocean Eng. 2021, 234, 108795. [Google Scholar] [CrossRef]
Gu, J.; Wang, Z.; Kuen, J.; Ma, L.; Shahroudy, A.; Shuai, B.; Liu, T.; Wang, X.; Wang, G.; Cai, J.; et al. Recent advances in convolutional neural networks. Pattern Recognit. 2018, 77, 354–377. [Google Scholar] [CrossRef]
Guo, M.-H.; Xu, T.-X.; Liu, J.-J.; Liu, Z.-N.; Jiang, P.-T.; Mu, T.-J.; Zhang, S.-H.; Martin, R.R.; Cheng, M.-M.; Hu, S.-M. Attention mechanisms in computer vision: A survey. Comput. Vis. Media 2022, 8, 331–368. [Google Scholar] [CrossRef]
Cheng, X.; Wu, D.; Wang, J.; Yan, M. TransWaveNet: Cross-structural time series prediction of wave height based on a simplified transformer. Phys. Fluids 2025, 37, 107157. [Google Scholar] [CrossRef]
Jörges, C.; Berkenbrink, C.; Stumpe, B. Prediction and reconstruction of ocean wave heights based on bathymetric data using LSTM neural networks. Ocean Eng. 2021, 232, 109046. [Google Scholar] [CrossRef]
Gómez-Orellana, A.M.; Guijo-Rubio, D.; Gutiérrez, P.A.; Hervás-Martínez, C. Simultaneous short-term significant wave height and energy flux prediction using zonal multi-task evolutionary artificial neural networks. Renew. Energy 2022, 184, 975–989. [Google Scholar] [CrossRef]
Wu, Y.; Wang, J.; Zhang, R.; Wang, X.; Yang, Y.; Zhang, T. RIME-CNN-BiLSTM: A novel optimized hybrid enhanced model for significant wave height prediction in the Gulf of Mexico. Ocean Eng. 2024, 312, 119224. [Google Scholar] [CrossRef]
Raj, N.; Prakash, R. Assessment and prediction of significant wave height using hybrid CNN-BiLSTM deep learning model for sustainable wave energy in Australia. Sustain. Horiz. 2024, 11, 100098. [Google Scholar] [CrossRef]
Zhang, S.; Zhao, Z.; Wu, J.; Jin, Y.; Jeng, D.-S.; Li, S.; Li, G.; Ding, D. Solving the temporal lags in local significant wave height prediction with a new VMD-LSTM model. Ocean Eng. 2024, 313, 119385. [Google Scholar] [CrossRef]
Dixit, P.; Londhe, S.; Dandawate, Y. Removing prediction lag in wave height forecasting using Neuro-Wavelet modeling technique. Ocean Eng. 2015, 93, 74–83. [Google Scholar] [CrossRef]
Yang, H.; Wang, H.; Gao, Y.; Liu, X.; Xu, M. A significant wave height forecast framework with end-to-end dynamic modeling and lag features length optimization. Ocean Eng. 2022, 266, 113037. [Google Scholar] [CrossRef]
Huang, N.E.; Shen, Z.; Long, S.R.; Wu, M.C.; Shih, H.H.; Zheng, Q.; Yen, N.-C.; Tung, C.C.; Liu, H.H. The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis. Proc. R. Soc. London Ser. A Math. Phys. Eng. Sci. 1998, 454, 903–995. [Google Scholar] [CrossRef]
Cleveland, R.B.; Cleveland, W.S.; McRae, J.E.; Terpenning, I. STL A seasonal-trend decomposition. J. Off. Stat. 1990, 6, 3–73. [Google Scholar]
Duan, W.-y.; Huang, L.-m. A hybrid EMD-AR model for nonlinear and non-stationary wave forecasting. J. Zhejiang Univ. Sci. A 2016, 17, 115–129. [Google Scholar] [CrossRef]
Duan, W.Y.; Han, Y.; Huang, L.M.; Zhao, B.B.; Wang, M.H. A hybrid EMD-SVR model for the short-term prediction of significant wave height. Ocean Eng. 2016, 124, 54–73. [Google Scholar] [CrossRef]
Hao, W.; Sun, X.; Wang, C.; Chen, H.; Huang, L. A hybrid EMD-LSTM model for non-stationary wave prediction in offshore China. Ocean Eng. 2022, 246, 110566. [Google Scholar] [CrossRef]
Guo, Y.; Si, J.; Wang, Y.; Hanif, F.; Li, S.; Wu, M.; Xu, M.; Mi, J. Ensemble-Empirical-Mode-Decomposition (EEMD) on SWH prediction: The effect of decomposed IMFs, continuous prediction duration, and data-driven models. Ocean Eng. 2025, 324, 120755. [Google Scholar] [CrossRef]
Sun, Y.; Yu, L.; Zhu, D. A Hybrid Deep Learning Model Based on FFT-STL Decomposition for Ocean Wave Height Prediction. Appl. Sci. 2025, 15, 5517. [Google Scholar] [CrossRef]
Liu, H.; Zhu, Z.; Zhou, Y.; Li, C.J. STL-FFT-STFT-TCN-LSTM: An Effective Wave Height High Accuracy Prediction Model Fusing Time-Frequency Domain Features. arXiv 2025, arXiv:2509.19313. [Google Scholar]
Yang, S.; Deng, Z.; Li, X.; Zheng, C.; Xi, L.; Zhuang, J.; Zhang, Z.; Zhang, Z. A novel hybrid model based on STL decomposition and one-dimensional convolutional neural networks with positional encoding for significant wave height forecast. Renew. Energy 2021, 173, 531–543. [Google Scholar] [CrossRef]
Weiss, K.; Khoshgoftaar, T.M.; Wang, D. A survey of transfer learning. J. Big Data 2016, 3, 9. [Google Scholar] [CrossRef]
Zhuang, F.; Qi, Z.; Duan, K.; Xi, D.; Zhu, Y.; Zhu, H.; Xiong, H.; He, Q. A Comprehensive Survey on Transfer Learning. Proc. IEEE 2021, 109, 43–76. [Google Scholar] [CrossRef]
Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
Torrey, L.; Shavlik, J. Transfer learning. In Handbook of Research on Machine Learning Applications and Trends: Algorithms, Methods, and Techniques; IGI Global Scientific Publishing: Hershey, PA, USA, 2010; pp. 242–264. [Google Scholar]
Ramsey, F.L. Characterization of the partial autocorrelation function. Ann. Stat. 1974, 2, 1296–1301. [Google Scholar] [CrossRef]
Kingma, D.P. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]

Figure 1. The basic structure of (a) an LSTM unit and (b) the LSTM and its transfer learning structure.

Figure 2. Sliding window prediction strategy.

Figure 3. Partial autocorrelation analysis for Buoy Point 46253. (a) Autocorrelation Analysis of Trend Components; (b) Autocorrelation Analysis of Seasonal Components; (c) Autocorrelation Analysis of Residual Components.

Figure 4. Locations of the five buoy stations.

Figure 5. Time Series of SWH at Five Buoy Stations.

Figure 6. Absolute errors (AE) and corresponding boxplots of SWH forecasts at buoys 46253, 46244, 44086, 51000 and 51101 with and without the transfer-learning strategy, using a two-layer LSTM architecture and a continuous forecast of 12 time steps. (a) Absolute errors of SWH forecasts at buoys 46253 without the transfer-learning strategy; (b) Absolute errors of SWH forecasts at buoys 46244 without the transfer-learning strategy; (c) Absolute errors of SWH forecasts at buoys 44086 without the transfer-learning strategy; (d) Absolute errors of SWH forecasts at buoys 51000 without the transfer-learning strategy; (e) Absolute errors of SWH forecasts at buoys 51101 without the transfer-learning strategy; (f) Absolute errors of SWH forecasts at buoys 46253 with the transfer-learning strategy; (g) Absolute errors of SWH forecasts at buoys 46244 with the transfer-learning strategy; (h) Absolute errors of SWH forecasts at buoys 44086 with the transfer-learning strategy; (i) Absolute errors of SWH forecasts at buoys 51000 with the transfer-learning strategy; (j) Absolute errors of SWH forecasts at buoys 51101 with the transfer-learning strategy; (k) Corresponding boxplots of SWH forecasts at buoys 46253; (l) Corresponding boxplots of SWH forecasts at buoys 46244; (m) Corresponding boxplots of SWH forecasts at buoys 44086; (n) Corresponding boxplots of SWH forecasts at buoys 51000; (o) Corresponding boxplots of SWH forecasts at buoys 51101.

Figure 7. Results of T-tests and F-tests for the equality of means and variances of STL-decomposed sub-series in the first five sliding windows at buoy 46253 for a 12-step forecast horizon. Most p-values exceed 0.05, indicating that differences in mean and variance across adjacent windows are not statistically significant. (a) The T-test results of the trend component; (b) The F-test results of the trend component; (c) The T-test results of the seasonal component; (d) The F-test results of the seasonal component; (e) The T-test results of the residual component; (f) The F-test results of the residual component.

Figure 8. (a) Mean squared errors and computation time of the STL-LSTM-T model for different numbers of stacked LSTM layers; (b) relative prediction errors at different stages of the sliding-window forecast.

Figure 9. Multi-step SWH Prediction Performance Scatter Plots at buoys 46253, 46244 and 44086, with and without the transfer-learning strategy. In all cases, the LSTM architecture uses two stacked layers and a continuous forecast horizon of 12 time steps. (a) Scatter plot of predicted and observed values for STL-LSTM-T model with 6-step forecast at buoy 46253; (b) Scatter plot of predicted and observed values for STL-LSTM model with 6-step forecast at buoy 46253; (c) Scatter plot of predicted and observed values for STL-LSTM-T model with 6-step forecast at buoy 46244; (d) Scatter plot of predicted and observed values for STL-LSTM model with 6-step forecast at buoy 46244; (e) Scatter plot of predicted and observed values for STL-LSTM-T model with 6-step forecast at buoy 44086; (f) Scatter plot of predicted and observed values for STL-LSTM model with 6-step forecast at buoy 44086; (g) Scatter plot of predicted and observed values for STL-LSTM-T model with 12-step forecast at buoy 46253; (h) Scatter plot of predicted and observed values for STL-LSTM model with 12-step forecast at buoy 46253; (i) Scatter plot of predicted and observed values for STL-LSTM-T model with 12-step forecast at buoy 46244; (j) Scatter plot of predicted and observed values for STL-LSTM model with 12-step forecast at buoy 46244; (k) Scatter plot of predicted and observed values for STL-LSTM-T model with 12-step forecast at buoy 44086; (l) Scatter plot of predicted and observed values for STL-LSTM model with 12-step forecast at buoy 44086; (m) Scatter plot of predicted and observed values for STL-LSTM-T model with 24-step forecast at buoy 46253; (n) Scatter plot of predicted and observed values for STL-LSTM model with 24-step forecast at buoy 46253; (o) Scatter plot of predicted and observed values for STL-LSTM-T model with 24-step forecast at buoy 46244; (p) Scatter plot of predicted and observed values for STL-LSTM model with 24-step forecast at buoy 46244; (q) Scatter plot of predicted and observed values for STL-LSTM-T model with 24-step forecast at buoy 44086; (r) Scatter plot of predicted and observed values for STL-LSTM model with 24-step forecast at buoy 44086.

Figure 10. Distributions of absolute errors for the STL-LSTM and STL-LSTM-T models under different continuous forecast horizons at buoys 46253 and 46244. (a) Distributions of absolute errors for the STL-LSTM models under different continuous forecast horizons at buoys 46253; (b) Distributions of absolute errors for the STL-LSTM models under different continuous forecast horizons at buoys 46244p; (c) Distributions of absolute errors for the STL-LSTM-T models under different continuous forecast horizons at buoys 46253; (d) Distributions of absolute errors for the STL-LSTM-T models under different continuous forecast horizons at buoys 46244.

Figure 11. Cumulative absolute error as a function of the forecast step for different continuous forecast horizons, illustrating the error-accumulation behaviour of the recursive predictions at buoys 46253 and 46244. (a) The sum of absolute error at buoys 46253; (b) The sum of absolute error at buoys 46244; (c) The absolute error at buoys 46253; (d) The absolute error at buoys 46244.

Table 1. Prediction errors of the transfer-learning STL-LSTM-T model and the baseline STL-LSTM model at buoys 46244, 46253, 44086, 51101 and 51000 for different numbers of LSTM layers and continuous forecast.

Station	LSTM Layers	Steps	STL-LSTM-T			STL-LSTM
			MAE	MSE	RMSE	MAE	MSE	RMSE
46244	1	12	0.502	0.538	0.734	-	-	-
	2	6	0.410	0.350	0.592	0.422	0.378	0.615
		12	0.546	0.607	0.779	0.517	0.557	0.747
		24	0.807	1.239	1.239	0.659	0.867	0.931
	4	12	0.539	0.586	0.766	-	-	-
46253	1	12	0.159	0.057	0.238	-	-	-
	2	6	0.143	0.046	0.215	0.156	0.056	0.237
		12	0.173	0.072	0.268	0.192	0.084	0.290
		24	0.208	0.099	0.315	0.232	0.112	0.335
	4	12	0.191	0.088	0.296	-	-	-
44086	2	6	0.245	0.139	0.373	0.249	0.139	0.372847552
		12	0.303	0.199	0.447	0.316	0.216	0.465230742
		24	0.419	0.384	0.619	0.420	0.347	0.589381859
51101	2	12	0.421	0.430	0.655	0.373	0.320	0.566
51000	2	12	0.504	0.573	0.757	0.328	0.256	0.506

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhao, G.; Cheng, Y.; Jia, Y.; Li, S.; Si, J. A Transfer-Learning-Based STL–LSTM Framework for Significant Wave Height Forecasting. J. Mar. Sci. Eng. 2026, 14, 146. https://doi.org/10.3390/jmse14020146

AMA Style

Zhao G, Cheng Y, Jia Y, Li S, Si J. A Transfer-Learning-Based STL–LSTM Framework for Significant Wave Height Forecasting. Journal of Marine Science and Engineering. 2026; 14(2):146. https://doi.org/10.3390/jmse14020146

Chicago/Turabian Style

Zhao, Guanhui, Yuyan Cheng, Yuanhao Jia, Shuang Li, and Jicang Si. 2026. "A Transfer-Learning-Based STL–LSTM Framework for Significant Wave Height Forecasting" Journal of Marine Science and Engineering 14, no. 2: 146. https://doi.org/10.3390/jmse14020146

APA Style

Zhao, G., Cheng, Y., Jia, Y., Li, S., & Si, J. (2026). A Transfer-Learning-Based STL–LSTM Framework for Significant Wave Height Forecasting. Journal of Marine Science and Engineering, 14(2), 146. https://doi.org/10.3390/jmse14020146

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Transfer-Learning-Based STL–LSTM Framework for Significant Wave Height Forecasting

Abstract

1. Introduction

2. Model and Methodologies

2.1. STL Decomposition

2.2. Long Short-Term Memory (LSTM) Transfer Learning Strategy

2.3. Sliding-Window Strategy for SWH Forecasting

2.4. Data Sources

2.5. Evaluation Metrics

3. Results and Discussion

3.1. Effect of the Transfer-Learning Strategy

3.2. Effect of the Continuous Forecast Steps

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI