1. Introduction
Marine forecasting plays an extremely important role in supporting coastal engineering construction, disaster prevention and mitigation, marine ecological civilization construction, and economic development. Forecasting significant wave height (SWH) holds important scientific value for monitoring the state of waves [
1,
2,
3,
4,
5,
6].
Traditional numerical models for wave forecasting are based on physical dynamics, solving a wave action balance equation through the use of discrete calculations [
1]. However, these models are usually computationally expensive and complex [
1,
2]. In contrast, existing machine-learning methods focus on data association and do not rely on physical mechanisms. Compared to numerical models, existing machine-learning methods offer lower computational and time costs and achieve higher accuracy in short-term wave forecasting [
3,
4,
5,
6,
7,
8,
9,
10,
11,
12,
13,
14,
15,
16,
17,
18,
19,
20,
21,
22,
23,
24,
25]. Commonly used machine-learning models for wave forecasting include artificial neural network (ANN) [
3], backpropagation (BP) [
4], support vector machine (SVM) [
5,
6,
7], long short-term memory (LSTM) [
8,
9,
10,
11,
12,
13,
14,
15,
16,
17], bidirectional LSTM (BiLSTM) [
18,
19,
20], gated recurrent unit (GRU) [
17,
21,
22,
23], and bidirectional GRU (BiGRU) [
24,
25]. These models capture the envolution of wave height with time from historical information or model wave envolution based on the driving effect from the wind and the influences of other environmental features (e.g., air pressure and temperature) on waves. Wave forecasting is divided into point-to-point forecasting and continuous-sequence forecasting. Most current studies focus on point-to-point forecasting, where waves at specific time steps in the future are forecasted by setting the length of the forecast window. Prahlada et al. [
3] utilized a hybrid model, combining wavelet analysis and an artificial neural network (WLNN) to forecast the significant wave height in a time series, with lead times extending up to 48 h. The root mean square error (RMSE) for a 48 h forecast horizon near the western region of Eureka, Canada, in the North Pacific Ocean was found to be 1.076 m. Under normal conditions, the mean absolute percentage error (MAPE) for a 12 h forecast was 61%, while under extreme conditions, it reduced to 40%. Li et al. [
21] used a GRU model and introduced environmental features such as wind speed and sea temperature to predict SWH 1–3 h ahead. Wave height data were collected from six monitoring stations in the offshore waters of China. Zhou et al. [
9] used a convolutional LSTM (ConvLSTM) model to perform forecasting 3–24 h ahead using National Oceanic and Atmospheric Administration (NOAA) wave reanalysis data under normal and extreme weather conditions. The mean absolute percentage error (MAPE) of a 12 h forecast was 61% under normal conditions and 40% under extreme conditions.
In addition to using machine-learning models for short-term forecasting, some studies [
8,
18,
19,
20,
26] have found that the combination of attention mechanisms or decomposition methods with machine-learning algorithms can greatly improve SWH forecasting. In a study by Zhou et al. [
8], an integrated model combining empirical mode decomposition (EMD) and LSTM was employed for forecasting SWH in the Atlantic Ocean at 3, 6, 12, 24, 48, and 72 h horizons. Wang et al. [
19] proposed a convolutional neural network (CNN)–BiLSTM–attention model and used it to carry out SWH forecasting 1–24 h ahead under normal and typhoon conditions using WaveWatch III (WW3) reanalysis data from the East China Sea and South China Sea from 2011 to 2020. The average RMSEs for the forecasts at 3, 6, 12, and 24 h were observed to be 0.063 m, 0.105 m, 0.172 m, and 0.281 m, respectively, under normal conditions. Under extreme conditions, the corresponding RMSEs were 0.159 m, 0.257 m, 0.437 m, and 0.555 m. Notably, this model outperformed the one trained solely on WW3 reanalysis data. The results demonstrate that the incorporation of an attention mechanism improved the forecasting accuracy of the model. Celik [
26] constructed a hybrid model by integrating an adaptive neuro-fuzzy inference system (ANFIS) with singular value decomposition (SVD) for forecasting SWH in the Pacific and Atlantic Oceans at lead times ranging from 1 to 24 h.
Single-time forecasting can provide high-accuracy SWH forecasts at a specific time. There are two methods for observing continuous SWH envolution over a period of time in the future. One method involves establishing multiple single-time models, which requires considerable computational cost. The other method is to build a time series forecasting model, which may sacrifice the accuracy of forecasting at individual points but save computational cost and can provide an accurate forecast trend. In recent years, the attention-mechanism-based transformer model [
27] has attracted attention due to its excellent performance in time series forecasting tasks. This model was initially proposed by the Google team in 2017 for natural language-processing (NLP) applications. Since then, it has been gradually optimized and is widely used in speech recognition [
28], computer vision [
29], time series forecasting [
30,
31], anomaly detection [
32,
33], and other fields [
34]. Researchers in the marine field have noted the advantages of the transformer model and applied it to marine time series data forecasting [
35,
36,
37,
38,
39]. Immas et al. [
35] used both LSTM and transformer to achieve real-time in situ forecasting of ocean currents. The two models performed similarly and provided valuable guidance for the path planning of autonomous underwater vehicles. Zhou et al. [
36] developed a 3D-geoformer model based on the transformer model to forecast El Niño 3.4 sea surface temperature (SST) anomalies 18 months in advance, achieving a Pearson correlation coefficient of 50%. Their results were comparable to those of Ham et al. [
40], who used a CNN to forecast El Niño/southern oscillation. Feng et al. [
37] used a transformer model to forecast the El Niño index, and the results were better than those using a CNN. Pokhrel et al. [
38] proposed differencing SWHs fitted with WW3 and measured data and used a transformer model to forecast residuals at specific time steps. Compared to WW3 predictions, the transformer-network-based residual correction for a 3 h forecast provided more accurate estimations. The results showed that the combination of numerical modeling and artificial intelligence algorithms yields a better performance.
Currently, there are very limited applications of transformer models in continuous wave forecasting [
38,
41] and wave scale classification. To fill this research gap, in this study, an attempt is made to use an attention-mechanism-based transformer model to achieve sequence-to-sequence learning for SWH. This model extracts the driving effects of various features on the SWH evolution from the historical information of input features, captures the contextual information and dependencies between sequences, and achieves continuous SWH forecasting, allowing the overall trend of wave changes in the future to be monitored and providing technical support for wave warning and forecasting.
The remainder of this paper is structured as follows.
Section 2 describes the data used,
Section 3 introduces the methods and experimental setup employed in this study,
Section 4 analyzes the results, and
Section 5 summarizes the findings.
4. Results and Discussion
To explore the influence of different input sequence lengths on the forecasting results of sequences of the same length,
Figure 6 shows the MAE results for the three machine-learning models, each using historical features from the previous 12 h, 24 h, 36 h, 48 h, 72 h, and 96 h as inputs, respectively, to forecast the SWHs of the next continuous 12 h. It can be seen that the transformer model results were less affected by different input sequence lengths, the error curve changed relatively smoothly, and the error results for different inputs were similar. However, the GRU and LSTM models showed variations in the forecasting performance under different input lengths. Our transformer model learned the dependencies between different positions in the input sequence through its attention mechanism. Thus, it was less affected by the length of the input sequence.
Table 5 presents the results of the multipoint average errors between the continuous 12 h, 24 h, 36 h, 48 h, 72 h, and 96 h forecasted SWHs and their corresponding true values. The machine-learning models outperformed the MASNUM numerical model in terms of MAE, RMSE, and MAPE within a 72 h time range. Interestingly, the MAE of the MASNUM numerical model was comparable to that of the LSTM model for predicting 96 consecutive hours. The error results of the MASNUM numerical model exhibited consistency across different prediction lengths, with the MAE ranging from 0.32 m to 0.34 m for consecutive 12–96 h and higher accuracy in wave scale classification for continuous 72–96 h. For the continuous forecasting, the transformer model performed similarly to the LSTM and GRU models. The MAE, RMSE, and MAPE between the true values and the forecasted values all showed a gradually increasing trend with increasing forecasting time. The accuracy of wave scale classification decreased gradually as the forecasting time increased. Overall, the transformer model outperformed the other two machine-learning models in terms of the average error results. For the continuous 12 h forecasting, our transformer model achieved an average MAE of 0.1394 m, an improvement of 8.4%, 14%, and 57% over GRU, LSTM, and the MASNUM numerical model, respectively, and had an average MAPE of 6.36%, an average bias of −0.0035, and a wave scale classification accuracy of 91%. In the case of continuous 96 h forecasting, our transformer model achieved an average MAE of 0.329 m, an improvement of 3.2%, 2.1%, and 2.2% over GRU, LSTM, and the MASNUM numerical model, respectively, and had an average MAPE of 15.29%, an average bias of 0.0004, and a wave scale classification accuracy of 77.47%. Therefore, the transformer model demonstrated superior accuracy in short-term wave scale classification warnings compared to the accuracies of the GRU, LSTM, and MASNUM numerical models.
Figure 7a–f present the frequency histograms of the average MAEs for continuous 12 h, 24 h, 36 h, 48 h, 72 h, and 96 h sequence forecasting. The transformer model exhibited a high frequency of small errors and a low frequency of large errors in terms of the MAEs compared to those of GRU and LSTM in the forecasting of different sequences, especially long sequences, as shown in
Figure 7e,f.
To study the performance of the three models in continuous long-sequence forecasting,
Figure 8a–c present the MAE, RMSE, and MAPE results at each time step during the continuous 96 h forecasting experiment. The MAEs, RMSEs, and MAPEs of the three models showed an increasing trend with the forecast time, but the error growth rate slowed. The fast accumulation of errors in the short term was related to the nonstationary fluctuations in the data. In the long-term forecasting, the models could capture the envolution and long-term dependencies in SWHs within a certain time range, resulting in a relatively slow error growth rate. Furthermore, the transformer model noticeably outperformed GRU and LSTM in the short term, while in the last 48 time steps, the MAEs, RMSEs, and MAPEs of all three models were very close to each other.
Figure 8d shows the accuracy of continuous 96 h wave forecasting. Within the first 24 h, the transformer model exhibited higher accuracies in terms of wave scale classification than those of GRU and LSTM. However, beyond 24 h, the performance of our transformer method deteriorated compared to that of GRU and LSTM, indicating that the transformer model tended to overestimate the maximum values and underestimate the minimum values at the “wave scale level boundaries” more frequently than GRU and LSTM.
Typhoon Aere originated from a tropical disturbance in the east–southeast of Palau on 27 June 2022. It traveled northward, crossing the East China Sea, and made landfall on the northern island of Hokkaido, Japan, gradually dissipating thereafter.
Figure 9 shows the three machine-learning models and MASNUM numerical model fitting results of wave height changes near the Hawaiian Islands within ten days of 27 June 2022.
Figure 9a–c represent the forecasting results made 1 h, 12 h, and 24 h in advance, respectively. During this time period, there were large waves due to the influence of cyclones in the northwest Pacific, and the central Pacific waves responded to the strong atmospheric disturbances, leading to an increasing trend in SWH. All three machine-learning models accurately fit the SWH envolution within the 1 h forecast horizon, but underestimation was observed at the peak values, and overestimation was noted at the troughs. As the forecast horizon extended to 24 h, the accuracy of all three machine-learning models declined, but they still outperformed the MASNUM numerical models. The three machine-learning models consistently underestimated the SWH during the rising phase and overestimated them during the falling phase. The forecasting results of the transformer model were closer to the true values than those of GRU and LSTM. It is noteworthy that the three machine-learning models underestimated the prediction of SWH. This discrepancy can be attributed to the machine-learning models being trained on data primarily from normal conditions, with limited inclusion of typhoon-induced wave data. Consequently, the models may fail to accurately capture all the characteristics of wave envolution in the central and eastern Pacific Ocean during typhoons in the western Pacific. As a result, the forecast of SWH specifically related to typhoon-induced conditions may be flawed. To improve the prediction accuracy under extreme conditions, it is advisable to incorporate targeted training using typhoon-induced wave data, which may yield better outcomes [
9].
According to the wave scale classification criteria, the waves near the Hawaiian Islands were mainly classified as moderate sea and rough sea. The evaluation results of various wave scale classifications forecasted using different sequences are shown in
Table 6. In each sequence forecasting experiment, the classification results for moderate sea and rough sea were better, and the classification accuracies of these waves gradually decreased as the forecasting time increased. The classification accuracy of slight sea showed a trend of first decreasing and then increasing, while the classification accuracy of very rough sea showed a trend of first increasing and then decreasing.
Figure 10 shows the MAPE results for different wave scale levels. The MAPE for each wave scale level increased with the forecasting time, exhibiting an upward trend and depending on the data volume. The MAPE growth rate was slow for moderate sea and rough sea and the fastest for slight sea. Considering the accidental errors caused by the uneven sample distribution, the reliability of the transformer model in forecasting SWH for different wave scale levels was affected to some extent.
In summary, the transformer model, as with GRU and LSTM, could effectively forecast SWH with higher accuracies than the MANSNUM numerical model for continuous 72 h. However, during continuous forecasting, the transformer model’s generalizability was slightly better than that of the GRU and LSTM models. For long-sequence forecasting, the transformer model could focus on key time steps and important features, resulting in a better performance in short-term forecasting. Regarding the scale classification and warning of wave scale levels based on the forecasted results, the transformer model performed better in terms of short-term scale classification and warnings, achieving higher accuracy in warnings for moderate sea and rough sea, but performing poorly for slight sea and very rough sea.
5. Conclusions
Based on North Pacific Ocean buoy data, this study proposed the use of a transformer model to achieve continuous SWH forecasting using buoy wind field features, wave features, and environmental features. The transformer model weighted different parts of the input sequence, which helped the model to better identify important information and, thus, better capture the relationships between data. This research showed the following:
The transformer model extracted key information from wave data, realizing the continuous forecasting of waves and early warning of wave scale levels, with higher forecasting accuracies than those of the MASNUM numerical model and GRU and LSTM.
Unlike the GRU and LSTM models, our transformer method was less affected by the time length of the input sequence.
In the long-sequence forecasting process, the transformer model significantly outperformed the GRU and LSTM models in accurately forecasting future short-term wave height.
The wave scale levels in the sea area where the buoy was located were mainly moderate sea and rough sea, and the transformer model performed better in SWH forecasting and scale classification for these.
The transformer model considered both accuracy and continuity for forecasting, providing a reliable reference for continuous SWH forecasting and the early warning and forecasting of wave scale levels. The transformer model showed an advantage in the overall accuracy of long-sequence forecasting compared with that of the GRU and LSTM models, while it performed similarly to the other two models in terms of long-term wave scale classification and warnings. Due to training sample imbalance, there was a lack of reliability in classifying wave scale levels with a small number of samples. To address this issue, it is necessary to add “negative samples” to improve the model’s ability to fit negative samples.
The main difficulties in wave forecasting are the random nature and nonstationarity of wave data. The key to long-term sequence forecasting lies in the accuracy of long-term trend fitting. Therefore, the next phase of research will focus on handling the nonstationarity of wave data and improving the long-sequence forecasting ability of the transformer model. Data decomposition methods can be used to deconstruct nonstationary time series data into low-, medium-, and high-frequency signals, as well as trend, seasonal, and noise components, thus helping the model to fit different components of the data more accurately. The combination of these methods with the transformer model may allow the dependencies between data from multiple perspectives to be captured, providing research references for more accurate sequence forecasting.