Short-Term Wind Speed Prediction for Bridge Site Area Based on Wavelet Denoising OOA-Transformer

Gao, Yan; Cao, Baifu; Yu, Wenhao; Yi, Lu; Guo, Fengqi

doi:10.3390/math12121910

Open AccessArticle

Short-Term Wind Speed Prediction for Bridge Site Area Based on Wavelet Denoising OOA-Transformer

by

Yan Gao

¹

,

Baifu Cao

¹

,

Wenhao Yu

²,

Lu Yi

² and

Fengqi Guo

^3,*

¹

School of Automation, Central South University, Changsha 410006, China

²

CCCC Second Harbor Engineering Co., Ltd., No.5 Branch, Wuhan 430040, China

³

School of Civil Engineering, Central South University, Changsha 410075, China

^*

Author to whom correspondence should be addressed.

Mathematics 2024, 12(12), 1910; https://doi.org/10.3390/math12121910

Submission received: 17 May 2024 / Revised: 14 June 2024 / Accepted: 18 June 2024 / Published: 20 June 2024

(This article belongs to the Special Issue Artificial Intelligence and Data Science)

Download

Browse Figures

Versions Notes

Abstract

:

Predicting wind speed in advance at bridge sites is essential for ensuring bridge construction safety under high wind conditions. This study proposes a short-term speed prediction model based on outlier correction, Wavelet Denoising, the Osprey Optimization Algorithm (OOA), and the Transformer model. The outliers caused by data entry and measurement errors are processed by the interquartile range (IQR) method. By comparing the performance of four different wavelets, the best-performing wavelet (Bior2.2) was selected to filter out sharp noise from the data processed by the IQR method. The OOA-Transformer model was utilized to forecast short-term wind speeds based on the filtered time series data. With OOA-Transformer, the seven hyperparameters of the Transformer model were optimized by the Osprey Optimization Algorithm to achieve better performance. Given the outstanding performance of LSTM and its variants in wind speed prediction, the OOA-Transformer model was compared with six other models using the actual wind speed data from the Xuefeng Lake Bridge dataset to validate our proposed model. The experimental results show that the mean absolute percentage error (MAPE), root mean square error (RMSE), and coefficient of determination (

R^{2}

) of this paper’s method on the test set were 4.16%, 0.0152, and 0.9955, respectively, which are superior to the other six models. The prediction accuracy was found to be high enough to meet the short-term wind speed prediction needs of practical projects.

Keywords:

interquartile range; wavelet denoising; osprey optimization algorithm; transformer; time-series prediction

MSC:

68T07

1. Introduction

During construction, large-span bridges have yet to form their final system and have relatively low structural rigidity, making them highly sensitive to wind loads. Ensuring wind resistance safety in engineering construction is particularly important. Currently, measures such as wind-resistant cables and wind-resistant bearings are commonly used during the construction period of large-span bridges [1,2,3,4,5], and it is mandated that aerial construction work is prohibited under severe weather conditions with strong winds of level 6 and above [6]. However, daily weather forecasts cannot account for local effects at the bridge site, they fail to provide accurate wind speed information specific to the bridge location, and nor can they predict future wind speed trends. Therefore, large-span bridges such as arch, cable-stayed, and suspension bridges still face severe risks of wind-induced disasters. To achieve the goal of “early warning and early judgment”, it is necessary to precisely predict the short-term wind speed changes in the area of cable-stayed bridges, thus providing a basis for wind disaster auxiliary decision making, disaster prevention, and mitigation during the construction period of large-span bridges.

The wind speed data in the bridge site area have many complex characteristics, such as multiple outliers, high noise, strong non-stationarity, strong nonlinearity, and apparent multi-scale features. To address the complex characteristics of wind speed data in bridge site areas, we propose a hybrid short-term wind speed forecasting model: Wavelet Denoising OOA-Transformer. This hybrid model includes IQR outlier correction, Wavelet Denoising (WD) [7,8], the Osprey Optimization Algorithm (OOA) [9], and the Transformer model. Initially, the complex terrain at bridge sites may influence wind speed data due to turbulence effects, leading to numerous outliers or anomalies within the collected data. We have implemented an outlier correction method utilizing Interquartile Range (IQR) to tackle these challenges. By calculating the upper and lower quartiles and the IQR of the wind speed series, data points that exceed these bounds are identified as outliers and are subsequently corrected or removed, thus purging the data series of these anomalies and enhancing the overall data quality. Furthermore, various types of noise might affect the wind speed data at the bridge site, such as those emanating from the measuring equipment or environmental electromagnetic interferences. Wavelet Denoising (WD) is applied to the wind speed series to mitigate the noise characteristics and reduce noise. After comparing the signal-to-noise ratio, mean square error, and waveform similarity among four commonly utilized wavelets, the optimal wavelet is selected. The WD process then splits the wind speed series into low-frequency approximate components and high-frequency detail components. The detail components undergo soft thresholding to eliminate high-frequency noise disturbances, and this is followed by a reconstruction process to yield a denoised wind speed series. Additionally, WD aids in extracting the multi-scale features within the wind speed data, thereby furnishing more comprehensive information for subsequent modeling efforts. During the development phase of the Transformer model, the Osprey Optimization Algorithm (OOA) is deployed to optimize critical parameters such as the number of attention heads, the dimensions of feed-forward layers, and the number of layers in both the Encoder and Decoder. The OOA [9] combines the fast convergence of the fish swarm algorithm and the global search ability of the eagle swarm algorithm. The process of parameter optimization balances worldwide exploration and local exploitation, which helps find the optimal model structure of the Transformer and improves the generalization performance of the model. Ultimately, the refined Transformer model is used to predict the actual wind speed series. Compared with traditional machine learning models and shallow neural networks, the Transformer, through its self-attention mechanism and parallel computing framework, excels at capturing both the short-term and long-term dependencies, as well as the nonlinear windspeed characteristics within the wind speed series. By constructing hidden representations of the wind speed sequence in the Encoder and generating future predictions in the Decoder, the Transformer model delivers precise multi-step wind speed forecasts. Through methods like IQR outlier correction, WD noise reduction, OOA optimization, and Transformer modeling, the precision and robustness of short-term wind speed prediction at bridge sites are markedly enhanced, thus providing substantial support for the safety of bridge construction.

The main contributions of this paper are outlined as follows:

A novel outlier processing method using quartile analysis and a Wavelet Denoising strategy is proposed. Outliers are identified by calculating the upper and lower quartiles of the data; the parts of the wavelet coefficients that fall below a certain threshold are either reduced or set to zero. By combining outlier processing and noise reduction, the quality of the wind speed data series is effectively enhanced.
A short-term wind speed prediction model for the bridge site area with the OOA-transformer model is proposed. The model adopts the Transformer to capture the short-term and long-term dependencies and nonlinear windspeed characteristics in the windspeed series. The OOA is used to optimize the key parameters of the Transformer model to improve the prediction accuracy and stability.
Experiments conducted on the actual wind speed dataset of the Xuefeng Lake Bridge demonstrate that the proposed model outperforms other models in all key evaluation metrics (MAPE, RMSE, and $R^{2}$ ).

The rest of this paper is organized as follows: In Section 2, the existing literature is organized and analyzed. This section provides the reader with a comprehensive overview of the progress made by experimentalists in wind speed prediction. In Section 3, a wind speed dataset based on the the Xuefeng Lake Bridge is presented, and the experimental methods used are described. Section 4 summarizes the main findings of this paper. This section describes a step-by-step process and analyzes the results, emphasizing the reliability and validity of the findings. Finally, Section 5 summarizes this paper, and it includes a summary of the key findings, broader implications, and limitations of our research work.

2. Related Works

The methods of wind speed prediction can be categorized into two main types: physics-based models and data-driven models [10]. Physics-based models of wind speed prediction are achieved by understanding and simulating the physical processes of atmospheric flow. These models typically rely on the fundamental principles of meteorology. Given that wind is a multifaceted physical phenomenon, direct wind speed forecasts with physical advantages can be obtained by numerical weather prediction (NWP) models [11], and spatially correlated flow dynamics models can also be used to predict wind speed [12].

However, considering the complexity of physical model construction, forecasting accuracy, and applicability issues, data-driven models such as statistical models, machine learning, and hybrid forecasting models have emerged. Statistical models based on the back-and-forth correlation of wind speed time series, according to the characteristics of the series and using the classical statistical model [13,14,15] to improve, are based on historical data to predict the trend change of wind speed. Statistical models have achieved good prediction results when dealing with smooth sequences, but wind speed sequences are more volatile and have obvious nonlinear features, so it is difficult to accurately predict complex changes using only statistical models. The autoregressive integrated moving average (ARIMA) model [16] is a classic model for wind speed prediction, and Yu et al. [17] utilized an ARIMA model to predict the bounded non-periodic data in coastal bridges. Unlike statistical methods, machine learning methods can better fit the nonlinear characteristics of wind speed and can recognize prediction patterns in the presence of relationship uncertainty in historical data [18]. Therefore, researchers have conducted numerous studies on the application of machine learning methods in the field of wind speed prediction. For instance, Alexiadis et al. [19] proposed an artificial neural network model that can predict the wind speed of a turbine in the next 10 min. Shao et al. [20] built a wind speed prediction model based on gated recurrent unit (GRU) neural networks using sensor-monitored wind speed data sets. The results of Décio Alves et al. [21] highlight the new efficacy of machine learning in predicting wind conditions when using high-resolution temporal data, and they also demonstrate that deep learning models outperform traditional methods in improving the accuracy of wind speed predictions. These methods frequently depend on single neural network models, which may lead to problems like overfitting in short-term wind speed predictions or local optimization. Hybrid models, which integrate the best features of different models, have been proposed as a way to increase forecast accuracy.

Composite neural networks, optimization algorithms, and data pretreatment are all combined in the hybrid forecasting model [22,23,24]. The data preprocessing strategy proposed by Qian et al. [25], consisting of outlier correction [26] and data decomposition, can significantly enhance the predictive performance of the entire model. After outlier correction, considering the inherent nonlinearity and noise characteristics of wind speed data, applying data decomposition techniques can effectively reduce the instability of wind speed time series and eliminate redundant information. Recent studies have shown that combining data decomposition methods with advanced machine learning models can significantly improve the accuracy of wind speed predictions [27,28]. For example, Liu et al. [29] utilized wavelet decomposition to reduce the volatility of time series and LSTM to extract time series features, thus constructing a hybrid model combining Wavelet Decomposition (WD) and LSTM for forecasting. Experiments have shown that this model effectively improves prediction accuracy when compared to use of the LSTM model alone [30]. Chen et al. [31] established a short-term wind speed prediction model using Wavelet Packet Decomposition (WPD), ARIMA, and LSTM, where the ARIMA model is utilized for predicting low-frequency time series and the LSTM model for high-frequency time series. By combining the advantages of both, the effectiveness is significantly enhanced. Liu et al. [32] employed the Empirical Mode Decomposition (EMD) algorithm to process the original wind speed series, utilized Long Short-Term Memory (LSTM) networks to predict high-frequency subsequences, and Autoregressive Integrated Moving Average (ARIMA) models to forecast the remaining low-frequency subsequences, thus establishing a wind speed prediction model that integrates Empirical Mode Decomposition (EMD) with a novel Recurrent Neural Network (RNN) and Autoregressive Integrated Moving Average (ARIMA).

Furthermore, the settings of parameters may affect the model’s predictive accuracy and training efficiency [33]. Researchers have developed adaptive short-term wind speed prediction models based on optimization algorithms to address these issues. For instance, Wang et al. [34] proposed a short-term wind speed prediction model for bridge sites based on Sparrow Search Algorithm (SSA), and Bidirectional Long Short-Term Memory Networks (BiLSTMs). Suo et al. [35] proposed a hybrid wind speed prediction model incorporating Time-Varying Filtering Empirical Mode Decomposition (TVFEMD), Partial Autocorrelation Function (PACF), Improved Chimpanzee Optimization Algorithm (IChOA), and Bidirectional Gated Recurrent Unit (BiGRU). This model uses TVFEMD to handle nonlinear sequences to capture temporal information at different time scales. It optimizes bidirectional gated recurrent networks with the Improved Chimpanzee Optimization Algorithm, significantly enhancing the model’s predictive accuracy and robustness. In summary, utilizing optimization algorithms enables the model to maintain optimal parameters [36,37,38,39,40], thereby potentially enhancing the accuracy of wind speed predictions.

Due to gradient vanishing and exploding issues in LSTM and BiLSTM models, which hinder their ability to capture long-distance dependencies, Vaswani et al. [41] proposed the Transformer model. Researchers are exploring applying the Transformer model to time series forecasting tasks. For instance, Zhang et al. [42] utilized the Transformer model to predict the power consumption of satellites, while Weng et al. [43] applied it to predict the deformation of dams. Additionally, some scholars [44,45,46,47] adopted decomposition methods to capture the global patterns of time series and used the Transformer model to capture more detailed structural features. This approach has achieved significant results in univariate time series forecasting tasks.

3. Materials and Methods

This section provides a detailed account of the dataset’s source, pre-training data preprocessing steps, and the construction and training process of the OOA-Transformer wind speed prediction model based on Wavelet Denoising. Firstly, we elaborate on the data acquisition specifics of the Xuefeng Lake Bridge, including timing, frequency, and the fundamental structure of the data. Subsequently, we delve into the pre-training data preprocessing steps in detail, encompassing the identification and handling of outliers, denoising procedures, and normalization techniques. These steps are aimed at optimizing data quality and aligning with the input requirements of the model. Finally, we introduce the integration of Wavelet Denoising technology with OOA algorithms and Transformer models, thus providing a comprehensive overview of model configuration, the training process, and strategies for effectively predicting wind speed data.

3.1. Data Sources and Pre-processing

Xuefeng Lake Bridge, located in Anhua County, Yiyang City, Hunan Province, is a key control project for the entire route of the Hunan Guanxin Expressway, which is designed as a double-tower, double-cable–surface steel–concrete composite girder cable-stayed bridge, with main structural spans of (60 + 160 + 500 + 160 + 60) meters, and the tallest No. 6 central tower is 202.4 m high. The topography of the reservoir at the bridge site is a U-shaped trough with a maximum construction water depth of 68 m. The bridge site is located above the reservoir in a mountainous area, with mountainous terrain on both sides and open water in the center. There are fewer studies on wind field prediction in similar terrain. Therefore, this paper conducted a wind speed prediction test at the Xuefeng Lake Bridge site and collected the data on the wind field by installing the ultrasonic anemometer on the construction crane for the experiment. The results of this study can provide a reference for wind field prediction experiments under similar geomorphologic conditions. In this paper, field measurement was carried out at the bridge site. Figure 1 shows the geographic location of the Xuefeng Lake Bridge. The wind measurement equipment was installed on the bridge construction crane, and the arrangement of measurement points is shown in Figure 2.

This paper presents a comprehensive wind speed dataset that was obtained from on-site measurements at the Xuefeng Lake Bridge, covering the period from 1 March 2018, at 00:00, to 1 June 2018, at 00:00. Figure 3 illustrates the actual wind speed series variation curve.

Considering that instrumental measurements often encounter issues such as instrumental failure, incorrect data entry, and measurement errors, it is crucial to apply robust methods for data cleaning. In this study, the interquartile range (IQR) rule was employed to correct outliers. The IQR is defined as the difference between the third quartile (Q3) and the first quartile (Q1), which helps to mitigate extreme values.

The raw wind speed data sequence, as shown in Table 1, initially ranged from 0.0000 m/s to 25.2060 m/s, with a mean of 8.0777 m/s and a standard deviation of 5.1068 m/s. The data display a positive skewness (0.6619) and a slightly negative kurtosis (−0.3347), indicating a distribution with a long tail on the higher value side and a peak that is less pronounced than a normal distribution.

From Table 1, several changes in the dataset characteristics can be observed before and after the processing of the IQR method. Firstly, the minimum value increases, suggesting that the corrective manipulation removes some shallow values, thereby raising the lower boundary of the data. Secondly, the maximum value decreases, indicating that the correction process effectively adjusts some extremely high values, thus narrowing the overall range of the data. Furthermore, the average value shows a slight decrease, with the change being relatively minor, which implies that the center of the data distribution shifts slightly to the left. However, the overall alteration is not significant. Additionally, the standard deviation decreases slightly, indicating a narrower and more concentrated data distribution. The skewness also decreases slightly, suggesting a reduction in the influence of extreme values and a shift toward a more symmetrical distribution. Finally, the kurtosis decreases slightly, reflecting a less peaked distribution. In conclusion, the processing of IQR anomalies leads to a reduction in the influence of extreme values, a narrowing in the distribution range, an increase in concentration, and an enhancement in both the symmetry and smoothness of the distribution.

3.2. Methods

In this paper, the IQR method was first used to handle outliers in the data, and this was followed by the application of wavelet transform to denoise the processed data, thus retaining key information while removing noise. Subsequently, the denoised data were normalized using min-max normalization to scale the values to the [0, 1] range in order to meet the input requirements of the Transformer model. During the training of the Transformer model, the OOA was employed to optimize the hyperparameters of the model to achieve optimal performance. Finally, the trained Transformer model was used to predict new input data, and the prediction results were rescaled to the original scale through inverse normalization, ultimately outputting the predicted values.

3.2.1. Wavelet Denoising

The principle of Wavelet Denoising involves using wavelet transform to perform multi-level decomposition on wind speed series at different scales. The wavelet coefficients are then processed, typically through methods like thresholding, and the denoised wind speed series is reconstructed through the inverse wavelet transform. Let the noisy wind speed series be denoted as

f (t)

. After applying the wavelet transform, the resulting wavelet coefficients,

W_{f} (a, b)

, are given by Equation (1):

W_{f} (a, b) = \frac{1}{\sqrt{a}} \int_{- \infty}^{+ \infty} f (t) ψ^{*} (\frac{t - b}{a}) d t,

(1)

where a is the level of decomposition, b is the translation parameter, and

ψ (t)

is the mother wavelet function. The wavelet decomposition splits the wind speed series

f (t)

into low-frequency and high-frequency components. Then, the thresholding processing is applied to the high-frequency coefficients obtained from the wavelet decomposition to remove noise. The modified wavelet coefficients

{\hat{W}}_{f} (a, b)

are defined as follows:

{\hat{W}}_{f} (a, b) = \{\begin{matrix} sgn (W_{f} (a, b)) (| W_{f} (a, b) | - λ), & if | W_{f} (a, b) | \geq λ \\ 0, & if | W_{f} (a, b) | < λ, \end{matrix}

(2)

where

λ

is a predefined threshold. If the absolute value of a coefficient is less than

λ

, then the coefficient is set to zero; otherwise, the coefficient is shrunk by the amount

λ

, meaning its absolute value is reduced by

λ

.

After thresholding, the processed wavelet coefficients

{\hat{W}}_{f} (a, b)

are used to reconstruct the denoised wind speed series through the inverse wavelet transform:

\hat{f} (t) = \frac{1}{C_{ψ}} \int_{0}^{+ \infty} \int_{- \infty}^{+ \infty} \frac{1}{a^{2}} {\hat{W}}_{f} (a, b) ψ (\frac{t - b}{a}) d a d b .

(3)

Here,

C_{ψ}

is the normalization constant of the wavelet function. The original function

f (t)

is reconstructed by performing a double integral over the wavelet coefficients

{\hat{W}}_{f} (a, b)

across different levels of decomposition a and translations b.

This study selects four wavelet functions (Db4, Sym4, Coif1, and Bior2.2) for Wavelet Denoising, using a four-level decomposition to perform wavelet decomposition of the wind speed series data. The soft thresholding method is applied to threshold the wavelet coefficients, and the processed wavelet coefficients are used for data reconstruction.

Three metrics are employed to assess the effectiveness of noise reduction techniques. The Signal-to-Noise Ratio (SNR) is the most intuitive measure of denoising effectiveness. A higher SNR indicates less noise in the signal and better noise reduction. The SNR is calculated using the power of the pure signal,

P_{signal}

, and the power of the pure noise,

P_{noise}

, as follows:

SNR (dB) = 10 {log}_{10} (\frac{P_{signal}}{P_{noise}}) .

(4)

Waveform similarity, measured by the Normalized Cross-Correlation (NCC), reflects the overall similarity of the signal waveforms before and after denoising. It does not characterize the detailed changes in waveform oscillations. Let

A_{s}

denote the pure signal and

A_{d}

the signal after filtering. The NCC is defined as follows:

NCC = \frac{\sum_{n = 1}^{N} A_{s} (n) A_{d} (n)}{\sqrt{(\sum_{n = 1}^{N} A_{s}^{2} (n)) (\sum_{n = 1}^{N} A_{d}^{2} (n))}} .

(5)

The Mean Square Error (MSE) is a metric that reflects the discrepancy between the estimated and the true values. Let

A_{signal}

be the pure signal and

A_{denoised}

the denoised signal. The MSE is computed as follows:

MSE = \frac{\sum_{i = 1}^{N} {|A_{signal} - A_{denoised}|}^{2}}{N} .

(6)

3.2.2. Transformer

The Google team introduced the Transformer model [41] in 2017, which is structured into two components: an Encoder and a Decoder. The Encoder features three key modules: Multi-Head Self-Attention, a Position-Wise Feed-Forward Network, and the Add and Normalize layer for normalization. The Decoder builds upon the Encoder’s existing modules by adding a cross-attention module and introducing a masking mechanism in the Multi-Head Self-Attention. The specific structure of the Transformer model is shown in Figure 4.

Positional Encoding: Since the Transformer model does not include recurrent or convolutional operations, positional encoding retains the temporal attributes associated with each data point, thus enabling the model to utilize this information to understand the sequential relationships in the series. Positional encoding is generated by alternating sine and cosine functions, with each dimension corresponding to a frequency:

${PE}_{(pos, 2 i)} = sin (\frac{pos}{10000^{2 i / d_{model}}}),$

(7)

${PE}_{(pos, 2 i + 1)} = cos (\frac{pos}{10000^{2 i / d_{model}}}),$

(8)

where pos is the position index, 2i and 2i + 1 are dimension indices, and $d_{model}$ represents the embedding dimension.
Multi-Head Attention allows the model to learn information simultaneously from different representational subspaces. This mechanism enhances the model’s focus and learning capabilities, thus enabling it to capture information at different levels of the sequence. Scaled dot-product attention is defined as follows:

$A (Q, K, V) = softmax (\frac{Q K^{T}}{\sqrt{d_{k}}}) V,$

(9)

where $Q, K$ , and V are the query, key, and value matrices, respectively, and $d_{k}$ is the dimension of the critical vectors. In the multi-head attention mechanism, the scaled dot-product operations for different heads are processed through different linear transformations. The outputs are then merged:

$MultiHead (Q, K, V) = Concat ({head}_{1}, \dots, {head}_{h}) W^{O} .$

(10)

This design not only reduces the vector dimension of the scaled dot-product calculation, but also captures the intrinsic associations of the sequence from multiple perspectives through the parallel processing of multiple heads. In the Decoder part, the multi-head attention mechanism uses masking to hide information after the current position, thus ensuring that predictions rely solely on known sequence information.
Feed-Forward Network (FFN) provides additional nonlinear processing capabilities for each layer of the Transformer model. The feed-forward network consists of two linear transformations separated by a nonlinear activation function. The FFN formula is as follows:

$FFN (x) = max (0, x W_{1} + b_{1}) W_{2} + b_{2},$

(11)

where x represents the input to the layer; $W_{1}$ and $b_{1}$ are the weights and biases of the first linear layer; $W_{2}$ and $b_{2}$ are the weights and biases of the second linear layer; and $max (0, \cdot)$ is the ReLU activation function, which operates on each element.

3.2.3. OOA-Transformer

Before constructing the model, the data are processed using the min-max normalization method. This step helps eliminate the impact of differing scales among variables and prevents complications during model training due to differences in the numerical ranges of variables.

Manual selection of Transformer parameters can lead to errors. In this paper, we employed the Osprey Optimization Algorithm (OOA) [9] to optimize seven hyperparameters of the Transformer model, thereby enhancing the network’s predictive performance. The specific steps for optimizing the Transformer model using the OOA are as follows:

Initialization: Identify the hyperparameters of the Transformer model to be optimized, i.e., the size of the osprey population (number of Transformer models), the dimension of the population (set to 1 for single-variable wind speed prediction), and the maximum number of iterations. First, form an initial solution set according to Equations (12) and (13):

$θ_{i} = {θ_{i, j}},$

(12)

$θ_{i, j} = l b_{j} + r \times (u b_{j} - l b_{j}),$

(13)

where $θ_{i}$ is the set of parameters for the i-th Transformer model, $θ_{i, j}$ is the j-th parameter of the Transformer model, j ranges from 0 to 7, $l b_{j}$ is the lower boundary of the optimization range, $u b_{j}$ is the upper boundary, and r is a random number between 0 and 1.
Global Exploration (First Phase): For each Transformer model $θ_{i}$ , first define the fitness function $L (θ_{i})$ :

$L (θ_{i}) = \frac{1}{n} \sum_{j = 1}^{n} | y_{j} - {\hat{y}}_{j} |,$

(14)

where n is the number of training samples, $y_{j}$ is the actual label of the j-th sample, and ${\hat{y}}_{j}$ is the prediction of the model for that sample. Then, determine the set of models in the population with a smaller loss function value (better performance) $F P_{i}$ :

$F P_{i} = {θ_{k} | k \in {1, 2, \dots, N}, L (θ_{k}) < L (θ_{i})} \cup {θ_{best}},$

(15)

where $θ_{best}$ is the parameter set of the model with the most minor loss in the current population. Next, randomly select a model $θ_{S F}$ from the set $F P_{i}$ as the target for the osprey’s predation. Update the direction of $θ_{i}$ toward $θ_{S F}$ using the osprey algorithm’s two strategies (i.e., chasing fish and avoiding strong opponents) as follows:

$θ_{i, j}^{P 1} = θ_{i, j} + r_{i, j} \times (2 I_{i, j} - 1) \times (θ_{S F_{i}, j} - θ_{i, j}),$

(16)

where $θ_{i, j}$ and $θ_{S F_{i}, j}$ represent the values of the current and target models at the j-th parameter, respectively; $r_{i, j}$ is a random number within an interval used to update the step size; and $I_{i, j}$ is a randomly chosen integer value of 0 or 1, where $I_{i, j} = 1$ brings the model closer to the target parameter and $I_{i, j} = 0$ moves it away, thus avoiding local optima. Boundary Checks and Adjustments: Each parameter $θ_{i, j}^{P 1}$ is then checked and adjusted for boundaries as follows:

$θ_{i, j}^{P 1} = \{\begin{matrix} θ_{i, j}^{P 1} & if l b_{j} \leq θ_{i, j}^{P 1} \leq u b_{j} \\ l b_{j} & if θ_{i, j}^{P 1} < l b_{j} \\ u b_{j} & if θ_{i, j}^{P 1} > u b_{j} . \end{matrix}$

(17)

If the final $L (θ_{i, j}^{P 1}) < L (θ_{i})$ , then replace $θ_{i}$ with $θ_{i, j}^{P 1}$ .
Local Exploitation (Second Phase): For each iteration $t = 1, 2, \dots, T$ , and for each Transformer model $θ_{i}$ , generate a new candidate solution $θ_{i, j}^{P 2}$ using the following equation:

$θ_{i, j}^{P 2} = θ_{i, j} + \frac{l b_{j} + r \times (u b_{j} - l b_{j})}{t} .$

(18)

Then, check and adjust each parameter of $θ_{i, j}^{P 2}$ for boundary constraints as follows:

$θ_{i, j}^{P 2} = \{\begin{matrix} θ_{i, j}^{P 2} & if l b_{j} \leq θ_{i, j}^{P 2} \leq u b_{j} \\ l b_{j} & if θ_{i, j}^{P 2} < l b_{j} \\ u b_{j} & if θ_{i, j}^{P 2} > u b_{j} . \end{matrix}$

(19)

If the final $L (θ_{i}^{P 2}) < L (θ_{i})$ , replace $θ_{i}$ with $θ_{i}^{P 2}$ .
Repeat: Repeat Steps 2 and 3 until the termination condition (maximum number of iterations) is met.

Figure 5 illustrates the parameter selection and training process of the Transformer model when it is optimized using the Osprey Algorithm.

3.2.4. The Overall Framework of Our Proposed Model

The diagram in Figure 6 illustrates a comprehensive framework for processing and predicting wind speed data. The process is divided into three main steps:

Step 1: The process begins with the Interquartile Range (IQR) method applied to the initial wind speed data to remove potential errors and outliers. This method calculates the IQR and excludes the data points lying outside $Q 1 - 1.5 \times IQR$ and $Q 3 + 1.5 \times IQR$ , thus aiming to improve data reliability. Following this preprocessing, the data are denoised using a Bior2.2 wavelet, which addresses non-stationary features in the data through wavelet decomposition, soft thresholding, and reconstruction.
Step 2: After the Wavelet Denoising, min-max normalization is employed to scale the values to a [0, 1] range. This normalization facilitates the efficient processing of data. The Transformer model’s parameters are optimized by the Osprey Optimization Algorithm, thus enhancing model performance.
Step 3: In the final step, the processed data are fed into the OOA-Transformer model to predict future wind speeds. The predictions undergo inverse normalization to revert them to their original scale, thus allowing for comparison with the actual data over different time intervals, as displayed in the diagram.

4. Results and Discussion

This subsection is structured into two different parts, the first part analyzes and discusses the noise reduction results of different wavelets, and the second part analyzes and discusses the prediction results of different models on the same test data.

4.1. The Results of Wavelet Denoising

Figure 7 displays a comparative plot of the data after Wavelet Denoising versus the data processed with the IQR method for outlier removal (a random selection is shown).

Figure 7 shows that the Bior2.2 wavelet effectively manages sudden sharp peaks, such as drastic changes in wind speed, while concurrently preserving a substantial amount of valuable information in the wavelet.

Table 2 presents the values of three metrics—the Signal-to-Noise Ratio (SNR), Mean Squared Error (MSE), and Normalized Cross-Correlation (NCC)—under four different wavelets. The SNR is typically expressed in decibels (dB).

As shown in Table 2, the Bior2.2 wavelet outperformed the others in all evaluated metrics. It had the highest SNR, indicating better noise suppression while retaining useful signals; the lowest MSE, demonstrating minimal error; and the highest NCC, which shows that the denoised data maintain the best correlation with the original data. Based on these findings, the Bior2.2 wavelet was selected for denoising in this study.

4.2. Compared with Other Prediction Models

LSTM and its variants have outstanding performance in wind speed prediction. As such, in order to verify the effectiveness of the proposed model in this study, the model was compared with six models: LSTM, LSTM-AT, BiLSTM-AT, CNN-BiLSTM-AT, OOA-CNN-BiLSTM-AT, and Transformer.

4.2.1. Parameter Settings

In this study, the proposed model was experimented with using Python 3.9.18, Keras 2.6.0, and Tensorflow 2.6.0 on a GeForce RTX 3090.

The parameter settings for the LSTM, LSTM-AT, BiLSTM-AT, and CNN-BiLSTM-AT models [48] regarding the time step lookback value, regularization rate dropout, and learning rate lr are presented in Table 3.

Parameter Settings in OOA-CNN-BiLSTM-AT: This model utilizes the lookback value, which represents the number of time steps of the input features and the target sequence, along with the regularization rate dropout and the learning rate as its hyperparameters. The ranges for these values are set as follows: the regularization rate ranges from 0.1 to 0.5, the lookback value ranges from 1 to 200, and the learning rate ranges from 0.0001 to 0.01. Additionally, the population size in the Osprey Optimization Algorithm is set to 10, and the maximum number of iterations is 20.

Parameter Settings for the Transformer Model without Optimization Algorithms:

window_len: This parameter defines the length of the input sequence for the model. In time series analysis (such as wind speed prediction), a window_len of 1 means using data from the previous hour to predict the wind speed at the next time point (or points);
target_len: This specifies the output sequence length from the model. A target_len of 1 indicates that the model predicts wind speed for a single future time point;
num_encoder_layers: The number of encoder layers in the Transformer model is 2. Encoder layers are responsible for processing the input sequence, and each additional layer can increase the model’s ability to capture complex relationships;
num_decoder_layers: The number of decoder layers in the Transformer model is also 2. Decoder layers are used to generate the output sequence;
d_model: The dimensionality of the output for all layers within the model is 64. For both encoder and decoder layers, d_model is the dimension size of their output vectors;
num_heads: The model uses a multi-head attention mechanism with two heads. This allows the model to learn information from different representational subspaces at the same time;
feedforward_dim: The dimension of the feedforward network in both encoder and decoder layers is 64. This represents the size of the hidden layer, which performs a separate, fully connected transformation on each position following the self-attention layer;
Dropout: The dropout rate for regularization in the model is 0.1. Dropout is a common regularization technique that prevents overfitting by randomly dropping (setting to zero) a portion of the feature values;
positional_encoding: “sinusoidal”. Since the Transformer model lacks recurrent or convolutional structures, it uses positional encoding to incorporate sequence order information, and “sinusoidal” refers to using fixed sinusoidal and cosinusoidal functions for positional encoding.

In this study, the parameters window_len, num_encoder_layers, num_decoder_layers, d_model, num_heads, feedforward_dim, and dropout were chosen as the hyperparameters for optimization in the OOA-Transformer model. Table 4 presents the range that was used for optimizing these seven parameters.

Additionally, the population size for the Osprey Optimization Algorithm was set to 10, with a maximum of 20 iterations.

4.2.2. Evaluation Criteria

To objectively assess the wind speed prediction performance of the proposed model, three metrics were used: Mean Absolute Percentage Error (MAPE), Root Mean Square Error (RMSE), and the Coefficient of Determination (

R^{2}

). The expressions for these metrics are given as follows:

MAPE = (\frac{100 %}{n} \sum_{i = 1}^{n} |\frac{y_{i} - h_{i}}{h_{i}}|),

(20)

RMSE = \sqrt{\frac{\sum_{i = 1}^{n} {(y_{i} - h_{i})}^{2}}{n}},

(21)

R^{2} = 1 - \frac{\sum_{i = 1}^{n} {(y_{i} - h_{i})}^{2}}{\sum_{i = 1}^{n} {(y_{i} - \bar{y})}^{2}},

(22)

where

y_{i}

denotes the actual values,

\bar{y}

is the mean of these actual values,

h_{i}

represents the predicted values, and n is the length of the series.

4.2.3. The Experiment Results

The performance accuracy of the Transformer, CNN-BiLSTM-AT model significantly hinges on the selection of its parameters. The choice of hyperparameter values is based on empirical experience and often requires multiple iterations of trial and error to attain notably high prediction accuracy. In this paper, we introduce the Osprey Optimization Algorithm (OOA) to optimize the hyperparameters of the Transformer, CNN-BiLSTM-AT model. The range of parameters subject to optimization, along with the specific parameters to be optimized and the other initial parameters of the network, are detailed in Section 3.2.1. The

L (θ_{i})

is the fitness function to ascertain the optimal parameters. Figure 8 depicts the fitness variation curves derived from the Osprey Optimization Algorithm for the optimization computation of the CNN-BiLSTM-AT model, Transformer.

Figure 8 (left) displays the optimized adaptation curve for the CNN-BiLSTM-AT model, while Figure 8 (right) illustrates the same for the Transformer model. As shown in Figure 8, both models exhibited a continuous decrease in the fitness function (

L (θ_{i})

) following optimization with the OOA, thus indicating a significant improvement over their pre-optimization performance. Moreover, the figure reveals that the fitness value of the Transformer model was lower than that of the CNN-BiLSTM-AT model when both were optimized with the same OOA. This lower fitness value suggests the superior performance of the Transformer model compared to the CNN-BiLSTM-AT model.

Table 5 show the hyperparameters of the OOA-optimized CNN-BiLSTM-AT and Transformer models.

In this study, the mainstream models Transformer, OOA-CNN-BiLSTM-AT, CNN-BiLSTM-AT, BILSTM-AT, LSTM-AT, and LSTM were taken to be compared with the OOA-Transformer model proposed in this study, and the data from some randomly selected test sets will be shown. Figure 9 demonstrates the six model prediction effects.

Figure 9 shows that the OOA-Transformer model proposed in this paper is closer to the actual value. Judging the prediction results by graphs is subjective and prone to errors. In order to further confirm the prediction accuracy of the models, all the prediction models were compared with each other by three performance indexes. Table 6 demonstrates the results of different models on these three indicators.

Figure 10, Figure 11 and Figure 12 show the results of different models for three different metrics in the test set.

As shown in Figure 10, the OOA-Transformer model displayed the lowest MAPE value, indicating the highest prediction accuracy among all the models tested. The traditional LSTM and its variants (LSTM-AT and BiLSTM-AT, which incorporate the attention mechanism) showed higher MAPE values, suggesting lower prediction accuracy. The CNN-BiLSTM-AT model, combining a Convolutional Neural Network (CNN) with a Bidirectional Long Short-Term Memory Network (BiLSTM) and integrating an attention mechanism, outperformed the standalone LSTM model, yet it remained inferior to the OOA-Transformer model. The Transformer model also performed well but slightly less so than the OOA-Transformer.

Figure 11 clearly demonstrates that, in the comparison of predictive model performance, the OOA-Transformer model is better with its lowest RMSE value, thus proving its superior prediction accuracy among all the models compared. Additionally, the CNN-BiLSTM-AT model, which integrates various technologies, surpasses the basic LSTM model, but still shows certain disadvantages when compared to the Transformer model.

As depicted in Figure 12, the chart demonstrates the

R^{2}

values of different predictive models. The OOA-Transformer model notably reaches the highest

R^{2}

value, almost perfect at 1.00, due to its integration of optimization algorithms with transformer techniques. In contrast, the basic LSTM model records the lowest

R^{2}

value, reflecting its limited ability to handle complex data. The successive enhancements in

R^{2}

from LSTM-AT to BiLSTM-AT, and then CNN-BiLSTM-AT, show the advantages of adding attention mechanisms and convolutional layers. Despite the standard Transformer’s superiority over LSTM-based models, it falls behind the OOA-CNN-BiLSTM-AT model.

As Table 6 illustrates, although the LSTM, a fundamental model for sequence prediction tasks in deep learning, demonstrates excellent predictive performance, it is not the best compared to other models. This is primarily because the standard LSTM model can only capture information sequentially, thus ignoring potential backward dependencies that may lead to an incomplete utilization of information in some complex sequence prediction tasks. The introduction of the attention mechanism improved the predictive accuracy of the LSTM model, reducing the MAPE to 9.35%, the RMSE to 0.0383, and increasing the

R^{2}

to 0.9716. The addition of the attention mechanism allows the model to focus more on significant time steps within the sequence, but its core still relies on unidirectional information processing, which is limited in handling data scenarios with bidirectional dependencies. The introduction of Bidirectional LSTM (BiLSTM), capturing both forward and backward information in the sequence, further optimizes the results. However, BiLSTM may overfit in certain situations, and its high computational complexity limits its application in resource-constrained environments. The integration of Convolutional Neural Networks (CNN) improves the performance of the model due to its ability to capture local features. Although CNNs can effectively handle spatial dependencies in time series data, they overlook long-term dependencies. The introduction of the Osprey Optimization Algorithm (OOA) significantly enhanced the accuracy of the model predictions, thus overcoming the limitations of traditional optimization algorithms in high-dimensional and complex adaptive landscapes. The Transformer model, with its unique self-attention mechanism, can consider all elements in the time series simultaneously, providing a global view of time dependencies. Unlike sequence processing models like LSTM, which rely on recursive structures to gradually transmit information, Transformers can directly learn dependencies between any two points in the sequence. Additionally, the multi-head attention mechanism of the Transformer allows the model to handle multiple different representational subspaces at the same time, enabling it to capture complex patterns and relationships in the data more meticulously. The optimized Transformer model exhibits superior performance, further reducing the MAPE to 4.16%, the RMSE to 0.0152, and achieving an astonishing

R^{2}

of 0.9955. The proposed OOA-Transformer model can be applied to wind speed prediction and has significantly improved forecast accuracy, thus yielding highly satisfactory results.

5. Conclusions

In this study, a short-term wind speed prediction model for the bridge site area, based on the OOA-Transformer, was proposed. This model effectively reduces the impact of random fluctuations in wind speed data and accurately predicts wind speed levels every ten minutes for the Xuefeng Lake Bridge. The predictive results indicate that, in terms of the MAPE, RMSE, and

R^{2}

evaluation metrics, the proposed model outperforms the six models that were compared: Transformer, OOA-CNN-BiLSTM-AT, CNN-BiLSTM-AT, BiLSTM-AT, LSTM-AT, and LSTM.

In this research, the IQR method was successfully utilized to eliminate outliers caused by human errors or machine failures; among the four wavelets compared, the Bior2.2 wavelet with soft thresholding showed the best filtering performance with SNR, MSE, and NCC values at 22.9762, 0.4535, and 0.9931, respectively, thus effectively preserving significant features in the wind speed series. The Osprey Optimization Algorithm optimized parameters for the Transformer and CNN-BiLSTM-AT networks using the fitness function value to assess parameter quality.

Visually, the proposed wind speed prediction model (OOA-Transformer) achieves high accuracy, with MAPE scores at approximately 4%, RMSE at 0.0152, and

R^{2}

at 0.9955, thus surpassing several state-of-the-art models. These findings are significant for accurately predicting wind speeds in the bridge site area of similar terrain cable-stayed bridges.

Our model was mainly validated using the wind speed data from the Xuefeng Lake Bridge. The Bior2.2 wavelet effectively captured the significant features of these wind speed data. However, its effectiveness needs to be further validated using different datasets to confirm its general applicability. In the future, we intend to expand our dataset by incorporating additional wind speed data, relevant meteorological variables, and seasonal factors. We will also evaluate the performance of the Bior2.2 wavelet on this extended dataset to assess its broader utility.

Author Contributions

Conceptualization, Y.G. and B.C.; methodology, B.C.; software, B.C.; validation, F.G., Y.G. and B.C.; formal analysis, Y.G.; investigation, W.Y. and L.Y.; resources, W.Y. and L.Y.; data curation, W.Y. and L.Y.; writing—original draft preparation, B.C.; writing—review and editing, B.C.; visualization, Y.G.; supervision, F.G. and Y.G.; project administration, Y.G. and F.G.; funding acquisition, F.G. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the construction monitoring and technological support unit of Malukou Zishui Bridge (no.: H738012038).

Data Availability Statement

The data presented in this study are available upon request from the corresponding authors.

Conflicts of Interest

The Authors Lu Yi and Wenhao Yu were employed by the company CCCC.SECOND HARBOR ENGINEERING CO., LTD No.5 BRANCH. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Liu, Y.; Lin, P.; Liang, X. Study of Wind Resistance Measures for Installing of Arches of a Long Span Steel Truss Girder Bridge with Flexible Arches on High-Speed Railway. Bridge Constr. 2018, 48, 40–44. [Google Scholar]
Zhang, T.; Tang, M.; Li, M.; Li, C. Study of Aerostatic Safety of Long-Span Suspension Bridge in Process of Stiffening Girder Hoisting. Bridge Constr. 2019, 49, 6. [Google Scholar]
Ding, Y. In-Situ Casting Technique for 49.2m Concrete Box Beam in Complicated Sea Area Using Offshore Bridge Building Machine. World Bridg. 2020, 48, 5. [Google Scholar]
Changqing, W.; Zhitian, Z.; Xiaobo, W. Influences of wind cables on dynamic properties and aerostatic stability of pedestrian suspension bridges. Bridge Constr. 2017, 47, 77–82. [Google Scholar]
Fan, W.; Zhang, M.; Chen, N. Wind-Resistant Measures for Long Cantilever Erection of Steel Truss Girder of Huanggang Changjiang River Rail-cum-Road Bridge. Bridge Constr. 2013, 43, 5. [Google Scholar]
JTG/T3650—2020; Technical Specification for Construcion of Highway Bridge and Culvert. China First Highway Engineering Co., Ltd.: Beijing, China, 2020.
Tang, Q.; Shi, R.; Fan, T.; Ma, Y.; Huang, J. Prediction of financial time series based on LSTM using wavelet transform and singular spectrum analysis. Math. Probl. Eng. 2021, 2021, 9942410. [Google Scholar] [CrossRef]
Zhang, X. Financial Time Series Forecasting Based on LSTM Neural Network optimized by Wavelet Denoising and Whale Optimization Algorithm. Acad. J. Comput. Inf. Sci. 2022, 5, 1–9. [Google Scholar]
Dehghani, M.; Trojovskỳ, P. Osprey optimization algorithm: A new bio-inspired metaheuristic algorithm for solving engineering optimization problems. Front. Mech. Eng. 2023, 8, 1126450. [Google Scholar] [CrossRef]
Li, F.; Ren, G.; Lee, J. Multi-step wind speed prediction based on turbulence intensity and hybrid deep neural networks. Energy Convers. Manag. 2019, 186, 306–322. [Google Scholar] [CrossRef]
Yang, M.; Guo, Y.; Huang, Y. Wind power ultra-short-term prediction method based on NWP wind speed correction and double clustering division of transitional weather process. Energy 2023, 282, 128947. [Google Scholar] [CrossRef]
Wang, H.; Han, S.; Liu, Y.; Yan, J.; Li, L. Sequence transfer correction algorithm for numerical weather prediction wind speed and its application in a wind power forecasting system. Appl. Energy 2019, 237, 1–10. [Google Scholar] [CrossRef]
Zhang, Y.; Zhao, Y.; Shen, X.; Zhang, J. A comprehensive wind speed prediction system based on Monte Carlo and artificial intelligence algorithms. Appl. Energy 2022, 305, 117815. [Google Scholar] [CrossRef]
Zhang, Y.; Zhao, Y.; Kong, C.; Chen, B. A new prediction method based on VMD-PRBF-ARMA-E model considering wind speed characteristic. Energy Convers. Manag. 2020, 203, 112254. [Google Scholar] [CrossRef]
Yunus, K.; Thiringer, T.; Chen, P. ARIMA-based frequency-decomposed modeling of wind speed time series. IEEE Trans. Power Syst. 2015, 31, 2546–2556. [Google Scholar] [CrossRef]
Box, G.E.; Pierce, D.A. Distribution of residual autocorrelations in autoregressive-integrated moving average time series models. J. Am. Stat. Assoc. 1970, 65, 1509–1526. [Google Scholar] [CrossRef]
Yu, E.; Wei, H.; Han, Y.; Hu, P.; Xu, G. Application of time series prediction techniques for coastal bridge engineering. Adv. Bridge Eng. 2021, 2, 6. [Google Scholar] [CrossRef]
Wu, C.; Wang, J.; Chen, X.; Du, P.; Yang, W. A novel hybrid system based on multi-objective optimization for wind speed forecasting. Renew. Energy 2020, 146, 149–165. [Google Scholar] [CrossRef]
Alexiadis, M.; Dokopoulos, P.; Sahsamanoglou, H.; Manousaridis, I. Short-term forecasting of wind speed and related electrical power. Sol. Energy 1998, 63, 61–68. [Google Scholar] [CrossRef]
Shao, L.; Wen, S. Research on Obtaining Average Wind Speed of Roadway Based on GRU Neural Network. Gold Sci. Technol. 2021, 29, 709–718. [Google Scholar]
Alves, D.; Mendonça, F.; Mostafa, S.S.; Morgado-Dias, F. The Potential of Machine Learning for Wind Speed and Direction Short-Term Forecasting: A Systematic Review. Computers 2023, 12, 206. [Google Scholar] [CrossRef]
Pan, H.; Tang, Y.; Wang, G. A Stock Index Futures Price Prediction Approach Based on the MULTI-GARCH-LSTM Mixed Model. Mathematics 2024, 12, 1677. [Google Scholar] [CrossRef]
Ma, G.; Yue, X.; Zhu, J.; Liu, Z.; Lu, S. Deep learning network based on improved sparrow search algorithm optimization for rolling bearing fault diagnosis. Mathematics 2023, 11, 4634. [Google Scholar] [CrossRef]
Noh, J.; Park, H.J.; Kim, J.S.; Hwang, S.J. Gated recurrent unit with genetic algorithm for product demand forecasting in supply chain management. Mathematics 2020, 8, 565. [Google Scholar] [CrossRef]
Qian, Z.; Pei, Y.; Zareipour, H.; Chen, N. A review and discussion of decomposition-based hybrid models for wind energy forecasting applications. Appl. Energy 2019, 235, 939–953. [Google Scholar] [CrossRef]
Zhu, Q.; Ye, L.; Zhao, Y.; Lang, Y.; Song, X. Methods for elimination and reconstruction of abnormal power data in wind farms. Dianli Xitong Baohu Yu Kongzhi/Power Syst. Prot. Control 2015, 43, 38–45. [Google Scholar]
Ma, Q.; Liu, S.; Fan, X.; Chai, C.; Wang, Y.; Yang, K. A time series prediction model of foundation pit deformation based on empirical wavelet transform and NARX network. Mathematics 2020, 8, 1535. [Google Scholar] [CrossRef]
Heidari, A.A.; Akhoondzadeh, M.; Chen, H. A wavelet PM2. 5 prediction system using optimized kernel extreme learning with Boruta-XGBoost feature selection. Mathematics 2022, 10, 3566. [Google Scholar] [CrossRef]
Liu, B.; Zhao, S.; Yu, X.; Zhang, L.; Wang, Q. A Novel Deep Learning Approach for Wind Power Forecasting Based on WD-LSTM Model. Energies 2020, 13, 4964. [Google Scholar] [CrossRef]
Alex Graves. Long short-term memory. In Supervised Sequence Labelling with Recurrent Neural Networks; Springer: Berlin/Heidelberg, Germany, 2012; pp. 37–45. [Google Scholar]
Hongfeng, C.; He, W.; Yan, L.; Min, X. Short-Term Wind Speed Prediction by Combining Two-Step Decomposition and ARIMA-LSTM. Acta Energiae Solaris Sin. 2024, 45, 164–171. [Google Scholar]
Liu, M.D.; Ding, L.; Bai, Y.L. Application of hybrid model based on empirical mode decomposition, novel recurrent neural networks and the ARIMA to wind speed prediction. Energy Convers. Manag. 2021, 233, 113917. [Google Scholar] [CrossRef]
Bai, Y.T.; Jia, W.; Jin, X.B.; Su, T.L.; Kong, J.L.; Shi, Z.G. Nonstationary time series prediction based on deep echo state network tuned by Bayesian optimization. Mathematics 2023, 11, 1503. [Google Scholar] [CrossRef]
Shaoqin, W.; Jingshu, L.; Minghao, G. Short-Term Wind Speed Prediction in Bridge Area Based on EMD-SSA-BiLSTM Method. Comput. Integr. Manuf. Syst. 2023, 12, 1–8. [Google Scholar]
Suo, L.; Peng, T.; Song, S.; Zhang, C.; Wang, Y.; Fu, Y.; Nazir, M.S. Wind speed prediction by a swarm intelligence based deep learning model via signal decomposition and parameter optimization using improved chimp optimization algorithm. Energy 2023, 276, 127526. [Google Scholar] [CrossRef]
Li, Y.; Sun, K.; Yao, Q.; Wang, L. A dual-optimization wind speed forecasting model based on deep learning and improved dung beetle optimization algorithm. Energy 2024, 286, 129604. [Google Scholar] [CrossRef]
Zhu, A.; Zhao, Q.; Yang, T.; Zhou, L.; Zeng, B. Wind speed prediction and reconstruction based on improved grey wolf optimization algorithm and deep learning networks. Comput. Electr. Eng. 2024, 114, 109074. [Google Scholar] [CrossRef]
Sun, P.; Liu, Z.; Wang, J.; Zhao, W. Interval forecasting for wind speed using a combination model based on multiobjective artificial hummingbird algorithm. Appl. Soft Comput. 2024, 150, 111090. [Google Scholar] [CrossRef]
Guo, X.; Zhu, C.; Hao, J.; Kong, L.; Zhang, S. A Point-Interval Forecasting Method for Wind Speed Using Improved Wild Horse Optimization Algorithm and Ensemble Learning. Sustainability 2023, 16, 94. [Google Scholar] [CrossRef]
Conte, T.; Oliveira, R. Comparative Analysis between Intelligent Machine Committees and Hybrid Deep Learning with Genetic Algorithms in Energy Sector Forecasting: A Case Study on Electricity Price and Wind Speed in the Brazilian Market. Energies 2024, 17, 829. [Google Scholar] [CrossRef]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30. [Google Scholar] [CrossRef]
Zhang, Z.; Chang, L.; Tian, M.; Deng, L.; Chang, J.; Dong, L. Power Consumption Time Series Forecast Based on CATPCA for Optimal Transformer Satellite. Trans. Beijing Inst. Technol. 2023, 43, 744–754. [Google Scholar]
Minghao, W.; Xinghua, X.; Juntao, C.; Guangjun, S.; Weifei, H. Dam Deformation Prediction Research Based on LSTM and Transformer. China Rural. Water Hydropower 2024, 4, 250–257. [Google Scholar] [CrossRef]
Wu, H.; Xu, J.; Wang, J.; Long, M. Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting. Adv. Neural Inf. Process. Syst. 2021, 34, 22419–22430. [Google Scholar]
Zhou, T.; Ma, Z.; Wen, Q.; Wang, X.; Sun, L.; Jin, R. Fedformer: Frequency enhanced decomposed transformer for long-term series forecasting. In Proceedings of the International Conference on Machine Learning, Baltimore, MD, USA, 17–23 July 2022; pp. 27268–27286. [Google Scholar]
Zhou, H.; Zhang, S.; Peng, J.; Zhang, S.; Li, J.; Xiong, H.; Zhang, W. Informer: Beyond efficient transformer for long sequence time-series forecasting. In Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada, 20–27 February 2021; Volume 35, pp. 11106–11115. [Google Scholar]
Qu, K.; Si, G.; Shan, Z.; Kong, X.; Yang, X. Short-term forecasting for multiple wind farms based on transformer model. Energy Rep. 2022, 8, 483–490. [Google Scholar] [CrossRef]
Shan, L.; Liu, Y.; Tang, M.; Yang, M.; Bai, X. CNN-BiLSTM hybrid neural networks with attention mechanism for well log prediction. J. Pet. Sci. Eng. 2021, 205, 108838. [Google Scholar] [CrossRef]

Figure 1. Geographic location of the bridge site.

Figure 2. Location of the measurement point.

Figure 3. Wind speed curve variation graph.

Figure 4. Illustration of the Transformer model. ‘*’ denotes multiplication.

Figure 5. Diagram of the OOA-Transformer algorithm.

Figure 6. Overall framework diagram. ‘*’ denotes multiplication.

Figure 7. Comparison plot of different Wavelet Denoising.

Figure 8. Change curve of fitness value.

Figure 9. Predictive effects of the different models on the test set.

Figure 10. MAPE (%) values of our models on the test dataset.

Figure 11. RMSE values of our models on the test dataset.

Figure 12.

R^{2}

values of our models on the test dataset.

Figure 12.

R^{2}

values of our models on the test dataset.

Table 1. Wind Speed Data Sequence Statistical Information.

Dataset	Min	Max	Mean	Std	Skewness	Kurtosis
Raw	0.0000	25.2060	8.0777	5.1068	0.6619	−0.3347
IQR	0.2052	22.8112	8.0336	5.0457	0.6317	−0.4290

Table 2. Evaluation metrics for different wavelets.

Wavelet	SNR	MSE	NCC
Db4	22.2305	0.5384	0.9917
Sym4	22.1369	0.5502	0.9918
Coif1	21.7374	0.6032	0.9910
Bior2.2	22.9762	0.4535	0.9931

Table 3. Model parameter settings.

Model	Dropout	lr	Lookback
LSTM	0.2	0.001	100
LSTM-AT	0.2	0.001	100
BiLSTM-AT	0.2	0.001	100
CNN-BiLSTM-AT	0.2	0.001	100

Table 4. Model parameter settings.

Parameter	Min	Max
window_len	1	50
num_encoder_layers	1	10
num_decoder_layers	1	10
d_model	16	512
num_heads	1	16
feedforward_dim	16	512
dropout	$1 \times 10^{- 6}$	0.5

Table 5. Optimized hyperparameters for machine learning models.

Model	Parameter	Optimum Value
OOA-CNN-BiLSTM-AT	lookback	148
	dropout	0.0105
	learning rate (lr)	0.0031
OOA-Transformer	window_len	1
	number_encoder_layers	1
	number_decoder_layers	1
	d_model	16
	number_heads	1
	feedforward_dim	16
	Dropout	$1 \times 10^{- 5}$

Table 6. Predictive effectiveness of the different models.

Model	MAPE (%)	RMSE	$R^{2}$
LSTM	10.36	0.0406	0.9679
LSTM-AT	9.35	0.0383	0.9716
BiLSTM-AT	9.21	0.0375	0.9727
CNN-BiLSTM-AT	8.93	0.0350	0.9762
OOA-CNN-BiLSTM-AT	4.57	0.0188	0.9930
Transformer	6.53	0.0296	0.9830
OOA-Transformer	4.16	0.0152	0.9955

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Gao, Y.; Cao, B.; Yu, W.; Yi, L.; Guo, F. Short-Term Wind Speed Prediction for Bridge Site Area Based on Wavelet Denoising OOA-Transformer. Mathematics 2024, 12, 1910. https://doi.org/10.3390/math12121910

AMA Style

Gao Y, Cao B, Yu W, Yi L, Guo F. Short-Term Wind Speed Prediction for Bridge Site Area Based on Wavelet Denoising OOA-Transformer. Mathematics. 2024; 12(12):1910. https://doi.org/10.3390/math12121910

Chicago/Turabian Style

Gao, Yan, Baifu Cao, Wenhao Yu, Lu Yi, and Fengqi Guo. 2024. "Short-Term Wind Speed Prediction for Bridge Site Area Based on Wavelet Denoising OOA-Transformer" Mathematics 12, no. 12: 1910. https://doi.org/10.3390/math12121910

APA Style

Gao, Y., Cao, B., Yu, W., Yi, L., & Guo, F. (2024). Short-Term Wind Speed Prediction for Bridge Site Area Based on Wavelet Denoising OOA-Transformer. Mathematics, 12(12), 1910. https://doi.org/10.3390/math12121910

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Short-Term Wind Speed Prediction for Bridge Site Area Based on Wavelet Denoising OOA-Transformer

Abstract

1. Introduction

2. Related Works

3. Materials and Methods

3.1. Data Sources and Pre-processing

3.2. Methods

3.2.1. Wavelet Denoising

3.2.2. Transformer

3.2.3. OOA-Transformer

3.2.4. The Overall Framework of Our Proposed Model

4. Results and Discussion

4.1. The Results of Wavelet Denoising

4.2. Compared with Other Prediction Models

4.2.1. Parameter Settings

4.2.2. Evaluation Criteria

4.2.3. The Experiment Results

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI