2.1. Data Description and Preprocessing
The wind field data utilized in this study were obtained from the ERA5 reanalysis dataset, produced by the European Centre for Medium-Range Weather Forecasts (ECMWF, Reading, UK). ERA5 provides hourly estimates of a large number of atmospheric, land, and oceanic climate variables and is widely recognized for its high spatial and temporal resolution. For the purpose of regional wind speed forecasting, a specific geo-graphical area was selected, spanning from 35° N to 37° N latitude and 119° W to 121° W longitude. The data covers the period from 1 January 2025 to 30 June 2025, with an hourly temporal resolution. Given the ERA5 spatial resolution of 0.25° × 0.25°, the selected region corresponds to a grid of 9 × 9 spatial points, resulting in a total of 81 observation stations. This grid structure serves as the basis for the regional spatiotemporal analysis.
The raw data includes both the
-component (zonal) and
-component (meridional) of the wind. To obtain the scalar wind speed (
v), the following conversion was applied:
where
and
represent the horizontal wind components. In this study, a continuous subset of 1000 time steps was extracted from the dataset to verify the effectiveness of the proposed model. The final dataset is represented as a 3D tensor with dimensions (
T ×
H ×
W), where
T = 1000,
H = 9, and
W = 9.
To ensure the numerical stability of the deep learning model while strictly preserving the physical characteristics and multiscale energy distribution of the wind field, the following preprocessing steps were implemented:
Unlike standard Z-score standardization (which normalizes each feature dimension independently to zero mean and unit variance) or Min-Max normalization (which relies on specific training set statistics and may cause “out-of-bounds” issues), this study employs a fixed global scaling strategy. Considering the physical limits of wind speed in the target region, a constant scaling factor of 50.0 was applied. The normalized wind speed
is calculated as
The rationale for avoiding independent Z-score standardization lies in the core design of the proposed “decomposition-first” framework. Both PCA and VMD decompose the original wind field into multiscale sub-signals (i.e., PCs and IMFs) with explicit physical meanings and intrinsic energy hierarchies. Typically, low-frequency trend components carry large amplitudes, whereas high-frequency stochastic modes possess minimal variance. Applying standard Z-score normalization independently to each decomposed channel would artificially equalize their variances, thereby severely amplifying the high-frequency turbulence and distorting the relative energy contributions. By utilizing a fixed global scaling factor, the proposed method ensures that all input values fall within a range conducive to neural network training (typically [0, 1] for wind speeds under 50 m/s), while inherently preserving the relative amplitude ratios across all multiscale components. This preservation is crucial for the downstream LSTM network to effectively distinguish the main deterministic trends from local stochastic fluctuations.
Sliding Window Sequence Generation: To construct the supervised learning samples for the LSTM network, a sliding window approach was applied. Let be the input sequence length and be the prediction horizon. The dataset was transformed into pairs of inputs and targets , where N = 81 (flattened spatial features from PCA components in the methodology) and B represents the batch size of the neural network input.
Input Sequence Length (): Set to 72, corresponding to the historical wind information of the past 3 days (72 h).
Prediction Horizon (): Set to 3, aiming to predict the wind speed for the next 3 h.
Dataset Partitioning: To ensure rigorous model evaluation and prevent data leakage, the dataset partitioning was strictly performed after the sliding window sequence generation. The initial 1000 continuous hours of raw data yielded a total of 926 supervised sequence samples. These samples were sequentially divided into three distinct subsets: a training set (648 samples, approx. 70%), a validation set (92 samples, approx. 10%), and a testing set (186 samples, approx. 20%). The training set was utilized for model training and parameter optimization (SSA-VMD). The validation set served for hyperparameter tuning and early stopping monitoring to prevent overfitting. The testing set was strictly reserved as unseen data for the final performance evaluation.
2.3. The Proposed Forecasting Framework
The overall framework of the proposed regional wind speed forecasting model is illustrated in
Figure 1. The workflow is composed of three sequential phases: spatial feature extraction, adaptive temporal decomposition, and nonlinear prediction. The detailed implementation procedure is described as follows:
Phase 1: Spatial Dimensionality Reduction via PCA
Given the raw regional wind field data represented as a matrix (where grid points), direct modeling would incur high computational costs and suffer from spatial redundancy.
To address this, Principal Component Analysis (PCA) is first applied to . By calculating the cumulative contribution rate (CCR), the top principal components (PCs) that satisfy are retained. This step effectively compresses the high-dimensional spatial information into a lower-dimensional subspace (in this study, ), decoupling the spatial correlations while preserving the dominant wind field characteristics.
Phase 2: Adaptive Temporal Decomposition (SSA-VMD)
The extracted PCs still contain non-stationary fluctuations and noise. To enhance predictability, an adaptive decomposition strategy is employed for each principal component ():
Parameter Optimization: The Sparrow Search Algorithm (SSA) is utilized to automatically search for the optimal combination of the mode number and penalty factor for VMD. The objective function is defined as the Minimum Envelope Entropy (as described in Equation (6)) of the decomposed modes. To ensure reproducibility, the random seed for the optimization algorithm was set to 42.
Signal Decomposition: With the optimized parameters , VMD decomposes the into a set of Intrinsic Mode Functions (IMFs).
Feature Selection: To filter out high-frequency noise and redundant modes, the Pearson Correlation Coefficient (PCC) is calculated between each IMF and the original . High-correlation sub-signals are validated and retained via a Pearson Correlation Coefficient (PCC) threshold strategy, ensuring comprehensive feature input for the prediction model (108 input features in this study).
Phase 3: Sequence Prediction and Reconstruction
In the final phase, the selected high-quality IMFs are normalized using the fixed global scaling strategy and fed into the LSTM network.
Training: The LSTM model is trained to predict the future values of these IMFs based on historical sequences.
Reconstruction: The final regional wind speed is obtained through a two-step reconstruction process:
where
is the forecasted wind field,
is the PCA loading matrix, and
is the mean vector of the original data.
The parameters of all algorithms and neural networks used in this paper are summarized in
Table 1.
All deep learning algorithms and forecasting experiments were implemented using Python (version 3.10.9, Python Software Foundation, Wilmington, DE, USA) and the PyTorch framework (version 2.5.1, Meta Platforms, Inc., Menlo Park, CA, USA). The computations were accelerated on a workstation equipped with an NVIDIA GeForce RTX 4060 Laptop GPU (NVIDIA Corporation, Santa Clara, CA, USA).
2.4. Evaluation Metrics
To strictly evaluate the prediction performance of the proposed model and benchmarks, three standard statistical metrics are adopted: Root Mean Square Error (RMSE), Mean Absolute Error (MAE), and Coefficient of Determination ().
Assuming represents the observed wind speed, denotes the predicted value, is the mean of observed values, and is the number of samples, the definitions are as follows:
Root Mean Square Error (RMSE): RMSE measures the standard deviation of the prediction errors. It gives higher weight to large errors, making it sensitive to outliers. A lower RMSE indicates better stability.
Mean Absolute Error (MAE): MAE calculates the average magnitude of errors in a set of predictions, without considering their direction. It provides a straightforward assessment of the prediction accuracy.
Coefficient of Determination (
):
evaluates the goodness of fit, representing the proportion of the variance in the dependent variable that is predictable from the independent variables. A
score closer to 1 indicates a perfect fit, while a score near 0 (or negative) implies poor predictive capability.
Statistical Significance Testing: To rigorously determine whether the proposed framework exhibits significant improvements over the baseline models, formal statistical analyses were conducted on the prediction results. Specifically, the Wilcoxon signed-rank test and the Diebold-Mariano (DM) test were employed to evaluate the statistical significance of the forecasting errors between the proposed model and the baselines. A p-value of less than 0.05 was considered statistically significant.