1. Introduction
Wind speed forecasting is crucial for ensuring the reliability of renewable energy production, maintaining grid stability, and implementing effective economic operating strategies. The highly stochastic, irregular, and multiscale behavior of wind energy makes short-term forecasting particularly challenging; traditional statistical methods often fail to represent this complex structure [
1,
2,
3]. Therefore, deep learning-based models capable of learning both nonlinear relationships and multi-dimensional meteorological interactions have rapidly gained prominence in the literature in recent years [
4,
5,
6].
Integrating convolutional neural networks (CNNs) with long short-term memory (LSTM/BiLSTM) architectures increases accuracy by processing both temporal dependencies and local patterns of wind speed [
7,
8,
9,
10,
11]. These hybrid approaches have been demonstrated to yield significant improvements in short-term forecasts compared to baseline methods, such as ARIMA and persistence, on various datasets. Significant decreases in RMSE and MAE metrics are observed in multiple studies [
7,
8,
10,
11,
12,
13]. The consistent performance of CNN–LSTM-based models, particularly in complex meteorological scenarios, has established these architectures as a fundamental reference in the forecasting literature [
14,
15].
Representing high-frequency components and low-frequency trends of wind speed within the same model often presents challenges. In this context, signal decomposition-based approaches, such as the wavelet transform, CEEMDAN, and similar methods, have decomposed the wind time series into its components, making each scale easier for the artificial neural network to learn, thereby improving accuracy [
16,
17,
18,
19]. CEEMDAN–CNN–BiLSTM and wavelet-based hybrid models, in particular, have yielded effective results in complex measurement environments, such as mountainous areas, coastal regions, and offshore areas.
A growing number of studies are also utilizing optimization algorithms to enhance the performance of deep learning models. The Bat algorithm [
20], GWO and its derivatives [
21], multi-objective swarm intelligence [
4], WOA–CNN–BiLSTM [
6], hybrid CNN–SLSTM [
22], and advanced parametric optimization approaches [
23,
24] have aimed to improve model stability and convergence performance, and have often been successful. These methods have automated the hyperparameter search process, resulting in lower-error models [
21,
22,
25,
26,
27,
28]. As highlighted in [
29], the true value of probabilistic wind forecasting models lies in their ability to reliably quantify predictive uncertainty rather than solely minimizing point-based error metrics. In line with this perspective, the proposed model prioritizes uncertainty-aware evaluation using prediction intervals and probabilistic scoring rules.
However, there are significant shortcomings in the existing literature. A large portion of the studies only produce deterministic single-valued forecasts and do not directly model forecast uncertainty. Furthermore, many studies do not thoroughly analyze the contributions of meteorological and temporal components that affect wind speed to model decision-making. Explainable artificial intelligence (XAI) applications are currently limited, primarily focusing on the relative importance of meteorological inputs [
30,
31]. However, the literature is quite limited in explaining the internal behavior of signal decomposition components (e.g., IMFs), dual-stream structures, and complex deep models.
Existing approaches still suffer from three major limitations:
The absence of uncertainty estimation restricts the operational usability of forecasts.
The use of fixed or manually tuned decomposition levels cannot adapt to the evolving spectral structure of wind dynamics.
The lack of explainable frameworks capable of revealing how meteorological and temporal factors affect predictions.
This paper aims to address these limitations and proposes an integrated approach that addresses these gaps, incorporating adaptive VMD (Variational Mode Decomposition)-based multiscale decomposition, a dual-stream CNN–BiLSTM architecture, quantile-based uncertainty modeling, and SHAP-assisted explainability analysis. Results obtained from a four-year hourly dataset used in the study show that the proposed model achieves high accuracy in short-term wind speed prediction, with an RMSE ≈ of approximately 0.700 m/s and an MAE ≈ of approximately 0.545 m/s on the test data. The obtained performance is comparable to recent CNN–LSTM-based hybrid models reported in the literature [
26,
27]. In addition to this, the proposed model provides uncertainty quantification and interpretability. Real-world low-wind regimes are characterized by high stochasticity and asymmetric uncertainty, making them challenging for deterministic forecasting models.
Recent studies have adopted probabilistic and attention-based architectures for short-term wind forecasting. Transformer-derived models and attention-enhanced recurrent networks have presented improved sequence modeling capability and uncertainty awareness under changing conditions. However, most of these approaches either rely on fixed-scale representations or provide limited interpretability regarding multiscale signal components. In contrast, the proposed framework integrates adaptive decomposition with quantile-based learning, enabling dynamic uncertainty modeling across frequency bands while retaining interpretability through mode-aware explainability. The point-based forecast accuracy and uncertainty characteristics are influenced by the adopted decomposition strategy. For instance, Sun et al. exhibited that different decomposition-based models lead to different probabilistic behaviors and uncertainty levels in wind speed forecasting. This study presents the critical role of decomposition selection in uncertainty-aware prediction frameworks [
31]. Bayesian dynamic linear models have been employed to propagate wind speed uncertainty into downstream energy applications such as green hydrogen production, showing the practical relevance of probabilistic forecasting beyond power generation alone [
32]. Meanwhile, multi-granularity and hierarchical forecasting strategies have been proposed to provide clarity across different temporal resolutions, which combine deep feature fusion with probabilistic reconciliation techniques [
33,
34].
Several works have also underlined uncertainty analysis by evaluating numerical error metrics and statistical properties of forecast errors. They highlight the limitations of accuracy-only comparisons. Moreover, interpretable deep learning architectures integrating multistage decomposition, attention mechanisms, and feature-importance analysis have been introduced to increase both predictive performance and model transparency [
35,
36,
37]. These developments underline the growing consensus that modern wind speed forecasting models should address multiscale dynamics, probabilistic uncertainty, and interpretability, which motivates the design of the proposed Adaptive VMD-based dual-stream CNN–BiLSTM framework.
Finally, the SHAP analysis investigates the impact of both meteorological and temporal inputs, as well as VMD modes, on the forecast. As a result of that, it provides a significant explanation for the “black box” nature of deep hybrid models. In conclusion, this study contributes an integrated approach that combines adaptive signal decomposition, probabilistic forecasting, and explainability. The main contributions of this study can be summarized as follows:
- (i)
An adaptive VMD framework is employed to automatically determine the optimal number of decomposition modes based on validation performance, eliminating manual tuning and enabling scale-consistent learning under varying wind regimes.
- (ii)
A dual-stream CNN–BiLSTM architecture is designed to separately process decomposed wind dynamics and meteorological drivers, allowing multiscale temporal patterns and environmental effects to be learned within a unified probabilistic framework.
- (iii)
Quantile regression is integrated to model time-varying predictive uncertainty, resulting in adaptive and non-symmetric prediction intervals that reflect changing wind volatility.
- (iv)
Mode-aware SHAP analysis is introduced to interpret both meteorological features and decomposed wind components, providing multi-scale explainability beyond conventional feature-importance analyses.
Due to the nonlinear relationship between wind speed and power output, even relatively small wind speed forecasting errors may lead to significant deviations in generated power, affecting reserve planning and short-term operational decisions.
2. Materials and Methods
2.1. Study Area and Data Description
This study is based on four-year (2021–2025) hourly resolution wind speed and meteorological variables obtained from Afyonkarahisar Meteorology Directorate. The dataset contains the following variables:
Average hourly wind speed (WS, m/s).
Global radiation jack (GHI, W/m2).
Direct normal radiation (DNI, W/m2).
Diffuse radiation (DIF, W/m2).
Air temperature (T, °C).
Time indices: hour, day of the year (DOY), sinusoidal time conversions (hour_sin, hour_cos, doy_sin, doy_cos).
Figure 1 illustrates that wind speeds in Afyonkarahisar are predominantly concentrated at low levels. The highest frequencies are observed in the range of 0–2 m/s. This indicates that the region typically experiences a low wind regime. The concentration below 2 m/s highlights the forecasting difficulty inherent in the dataset.
Higher velocities (4 m/s and above) are quite infrequent, confirming that the distribution is right-skewed. Such a distribution indicates both a high stochastic structure and that predicting sudden and infrequent wind peaks can be challenging for models.
In
Figure 2, the seasonal cycle, daily periodicity, and long-term trend are clearly illustrated. This structure justifies VMD parsing.
Time jumps, sudden breaks, and negative radiation values can be detected in the raw data. The dataset contains approximately 1.1% missing values, primarily originating from intermittent sensor outages in solar radiation measurements (DNI and DIF). Missing observations were handled using time-based linear interpolation to preserve temporal continuity.
Impossible values were identified and removed based on established meteorological quality-control thresholds. Wind speed values below 0 m/s were considered invalid and set to zero. Air temperature measurements outside the range of −40 °C to 60 °C were treated as outliers and removed prior to interpolation. Negative radiation values (GHI, DNI, and DIF), which have no physical meaning, were also set to zero. These thresholds are consistent with adopted practices in near-surface atmospheric data preprocessing. The entire data series is cleaned using Pandas’ time-based interpolation technique, and negative values without a physical equivalent are set to zero. The wind speed series is normalized with Min–Max scaling prior to VMD. In the dual-stream model step, VMD modes (wind-stream) and meteorological/temporal variables (met-stream) are rescaled to the range of [0, 1] using separate MinMaxScaler objects, and the target variable, wind speed, is normalized based on its minimum and maximum values.
In model training, 70% training, 15% validation, and 15% test split are utilized. With the sliding-window technique, the next hourly wind speed target values are produced from 24 h historical data.
Figure 3 supports the need for deep learning by demonstrating that the linear relationships between meteorological variables and wind speed are not well-correlated. The forecasting horizon in this study is one hour ahead, meaning that the model predicts wind speed at time t + 1 using information available up to time t.
2.2. Adaptive VMD-Based Multiscale Decomposition
2.2.1. Variational Mode Decomposition (VMD) Framework
VMD is a variational optimization method that aims to divide the time series into band-limited modes with different center frequencies. Unlike other decoupling techniques (e.g., EMD, CEEMDAN), it resolves VMD modes, reducing the mode-mixing problem and ensuring more stable frequency bands. A signal (
t) is parsed by VMD in the following form:
In the adaptive version, K is not predefined but selected using a spectral flatness–based criterion that identifies the point where additional modes no longer reveal meaningful oscillatory energy. The analytical signal form of each mode is generated by the Hilbert transform; optimization is defined as a variational problem that minimizes the bandwidth of the modes.
VMD is implemented using a custom Python (v3.10.13) implementation based on the original formulation proposed by Dragomiretskiy and Zosso. The algorithm is coded in-house to allow full control over the adaptive mode selection process and integration with the validation-based optimization procedure.
2.2.2. Adaptively Selecting the Optimal Number of Modes
The most critical hyperparameter of VMD is the mode number (K). The use of constant K values in the literature often results in the loss of information or excessive disaggregation. In this study, the K value is determined as adaptive:
Separations are made for K ∈ {3, 4, 5, 6, 7}.
For each number of candidate modes (K ∈ {3, 4, 5, 6, 7}), the wind speed series is decomposed with VMD, and a small, fully connected (feed-forward) neural network model is trained with a sliding window approach over the resulting modes. For each value of K, this lightweight proxy model is evaluated in the validation set, and the K that gives the lowest RMSE value is selected as the optimal number of modes. According to the results obtained in this study, the optimal number of modes is K = 4.
The number of modes that provide the lowest validation RMSE value is assigned to the optimal K.
The four modes represent the high-frequency fluctuations and low-frequency trends of wind dynamics. Although the number of modes K is determined adaptively, it still acts as a structural hyperparameter that affects the representation capacity of the model. Therefore, the validation set is utilized to select K in a data-driven manner. It ensures that the chosen decomposition yields the best generalization performance without leaking test information. The resulting modes are fed into the wind dynamics channel (Wind-Stream) input. To guide the selection of the number of modes K, a spectral flatness criterion is employed. Spectral flatness (
SF) is defined as;
where
P(fi) denotes the power spectral density at frequency bin fi. Lower values of spectral flatness indicate more structured, narrowband components, while higher values correspond to noise-like behavior. In this study, the optimal number of VMD modes is selected by observing the stabilization of spectral flatness values as K increases, thereby balancing mode separability and over-decomposition.
2.3. Dual-Stream CNN–BiLSTM Hybrid Architecture
2.3.1. Motivation for a Dual-Stream Architecture
Single-channel models (e.g., CNN–LSTM with only wind series input) are insufficient to represent complex interactions between variables. Therefore, the model consists of two parallel sub-networks;
Wind-Stream: Modes achieved with Adaptive VMD, and
Met-Stream: Meteorological variables and temporal attributes. As both streams follow different patterns, they are processed separately and then combined in the fusion layer.
Each stream employs a single 1D convolutional layer with 32 filters, a kernel size of 3, same padding, and ReLU activation to capture local temporal patterns. Pooling layers are deliberately not used in the proposed model in order to avoid the loss of short-term peak information, which is critical for accurate wind speed forecasting.
Following the convolutional layer, two stacked Bidirectional LSTM layers are employed, with 64 and 32 hidden units, respectively. The first BiLSTM layer returns sequences to preserve temporal dependencies, while the second produces a compact latent representation for each stream. The architecture can be shown as follows:
CNN:
BiLSTM:
Layers: 2 stacked BiLSTM.
Units: 64 → 32.
First layer: return_sequences = True.
Second layer: return_sequences = False.
As illustrated in
Figure 4, the proposed framework consists of two parallel processing streams. The first stream extracts multiscale temporal patterns from VMD-derived wind speed modes, while the second stream processes meteorological and temporal features. The learned representations are fused via feature concatenation to generate quantile-based probabilistic forecasts.
2.3.2. Wind-Stream Subnetwork
This sub-network consists of the following layers:
One-dimensional CNN (kernel = 3, 32 filter): captures local frequency relationships between modes.
MaxPooling1D: compresses local variations.
Bidirectional LSTM (64 unit): learns multiscale temporal dependencies.
Dense(32): produces high-level representations.
In both sub-networks, the 1D convolutional layers employ a kernel size of 3 with 32 filters, a stride of 1, and same padding to preserve the temporal resolution of the input sequences. ReLU activation is used to enhance nonlinearity.
The Bidirectional LSTM layers consist of 64 hidden units and apply a dropout rate of 0.2 along with a recurrent dropout of 0.2 to mitigate overfitting. In addition, dropout with a rate of 0.2 is applied after the Dense layers in both streams and in the fusion stage.
2.3.3. Met-Stream Subnetwork
The meteorological channel processes environmental parameters other than wind.
This structure enhances the impact of GHI, DNI, DIF, temperature, and sin/cos time coding on model performance. In both sub-networks, the 1D convolutional layers employ a kernel size of 3 with 32 filters, a stride of 1, and same padding to preserve the temporal resolution of the input sequences. ReLU activation is used to enhance nonlinearity.
2.3.4. Fusion Layer and Output Quantile Head
The outputs of both streams are combined with Concatenate(), then:
Dense(64, ReLU)
Dense(32, ReLU)
Layers have been added.
Final layer 3-output quantile regressor:
where
denotes the predicted wind speed at quantile level
τ ∈ {0.05, 0.5, 0.95}, representing the lower bound, median, and upper bound of the predictive distribution, respectively. These outputs jointly characterize the probabilistic forecast and its associated uncertainty interval. This structure produces both the median estimate and the uncertainty band. EarlyStopping (patience = 5) and Adam optimizer (lr = 1 × 10
−4, clipnorm = 1.0) are used. The fusion of the two data streams is performed at the model level using the Concatenate layer in Keras. No dataframe-level concatenation is applied; instead, learned feature representations from each subnetwork are combined within the neural network architecture.
The selection of the 0.05–0.95 quantile interval is motivated by practical relevance and consistency with established practices in probabilistic wind speed forecasting. This interval corresponds to a 90% prediction coverage, which is widely adopted in energy system operation, reserve allocation, and risk-aware decision-making, as it provides a balanced trade-off between uncertainty coverage and interval sharpness. From an operational perspective, such confidence levels are commonly used in grid balancing, uncertainty-aware dispatch, and short-term planning tasks. In addition, this interval enables a meaningful evaluation of probabilistic forecast quality using well-established metrics such as PICP, PINAW, and CRPS, facilitating comparison with recent probabilistic forecasting studies. It should be noted that the proposed framework is flexible and can be readily extended to alternative quantile levels depending on specific operational or market-driven requirements.
2.4. Quantile Regression Framework
The model is trained with quantile loss instead of classical MSE:
where
y denotes the observed wind speed,
represents the predicted value at quantile τ, and
is the asymmetric quantile loss function that penalizes overestimation and underestimation differently depending on the selected quantile level. Total loss:
where L denotes the overall training loss,
represents the quantile regression loss defined in Equation (4). In this way, the model learns both central prediction and uncertainty together. Methodologically, this is an innovation that is not common in wind energy applications, but is critical for grid planning and risk management.
2.5. Model Evaluation
Model performance is evaluated using both deterministic and probabilistic criteria. Deterministic accuracy is assessed using the root mean square error (RMSE) and mean absolute error (MAE) based on the median (0.5 quantile) forecast.
To assess the quality of uncertainty estimation, several probabilistic metrics are additionally employed. The prediction interval coverage probability (PICP) and the prediction interval normalized average width (PINAW) are computed for the 0.05–0.95 quantile band to evaluate interval reliability and sharpness, respectively. Furthermore, the pinball loss is reported to quantify the accuracy of quantile predictions. An approximate continuous ranked probability score (CRPS) is also calculated based on the available quantile forecasts to provide a unified measure of probabilistic forecast quality.
This combined evaluation framework enables a comprehensive assessment of both point forecasting accuracy and the reliability of uncertainty-aware predictions.
2.6. Explainability via SHAP Analysis
SHAP analysis for explainability is applied on the 0.5 quantile (median) output of the trained dual-stream CNN–BiLSTM model. Background samples are selected from the training data, SHAP values are calculated for both VMD modes and meteorological/temporal features on the test subset, and summary graphs are produced. The internal decision mechanism of the model is analyzed using SHAP Values. The significance distribution for both modes (IMF/VMD modes) and meteorological variables (GHI, T, DNI, DIF, timecodes) has been analyzed and quantified.
Findings of SHAP analysis:
Low-frequency VMD modes play a dominant role in determining the general trend in the wind.
High-frequency modes capture sudden wind surges.
Meteorological inputs such as GHI and temperature are among the most influential variables in the median forecast of the model.
Time codings (hour_sin, hour_cos) make an important contribution to daily periodicity.
This analysis goes beyond similar studies in the literature by reducing the “decision clarity” problem of the model. Exploratory data analysis (EDA) is performed on wind speed and meteorological variables before modeling. In this context, time series graphs, histograms, correlation matrix between variables and selected binary scatter graphs are examined; data cleaning steps are validated by identifying outliers, sensor errors, and meaningless measurements.
To provide a stronger and more recent baseline for comparison, a Temporal Convolutional Network (TCN) is implemented as an additional benchmark model. TCNs have exhibited competitive performance in time-series forecasting tasks due to their ability to capture long-range temporal dependencies through expanded causal convolutions.
In this study, the TCN model is trained under the same forecasting setup as the proposed framework, using identical data splits, input window length, and prediction horizon. The benchmark model produces quantile forecasts (0.05, 0.5, and 0.95) to provide a fair comparison with respect to uncertainty modeling. All benchmark results are evaluated using the same deterministic and probabilistic metrics described in
Section 2.5.
3. Results
3.1. Descriptive Statistics and Exploratory Findings
Exploratory analysis of four years of hourly data on wind speed and meteorological variables revealed that the region exhibits distinct daily and seasonal cycles. Histogram analyses have exhibited that wind speed has a positively skewed distribution, with most measurements concentrated in the low-to-medium speed range. When the time series is examined, a strong diurnal component and significant seasonality are observed in the wind structure of the Afyonkarahisar region.
According to the correlation matrix results, the relationship between wind speed and GHI/DNI/DIF is low-to-medium, and the relationship between temperature and wind speed is more limited. This shows that although direct correlations are weak, the contribution of meteorological variables to the model will emerge through nonlinear learning.
3.2. Adaptive VMD Results
The adaptive VMD algorithm decomposed the wind speed series under the optimal mode K = 4. According to the RMSE criterion, K = 4, which gave the lowest verification error among the options K = {3, 4, 5, 6, 7}, represented both low and high frequency components in a balanced way.
When the modes are examined:
IMF1: Noise-like mode represents high-frequency, sudden wind surges.
IMF2: Medium frequency components, short-term behavior.
IMF3: More uniform changes corresponding to the daily cycle.
IMF4: The very low-frequency trend component represents the long-term wind regime.
This decomposition provided a multiscale representation of the Wind-Stream channel of the dual-stream structure and improved model performance.
Figure 5 illustrates the frequency levels associated with each mode. The model’s multiscale learning capability is visually supported.
3.3. Model Training and Convergence
The dual-stream CNN–BiLSTM model is trained using quantile loss, and the trend of training and confirmation losses exhibited that the model converged in a balanced manner with the early stop (patience = 5) mechanism. The training curves did not show overfitting tendencies, and the course of the validation curve confirmed that the model is steadily optimized.
Figure 6 shows that the training process is stable and there is no overlearning.
3.4. Deterministic Forecasting Performance
The deterministic evaluation results obtained using the median estimate on the test data are as follows:
RMSE = 0.700 m/s
MAE = 0.545 m/s
A likely reason for this behavior is that the decomposition-reconstruction cycle introduces a mild smoothing effect, which reduces high-frequency fluctuations that are essential for point prediction loss functions such as MSE and MAE.
The dual-stream architecture is designed to optimize multiscale representation instead of focusing solely on instantaneous peak tracking. As a result, it is more effective for interval prediction than for achieving strict point-wise accuracy. The performance of this model is competitive with similar CNN–LSTM, CEEMDAN-LSTM, and optimization-supported hybrid models found in the literature.
The median forecast curve tracks the general trend, diurnal periodicity, and seasonal variations in wind speed with high accuracy. The model smooths out sudden and sharp peaks due to the nature of quantile regression. It results in more stable predictions against extreme volatility.
3.5. Probabilistic Forecasting and Uncertainty Quantification
The forecast band generated using the quantile outputs of the model, 0.05–0.95, represents the uncertainty of the wind speed and presents a non-symmetrical, dynamically wide distribution around the median forecast.
The analysis displayed that:
Actual measurements are located in a significant part of the forecast band (mostly in the mid-upper region).
Band width increases during periods of wind instability; it narrows during quiet periods.
The fact that uncertainty changes over time suggests that the model accurately captures volatility.
These results enable the model to be used not only for spot forecasting but also for risk and reliability-oriented energy planning.
Figure 7 represents the agreement of the median estimate (Q0.5) and the uncertainty band over time. The widening of the band at the peaks indicates that the pattern has caught up with volatility. To evaluate the probabilistic performance, the prediction interval coverage probability (PICP) is computed for the 90% quantile band.
The obtained PICP of 87.4% indicates a well-calibrated interval with slight under-coverage, which is expected in low-wind regimes where signal variability is high.
3.6. Explainability Results (SHAP Analysis)
SHAP analysis revealed in detail the contributions of both meteorological/temporal variables, as well as VMD modes, to the model.
3.6.1. Meteorological Feature Importance
According to SHAP results:
GHI is the most dominant variable for the median estimate; It alters the wind speed during daylight hours strongly.
Temperature (T) makes a powerful contribution to explaining changes in wind speed, especially in summer.
DNI and DIF are effective in short-term changes and have a lower weight than GHI.
The hourly sinusoid codes (hour_sin/hour_cos) establish that the model learns the daily period well.
3.6.2. VMD Mode Contributions
Mode-based importance ranking:
IMF4 (trending mode): Carries the long-term structure with the highest contribution.
IMF3 (daily cycle): Although not as dominant as the trend, it contributes to the forecast.
IMF1 and IMF2 (high frequency): Effective in explaining sudden changes, but lower in total contribution.
These findings suggest that dual-stream architecture does indeed successfully combine two different information channels.
The SHAP analysis reveals that seasonal circularity, indicated by the day-of-year sine component (doy_sin), is the most significant parameter in predicting wind speed, as illustrated in
Figure 8. This indicates that the model strongly relies on annual periodic patterns, which aligns with the climatological behavior of wind regimes in continental regions such as Afyonkarahisar. The variables shown in
Figure 8 correspond exactly to the eight meteorological and temporal inputs of the model, while
Figure 5 represents the three intrinsic VMD modes obtained through adaptive decomposition.
The second most important variable is GHI (Global Horizontal Irradiance), indicating that solar-driven atmospheric mixing processes have a significant impact on short-term wind fluctuations. The prominence of hour-level harmonic features (hour_cos, hour_sin) further confirms that diurnal wind cycles—typically governed by thermal gradients and local topography—play a critical role in shaping the temporal structure of wind speed.
Other variables, such as doy_cos, DIF, DNI, and ambient temperature (T), contribute moderately but still provide meaningful information to the model. Their lower SHAP magnitudes do not imply irrelevance; rather, they reflect the subordinate but complementary nature of these features within a multi-factor atmospheric system.
Mainly, the explainability results show that the model’s internal decision process aligns well with the physical mechanisms governing wind formation, validating both the architectural design (dual-stream input fusion) and the relevance of the selected meteorological variables.
Figure 9 illustrates that the trend mode has the greatest impact on the forecast decision. The SHAP analysis of the decomposed wind-speed signal reveals that all intrinsic modes extracted by the Adaptive VMD framework contribute meaningfully to the forecasting process. Among them, Mode 4 and Mode 3 exhibit the highest average SHAP values, indicating that the model relies more heavily on the lower-frequency, smoother components of the decomposed signal. These modes typically capture broader atmospheric patterns and persistent temporal structures that are strongly correlated with short-term wind-speed behavior.
In contrast, Mode 1 and Mode 2 show slightly lower contributions, consistent with their role in representing higher-frequency fluctuations and noise-like variations. While these rapid oscillations provide useful fine-scale information, their predictive value is inherently more limited due to the stochastic nature of wind dynamics.
Finally, the SHAP distribution across modes confirms that the Adaptive VMD procedure successfully decomposes the signal into physically meaningful components and that the model effectively integrates multiscale information during forecasting. A comparative performance metrics are given in
Table 1.
Quantile-based models are evaluated using RMSE and MAE computed from the median (0.5 quantile) forecasts. Deterministic models do not produce probabilistic outputs; therefore, Pinball Loss, PICP, and PINAW are not applicable. The TCN-Quantile results are obtained under a simplified benchmark setup designed to enable comparison with a recent probabilistic baseline; therefore, absolute error magnitudes differ from those reported in the main experimental configuration. RMSE and MAE are computed from the median (q0.5) forecasts. Probabilistic skill is assessed using Pinball Loss, PICP, PINAW, and an approximate CRPS. All models in this table are evaluated under the same data split (70/15/15), input window (12), and forecast horizon (1) to provide a fair comparison.
4. Discussion
First, a single-stream CNN–LSTM model that takes only the wind speed series as input is trained as a baseline comparison. This model reached RMSE = 0.586 m/s and MAE = 0.555 m/s in the test set, giving results consistent with the short-term wind speed forecasting performances reported in the literature.
The deterministic version of the proposed adaptive-VMD-based dual-stream architecture (the model trained with MSE loss only) produced RMSE = 0.7 m/s and MAE = 0.55 m/s values on the test data, indicating that VMD modes and meteorological features alone do not improve point prediction accuracy when used with MSE-based optimization.
Although the TCN baseline yields lower point errors in this specific setup, the proposed framework is designed to prioritize uncertainty-aware forecasting and interpretability through quantile learning and SHAP-based attribution across multiscale components. Therefore, the contribution is not limited to minimizing RMSE, but to providing calibrated prediction intervals and explainable multiscale drivers, which are critical for operational decision-making. In the first dual-stream quantile model trained with pure quantile loss, RMSE = 0.7 m/s and MAE = 0.55 m/s are obtained according to the median (0.5 quantile) estimates. Due to the nature of quantile regression, the model especially smooths the extreme values, which increases the squared error metric and affects the absolute error to a limited extent. The dual-stream architecture requires approximately 3.1 million trainable parameters, resulting in a moderate computational load.
Although more complex than conventional LSTM or CNN–LSTM models, the training time remained manageable, and the inference latency is well within the requirements for real-time forecasting.
Finally, the hybrid loss function (0.7 Quantile + 0.3 MSE) is used. This approach reduced the RMSE to 0.700 m/s in the test cluster, but also reduced the MAE to 0.545 m/s, which is better than the baseline CNN–LSTM model. Thus, the proposed hybrid quantile model not only produces uncertainty band (0.05–0.95 quantile range) and explainability outputs, but also brings the point estimation performance to a competitive level with a simple CNN–LSTM-based approach. From an operational perspective, uncertainty-aware forecasts may support short-term scheduling, reserve allocation, and wind curtailment decision processes, particularly in regions with highly variable low-speed wind characteristics such as Afyonkarahisar. Although the baseline CNN–LSTM model achieved kind of lower RMSE values, it provides only deterministic point forecasts without uncertainty estimation or interpretability. In contrast, the proposed hybrid quantile model delivers probabilistic forecasts with calibrated prediction intervals and explainable feature attributions, which are essential for operational decision-making in power systems.
While Transformer-based and attention-driven models offer strong sequence modeling capabilities, their high computational complexity and limited transparency pose challenges for operational deployment in real-world forecasting systems. The proposed model focuses on a complementary objective by prioritizing uncertainty calibration and multiscale interpretability rather than solely maximizing point prediction accuracy.
Unlike many recent hybrid models that primarily optimize point forecasting metrics, the proposed framework prioritizes uncertainty modeling and interpretability as first-class objectives. By linking adaptive decomposition, probabilistic learning, and explainability, the model provides actionable insights for operational decision-making rather than solely improving numerical accuracy. The benefits of uncertainty-aware forecasting become more pronounced in low-wind continental regions, where small absolute errors can lead to disproportionately large relative power estimation uncertainties. To clarify the methodological positioning of the proposed framework with respect to recent wind forecasting studies,
Table 2 provides a qualitative comparison focusing on decomposition strategy, uncertainty modeling, and interpretability. It is also worth noting that the 0.05–0.95 prediction interval produced by the proposed model shows a slightly conservative coverage (PICP > 0.90), while maintaining a relatively narrow normalized width (PINAW). This behavior suggests well-calibrated uncertainty estimates that remain informative rather than extremely wide.
Although the proposed framework exhibits strong probabilistic forecasting performance for a low-wind inland region, it is acknowledged that the experimental evaluation is currently limited to a single geographical and climatic regime. Wind characteristics in offshore or high-wind regions show different statistical properties, including higher variance, stronger turbulence intensity, and distinct spectral distributions. Nevertheless, the proposed Adaptive VMD-based dual-stream architecture is model-agnostic with respect to wind regime characteristics, as the decomposition stage adaptively extracts intrinsic oscillatory modes from the input signal, while the dual-stream learning mechanism separately captures wind dynamics and exogenous meteorological influences. Therefore, the framework is not restricted to low-wind conditions. Extending the evaluation to offshore and high-wind datasets is identified as a significant direction for future work. Transfer learning or fine-tuning strategies could be employed to adapt the model to different wind regimes and operational contexts.
5. Conclusions
This study presented a hybrid deep learning framework that integrates Adaptive Variational Mode Decomposition, a Dual-Stream CNN–BiLSTM architecture, and a quantile-based prediction layer for short-term wind speed forecasting. The proposed model is developed to address two key limitations observed in existing approaches: the insufficient representation of multiscale atmospheric variability and the lack of uncertainty-aware forecasting mechanisms.
The experimental analysis, conducted using multi-year meteorological data from the Afyonkarahisar region, demonstrated that the adaptive decomposition of wind speed signals considerably improves the model’s ability to capture high- and low-frequency dynamics. The dual-stream structure, which processes decomposed wind components and meteorological variables in parallel, successfully enhanced the temporal feature extraction capability of the network. As a result, the quantile-based estimator produced stable and interpretable prediction intervals while achieving competitive deterministic accuracy, with an RMSE of 0.700 m/s and an MAE of 0.545 m/s in the test set.
The SHAP-based explainability assessment further revealed that the model’s predictions are strongly influenced by seasonal periodicity, diurnal patterns, and solar radiation—factors that are physically consistent with regional wind formation mechanisms. This alignment between the model’s internal behavior and real-world atmospheric dynamics is a noticeable strength, improving confidence in the reliability of the predictions.
Overall, the proposed approach offers a comprehensive forecasting solution that delivers both accurate point predictions and well-calibrated uncertainty bounds. These features make the model suitable for operational applications such as wind farm scheduling, reserve allocation, and short-term grid planning. Future work may explore transformer-based temporal encoders, advanced probabilistic training strategies, and spatio-temporal expansion using multiple meteorological stations to further enhance forecasting performance. Despite its strengths, the model exhibits limitations. The forecasting accuracy is constrained by the low-wind regime characteristics of the study area, where rapid fluctuations reduce signal predictability. Additionally, the absence of pressure, humidity, and large-scale atmospheric indices limits the representational capability of the meteorological stream. Future work will explore physics-informed constraints, additional atmospheric variables, and lightweight architectures for deployment in resource-limited environments.