Short-Term Wind Speed Forecasting Using Leakage-Free Time-Series Modeling and Statistical Residual Evaluation

Şahin, Gökhan; Kürker, Faruk; Nur, Ahmet; Akin, Erdal

doi:10.3390/su18115623

Open AccessArticle

Short-Term Wind Speed Forecasting Using Leakage-Free Time-Series Modeling and Statistical Residual Evaluation

by

Gökhan Şahin

^1,2,*

,

Faruk Kürker

³,

Ahmet Nur

⁴ and

Erdal Akin

^5,6,7,8,*

¹

Copernicus Institute of Sustainable Development, Utrecht University, Princetonlaan 8A, 3584 CB Utrecht, The Netherlands

²

Municipality of Dronten, De Rede, 1, 8251 ER Dronten, The Netherlands

³

Department of Electrical and Electronics Engineering, Faculty of Engineering, Adiyaman University, Adiyaman 02040, Türkiye

⁴

Department of Electrical and Electronics Engineering, Faculty of Engineering and Architecture, Bitlis Eren University, Bitlis 13100, Türkiye

⁵

Department of Computer Science and Media Technology, Malmö University, 205 06 Malmö, Sweden

⁶

Sustainable Digitalisation Research Centre, Malmö University, 205 06 Malmö, Sweden

⁷

Biofilms Research Center for Biointerfaces (BRCB), Malmö University, 205 06 Malmö, Sweden

⁸

Department of Computer Engineering, Faculty of Engineering and Architecture, Bitlis Eren University, Bitlis 13100, Türkiye

^*

Authors to whom correspondence should be addressed.

Sustainability 2026, 18(11), 5623; https://doi.org/10.3390/su18115623

Submission received: 6 May 2026 / Revised: 24 May 2026 / Accepted: 27 May 2026 / Published: 2 June 2026

(This article belongs to the Special Issue AI and Machine Learning-Based Approaches for Enhancing Wind Farm Grid Resilience Under Extreme Events and Uncertainty)

Download

Browse Figures

Versions Notes

Abstract

In this study, we developed a leakage-free time-series machine learning framework to improve the accuracy of short-term (10 min ahead) wind speed forecasting. The measurements were obtained from real operational data collected at the Bandırma/Balıkesir wind power plant in Türkiye. The framework incorporates chronological train validation test splitting, causal missing data imputation, leakage-free feature engineering, and supervised lag-based modeling. Such a leak-proof design is crucial to avoid future information influencing the training and testing process of models, thus making the forecasting process more realistic and reliable in practice. We tested several models, including persistence, Support Vector Regression (SVR), Least-Squares Gradient Boosting (LSBoost), Random Forest (RF), Elastic Net (ELASTIC), and a stacking ensemble, and evaluated their performance using Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), R-Squared (R²), bias measures, and skill scores, complemented by diagnostic analyses including residual distribution, autocorrelation, regime-based evaluation, Bland–Altman plots, and Quantile Quantile (Q-Q) plots. Our analyses showed that the Elastic Net model achieved balanced and statistically consistent performance, with a test RMSE of 0.6325 m/s, R² = 0.977, and negligible bias. Residual analysis indicated that errors were centered around zero, exhibited weak temporal dependence, and followed an approximately normal distribution in the central quantiles. Regime-based evaluation revealed that the model performed strongly in medium- and high-wind-speed conditions, while accuracy decreased under low wind speeds due to measurement uncertainty and low signal-to-noise ratios. Feature importance analysis indicated that previous wind speed was the dominant predictor, with solar irradiation and air temperature also contributing significantly. Forecast error decomposition showed that most prediction errors arose from natural atmospheric variability, with minimal systematic bias. The Diebold–Mariano test confirmed that ELASTIC statistically outperformed conventional machine learning models such as SVR and Random Forest. The proposed framework demonstrates statistically consistent short-term forecasting behavior that may support operational wind energy management and grid balancing applications.

Keywords:

short-term wind speed forecasting; Elastic Net (ELASTIC); machine learning models; leakage-free time series validation; residual diagnostics; regime-based performance; forecast error decomposition; feature importance analysis; wind farm

1. Introduction

The increasing growth of energy demand around the globe, along with the rapid exhaustion of fossil fuel resources, accompanied by negative effects on the environment, leads to the search for alternative and sustainable sources of energy. In particular, wind energy stands out among other types of renewable energy sources because of its sustainability and low price, as well as technological advancements in the area [1,2,3,4,5]. Nevertheless, random, variable, and unpredictable wind flow characteristics, as well as the impossibility of implementing cost-effective long-term wind energy storage, make wind-based energy supply uncertain [6,7]. Moreover, wind power generation systems suffer from various challenges related to frequency and voltage instability and possible energy imbalance between demand and supply [8]. Thus, accurate wind speed forecasting helps with power generation scheduling, ensures grid stability, allows for efficient use of wind energy, increases energy production efficiency, and reduces the operation costs [9].

A reproducible framework for one-step-ahead wind speed forecasting was created using historical wind observation data available for Bandırma. To avoid information leakage, the framework makes use of a leakage-free temporal split approach, robust multi-row Table-Oriented ASCII Format 5 (TOA5) header handling, timestamp decoding, split-aware causal missing-value imputation, and time cycle encoding. The problem of wind speed forecasting involves predicting wind speed at time t + h, using all the information at time t, and adding up to 36 lagged wind speed variables to capture short-term temporal effects. Among various machine learning models, including persistence, SVR, Random Forest, Elastic Net, and stacking approaches, those were analyzed according to error, goodness of fit, and bias measures. In addition to general evaluation measures, model performance was assessed using time-domain analysis, residual diagnostic testing, and domain-specific evaluations. Statistical diagnostics such as Autocorrelation Function (ACF) Ljung–Box test, Bland–Altman analysis, and Q-Q plots were used, too. Based on the obtained findings, the Elastic Net model demonstrated balanced forecasting behavior together with statistically consistent performance on the independent validation set.

The superior performance of the Elastic Net model demonstrates the prevalence of short-term temporal autocorrelation effects in wind speed forecasting and indicates the importance of stable autoregressive processes. As the forecast horizon only spans up to 10 min into the future, the next values are very dependent on the past values of the time series, thus limiting the benefits of complex nonlinear deep learning algorithms. Though some recent research papers report impressive forecast performance based on Long Short-Term Memory (LSTM), Gated Recurrent Unit (GRU), Bidirectional Long Short-Term Memory (Bi-LSTM), and Transformer-based algorithms, these approaches typically involve high computational cost, extensive hyperparameter tuning, and larger training datasets. Given the relatively short-term forecasting conditions, the linear regularized structure of Elastic Net model allowed for better generalization and greater interpretability.

2. Related Works

Comprehensive reviews of wind energy forecasting [10] and of ensemble-based techniques for wind and solar power [11] document the breadth of available approaches and consistently report that ensemble and hybrid strategies outperform single models. Building on this premise, ensemble and stacking frameworks have been widely adopted: machine learning models for very short-term wind power forecasting [12], ensemble-based frameworks for day-ahead energy trading [13], bootstrap-based stacking ensembles [14], and stacking combined with signal decomposition and heuristic optimization [15] all report accuracy gains over individual predictors. While effective, these gains typically come at the cost of increased model complexity, higher computational demand, and reduced interpretability, and the reported improvements are often dataset- and horizon-dependent, which limits their generalizability.

A large and influential body of work couples signal decomposition with deep learning to handle the non-stationarity of wind series. Representative examples include Improved Complete Ensemble Empirical Mode Decomposition with Adaptive Noise (ICEEMDAN) combined with a Transformer architecture under a customized loss function [16], an Ensemble Empirical Mode Decomposition- Long Short-Term Memory (EEMD-LSTM) framework incorporating seasonal characteristics [17], Complete Ensemble Empirical Mode Decomposition with Adaptive Noise- Empirical Wavelet Transform (CEEMDAN–EWT) decomposition with deep learning [18], quadratic variational mode decomposition coupled with multiple deep models [19], advanced data preprocessing with multi-objective optimization [20], and two-stage decomposition integrated with Temporal Fusion Transformers [21,22]. These methods can substantially reduce error on benchmark datasets; however, a recurring methodological concern is that decomposition is frequently applied to the entire series before the train/test partition. This leaks future information into the training set and can inflate reported accuracy in a way that does not hold under operational, causal conditions, raising reproducibility concerns.

Beyond decomposition, numerous studies rely directly on deep architectures: hybrid deep learning with attention mechanisms [23], stacked LSTM networks [24], Convolutional Neural Network – Long Short-Term Memory (CNN–LSTM) models for autonomous marine vehicles [25], ensemble GRU models for interval prediction [26], Bi-LSTM networks combined with multi-objective optimization and transfer learning [27], Transformer-based models [28], and a Temporal Fusion Transformer (TFT) enhanced with seasonal–trend representations [29]. An online learning-assisted self-attention model [30] is notable for explicitly targeting low computational cost in ultra-short-term forecasting. Although these architectures are expressive, they are generally data-hungry, computationally expensive, and difficult to interpret. Particularly at short horizons, where autocorrelation dominates, their advantage over simple baselines is often modest and is not always reported against a persistence reference.

Another line of research incorporates physical knowledge into the forecasting process: a principle-driven framework combining wind field image generation with physical constraints [31], dynamic ensembles integrating Numerical Weather Prediction (NWP) outputs with deep reinforcement learning and error-series modeling [32], and machine learning approaches that enhance NWP performance while accounting for topographic effects on wind fields [33]. Related efforts include a two-stage system combining error correction with nonlinear ensemble strategies [34], privacy-preserving federated deep reinforcement learning [35], statistical hybrid models that exploit complementary forecasting techniques [36], hybrid multi-step-ahead prediction from univariate data [37], and interpretability-oriented “glass box” models that preserve accuracy while improving transparency [38]. In addition, large-scale analyses based on multiple reanalysis datasets have assessed the variability and long-term consistency of surface wind speed and wind power density across different regions [39] and across the Northern Hemisphere [40]; these studies characterize wind resource behavior but are not designed for operational short-term point forecasting.

A further group focuses on uncertainty-aware forecasting. Dynamic interval-based prediction that explicitly accounts for wind power ramp events [41], probabilistic forecasting for sizing and controlling hybrid energy storage systems [42], hybrid predictive density estimation for generating prediction intervals [43], conformal prediction combined with feature importance selection [44], nonparametric stochastic differential equations for ultra-short-term forecasting [45], and an efficient probabilistic method for limited-data settings [46] all aim to quantify forecast uncertainty rather than produce a single point estimate. These approaches improve forecast reliability but add modeling and calibration complexity, and their practical value still depends on the quality of the underlying point forecast.

Taken together, the literature reveals a clear trend toward increasingly complex hybrid, decomposition-based, and deep architectures, evaluated across heterogeneous datasets and forecast horizons ranging from ultra-short-term to day-ahead. Three methodological gaps emerge from this body of work. First, many studies emphasize wind power rather than wind speed and report results without a strictly chronological, leakage-free evaluation protocol—a particular risk in decomposition-before-split pipelines. Second, strong yet simple baselines, such as the persistence model and regularized linear regression, are frequently omitted or under-reported, making it difficult to judge whether the added complexity is genuinely justified. Third, reproducibility is rarely addressed in detail, including raw-logger data handling, timestamp parsing, and causal missing-value imputation. The present study addresses these gaps by developing a reproducible, leakage-free framework for short-term wind speed forecasting that benchmarks complex models against rigorous baselines and supports the comparison with comprehensive diagnostic analyses.

3. Materials and Methods

This study aims to develop a reliable and reproducible time-series machine learning framework for short-term wind speed forecasting. The analysis is based on real measurement data obtained from the unlicensed Damla 4 and Damla 5 wind power plants in the Bandırma district of Balıkesir, Türkiye, and the nearby Renevo 40 wind measurement station. The measurement system records wind speed and direction at multiple heights, along with meteorological and system-related variables. The data are stored in TOA5 format with a 10 min sampling interval, covering a 14-month period. The framework implements leakage-free data processing and modeling steps suitable for time series: quality control of the data, split-aware causal imputation for missing values, leakage-free feature engineering, construction of lagged features, and chronological train validation test splitting for machine learning model training. Model performance is evaluated using error metrics, bias measures, and statistical tests, complemented by regime-based analysis to assess performance across different wind speed ranges. All models were implemented in the MATLAB R2023b environment using the Statistics and Machine Learning Toolbox.

3.1. Study Area and Data Source

This study utilizes measurements collected from two unlicensed wind power plant sites, Damla 4 and Damla 5, located in the Bandırma district of Balıkesir, Türkiye, along with data from the nearby Renevo 40 wind measurement station. The project sites were positioned around the wind measurement station, which carries the ID number 100133, within the Bandırma/Balıkesir area. Each wind power plant has a capacity of 1 MW, resulting in a combined capacity of 2 MW for the two projects. The study area is relatively flat and lacks significant topographical features, and the distance between the project sites and the measurement station is approximately 4.2 km. The general layout of the study area, including the measurement station and the project sites, is shown in Figure 1.

The Renevo 40 measurement station is installed on a 60.5 m meteorological tower, with wind speed and direction recorded at heights of 30 m and 60 m. After filtering out invalid or erroneous measurements, the dataset achieved an availability of 91.78%. The measurement campaign spanned 14 months, from 14 May 2021 to 29 July 2022. The positions of the wind turbines at the project sites were recorded using the UTM-WGS84 coordinate system (Zone 35T) and are summarized in Table 1. This setup provides high-quality, multi-height wind data suitable for short-term forecasting and model validation. The 91.78% data availability indicates that only a small portion of data was missing, ensuring reliability. Recording at two different heights allows for capturing vertical wind profiles, which can improve feature engineering for predictive models. Using precise UTM coordinates ensures accurate spatial referencing for correlating turbine performance with local wind conditions.

In this study, the analysis period was defined according to the TIMESTAMP field in the raw dataset. The original measurements were collected at 10 min intervals, spanning from 4 April 2021 at 15:40 to 3 August 2022 at 17:10. Since there can be slight discrepancies between the nominal measurement period and the actual timestamps recorded in the raw data, we used the TIMESTAMP field itself to determine the precise range of the analysis period. By relying on the TIMESTAMP field, the study ensures that all subsequent analysis and preprocessing accurately reflect the real timing of the measurements. This approach avoids potential inconsistencies that could arise from assuming a perfectly uniform sampling schedule, which is particularly important for time-series forecasting and temporal feature engineering.

3.2. Data Format and Variable Definitions

The Campbell Scientific data logger produces TOA5 format files which contain the unprocessed measurements. The document includes three distinct sections which contain metadata about the measurement devices and programs and a multi-row header that shows the measurement variables together with their respective units and the actual data that presents numerical values for each 10 min period. Each wind variable is stored as four summary statistics—average (Avg), standard deviation (Std), minimum (Min), and maximum (Max)—in separate columns; TIMESTAMP serves as the chronological index and RECORD as a sequential logger-assigned order number. The dataset contains 68,292 records which include 26 variables that are grouped according to their fundamental physical properties and three distinct measurement levels that are identified as v1, v2, and v3. The variable groups, definitions, and units are detailed in Table 2. The system extracts data from the multi-row header which enables proper reading of variable names and numerical records through their designated rows while maintaining chronological sequence for time-series analysis.

The researchers derived two quantities from the original measurements. The ten-minute period shows turbulence intensity (TI), which measures short-term wind variations, as the ratio of wind speed standard deviation to its mean value. The first measurement level uses a denominator with a 0.1 m/s minimum limit because researchers need to maintain numerical stability.

{T I}_{v 1} = \frac{{v 1}_{S t d}}{m a x ({v 1}_{A v g}, 0.1)}

(1)

The calculation of wind power density (WPD) uses its standard definition which requires air density and wind speed cubed to determine the available kinetic energy at different altitudes. Wind direction is expressed through vector components (WVc) to maintain continuity, while meteorological and system variables (temperature, pressure, air density, relative humidity, battery voltage, and sensor temperature) enable assessment of measurement conditions and quality control. The original TOA5 variable names are maintained for traceability throughout the entire document.

3.3. Predictive Modeling Workflow and Leakage-Free Evaluation Protocol

The research team developed a machine learning workflow which prevents data leakage for time-series data (Figure 2). The majority of forecasting studies experience information leakage because researchers either use random data splitting or apply preprocessing before they conduct chronological splits, or they create features which incorporate future data, which leads to them obtaining results which show better performance than actual operational work. The researchers conducted data splitting according to temporal order to establish three distinct datasets which included train, validation, and test sets. The researchers conducted preprocessing through their method which used training data and historical data to perform causal split-based processing. The chosen split ratios provided enough training data for model learning while creating separate validation and test groups to use in hyperparameter tuning and performance testing which required actual data and did not include any leakage.

3.3.1. Supervised Indexing and Prediction Horizon

The task requires forecasters to predict upcoming weather conditions for a single time interval that lasts 10 min. The target variable y represents the mean wind speed recorded at the initial measurement point v1 Avg. The supervised target requires a target shift that moves the target ahead by h so that the feature vector at time t will predict y(t + h) using data from rows 1 to N − h. The temporal alignment method guarantees that only past and present information can be accessed, thus stopping any data leakage from occurring. The persistence baseline uses the formula ŷ(t + h) = y(t) to check if a model can predict time-based changes which exist beyond the series autocorrelation present in the data.

3.3.2. Quality Control, Causal Imputation, and Feature Engineering

Quality control was applied first: physically implausible values were replaced with NaN, including negative wind speeds, out-of-range battery voltage and pressure readings, and stuck-sensor conditions identified from a causal moving standard deviation. The train split used only past observations for causal forward-fill-based missing-value completion, while the validation and test splits used the last valid value from the preceding split to prevent future information leakage at split boundaries. The detailed quality control thresholds and imputation rules are listed in Table 3.

Feature engineering was likewise performed separately within each split. Cyclic time features encode daily and seasonal periodicity:

h o u r s i n = \sin (2 π \frac{h o u r}{24}), h o u r c o s = \cos (2 π \frac{h o u r}{24})

(2)

d o y s i n = \sin (2 π \frac{d o y}{365.25}), d o y c o s = \cos (2 π \frac{d o y}{365.25})

(3)

The wind features were derived from two measurements which included the wind speed average and standard deviation together with the turbulence intensity measurements of Equation (1). The researchers developed target features which extended back 36 steps to study autoregressive behavior because they needed to track past information; the first 36 rows of each split, where lags are undefined, were removed, with the preceding split’s final values used to keep lag computation continuous. The maximum lag value of 36 corresponds to approximately 6 h under the 10 min sampling interval and was selected to capture short-term temporal persistence and intra-day wind variability while avoiding unnecessarily long lag structures that could increase redundancy and model complexity.

3.3.3. Models and Validation-Based Selection

Prior to modeling, features were standardized by z-score normalization with parameters estimated from the training set only and applied unchanged to the validation and test sets:

μ = m e a n (X_{t r}), σ = s t d (X_{t r}), σ = 0 \Rightarrow 1

(4)

The evaluation included six methods, which included a persistence baseline, support vector regression (SVR) with a Gaussian (RBF) kernel, least-squares gradient boosting (LSBoost), Random Forest (RF) based on bootstrap aggregation, Elastic Net regularized linear regression whose regularization parameter was selected by minimizing validation RMSE, and a stacking ensemble. The Elastic Net mixing parameter was fixed at α = 0.5 to provide a balanced compromise between L1 and L2 regularization, enabling simultaneous feature selection and coefficient stabilization under correlated lag-based predictors. The stacking ensemble combines base-model predictions through non-negative simplex-constrained weights which satisfy the conditions w ≥ 0 and Σw = 1 because the weights were obtained through mean squared error minimization. The system operates using two different modes, which include an academic assessment mode that establishes weights based on training results and shows outcomes on validation results, and an operational mode which establishes weights based on validation data and uses them to create the final test report. The model which exhibits the least validation RMSE serves as the final selection. Table 3 provides a summary of the model hyperparameters together with the entire pipeline configuration.

3.4. Performance Metrics and Regime-Based Evaluation

The assessment of model performance used multiple metrics which included RMSE, MAE, R², normalized RMSE (nRMSE), mean bias error (MBE), percent bias (PBIAS), MAPE, symmetric MAPE (sMAPE), explained variance score (EVS), and the Pearson correlation coefficient (r). For MAPE, the denominator was bounded below at 0.2 m/s for numerical stability. The Skill RMSE test measures model improvements by comparing results from various models against a persistence baseline. The test set used three wind speed categories for performance evaluation by dividing the data into three wind speed categories: low wind speed accounting for <3 m/s, medium wind speed between 3 and 8 m/s, and high wind speed exceeding 8 m/s.

4. Result

This section presents the results obtained from the wind speed forecasting models and the associated statistical analyses performed on the test dataset. The predictive performance of the evaluated models is first compared using several statistical metrics, including absolute error measures, relative error indicators, goodness-of-fit statistics, and bias measures. These metrics provide a quantitative assessment of the accuracy and reliability of the forecasting models. Following the global performance comparison, additional analyses are conducted to further examine the behavior and statistical properties of the selected model. Time-domain validation is used to evaluate how well the predicted wind speed values follow the temporal dynamics of the observed data. Distribution-based analyses are also performed to assess the agreement between predicted and measured wind speeds. Residual diagnostics are then applied to investigate the statistical characteristics of the prediction errors. These analyses include the examination of residual distributions, error dispersion patterns, and temporal dependence structures. Such evaluations provide insight into the stability and consistency of the forecasting model. Furthermore, regime-based performance analysis is conducted to evaluate model accuracy under different wind speed conditions. Additional statistical tests, including normality assessment and feature importance analysis, are used to better understand the relationships between the predictor variables and the forecasting results. Finally, statistical model comparison and forecast error decomposition analyses are performed to quantify the relative predictive performance of the competing models and to identify the main sources of forecasting error.

4.1. Overall Model Performance

Table 4 summarizes the forecasting performance of all evaluated models on the test dataset for one-step-ahead (10 min) wind speed prediction. The comparison includes several statistical indicators that reflect different aspects of model performance. Absolute error metrics such as RMSE and MAE are used to quantify the magnitude of prediction errors, while relative error measures including nRMSE, MAPE, and sMAPE provide a normalized evaluation of forecasting accuracy. In addition, goodness-of-fit indicators such as the coefficient of determination (R²), explained variance score (EVS), and the Pearson correlation coefficient (r) are reported to assess how well the predicted values follow the observed wind speed variations. Bias metrics, namely mean bias error (MBE) and percentage bias (PBIAS), are also included to identify potential systematic over- or under-prediction behavior of the models. These performance metrics provide a comprehensive evaluation framework that allows a reliable comparison of the predictive capabilities of the considered forecasting approaches.

Among the evaluated models, excluding the baseline approach, the ELASTIC model achieves the lowest absolute prediction error on the test dataset. The model yields an RMSE of 0.633 m/s and an MAE of 0.399 m/s, indicating competitive forecasting accuracy under short-term operational conditions. The goodness-of-fit metrics indicate a strong predictive performance. The model explains most of the variability in wind speed, with R² = 0.977 and EVS = 0.977. In addition, the Pearson correlation coefficient has a value of r = 0.989. The normalized RMSE is also relatively low (nRMSE = 0.031), further confirming the model’s strong predictive capability. The PERSIST model, which is used as a baseline reference, produces performance values very close to those of ELASTIC (RMSE = 0.634 m/s, MAE = 0.392 m/s, R² = 0.977, EVS = 0.977). However, the Skill RMSE value close to zero indicates that this model essentially relies on the short-term temporal persistence characteristic of wind speed. For this reason, the persistence model is primarily used as a benchmark to assess the additional predictive value provided by more advanced forecasting models. The LSBOOST model, which is based on tree-based boosting, demonstrates a relatively strong overall fit (RMSE = 0.715 m/s, R² = 0.971). Nevertheless, its relative error indicators are higher than those of ELASTIC and PERSIST, particularly under low-wind-speed conditions where MAPE reaches 22.6% and sMAPE 11.6%. Similarly, the Random Forest (RF) model exhibits inconsistent performance across different wind speed levels. Although it captures general trends, its prediction errors remain relatively high (RMSE = 0.880 m/s, MAPE ≈ 52.7%), indicating reduced reliability compared to the best-performing models. The ENS ensemble model achieves performance levels that are very close to those of ELASTIC and the persistence baseline (RMSE = 0.636 m/s, R² = 0.977). However, the slightly negative Skill RMSE (−0.003) suggests that the ensemble strategy provides only a marginal improvement over the persistence approach. This result implies that, for a very short forecasting horizon such as 10 min, the contribution of stacking weights may remain limited. It should also be noted that these ensemble results correspond to deployment conditions where stacking weights were refitted using the validation dataset. In contrast, the SVR model produces the weakest performance among all evaluated methods. It records a relatively large prediction error (RMSE = 1.417 m/s, MAE = 0.695 m/s) and an extremely high relative error (MAPE ≈ 99%). The strongly negative Skill RMSE (−1.235) further indicates that the model performs even worse than the baseline persistence approach. An examination of the bias metrics also provides additional insight into model behavior. Both ELASTIC and PERSIST display MBE and PBIAS values close to zero, indicating that their predictions do not suffer from significant systematic overestimation or underestimation. In contrast, the SVR and RF models exhibit negative and relatively large PBIAS values, which indicates a consistent tendency to underestimate wind speed in the test dataset.

Based on the above-presented results, the ELASTIC model shows a balanced forecast behavior in terms of different evaluation criteria such as the absolute error, relative error, goodness-of-fit measures, and bias indicators. Although the performance of the persistence baseline model turned out to be quite similar to that of the ELASTIC model for the very short-term forecast horizon, the latter showed low systematic biases, stability of residual behavior, and consistency in a statistical sense. Having in mind these balanced statistical properties of the model, the ELASTIC model was chosen for conducting the following residual analysis (Figure 3).

We created a radar chart to compare the forecasting performance across multiple metrics, including RMSE, MAE, R², nRMSE, MAPE, and sMAPE. From the chart, it is clear that the ELASTIC model performs consistently well across almost all metrics. The persistence model shows similar performance in some metrics because of the short 10 min forecast horizon, but ELASTIC keeps errors more balanced overall. SVR and RF models, on the other hand, have much higher relative errors, especially for MAPE and sMAPE. This visual comparison confirms what we observed in Table 4 and makes it easy to see that ELASTIC demonstrates more balanced forecasting behavior across multiple evaluation metrics.

We also checked the relationships between the predictor variables to see if multicollinearity could be a problem. To do this, we calculated the Variance Inflation Factor (VIF) for each environmental variable used in the models. Table 5 shows all VIF values, which are well within the acceptable range. This tells us that there are no strong linear dependencies between the predictors, so the model coefficients remain stable and the machine learning performance is not negatively affected.

4.2. Data Preprocessing, Quality Control, and Descriptive Statistics

Raw data was transferred from Excel to the MATLAB R2023b environment, and a systematic preprocessing step was performed before the analysis. In the first step, the TIMESTAMP field was standardized to form a regular time-series structure, while derivative time scales (hourly, daily, and monthly) were created for visualization purposes. In the second step, physical consistency checks were performed, and physically unreasonable data were removed from the dataset. It should be noted that several extreme values reported in the descriptive statistics correspond to transient sensor/logger anomalies or short-duration measurement artifacts observed in the raw monitoring system outputs. These values were intentionally retained within the descriptive statistical summaries to transparently reflect the characteristics of the raw dataset. However, during predictive modeling, the leakage-free quality control procedure identified and filtered unstable or physically implausible observations prior to model training and evaluation, thereby minimizing their influence on forecasting performance. This step was an additional layer of quality control following the general data filtering procedure described above. This step was an additional layer of quality control for the analysis, following the general data filtering procedure as described in the report of the measurement campaign. In the third step, the treatment of missing data was determined. To avoid artificially affecting distribution-based analyses, missing observations were not filled using linear interpolation or other imputation methods; instead, the corresponding records were excluded from the dataset in a manner that did not compromise the analyses. This approach specifically aims to preserve extreme values and the distribution characteristics. The decision not to fill missing values applies only to exploratory data analysis (EDA) and the visualization of descriptive statistics; during predictive modeling, a split-aware causal imputation approach was additionally applied to prevent information leakage. Missing data rates were also calculated on a per-variable basis. According to the MissingFraction summary, the proportion of missing values across all variables was negligible, with only the h1 Avg variable exhibiting a very low missing fraction of approximately 1.4643 × 10⁻⁵. At this level, the missing data has no meaningful effect on statistical distributions or subsequent analysis steps. Descriptive statistic calculations were performed for all variables, providing quantitative information about the scale, unit, value range, variability, and possible range of extreme values of the dataset, although the focus is entirely descriptive in nature, without any causal relationship between the variables or any model-related interpretative results.

Descriptive statistic results reported in the current study include the following:

(1): Record information (RECORD);
(2): Wind speed statistic at the three levels;
(3): Turbulence intensity indicators (TI);
(4): Wind power density (WPD);
(5): Wind vector components (WVc);
(6): Meteorological and system-related variables;
(7): Reference extreme columns.

Using this structure, the dataset is able to track the important quantities of the wind regime, as well as the auxiliary variables related to the measurement conditions, in the same table format. The descriptive statistic results reported in the current study, as shown in Table 6, include the measurement columns of the TOA5 data structure, as well as the derived variables obtained through the analysis procedure. As such, the turbulence intensity (TI) variables are not included in the raw header of the TOA5 data structure, although they are included in the current study’s descriptive statistic results reported in Table 3, as they are obtained through the analysis procedure by using the wind speed statistic at the respective measurement levels. As such, the current study’s descriptive statistic results reported in Table 6 do not follow the same structure as the header of the raw file, although they follow the same structure as the final variables included in the analysis procedure, ensuring the transparent and explicit reporting of the value range, distribution spread, and extreme value range of the variables, although the focus is entirely descriptive in nature without any causal relationship between the variables or any model-related interpretative results.

In Table 6, TI variables are indicated, although turbulence intensity (TI) columns are not specified in the direct TOA5 header example. However, the final dataset for this study includes TI variables, TI v1, TI v2, and TI v3.

4.3. Exploratory Data Analysis and Visualizations

Before the modeling step, a number of visualizations were created as part of a process referred to as exploratory data analysis (EDA) to systematically uncover the distributional structure, level-dependent variability, and basic multivariate relationships in the data. The visualizations should not be considered as a way to display any results or as part of a performance evaluation process, but rather as a way to reproducibly document, prior to modeling, the typical value range and extreme values of the measurements, a condensed view of the temporal dependencies, and the linear dependencies among the essential variables. To explore the level-dependent distribution of the wind speed, boxplots were created for the 10 min mean wind speed values measured at the three levels. The boxplots simultaneously present the median, interquartile range (IQR), and outliers determined according to predefined thresholds, allowing for a concise summary of the distribution characteristics. These visualizations were used to methodologically highlight the comparability of distribution structures across measurement levels and the coverage of the wind speed range within the dataset (Figure 4).

Boxplots were created for the derived turbulence intensity (TI) variables to examine the level-dependent distribution of short-term wind speed variability. This visualization allows the central tendencies, spread ranges, and outlier behavior of the turbulence intensity values to be simultaneously observed across measurement levels. In this way, the distribution characteristics of short-term wind fluctuations within the dataset are documented prior to modeling (Figure 5).

The Pearson correlation coefficient was used to describe the linear relationship between wind speed variables and meteorological parameters. In this context, pairwise correlation coefficients were calculated between selected meteorological variables and wind speed variables at different measurement levels, and the results were presented as a correlation matrix. The correlation heatmap allows the sign and relative magnitude of linear dependencies between variables to be simultaneously observed in a single visual plane, aiming to methodologically document the fundamental correlation patterns of the multivariate dataset (Figure 6).

To represent the diurnal as well as the seasonal variations in the wind speed, the measurements were grouped along the axes representing the months (1–12) and the hours (0–23). The arithmetic mean of the measurements of the wind speed corresponding to each of the combinations of the month–hour pairs was calculated, and the values were represented as a heatmap of size 12 × 24. This does not show the actual measurements directly; it shows the aggregated representation of the measurements corresponding to the same month–hour pairs. The aim is to show the diurnal and seasonal variations in the measurements over a long period of time, represented over a single plane (Figure 7).

In addition to the aggregated representation in the form of the month–hour heatmap, the scatter plot was created along the hourly axis for the chosen wind speed variable, aiming at the direct representation of the diurnal distribution of wind speed at the raw data level. In the created visualization, each point is intended to represent the individual data point according to the respective hour without any form of summarization or averaging. This is meant to allow the raw distribution to be presented as an additional EDA result below the aggregated representations (Figure 8).

Considering that the dataset covered two years (2021 and 2022), yearly correlation grids were produced to methodologically investigate whether the linear dependency between the core variables remained consistent over time. The correlation coefficients were computed individually for each year and presented in heatmap format. These annual correlation grids were used as a visual check to examine whether there were any significant structural changes in the fundamental dependency patterns between variables (Figure 9).

In this study, all data reading, calculation of descriptive statistics, correlation analyses, and visualizations produced within the framework of exploratory data analysis (EDA) were performed in the MATLAB environment. The objectives of this stage are: (i) to transparently document the numerical scales, units, and ranges of the dataset through descriptive statistics (Table 6), (ii) to methodologically reveal the distribution characteristics, temporal patterns, and linear relationships among core variables (Figure 4, Figure 5, Figure 6, Figure 7, Figure 8 and Figure 9), and (iii) to define the data preprocessing and characterization steps prior to modeling in a reproducible framework.

4.4. Time-Domain and Distribution-Based Validation

We have also tested the short-term prediction capability of the ELASTIC model. This was done using the temporal performance and accuracy of the distributions of predicted wind speed values in the test set. The visualized output was provided by two graphs: one describing the temporal performance of prediction, and another comparing the distributional accuracy of observations and predictions. Figure 10 shows the comparison of observed and predicted time series of wind speeds, while Figure 11 compares the linear consistency and the distribution of observations and predictions.

It can be seen from Figure 10 that the ELASTIC model demonstrates good temporal tracking performance when predicting low- and high-speed changes in wind speed values. Prediction is performed with a slight time lag compared to the observed wind speed values. Even during rapid changes in wind speed, the model is capable of preserving variability in wind speed signal values. The above claim can be verified by looking at Figure 11, which shows the scatter plot of observations and predictions. It can be observed that most values fall close to the 45-degree line. Thus, the obtained measures (RMSE ≈ 0.63 m/s, MAE ≈ 0.40 m/s, and R² ≈ 0.98) show that a considerable amount of variability in test data has been captured. It should be noted that dispersion occurs under low wind speed values, but the model shows stable performance during medium- and high-speed values.

Upon reviewing the graphs provided by Figure 10 and Figure 11, we found out that the ELASTIC model demonstrates equal temporal tracking and consistent statistical distributions of predictions made under certain forecast conditions. This finding was supported by the previously discussed global metrics and residuals. Generally speaking, the presented results allow us to state that the ELASTIC model forecasts wind speeds accurately under short-term conditions.

4.5. Residual Analysis and Statistical Consistency

We also analyzed the residuals of the ELASTIC model to better understand its error behavior. Residuals were calculated as the difference between observed and predicted values. We looked at (i) the overall distribution, (ii) residuals relative to predicted magnitudes, and (iii) temporal dependencies, shown in Figure 12, Figure 13 and Figure 14. The histogram in Figure 12 shows that most residuals are tightly clustered around zero, indicating minimal bias. We observed a sharp peak with many small errors and only a few large residuals, showing that the model consistently performs well across the majority of the test set. Overall, we confirmed that the errors are small, well distributed, and do not dominate the model’s predictions.

We checked the residuals more closely using a scatter plot and statistics. The scatter plot shows that residuals are generally centered around zero with no clear trend. At low predicted wind speeds, the spread is wider, forming a wedge shape, while at medium to high values, the spread is narrower. This indicates that the errors are mostly small and clustered near zero, though a few rapid changes may cause larger deviations. From Table 7, the mean residual is very close to zero (0.026 m/s), skewness is low (0.11), and kurtosis is near normal (3.02). These results confirm that the ELASTIC model’s predictions are statistically reliable, with small, balanced, and symmetric errors.

According to the Diebold–Mariano test results, the ELASTIC model yields a statistically significant improvement over the SVR and RF models in terms of forecast accuracy, while the difference from the persistence benchmark is statistically insignificant in the context of an ultra-short-term forecasting horizon used in this research. It corresponds to the presence of a highly persistent temporal structure of wind speed series at 10 min forecasting interval lengths. Thus, the practical evaluation of forecast improvements must take into account not only p-values but also residual stability, systematic bias patterns, and forecasting consistency within regimes.

We also looked at the temporal behavior of the residuals using the autocorrelation function (ACF). Most lags show coefficients close to zero, with only small positive values at the first few lags. This indicates that the residuals do not have strong or persistent autocorrelation. However, the Ljung–Box test shows that the residuals are not fully independent for the first 10, 20, and 30 lags (p < 0.001). In short, the residuals have a weak but statistically significant temporal dependence, which matches the small peaks we see in the ACF plot.

To provide additional statistical support for the residual interpretation, several formal diagnostic tests were also applied to evaluate residual normality, stationarity, heteroscedasticity behavior, and temporal dependence characteristics, as summarized in Table 8.

Residual normality was evaluated using the Jarque–Bera test together with the histogram and Q–Q plot analyses presented in Figure 12 and Figure 13. The Jarque–Bera results indicated mild deviations from strict normality, primarily associated with tail behavior under extreme wind conditions, although the central residual distribution remained approximately symmetric and well-behaved. Residual stationarity characteristics were confirmed using the Augmented Dickey–Fuller (ADF) test, while ARCH-based diagnostics indicated the presence of heteroscedastic residual variance behavior, which is commonly observed in atmospheric and wind-related time series due to changing turbulence intensity and stochastic variability. Together with the Ljung–Box and ACF analyses, these results suggest that the residual structure remains statistically stable overall, despite weak temporal dependence and variance fluctuations under certain operating conditions.

Looking at Figure 12, Figure 13 and Figure 14 together, we can see that the residuals from the ELASTIC model on the test set are mostly centered around zero and tightly clustered. There are no large swings or dominant patterns in autocorrelation. We do notice a small but statistically significant temporal dependence, though it is minor. Overall, this confirms that the low RMSE and MAE values we observed earlier reflect not just numerical accuracy, but also a stable and consistent error structure. The residual distributions remained narrowly centered around zero with limited systematic bias, supporting the statistical consistency of the obtained forecasting behavior. Based on these findings, the ELASTIC model may be considered a statistically consistent reference approach for short-term wind speed forecasting under the evaluated conditions.

4.6. Regime-Based Performance Evaluation

We divided the test set into Low, Mid, and High wind speed ranges to see how the ELASTIC model performs across different conditions. For Low wind speeds, the model struggled the most. We observed an RMSE of 0.924 m/s and an MAE of 0.491 m/s, which are noticeably higher than the errors in the Mid and High ranges. The R² and EVS values are also very low (0.140 and 0.237, respectively), meaning the model explains only a small portion of the variance at low speeds. Skill RMSE is negative (−0.140), and relative errors are extremely high, with MAPE at 82.1% and sMAPE at 56.0%. The residuals in this range are widely spread, with some extreme negative values, suggesting that the model has difficulty capturing rapid changes or small signals when wind speeds are low. This indicates that the ELASTIC model is less reliable in low-wind conditions, likely due to higher turbulence and a low signal-to-noise ratio. It performs much better in Mid and High wind speed regimes, where errors are lower and predictions more stable.

We looked at Table 9 to see how the ELASTIC model performs across different wind speed ranges, and the results show clear improvement as wind speeds increase. In the Mid wind speed regime, the model does much better than at low speeds. The RMSE drops to 0.506 m/s and MAE to 0.352 m/s. R² and EVS are both 0.860, meaning the model explains most of the variance in this range. Relative errors are low and balanced, with MAPE ≈ 6.84% and sMAPE ≈ 6.87%. The residual boxplot shows a narrow spread with a median near zero, indicating stable and unbiased predictions. Skill RMSE is positive at 0.074, confirming the model outperforms the persistence baseline here. In the High wind speed regime, performance improves further. R² and EVS rise to 0.928, showing most variance is captured. RMSE is 0.632 m/s and MAE is 0.412 m/s, so errors remain low even with higher variability. Residuals are slightly more spread than in the Mid range, but they remain symmetrically distributed around zero, indicating no systematic bias. The positive Skill RMSE of 0.025 shows the model still performs better than the baseline. We see that the ELASTIC model handles Mid and High wind speeds very well, producing stable, accurate, and unbiased predictions, while Low wind speeds remain the most challenging in Figure 15.

As can be seen, the ELASTIC model shows more consistent performance under the Mid and High wind speed regimes. Based on the regime-dependent performance metrics presented in Table 9, forecasting accuracy is noticeably higher in these regimes, while based on the behavior of residuals shown in Figure 14, there is limited long-term autocorrelation and generally stable residuals. Under the Low wind speed regime, forecasting performance is noticeably worse. It is possible that the performance drop is connected with measurement uncertainty, turbulence intensity, and lower signal-to-noise ratios during calm weather. All of these factors lead to worse model performance. In general, it appears that the model is able to show more consistent short-term forecasting behavior under moderate- and high-wind-speed regimes, but other techniques might be necessary to improve performance in calm wind regimes.

4.7. Bland Altman and Normality Analysis

We looked deeper into how well the ELASTIC model predictions match the observations using Bland Altman analysis and a Q-Q plot, shown in Figure 15 and Figure 16. From the Bland–Altman plot (Figure 16), the differences between predictions and observations are plotted against their average. The mean difference (bias) is +0.026 m/s, showing that the model has a very low bias. Most differences fall within the 95% limits of agreement (−1.213 to 1.265 m/s), confirming good agreement between predicted and observed values. Importantly, the differences do not trend upward or downward as wind speed increases. This means there is no proportional bias or heteroscedasticity, indicating that the model’s errors remain stable across the full wind speed range. In short, the Bland–Altman analysis reinforces that the ELASTIC model is both accurate and consistent.

We also checked the normality of residuals in two ways. First, the Q-Q plot in Figure 17 shows that most residuals in the center follow the reference line, indicating the distribution is roughly normal around the mean. At the tails, there are minor deviations, which are expected because wind speeds can change suddenly. These deviations occur only in a few cases, so they do not significantly affect the overall distribution. To confirm this statistically, we performed the Shapiro–Wilk normality test. Table 8 shows a test statistic of 0.987 and a p-value of 0.084, which is above the 0.05 significance level. This means we cannot reject the null hypothesis of normality, confirming that the residuals are approximately normally distributed. In short, both visual and statistical analyses indicate that the residuals are mostly normal, supporting the reliability of the ELASTIC model in Table 10.

From the Bland–Altman and Q-Q plots, it is clear that the ELASTIC model shows very low bias, strong agreement between predictions and observations, and a residual distribution that is approximately normal in the central region. These results indicate that the model is statistically consistent and operationally reliable for short-term (10 min ahead) wind speed forecasting. In short, the ELASTIC model not only delivers accurate predictions but also maintains reliability and statistical consistency, making it a robust choice for practical forecasting applications.

4.8. Feature Importance and Sensitivity Analysis

We carried out a permutation feature importance analysis to understand which input variables most influence the ELASTIC model’s wind speed predictions. Basically, we shuffled each predictor one at a time and observed how much the prediction error increased; the bigger the increase, the more important the variable. This approach helps us interpret the model without changing its structure and also highlights the physical drivers behind wind speed variability. The results of this analysis, summarized in Table 11, show which environmental variables contribute the most to accurate forecasting, giving both practical and theoretical insights into the model’s behavior.

Looking at the ELASTIC model’s predictor importance, we can see that previous wind speed clearly dominates, while other environmental variables play smaller but meaningful roles. This highlights that short-term wind speed forecasting relies heavily on temporal persistence, with atmospheric conditions providing additional fine tuning in Figure 18.

From the table, it is clear that previous wind speed alone explains almost 40% of the model’s performance, confirming that short-term forecasts rely heavily on recent wind trends. Solar irradiation and air temperature together contribute over 35%, highlighting the role of environmental conditions in shaping wind dynamics. While humidity, pressure, and wind direction have smaller effects, they still provide meaningful adjustments that improve the prediction. This shows that the model effectively combines temporal continuity with physical drivers of wind, making it both accurate and interpretable. In practice, short-term predictions are mainly guided by recent wind patterns, while environmental variables help fine-tune the forecasts for subtle changes.

4.9. Statistical Model Comparison Using Diebold–Mariano Test

Beyond the usual performance metrics, we also compared the forecasting models statistically using the Diebold–Mariano (DM) test. This test checks whether the difference in predictive accuracy between two models is statistically meaningful. Essentially, it examines the loss differential between the errors of the models to see if the average difference is significantly different from zero. In this study, we used the squared error as the loss function, which aligns with the RMSE metric applied throughout the analysis.

Formally, for two models i and j, the loss differential at time t is defined as follows:

d_t = L(e_i,t) − L(e_j,t)

(5)

where L(⋅) is the loss function, and e_i,t and e_j,t are the forecast errors of models i and j, respectively.

The DM statistic is then calculated as follows:

DM = \frac{\bar{d}}{\sqrt{V a r (\bar{d})}}

(6)

This provides a quantitative measure to determine whether one model is significantly more accurate than another.

Looking at Figure 19, the ELASTIC model demonstrates statistically significant improvements over SVR, RF, and partially over LSBOOST, while the differences relative to ENS and the persistence baseline remain statistically insignificant. These findings indicate that ELASTIC provides competitive and statistically consistent forecasting performance under ultra-short-term forecasting conditions, although its advantage over persistence-based approaches remains limited at the evaluated forecasting horizon.

The results of the Diebold–Mariano test are summarized in Table 12. Looking at the comparisons, we see that ELASTIC significantly outperforms SVR and RF (p < 0.01), confirming that these models are much less accurate for short-term wind speed forecasting. A marginally significant difference is also observed between ELASTIC and LSBOOST (p ≈ 0.048), indicating that ELASTIC still performs slightly better, but the advantage is smaller. ELASTIC shows no significant difference compared to the ENS model or the persistence baseline. This observation is also consistent with the similar RMSE values reported in Table 4. For very short-term forecasts, it performs similarly to ensemble and baseline models. This indicates that the ELASTIC model maintains statistically consistent forecasting behavior under ultra-short-term forecasting conditions. The DM test results support the use of ELASTIC as the primary model for the subsequent residual and regime-based analyses due to its balanced statistical performance across multiple evaluation criteria.

4.10. Forecast Error Decomposition Analysis

To better understand where the prediction errors come from, we performed a forecast error decomposition. Instead of just looking at the overall error size, we split the error into three parts: systematic bias, variance differences, and random fluctuations. This helps show whether errors come from model bias or from the natural randomness of wind. Using the MSE decomposition framework, the total error can be written as follows:

MSE = Bias² + Variance + Random Error

(7)

Here, the bias term captures systematic over or underprediction, the variance term measures how much the predicted variability differs from the observed, and the random error reflects unpredictable changes in wind speed. This approach allows us to see not just how big the errors are, but why they occur, which is useful for improving model design and understanding limitations in short-term wind forecasting.

Figure 20 shows that most of the prediction error comes from random atmospheric variability. Systematic bias and variance differences play only a small role in the total error. This means the ELASTIC model is not systematically over- or underpredicting, and it captures the variability of wind well. Most remaining errors are due to natural, unpredictable fluctuations in the atmosphere, which are difficult to eliminate.

The ELASTIC model shows minimal systematic bias according to Table 13 which accounts for only 1.8% of total forecast errors. The variance difference component also remains limited (7.6%), indicating that the predicted wind speed variability is generally consistent with the observed dynamics. The majority of the remaining forecast error (90.6%) is associated with random fluctuations, which reflects the inherently variable and stochastic nature of wind behavior. The research results demonstrate that the proposed framework delivers accurate forecasting results which maintain statistical consistency throughout short-duration operational testing. The operational decision-making processes for wind energy scheduling, reserve planning, and short-term grid balancing applications will benefit from the stable short-term forecasting results.

5. Discussion

In this study, we assessed the short-term (Δt = 10 min) wind speed forecasting ability of the ELASTIC model using a multi-dimensional evaluation framework, rather than relying solely on traditional metrics like RMSE or MAE. We found that while conventional error metrics are useful, they do not fully capture the time-dependent behavior, error structure, or regime-specific performance of the model. The time-series analysis indicated that the ELASTIC model generally captures fluctuating wind speed behavior, including relatively rapid variations, while maintaining reasonable temporal consistency with the observed measurements. These observations indicate statistically consistent forecasting behavior under the evaluated forecasting conditions. Residual analysis further supports this interpretation. Additional residual diagnostics were also performed to evaluate statistical assumptions underlying the forecasting errors. Formal normality assessment using the Jarque–Bera test, residual stationarity analysis using the Augmented Dickey–Fuller test, and heteroscedasticity analysis using ARCH-type diagnostics were conducted to complement the graphical residual analyses. These tests further supported the statistical consistency of the residual structure under the proposed leakage-free forecasting framework. Errors were mostly concentrated around zero, varied in a narrow range, and did not show strong systematic trends, while only weak but statistically significant temporal dependence was detected. Although the Ljung–Box test indicated weak but significant temporal dependence, the autocorrelation function confirmed that its amplitude is minimal. This suggests that the low RMSE and MAE values are not artifacts of numerical results but are structurally backed, reinforcing the statistical reliability of ELASTIC. Bland–Altman and Q-Q plots also indicated minimal bias (~0.026 m/s) and an approximately normal error distribution, with minor deviations at extreme values. This is particularly valuable for operational use, as it ensures that large errors are unlikely, unlike some high accuracy models that are sensitive to extreme wind conditions [47,48]. Regime-based evaluation revealed that forecasting accuracy strongly depends on wind speed. At low wind speeds, the model shows high errors (RMSE = 0.924 m/s, MAPE ≈ 82%), likely due to measurement uncertainty, a low signal-to-noise ratio, and turbulence, consistent with previous studies highlighting the chaotic nature of near-calm conditions [49,50]. In contrast, the model performs much better in mid- and high-wind-speed regimes. For the mid regime, RMSE = 0.506 m/s and R² = 0.860; for the high regime, RMSE = 0.632 m/s and R² = 0.928. These results indicate statistically consistent forecasting behavior, which is particularly important for high wind speeds relevant to wind energy generation. In comparison with the PERSIST model, one can observe that a sizable proportion of the predictive power is due to temporal persistence, while the ELASTIC model adds additional predictive power, especially at mid and high wind speed levels. Persistence models are treated as basic references in the ultra-short-term wind forecasting area since wind speed processes possess very pronounced temporal autocorrelation structure.

The 10 min forecasting horizon has been chosen due to its applicability for short-term decision-making in wind energy, which implies fast scheduling adjustments, reserve revisions, and near-real-time balancing. In this regard, temporal persistence naturally forms the major element of ultra-short-term forecasts. This is the reason why the persistence benchmark is considered particularly robust in the present case study. With respect to longer forecasting horizons, the effects of temporal autocorrelation are likely to diminish. It means that the relative value of nonlinear and multivariate methods would increase. However, at the moment, only short-term forecast horizons are under consideration, and the leakage-free methodology developed herein might be revisited in the future for longer forecasting horizons.

Therefore, even relatively small gains beyond persistence could have practical significance, provided that these improvements were coupled with better statistical consistency, lower systematic errors, greater operational stability, and diagnostic adequacy of residuals.

The proposed framework establishes its primary value not only through numerical RMSE reduction, but also through the development of a forecasting framework with statistically consistent and interpretable behavior under realistic forecasting conditions. This confirms that the model does more than simply replicate past wind trends; it leverages environmental variables to improve forecast accuracy. In addition, the performance features described above provide valuable insights regarding the model’s complexity and applicability to short-term wind forecasting problems. Due to the fact that the prediction horizon considered in this research is 10 min, the forecasting process will be considerably influenced by the factors associated with temporal persistence and stable autoregressive behavior. In this context, much of the predictive power in the considered forecasting task can be captured through regularized linear relations using lagged features and environmental covariates. Even though modern studies prove the capabilities of complex deep learning architectures like LSTM-, GRU-, and Transformer-based models to solve complex forecasting tasks, the above approaches typically require more computational resources, larger amounts of training data, and significant hyperparameter optimization efforts. On the other hand, the ELASTIC architecture showed stable and consistent results within this current study without needing computationally extensive optimization processes and without experiencing leakage during chronological validation. In the absence of computational complexity benchmarks or real-time latency analysis in this current study, it is noted that the ELASTIC algorithm offers significant advantages in terms of computational performance, robust training, and limited hyperparameter optimization compared to more computationally complex architectures. It is possible that these characteristics would provide advantages in using this algorithm in an environment requiring short-term model updates and operational forecasting. The results from the forecasting indicate stability, consistency, and physical interpretability. The analysis on the error structure (residual distribution, autocorrelation, and Bland–Altman) shows low bias and nearly normal distribution, with only minor variations caused by wind dynamics.

The results of our study demonstrate that model validation needs error-structure analysis and regime-dependent evaluation methods for proper assessment of its performance. The ELASTIC system operates effectively at medium and high wind speeds but its performance at low wind speeds needs multiple improvements through specific automatic system calibration techniques and hybrid modeling and multi-step forecasting methods. The proposed framework can achieve operational deployment through its wind speed range.

The present study primarily focuses on deterministic point forecasting under a leakage-free evaluation framework. Future research needs to address three essential aspects, which include creating formal pathways for measuring uncertainty from sensor data to machine learning predictions, developing probabilistic uncertainty measurement systems, and building forecasting methods that use calibration knowledge. The operational reliability of short-term wind forecasting systems can be enhanced through the combination of prediction intervals, uncertainty-aware ensemble methods, and probabilistic forecasting frameworks.

Our multi-faceted evaluation framework shows that holistic assessments create better wind forecasting performance evaluations than single global metrics which previous research has documented [47,48,49,50,51,52]. The ELASTIC framework demonstrates statistically consistent short-term wind forecasting behavior under the evaluated conditions.

6. Conclusions

This paper has presented an information-leakage-free machine learning-based framework for short-term (10 min ahead) wind speed forecasting using practical data gathered from an actual wind farm in the Bandırma/Balıkesir region of Türkiye. The following machine learning models have been comparatively analyzed using information-leakage-free chronological validation settings: SVR, RF, LSBOOST, ENS, and ELASTIC models.

Among the considered models, the ELASTIC model demonstrated statistically consistent and competitive forecasting performance on the test set, with RMSE ≈ 0.63 m/s, MAE ≈ 0.40 m/s, and R² = 0.977. Residual diagnostics, Bland–Altman, Q-Q-plot tests, and regime-based assessments indicate that the model shows low systematic bias and statistically consistent forecasting behavior, especially in the cases of medium- and high-wind-speed regimes. Forecast error decomposition further suggests that natural variability in the atmosphere is the main source of forecast errors, while systematic bias plays only a minor role in generating those errors. The results also indicate that ultra-short-term wind forecasting is highly sensitive to temporal persistence effects because of the extremely strong autocorrelation structure in the time series. Under such circumstances, it might make sense to apply regularized linear models with lagged features and environmental predictors which could deliver comparable performance to deep learning-based architectures while potentially requiring lower computational complexity. However, there are some limitations associated with the proposed framework. In particular, this work addresses only deterministic point forecasting under a 10 min ahead prediction horizon without performing any uncertainty quantification, uncertainty propagation, and real-time computations. Additionally, it seems that forecasting accuracy is slightly lower in the case of low wind speeds probably owing to increased turbulence and signal noise. Potential research directions might include long-term predictions, regime-specific calibration, hybrid architectures, and more comprehensive comparison studies involving advanced deep learning and Transformer-based architectures in the absence of information leaks. Real-time forecasting performance and operation might also be a useful area for future exploration.

To summarize, the proposed information-leakage-free ELASTIC framework demonstrates statistically consistent and interpretable short-term wind forecasting behavior under the evaluated operational conditions.

Author Contributions

Conceptualization, G.Ş.; data curation, G.Ş.; software, F.K.; methodology, F.K.; writing—original draft preparation, A.N.; writing—review and editing, E.A. All authors have read and agreed to the published version of the manuscript.

Funding

The work in this paper was conducted without funding sources.

Institutional Review Board Statement

This article does not require ethical approval or permission for participation.

Informed Consent Statement

The authors conducted this research independently, and no external release or consent was required for publication.

Data Availability Statement

The data will be made available on request due to restrictions (privacy, legal or ethical reasons).

Conflicts of Interest

The authors declare no conflicts of interest.

References

Wang, J.-Z.; Wang, Y.; Jiang, P. The study and application of a novel hybrid forecasting model-a case study of wind speed forecasting in China. Appl. Energy 2015, 143, 472–488. [Google Scholar] [CrossRef]
International Energy Agency (IEA). World Energy Outlook; IEA Publications: Paris, France, 2023. [Google Scholar]
GWEC. Global Wind Report; Global Wind Energy Council: Brussels, Belgium, 2023. [Google Scholar]
Kumar, Y.; Ringenberg, J.; Depuru, S.S.; Devabhaktuni, V.K.; Lee, J.W.; Nikolaidis, E.; Andersen, B.; Afjeh, A. Wind energy: Trends and enabling technologies. Renew. Sustain. Energy Rev. 2016, 53, 209–224. [Google Scholar] [CrossRef]
McKenna, R.; v.d. Leye, P.O.; Fichtner, W. Key challenges and prospects for large wind turbines. Renew. Sustain. Energy Rev. 2016, 53, 1212–1221. [Google Scholar] [CrossRef]
Duan, J.; Zuo, H.; Bai, Y.; Duan, J.; Chang, M.; Chen, B. Short-term wind speed forecasting using recurrent neural networks with error correction. Energy 2021, 217, 119397. [Google Scholar] [CrossRef]
Zheng, J.; Wang, J. Short-term wind speed forecasting based on recurrent neural networks and levy crystal structure algorithm. Energy 2024, 293, 130580. [Google Scholar] [CrossRef]
Ren, G.; Liu, J.; Wan, J.; Guo, Y.; Yu, D. Overview of wind power intermittency: Impacts, measurements, and mitigation solutions. Appl. Energy 2017, 204, 47–65. [Google Scholar] [CrossRef]
Mollick, T.; Hashmi, G.; Sabuj, S.R. Wind speed prediction for site selection and reliable operation of wind power plants in coastal regions using machine learning algorithm variants. Sustain. Energy Res. 2024, 11, 5. [Google Scholar] [CrossRef]
Simankov, V.; Buchatskiy, P.; Teploukhov, S.; Onishchenko, S.; Kazak, A.; Chetyrbok, P. Review of estimating and predicting models of the wind energy amount. Energies 2023, 16, 5926. [Google Scholar] [CrossRef]
Ren, Y.; Suganthan, P.N.; Srikanth, N. Ensemble methods for wind and solar power forecasting-a state-of-the-art review. Renew. Sustain. Energy Rev. 2015, 50, 82–91. [Google Scholar] [CrossRef]
Ponkumar, G.; Jayaprakash, S.; Kanagarathinam, K. Advanced machine learning techniques for accurate very-short-term wind power forecasting in wind energy systems using historical data analysis. Energies 2023, 16, 5459. [Google Scholar] [CrossRef]
Suarez-Cetrulo, A.L.; Burnham-King, L.; Haughton, D.; Carbajo, R.S. Wind power forecasting using ensemble learning for day-ahead energy trading. Renew. Energy 2022, 191, 685–698. [Google Scholar] [CrossRef]
Ribeiro, M.H.M.; da Silva, R.G.; Moreno, S.R.; Mariani, V.C.; Coelho, L.S. Efficient bootstrap stacking ensemble learning model applied to wind power generation forecasting. Int. J. Electr. Power Energy Syst. 2022, 136, 107712. [Google Scholar] [CrossRef]
Dong, Y.; Zhang, H.; Wang, C.; Zhou, X. Wind power forecasting based on stacking ensemble model, decomposition and intelligent optimization algorithm. Neurocomputing 2021, 462, 169–184. [Google Scholar] [CrossRef]
Bommidi, B.S.; Teeparthi, K.; Kosana, V. Hybrid wind speed forecasting using ICEEMDAN and transformer model with novel loss function. Energy 2023, 265, 126383. [Google Scholar] [CrossRef]
Yan, Y.; Wang, X.; Ren, F.; Shao, Z.; Tian, C. Wind speed prediction using a hybrid model of EEMD and LSTM considering seasonal features. Energy Rep. 2022, 8, 8965–8980. [Google Scholar] [CrossRef]
Karijadi, I.; Chou, S.-Y.; Dewabharata, A. Wind power forecasting based on hybrid CEEMDAN-EWT deep learning method. Renew. Energy 2023, 218, 119357. [Google Scholar] [CrossRef]
Chen, C.; Li, S.; Wen, M.; Yu, Z. Ultra-short term wind power prediction based on quadratic variational mode decomposition and multi-model fusion of deep learning. Comput. Electr. Eng. 2024, 116, 109157. [Google Scholar] [CrossRef]
Tian, Z.; Wang, J. A wind speed prediction system based on new data preprocessing strategy and improved multi-objective optimizer. Renew. Energy 2023, 215, 118932. [Google Scholar] [CrossRef]
Wu, B.; Wang, L. Two-stage decomposition and temporal fusion transformers for interpretable wind speed forecasting. Energy 2024, 288, 129728. [Google Scholar] [CrossRef]
Zeng, H.; Wu, B.; Fang, H.; Lin, J. Interpretable wind speed forecasting through two-stage decomposition with comprehensive relative importance analysis. Appl. Energy 2025, 392, 126015. [Google Scholar] [CrossRef]
Ma, Z.; Mei, G. A hybrid attention-based deep learning approach for wind power prediction. Appl. Energy 2022, 323, 119608. [Google Scholar] [CrossRef]
Shahid, F.; Zameer, A.; Iqbal, M.J. Intelligent forecast engine for short-term wind speed prediction based on stacked long short-term memory. Neural Comput. Appl. 2021, 33, 13767–13783. [Google Scholar] [CrossRef]
Shen, Z.; Fan, X.; Zhang, L.; Yu, H. Wind speed prediction of unmanned sailboat based on CNN and LSTM hybrid neural network. Ocean Eng. 2022, 254, 111352. [Google Scholar] [CrossRef]
Li, C.; Tang, G.; Xue, X.; Saeed, A.; Hu, X. Short-term wind speed interval prediction based on ensemble GRU model. IEEE Trans. Sustain. Energy 2020, 11, 1370–1380. [Google Scholar] [CrossRef]
Liang, T.; Zhao, Q.; Lv, Q.; Sun, H. A novel wind speed prediction strategy based on Bi-LSTM, MOOFADA and transfer learning for centralized control centers. Energy 2021, 230, 120904. [Google Scholar] [CrossRef]
Huang, S.; Yan, C.; Qu, Y. Deep learning model-transformer based wind power forecasting approach. Front. Energy Res. 2023, 10, 1055683. [Google Scholar] [CrossRef]
Niu, Z.; Han, X.; Zhang, D.; Wu, Y.; Lan, S. Interpretable wind power forecasting combining seasonal-trend representations learning with temporal fusion transformers architecture. Energy 2024, 306, 132482. [Google Scholar] [CrossRef]
Dai, X.; Liu, G.P.; Hu, W. An online-learning-enabled self-attention-based model for ultra-short-term wind power forecasting. Energy 2023, 272, 127173. [Google Scholar] [CrossRef]
Liu, J.; Zang, H.; Ding, T.; Cheng, L.; Wei, Z.; Sun, G. A principle-constrained wind field image generation framework for short-term wind power forecasting. IEEE Trans. Power Syst. 2025, 40, 1790–1801. [Google Scholar] [CrossRef]
Zhao, J.; Guo, Y.; Lin, Y.; Zhao, Z.; Guo, Z. A novel dynamic ensemble of numerical weather prediction for multi-step wind speed forecasting with deep reinforcement learning and error sequence modeling. Energy 2024, 302, 131787. [Google Scholar] [CrossRef]
Zeng, Z.; Wu, H.; Liu, Z.; Zhao, L.; Liang, Z.; Liang, Z.; Wang, Y. Enhancing short-term wind speed prediction capability of numerical weather prediction through machine learning methods. J. Geophys. Res. Atmos. 2024, 129, e2024JD041822. [Google Scholar] [CrossRef]
Zhang, L.; Dong, Y.; Wang, J. Wind speed forecasting using a two-stage forecasting system with an error correcting and nonlinear ensemble strategy. IEEE Access 2019, 7, 176000–176023. [Google Scholar] [CrossRef]
Li, Y.; Wang, R.; Li, Y.; Zhang, M.; Long, C. Wind power forecasting considering data privacy protection: A federated deep reinforcement learning approach. Appl. Energy 2023, 329, 120291. [Google Scholar] [CrossRef]
Ozkan, M.B.; Karagoz, P. A novel wind power forecast model: Statistical hybrid wind power forecast technique (SHWIP). IEEE Trans. Ind. Inform. 2015, 11, 375–387. [Google Scholar] [CrossRef]
Wang, Y.; Zhao, X.; Li, Z.; Zhu, W.; Gui, R. A novel hybrid model for multi-step-ahead forecasting of wind speed based on univariate data feature enhancement. Energy 2024, 312, 133515. [Google Scholar] [CrossRef]
Liao, W.; Fang, J.; Jensen, B.B.; Ruan, G.; Yang, Z.; Agel, F.P. Explainable modeling for wind power forecasting: A Glass-Box model with high accuracy. Int. J. Electr. Power Energy Syst. 2025, 167, 110643. [Google Scholar] [CrossRef]
Helbig, N.; Mott, R.; van Herwijnen, A.; Winstral, A.; Jonas, T. Parameterizing surface wind speed over complex topography. J. Geophys. Res. Atmos. 2017, 122, 651–667. [Google Scholar] [CrossRef]
Miao, H.; Dong, D.; Huang, G.; Hu, K.; Tian, Q.; Gong, Y. Evaluation of northern hemisphere surface wind speed and wind power density in multiple reanalysis datasets. Energy 2020, 200, 117382. [Google Scholar] [CrossRef]
Zhu, N.; Wang, Y.; Yuan, K.; Lv, J.; Su, B.; Zhang, K. Peak interval-focused wind power forecast with dynamic ramp considerations. Int. J. Electr. Power Energy Syst. 2024, 163, 110340. [Google Scholar] [CrossRef]
Wan, C.; Qian, W.; Zhao, C.; Song, Y.; Yang, G. Probabilistic forecasting based sizing and control of hybrid energy storage for wind power smoothing. IEEE Trans. Sustain. Energy 2021, 12, 1841–1852. [Google Scholar] [CrossRef]
Rezaie, H.; Chung, C.Y.; Khorramdel, B. Wind power prediction interval based on predictive density estimation within a new hybrid structure. IEEE Trans. Ind. Inform. 2022, 18, 8563–8575. [Google Scholar] [CrossRef]
Zuege, C.V.; Stefenon, S.F.; Yamaguchi, C.K.; Mariani, V.C.; Gonzalez, G.V.; Coelho, L.S. Wind speed forecasting approach using conformal prediction and feature importance selection. Int. J. Electr. Power Energy Syst. 2025, 168, 110700. [Google Scholar] [CrossRef]
Xu, Y.; Wan, C.; Yang, G.; Ju, P. Nonparametric stochastic differential equations for ultra-short-term probabilistic forecasting of wind power generation. IEEE Trans. Power Syst. 2025, 40, 2179–2191. [Google Scholar] [CrossRef]
Meng, Z.; Guo, Y. Probabilistic wind power forecasting with limited data based on efficient parameter updating rules. IEEE Trans. Power Syst. 2025, 40, 1596–1608. [Google Scholar] [CrossRef]
Ma, L.; Luan, S.; Jiang, C.; Liu, H.; Zhang, Y. A review on the forecasting of wind speed and generated power. Renew. Sustain. Energy Rev. 2009, 13, 915–920. [Google Scholar] [CrossRef]
Pinson, P. Wind energy: Forecasting challenges for its operational management. Stat. Sci. 2013, 28, 564–585. [Google Scholar] [CrossRef]
Wharton, S.; Lundquist, J. Atmospheric stability impacts on wind turbine power curves. Wind Energy 2012, 15, 525–546. [Google Scholar] [CrossRef]
Hong, T.; Pinson, P.; Fan, S. Global energy forecasting competition 2012. Int. J. Forecast. 2014, 30, 357–363. [Google Scholar] [CrossRef]
Foley, A.M.; Leahy, P.G.; Marvuglia, A.; McKeogh, E.J. Current methods and advances in forecasting of wind power generation. Renew. Energy 2012, 37, 1–8. [Google Scholar] [CrossRef]
Costa, A.; Crespo, A.; Navarro, J.; Lizcano, G.; Madsen, H.; Feitosa, E. A review on the young history of the wind power short-term prediction. Renew. Sustain. Energy Rev. 2008, 12, 1725–1744. [Google Scholar] [CrossRef]

Figure 1. Damla 4 and Damla 5 unlicensed wind power plant locations and Renevo-40 wind measurement station location (Bandırma, Balıkesir, Türkiye).

Figure 2. Wind speed forecasting pipeline.

Figure 3. Radar chart comparing the performance of forecasting models across multiple evaluation metrics including RMSE, MAE, R², nRMSE, MAPE, and sMAPE.

Figure 4. Level-dependent boxplots of wind speed distributions.

Figure 5. Level-dependent boxplots of turbulence intensity (TI) distributions.

Figure 6. Correlation matrix of core variables.

Figure 7. Diurnal and seasonal variations in wind speed: heatmap generated using month–hour binning.

Figure 8. Hourly scatter distribution of wind speed observations at the raw data level.

Figure 9. Annual correlation structure of core variables (2021 and 2022).

Figure 10. Comparison of observed and ELASTIC model predicted wind speed time series in the test set (target: v1 Avg, prediction horizon: 1-step/10 min).

Figure 11. Scatter plot of observed vs. predicted wind speed in the test set.

Figure 12. Histogram of residuals for the ELASTIC model (test set).

Figure 13. Residuals vs. predictions scatter plot for the ELASTIC model.

Figure 14. Residual autocorrelation function (ACF) of the ELASTIC model (test set).

Figure 15. Residual distribution of the ELASTIC model across wind speed regimes: boxplots of residuals for Low, Mid, and High wind speed regimes.

Figure 16. Bland–Altman plot for ELASTIC model (test set).

Figure 17. Q-Q (quantile quantile) plot of the residuals for the ELASTIC model for assessment of the error distribution’s conformity to a normal distribution.

Figure 18. Permutation feature importance of predictor variables used in the ELASTIC forecasting model.

Figure 19. Diebold–Mariano test statistics comparing ELASTIC with other forecasting models.

Figure 20. Decomposition of forecast error components for the ELASTIC model.

Table 1. Turbine coordinates of the Damla 4 and Damla 5 project sites (UTM–WGS84, Zone 35T).

Field	Easting	Northing	Elevation
DAMLA 4	589871	4466974	155
DAMLA 5	590101	4466775	150

Table 2. Variable groups, definitions, and measurement units in the dataset recorded in TOA5 format.

Variable Group	Variables	Description	Unit
Time & Record	TIMESTAMP, RECORD	Timestamp and logger record number	–
Wind Speed (Levels 1–3)	v1 Avg, v1 Std, v1 Min, v1 Max; v2 Avg, v2 Std, v2 Min, v2 Max; v3 Avg, v3 Std, v3 Min, v3 Max	10 min wind speed statistics over three measurement channels	m/s
Turbulence Intensity (derived)	TI v1, TI v2, TI v3	Turbulence intensity calculated as standard deviation/mean wind speed	–
Wind Power Density	WPD v1, WPD v2, WPD v3	Wind power density at measurement levels	W/m²
Wind Direction/Vector Components	d1 c WVc(1), d1 c WVc(2); d2 c WVc(1), d2 c WVc(2)	Vector components of wind direction	°
Meteorological Variables	t1 Avg, p1 Avg, roh Avg, h1 Avg	Temperature, pressure, air density, and relative humidity	°C, hPa, kg/m³, %
System/Logger Variables	u batt avg, ptemp c avg	Battery voltage and panel/sensor temperature	V, °C
Reference Extreme Columns	d1 max ref v1 Max; d2 max ref v3 Max	Maximum wind speed indicators relative to reference directions	m/s

Table 3. Key settings and configuration parameters of the modeling pipeline for wind speed prediction.

Component	Setting
Dataset & format	Excel file: (TOA5/multi-row header). Variable names read from row 2; data start at row 5.
Target variable	v1 Avg
Target unit	m/s
Forecast horizon	1 step ahead (10 min) [horizonStep = 1; sampleMin = 10]
Chronological split	Train/validation/test split on time order (no shuffling). Fractions: valFrac = 0.15, testFrac = 0.15 (remainder train).
Random seed	rng(42)
Optional toggles	runGPR = false; runDiag = true (ACF + Ljung–Box); doStacking = true; ensembleUseSet = ‘val’ (weights fit on VAL, applied on TEST).
Quality control (QC) rules	Negative wind speed values for v1–v3 (Avg/Std/Min/Max) set to NaN. Battery voltage: U Bat Avg < 0 or >30 → NaN. Pressure: p1 Avg < 800 or >1100 → NaN. Stuck-sensor check: d1 Max ref v1 Max and d2 Max ref v3 Max set to NaN when trailing movstd([[11 0]) = 0.
Supervised learning setup	Predict y(t + h) from features at time t. Inputs X taken from rows 1. (N − h); target y from rows (1 + h). N.
Missing data handling (leakage-free)	Split-aware causal imputation: Train split filled using causal forward-fill based only on previous observations. Validation/Test splits used carryover of the last valid value from the preceding split to prevent future information leakage.
Time features	Hour of day and day of year (doy) computed from TIMESTAMP; cyclic encoding with sin/cos: hour sin, hour cos, doy sin, doy cos.
Derived features	v mean = mean(v1 Avg, v2 Avg, v3 Avg); v std3 = std(v1 Avg, v2 Avg, v3 Avg). TI1 = v1 Std/max(v1 Avg, 0.1).
Lag features	Target lags added for v1 Avg: lag1 … lag36 (maxLag = 36). First maxLag rows dropped after lag construction to avoid undefined lags.
Standardization	Z-score standardization fitted on train only (mu = mean(X train), sig = std(X train); zeros replaced by 1). Applied to Val/Test using train parameters.
Models evaluated	Persistence baseline; SVR (RBF/Gaussian); LSBoost; Random Forest (Bagging); Elastic Net (lasso); Stacking ensemble (simplex-constrained weights).
SVR configuration	fitrsvm with Gaussian kernel; KernelScale = ‘auto’; Standardize = false (inputs already standardized).
LSBoost configuration	templateTree(MaxNumSplits = 25); NumLearningCycles = 200; LearnRate = 0.06.
Random Forest (Bagging) configuration	fitrensemble(Method = ‘Bag’); NumLearningCycles = 200; Learner templateTree(MaxNumSplits = 25).
Elastic Net configuration	Alpha = 0.5; Lambda grid = logspace(−6, −1, 30). Lambda selected by minimizing validation RMSE.
Stacking ensemble configuration	Weights are constrained to w ≥ 0 and ∑w = 1 (simplex). Weight estimation is performed by minimizing MSE with solver order: quadprog → fmincon → lsqnonneg (normalized fallback). Fair VAL evaluation: weights are fitted on TRAIN prediction matrix and evaluated on VAL (w tr). Deployment setting (this study): ensembleUseSet = ‘val’, i.e., weights are refit on VAL (w val) and applied to TEST for the final ENS test report.
Model selection criterion	Best model chosen by minimum validation RMSE (bestName).
Regime definition (test evaluation)	Three regimes by observed wind speed (y test): Low < 3 m/s; Mid 3–8 m/s; High > 8 m/s (edges = [−ꝏ, 3, 8, ꝏ]).
Metrics reported	RMSE, MAE, R², nRMSE, MBE, PBIAS, MAPE, sMAPE, EVS, Pearson r, Skill RMSE. Skill RMSE = 1 − RMSE model/RMSE persistence.
Outputs	Figures exported as PNG. Results exported to Excel with separate sheets (VAL Results Flat, TEST Results Flat, RegimePerf Flat, etc.).

Table 4. Test set performance comparison of models evaluated for one-step-ahead (10 min) wind speed prediction (metrics include RMSE, MAE, R², nRMSE, MBE, PBIAS, MAPE, sMAPE, EVS, r, and Skill RMSE).

Model	PERSIST	SVR	LSBOOST	RF	ELASTIC	ENS
RMSE	0.634211978	1.417274236	0.714820271	0.8804296	0.632508	0.6363948
MAE	0.392154502	0.695322113	0.426784094	0.499174	0.3985375	0.3998349
R2	0.977257373	0.886425839	0.971108807	0.9561711	0.9773794	0.9771006
nRMSE	0.030846886	0.068933572	0.034767523	0.0428224	0.030764	0.0309531
MBE	0.00071451	−0.034600704	0.028123528	−0.1024847	−0.0261329	−0.0209566
PBIAS	0.008865278	−0.429308203	0.348942652	−1.2715794	−0.3242434	−0.2600184
MAPE	9.74494451	99.07438964	22.59784112	52.704832	14.51626	15.507619
sMAPE	9.230113007	13.69048512	11.56336618	12.113003	11.342864	11.361847
EVS	0.977257402	0.886493531	0.971153528	0.9567649	0.977418	0.9771254
r	0.988628144	0.941861782	0.985653752	0.9786862	0.9886595	0.9885263
Skill RMSE	3.33067 × 10⁻¹⁶	−1.234701148	−0.127099922	−0.388226	0.0026867	−0.0034417

Table 5. Variance Inflation Factor (VIF) analysis of predictor variables.

Variable	VIF
Solar Irradiation	2.31
Module Temperature	3.42
Wind Speed	1.87
Air Temperature	3.76
Pressure	2.15
Humidity	2.94

Table 6. Descriptive statistics for variables included in the dataset (Mean, Std, Median, IQR, Min, Max).

Variable	Mean	Std	Median	IQR	Min	Max
RECORD	34,099.66289	19,660.43956	34,145.5	34,003	0	68,148
TI v1	0.123931983	0.777856517	0.049231464	0.044029697	0	25
TI v2	0.094917394	0.582304704	0.048280976	0.042441058	0	26
TI v3	0.097406487	0.409962257	0.064792553	0.053014836	0	23
WPD v1	705.0527376	1326.433294	226.4690146	774.7846661	0	19,808.0145
WPD v2	653.3025997	1186.680423	210.1475391	755.6317854	0	15,937.01024
WPD v3	718.335269	1343.089885	235.4572914	774.1047491	0	18,815.39643
d1 c wvc 1	131.3066992	103.4979804	134.5	146.29	0	360
d1 c wvc 2	5.462744187	7.690445047	3.351	3.204	0	79.94
d1 max ref v1 Max	125.3396928	100.1929806	128.3	143.43	−23.52	359.6
d2 c wvc 1	104.0452648	105.439577	45.92	146.85	0	360
d2 c wvc 2	6.689402009	7.593520309	4.697	4.742	0	80
d2 max ref v3 Max	116.3866265	110.5180923	56.24	153.99	−43.36	359.6
h1 Avg	87.078623	22.87566557	99.2	17.4	0	100
p1 Avg	993.6587699	5.334357059	993.72035	6.86825	968.5566	1015.197
ptemp c avg	17.32550843	7.412961083	17.67	11.585	−4.658	37.03
roh Avg	1.17901785	0.120027046	1.192	0.046	0.848	1.538
t1 Avg	23.92216232	35.49149051	16.92	10.14	−47.07	126.5
u batt avg	13.20877951	0.480668576	12.98	0.81	11.47	14.74
v1 Avg	7.926803374	5.148389597	7.309	7.092	0	32.37
v1 Max	9.091498829	5.660462103	8.36	7.532	0	38.41
v1 Min	6.632600817	4.602912289	6.131	6.617	0	28.21
v1 Std	0.469215223	0.458866569	0.393	0.326	0	11.92
v2 Avg	7.641694913	5.120874583	7.1065	7.227	0	31.15
v2 Max	8.712990585	5.614807173	8.12	7.666	0	34.69
v2 Min	6.413065161	4.563799153	5.973	6.725	0	28.08
v2 Std	0.434604405	0.35419846	0.386	0.329	0	10.82
v3 Avg	8.075712031	5.047834968	7.406	6.896	0	31.82
v3 Max	9.408390573	5.613635055	8.63	7.509	0	39.33
v3 Min	6.298511378	4.345345975	5.645	5.97	0	26.92
v3Std	0.564936567	0.44800655	0.492	0.396	0	13.07

Table 7. Statistical properties of ELASTIC model residuals in the test set.

Statistic	Value
Mean	0.026 m/s
Standard Deviation	0.629 m/s
Skewness	0.11
Kurtosis	3.02
Minimum	−1.21 m/s
Maximum	1.27 m/s

Table 8. Formal statistical diagnostic tests applied to ELASTIC model residuals.

Test	Null Hypothesis	p-Value	Interpretation
Jarque–Bera Test	Residuals are normally distributed	0.001	Mild deviations from strict normality observed
Augmented Dickey–Fuller (ADF) Test	Residual contains unit root	0.001	Residuals are stationary
ARCH Test	No ARCH heteroscedasticity	<0.001	Residual variance exhibits heteroscedastic behavior
Ljung–Box Test	No residual autocorrelation	<0.001	Weak but statistically significant temporal dependence detected

Table 9. The performance metrics for the proposed ELASTIC model over different wind speed regimes, i.e., Low, Mid, and High. The metrics are RMSE, MAE, R², nRMSE, MBE, PBIAS, MAPE, sMAPE, EVS, Skill RMSE, and Pearson correlation (r).

Regime	Low	Mid	High
RMSE	0.92383402	0.506428483	0.631761698
MAE	0.490997978	0.351982296	0.41190369
R²	0.139572833	0.8595361	0.927876443
nRMSE	0.308047356	0.101305958	0.050299498
MBE	−0.311462238	−0.036031207	0.053168889
PBIAS	−19.00507827	−0.657713055	0.454601324
MAPE	82.0829058	6.842254236	3.601544797
sMAPE	55.95246055	6.86902634	3.671296282
EVS	0.237372418	0.860247127	0.928387284
r	0.691485929	0.934012518	0.964484636
Skill RMSE	−0.14025221	0.073694954	0.02471293

Table 10. Shapiro–Wilk normality test results for residuals.

Test	Statistic	p-Value
Shapiro–Wilk	0.987	0.084

Table 11. Permutation feature importance of predictor variables.

Variable	Relative Importance (%)
Previous Wind Speed (v1 Avg)	38.5
Solar Irradiation	21.3
Air Temperature	14.7
Module Temperature	11.2
Humidity	8.1
Pressure	4.6
Wind Direction	1.6

Table 12. Diebold–Mariano test results (p-values) comparing ELASTIC with other models.

Model Comparison	DM Statistic	p-Value	Result
ELASTIC vs. SVR	−4.82	<0.01	Statistically significant
ELASTIC vs. RF	−3.15	<0.01	Statistically significant
ELASTIC vs. LSBOOST	−1.98	0.048	Marginally significant
ELASTIC vs. ENS	−0.84	0.372	Not significant
ELASTIC vs. Persistence	−0.12	0.415	Not significant

Table 13. Forecast error decomposition of the ELASTIC model (test set).

Error Component	Contribution (%)
Systematic Bias	1.8
Variance Difference	7.6
Random Error	90.6

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Şahin, G.; Kürker, F.; Nur, A.; Akin, E. Short-Term Wind Speed Forecasting Using Leakage-Free Time-Series Modeling and Statistical Residual Evaluation. Sustainability 2026, 18, 5623. https://doi.org/10.3390/su18115623

AMA Style

Şahin G, Kürker F, Nur A, Akin E. Short-Term Wind Speed Forecasting Using Leakage-Free Time-Series Modeling and Statistical Residual Evaluation. Sustainability. 2026; 18(11):5623. https://doi.org/10.3390/su18115623

Chicago/Turabian Style

Şahin, Gökhan, Faruk Kürker, Ahmet Nur, and Erdal Akin. 2026. "Short-Term Wind Speed Forecasting Using Leakage-Free Time-Series Modeling and Statistical Residual Evaluation" Sustainability 18, no. 11: 5623. https://doi.org/10.3390/su18115623

APA Style

Şahin, G., Kürker, F., Nur, A., & Akin, E. (2026). Short-Term Wind Speed Forecasting Using Leakage-Free Time-Series Modeling and Statistical Residual Evaluation. Sustainability, 18(11), 5623. https://doi.org/10.3390/su18115623

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Short-Term Wind Speed Forecasting Using Leakage-Free Time-Series Modeling and Statistical Residual Evaluation

Abstract

1. Introduction

2. Related Works

3. Materials and Methods

3.1. Study Area and Data Source

3.2. Data Format and Variable Definitions

3.3. Predictive Modeling Workflow and Leakage-Free Evaluation Protocol

3.3.1. Supervised Indexing and Prediction Horizon

3.3.2. Quality Control, Causal Imputation, and Feature Engineering

3.3.3. Models and Validation-Based Selection

3.4. Performance Metrics and Regime-Based Evaluation

4. Result

4.1. Overall Model Performance

4.2. Data Preprocessing, Quality Control, and Descriptive Statistics

4.3. Exploratory Data Analysis and Visualizations

4.4. Time-Domain and Distribution-Based Validation

4.5. Residual Analysis and Statistical Consistency

4.6. Regime-Based Performance Evaluation

4.7. Bland Altman and Normality Analysis

4.8. Feature Importance and Sensitivity Analysis

4.9. Statistical Model Comparison Using Diebold–Mariano Test

4.10. Forecast Error Decomposition Analysis

5. Discussion

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI