1. Introduction
The increasing growth of energy demand around the globe, along with the rapid exhaustion of fossil fuel resources, accompanied by negative effects on the environment, leads to the search for alternative and sustainable sources of energy. In particular, wind energy stands out among other types of renewable energy sources because of its sustainability and low price, as well as technological advancements in the area [
1,
2,
3,
4,
5]. Nevertheless, random, variable, and unpredictable wind flow characteristics, as well as the impossibility of implementing cost-effective long-term wind energy storage, make wind-based energy supply uncertain [
6,
7]. Moreover, wind power generation systems suffer from various challenges related to frequency and voltage instability and possible energy imbalance between demand and supply [
8]. Thus, accurate wind speed forecasting helps with power generation scheduling, ensures grid stability, allows for efficient use of wind energy, increases energy production efficiency, and reduces the operation costs [
9].
A reproducible framework for one-step-ahead wind speed forecasting was created using historical wind observation data available for Bandırma. To avoid information leakage, the framework makes use of a leakage-free temporal split approach, robust multi-row Table-Oriented ASCII Format 5 (TOA5) header handling, timestamp decoding, split-aware causal missing-value imputation, and time cycle encoding. The problem of wind speed forecasting involves predicting wind speed at time t + h, using all the information at time t, and adding up to 36 lagged wind speed variables to capture short-term temporal effects. Among various machine learning models, including persistence, SVR, Random Forest, Elastic Net, and stacking approaches, those were analyzed according to error, goodness of fit, and bias measures. In addition to general evaluation measures, model performance was assessed using time-domain analysis, residual diagnostic testing, and domain-specific evaluations. Statistical diagnostics such as Autocorrelation Function (ACF) Ljung–Box test, Bland–Altman analysis, and Q-Q plots were used, too. Based on the obtained findings, the Elastic Net model demonstrated balanced forecasting behavior together with statistically consistent performance on the independent validation set.
The superior performance of the Elastic Net model demonstrates the prevalence of short-term temporal autocorrelation effects in wind speed forecasting and indicates the importance of stable autoregressive processes. As the forecast horizon only spans up to 10 min into the future, the next values are very dependent on the past values of the time series, thus limiting the benefits of complex nonlinear deep learning algorithms. Though some recent research papers report impressive forecast performance based on Long Short-Term Memory (LSTM), Gated Recurrent Unit (GRU), Bidirectional Long Short-Term Memory (Bi-LSTM), and Transformer-based algorithms, these approaches typically involve high computational cost, extensive hyperparameter tuning, and larger training datasets. Given the relatively short-term forecasting conditions, the linear regularized structure of Elastic Net model allowed for better generalization and greater interpretability.
2. Related Works
Comprehensive reviews of wind energy forecasting [
10] and of ensemble-based techniques for wind and solar power [
11] document the breadth of available approaches and consistently report that ensemble and hybrid strategies outperform single models. Building on this premise, ensemble and stacking frameworks have been widely adopted: machine learning models for very short-term wind power forecasting [
12], ensemble-based frameworks for day-ahead energy trading [
13], bootstrap-based stacking ensembles [
14], and stacking combined with signal decomposition and heuristic optimization [
15] all report accuracy gains over individual predictors. While effective, these gains typically come at the cost of increased model complexity, higher computational demand, and reduced interpretability, and the reported improvements are often dataset- and horizon-dependent, which limits their generalizability.
A large and influential body of work couples signal decomposition with deep learning to handle the non-stationarity of wind series. Representative examples include Improved Complete Ensemble Empirical Mode Decomposition with Adaptive Noise (ICEEMDAN) combined with a Transformer architecture under a customized loss function [
16], an Ensemble Empirical Mode Decomposition- Long Short-Term Memory (EEMD-LSTM) framework incorporating seasonal characteristics [
17], Complete Ensemble Empirical Mode Decomposition with Adaptive Noise- Empirical Wavelet Transform (CEEMDAN–EWT) decomposition with deep learning [
18], quadratic variational mode decomposition coupled with multiple deep models [
19], advanced data preprocessing with multi-objective optimization [
20], and two-stage decomposition integrated with Temporal Fusion Transformers [
21,
22]. These methods can substantially reduce error on benchmark datasets; however, a recurring methodological concern is that decomposition is frequently applied to the entire series before the train/test partition. This leaks future information into the training set and can inflate reported accuracy in a way that does not hold under operational, causal conditions, raising reproducibility concerns.
Beyond decomposition, numerous studies rely directly on deep architectures: hybrid deep learning with attention mechanisms [
23], stacked LSTM networks [
24], Convolutional Neural Network – Long Short-Term Memory (CNN–LSTM) models for autonomous marine vehicles [
25], ensemble GRU models for interval prediction [
26], Bi-LSTM networks combined with multi-objective optimization and transfer learning [
27], Transformer-based models [
28], and a Temporal Fusion Transformer (TFT) enhanced with seasonal–trend representations [
29]. An online learning-assisted self-attention model [
30] is notable for explicitly targeting low computational cost in ultra-short-term forecasting. Although these architectures are expressive, they are generally data-hungry, computationally expensive, and difficult to interpret. Particularly at short horizons, where autocorrelation dominates, their advantage over simple baselines is often modest and is not always reported against a persistence reference.
Another line of research incorporates physical knowledge into the forecasting process: a principle-driven framework combining wind field image generation with physical constraints [
31], dynamic ensembles integrating Numerical Weather Prediction (NWP) outputs with deep reinforcement learning and error-series modeling [
32], and machine learning approaches that enhance NWP performance while accounting for topographic effects on wind fields [
33]. Related efforts include a two-stage system combining error correction with nonlinear ensemble strategies [
34], privacy-preserving federated deep reinforcement learning [
35], statistical hybrid models that exploit complementary forecasting techniques [
36], hybrid multi-step-ahead prediction from univariate data [
37], and interpretability-oriented “glass box” models that preserve accuracy while improving transparency [
38]. In addition, large-scale analyses based on multiple reanalysis datasets have assessed the variability and long-term consistency of surface wind speed and wind power density across different regions [
39] and across the Northern Hemisphere [
40]; these studies characterize wind resource behavior but are not designed for operational short-term point forecasting.
A further group focuses on uncertainty-aware forecasting. Dynamic interval-based prediction that explicitly accounts for wind power ramp events [
41], probabilistic forecasting for sizing and controlling hybrid energy storage systems [
42], hybrid predictive density estimation for generating prediction intervals [
43], conformal prediction combined with feature importance selection [
44], nonparametric stochastic differential equations for ultra-short-term forecasting [
45], and an efficient probabilistic method for limited-data settings [
46] all aim to quantify forecast uncertainty rather than produce a single point estimate. These approaches improve forecast reliability but add modeling and calibration complexity, and their practical value still depends on the quality of the underlying point forecast.
Taken together, the literature reveals a clear trend toward increasingly complex hybrid, decomposition-based, and deep architectures, evaluated across heterogeneous datasets and forecast horizons ranging from ultra-short-term to day-ahead. Three methodological gaps emerge from this body of work. First, many studies emphasize wind power rather than wind speed and report results without a strictly chronological, leakage-free evaluation protocol—a particular risk in decomposition-before-split pipelines. Second, strong yet simple baselines, such as the persistence model and regularized linear regression, are frequently omitted or under-reported, making it difficult to judge whether the added complexity is genuinely justified. Third, reproducibility is rarely addressed in detail, including raw-logger data handling, timestamp parsing, and causal missing-value imputation. The present study addresses these gaps by developing a reproducible, leakage-free framework for short-term wind speed forecasting that benchmarks complex models against rigorous baselines and supports the comparison with comprehensive diagnostic analyses.
3. Materials and Methods
This study aims to develop a reliable and reproducible time-series machine learning framework for short-term wind speed forecasting. The analysis is based on real measurement data obtained from the unlicensed Damla 4 and Damla 5 wind power plants in the Bandırma district of Balıkesir, Türkiye, and the nearby Renevo 40 wind measurement station. The measurement system records wind speed and direction at multiple heights, along with meteorological and system-related variables. The data are stored in TOA5 format with a 10 min sampling interval, covering a 14-month period. The framework implements leakage-free data processing and modeling steps suitable for time series: quality control of the data, split-aware causal imputation for missing values, leakage-free feature engineering, construction of lagged features, and chronological train validation test splitting for machine learning model training. Model performance is evaluated using error metrics, bias measures, and statistical tests, complemented by regime-based analysis to assess performance across different wind speed ranges. All models were implemented in the MATLAB R2023b environment using the Statistics and Machine Learning Toolbox.
3.1. Study Area and Data Source
This study utilizes measurements collected from two unlicensed wind power plant sites, Damla 4 and Damla 5, located in the Bandırma district of Balıkesir, Türkiye, along with data from the nearby Renevo 40 wind measurement station. The project sites were positioned around the wind measurement station, which carries the ID number 100133, within the Bandırma/Balıkesir area. Each wind power plant has a capacity of 1 MW, resulting in a combined capacity of 2 MW for the two projects. The study area is relatively flat and lacks significant topographical features, and the distance between the project sites and the measurement station is approximately 4.2 km. The general layout of the study area, including the measurement station and the project sites, is shown in
Figure 1.
The Renevo 40 measurement station is installed on a 60.5 m meteorological tower, with wind speed and direction recorded at heights of 30 m and 60 m. After filtering out invalid or erroneous measurements, the dataset achieved an availability of 91.78%. The measurement campaign spanned 14 months, from 14 May 2021 to 29 July 2022. The positions of the wind turbines at the project sites were recorded using the UTM-WGS84 coordinate system (Zone 35T) and are summarized in
Table 1. This setup provides high-quality, multi-height wind data suitable for short-term forecasting and model validation. The 91.78% data availability indicates that only a small portion of data was missing, ensuring reliability. Recording at two different heights allows for capturing vertical wind profiles, which can improve feature engineering for predictive models. Using precise UTM coordinates ensures accurate spatial referencing for correlating turbine performance with local wind conditions.
In this study, the analysis period was defined according to the TIMESTAMP field in the raw dataset. The original measurements were collected at 10 min intervals, spanning from 4 April 2021 at 15:40 to 3 August 2022 at 17:10. Since there can be slight discrepancies between the nominal measurement period and the actual timestamps recorded in the raw data, we used the TIMESTAMP field itself to determine the precise range of the analysis period. By relying on the TIMESTAMP field, the study ensures that all subsequent analysis and preprocessing accurately reflect the real timing of the measurements. This approach avoids potential inconsistencies that could arise from assuming a perfectly uniform sampling schedule, which is particularly important for time-series forecasting and temporal feature engineering.
3.2. Data Format and Variable Definitions
The Campbell Scientific data logger produces TOA5 format files which contain the unprocessed measurements. The document includes three distinct sections which contain metadata about the measurement devices and programs and a multi-row header that shows the measurement variables together with their respective units and the actual data that presents numerical values for each 10 min period. Each wind variable is stored as four summary statistics—average (Avg), standard deviation (Std), minimum (Min), and maximum (Max)—in separate columns; TIMESTAMP serves as the chronological index and RECORD as a sequential logger-assigned order number. The dataset contains 68,292 records which include 26 variables that are grouped according to their fundamental physical properties and three distinct measurement levels that are identified as v1, v2, and v3. The variable groups, definitions, and units are detailed in
Table 2. The system extracts data from the multi-row header which enables proper reading of variable names and numerical records through their designated rows while maintaining chronological sequence for time-series analysis.
The researchers derived two quantities from the original measurements. The ten-minute period shows turbulence intensity (TI), which measures short-term wind variations, as the ratio of wind speed standard deviation to its mean value. The first measurement level uses a denominator with a 0.1 m/s minimum limit because researchers need to maintain numerical stability.
The calculation of wind power density (WPD) uses its standard definition which requires air density and wind speed cubed to determine the available kinetic energy at different altitudes. Wind direction is expressed through vector components (WVc) to maintain continuity, while meteorological and system variables (temperature, pressure, air density, relative humidity, battery voltage, and sensor temperature) enable assessment of measurement conditions and quality control. The original TOA5 variable names are maintained for traceability throughout the entire document.
3.3. Predictive Modeling Workflow and Leakage-Free Evaluation Protocol
The research team developed a machine learning workflow which prevents data leakage for time-series data (
Figure 2). The majority of forecasting studies experience information leakage because researchers either use random data splitting or apply preprocessing before they conduct chronological splits, or they create features which incorporate future data, which leads to them obtaining results which show better performance than actual operational work. The researchers conducted data splitting according to temporal order to establish three distinct datasets which included train, validation, and test sets. The researchers conducted preprocessing through their method which used training data and historical data to perform causal split-based processing. The chosen split ratios provided enough training data for model learning while creating separate validation and test groups to use in hyperparameter tuning and performance testing which required actual data and did not include any leakage.
3.3.1. Supervised Indexing and Prediction Horizon
The task requires forecasters to predict upcoming weather conditions for a single time interval that lasts 10 min. The target variable y represents the mean wind speed recorded at the initial measurement point v1 Avg. The supervised target requires a target shift that moves the target ahead by h so that the feature vector at time t will predict y(t + h) using data from rows 1 to N − h. The temporal alignment method guarantees that only past and present information can be accessed, thus stopping any data leakage from occurring. The persistence baseline uses the formula ŷ(t + h) = y(t) to check if a model can predict time-based changes which exist beyond the series autocorrelation present in the data.
3.3.2. Quality Control, Causal Imputation, and Feature Engineering
Quality control was applied first: physically implausible values were replaced with NaN, including negative wind speeds, out-of-range battery voltage and pressure readings, and stuck-sensor conditions identified from a causal moving standard deviation. The train split used only past observations for causal forward-fill-based missing-value completion, while the validation and test splits used the last valid value from the preceding split to prevent future information leakage at split boundaries. The detailed quality control thresholds and imputation rules are listed in
Table 3.
Feature engineering was likewise performed separately within each split. Cyclic time features encode daily and seasonal periodicity:
The wind features were derived from two measurements which included the wind speed average and standard deviation together with the turbulence intensity measurements of Equation (1). The researchers developed target features which extended back 36 steps to study autoregressive behavior because they needed to track past information; the first 36 rows of each split, where lags are undefined, were removed, with the preceding split’s final values used to keep lag computation continuous. The maximum lag value of 36 corresponds to approximately 6 h under the 10 min sampling interval and was selected to capture short-term temporal persistence and intra-day wind variability while avoiding unnecessarily long lag structures that could increase redundancy and model complexity.
3.3.3. Models and Validation-Based Selection
Prior to modeling, features were standardized by z-score normalization with parameters estimated from the training set only and applied unchanged to the validation and test sets:
The evaluation included six methods, which included a persistence baseline, support vector regression (SVR) with a Gaussian (RBF) kernel, least-squares gradient boosting (LSBoost), Random Forest (RF) based on bootstrap aggregation, Elastic Net regularized linear regression whose regularization parameter was selected by minimizing validation RMSE, and a stacking ensemble. The Elastic Net mixing parameter was fixed at α = 0.5 to provide a balanced compromise between L1 and L2 regularization, enabling simultaneous feature selection and coefficient stabilization under correlated lag-based predictors. The stacking ensemble combines base-model predictions through non-negative simplex-constrained weights which satisfy the conditions w ≥ 0 and Σw = 1 because the weights were obtained through mean squared error minimization. The system operates using two different modes, which include an academic assessment mode that establishes weights based on training results and shows outcomes on validation results, and an operational mode which establishes weights based on validation data and uses them to create the final test report. The model which exhibits the least validation RMSE serves as the final selection.
Table 3 provides a summary of the model hyperparameters together with the entire pipeline configuration.
3.4. Performance Metrics and Regime-Based Evaluation
The assessment of model performance used multiple metrics which included RMSE, MAE, R2, normalized RMSE (nRMSE), mean bias error (MBE), percent bias (PBIAS), MAPE, symmetric MAPE (sMAPE), explained variance score (EVS), and the Pearson correlation coefficient (r). For MAPE, the denominator was bounded below at 0.2 m/s for numerical stability. The Skill RMSE test measures model improvements by comparing results from various models against a persistence baseline. The test set used three wind speed categories for performance evaluation by dividing the data into three wind speed categories: low wind speed accounting for <3 m/s, medium wind speed between 3 and 8 m/s, and high wind speed exceeding 8 m/s.
4. Result
This section presents the results obtained from the wind speed forecasting models and the associated statistical analyses performed on the test dataset. The predictive performance of the evaluated models is first compared using several statistical metrics, including absolute error measures, relative error indicators, goodness-of-fit statistics, and bias measures. These metrics provide a quantitative assessment of the accuracy and reliability of the forecasting models. Following the global performance comparison, additional analyses are conducted to further examine the behavior and statistical properties of the selected model. Time-domain validation is used to evaluate how well the predicted wind speed values follow the temporal dynamics of the observed data. Distribution-based analyses are also performed to assess the agreement between predicted and measured wind speeds. Residual diagnostics are then applied to investigate the statistical characteristics of the prediction errors. These analyses include the examination of residual distributions, error dispersion patterns, and temporal dependence structures. Such evaluations provide insight into the stability and consistency of the forecasting model. Furthermore, regime-based performance analysis is conducted to evaluate model accuracy under different wind speed conditions. Additional statistical tests, including normality assessment and feature importance analysis, are used to better understand the relationships between the predictor variables and the forecasting results. Finally, statistical model comparison and forecast error decomposition analyses are performed to quantify the relative predictive performance of the competing models and to identify the main sources of forecasting error.
4.1. Overall Model Performance
Table 4 summarizes the forecasting performance of all evaluated models on the test dataset for one-step-ahead (10 min) wind speed prediction. The comparison includes several statistical indicators that reflect different aspects of model performance. Absolute error metrics such as RMSE and MAE are used to quantify the magnitude of prediction errors, while relative error measures including nRMSE, MAPE, and sMAPE provide a normalized evaluation of forecasting accuracy. In addition, goodness-of-fit indicators such as the coefficient of determination (R
2), explained variance score (EVS), and the Pearson correlation coefficient (r) are reported to assess how well the predicted values follow the observed wind speed variations. Bias metrics, namely mean bias error (MBE) and percentage bias (PBIAS), are also included to identify potential systematic over- or under-prediction behavior of the models. These performance metrics provide a comprehensive evaluation framework that allows a reliable comparison of the predictive capabilities of the considered forecasting approaches.
Among the evaluated models, excluding the baseline approach, the ELASTIC model achieves the lowest absolute prediction error on the test dataset. The model yields an RMSE of 0.633 m/s and an MAE of 0.399 m/s, indicating competitive forecasting accuracy under short-term operational conditions. The goodness-of-fit metrics indicate a strong predictive performance. The model explains most of the variability in wind speed, with R2 = 0.977 and EVS = 0.977. In addition, the Pearson correlation coefficient has a value of r = 0.989. The normalized RMSE is also relatively low (nRMSE = 0.031), further confirming the model’s strong predictive capability. The PERSIST model, which is used as a baseline reference, produces performance values very close to those of ELASTIC (RMSE = 0.634 m/s, MAE = 0.392 m/s, R2 = 0.977, EVS = 0.977). However, the Skill RMSE value close to zero indicates that this model essentially relies on the short-term temporal persistence characteristic of wind speed. For this reason, the persistence model is primarily used as a benchmark to assess the additional predictive value provided by more advanced forecasting models. The LSBOOST model, which is based on tree-based boosting, demonstrates a relatively strong overall fit (RMSE = 0.715 m/s, R2 = 0.971). Nevertheless, its relative error indicators are higher than those of ELASTIC and PERSIST, particularly under low-wind-speed conditions where MAPE reaches 22.6% and sMAPE 11.6%. Similarly, the Random Forest (RF) model exhibits inconsistent performance across different wind speed levels. Although it captures general trends, its prediction errors remain relatively high (RMSE = 0.880 m/s, MAPE ≈ 52.7%), indicating reduced reliability compared to the best-performing models. The ENS ensemble model achieves performance levels that are very close to those of ELASTIC and the persistence baseline (RMSE = 0.636 m/s, R2 = 0.977). However, the slightly negative Skill RMSE (−0.003) suggests that the ensemble strategy provides only a marginal improvement over the persistence approach. This result implies that, for a very short forecasting horizon such as 10 min, the contribution of stacking weights may remain limited. It should also be noted that these ensemble results correspond to deployment conditions where stacking weights were refitted using the validation dataset. In contrast, the SVR model produces the weakest performance among all evaluated methods. It records a relatively large prediction error (RMSE = 1.417 m/s, MAE = 0.695 m/s) and an extremely high relative error (MAPE ≈ 99%). The strongly negative Skill RMSE (−1.235) further indicates that the model performs even worse than the baseline persistence approach. An examination of the bias metrics also provides additional insight into model behavior. Both ELASTIC and PERSIST display MBE and PBIAS values close to zero, indicating that their predictions do not suffer from significant systematic overestimation or underestimation. In contrast, the SVR and RF models exhibit negative and relatively large PBIAS values, which indicates a consistent tendency to underestimate wind speed in the test dataset.
Based on the above-presented results, the ELASTIC model shows a balanced forecast behavior in terms of different evaluation criteria such as the absolute error, relative error, goodness-of-fit measures, and bias indicators. Although the performance of the persistence baseline model turned out to be quite similar to that of the ELASTIC model for the very short-term forecast horizon, the latter showed low systematic biases, stability of residual behavior, and consistency in a statistical sense. Having in mind these balanced statistical properties of the model, the ELASTIC model was chosen for conducting the following residual analysis (
Figure 3).
We created a radar chart to compare the forecasting performance across multiple metrics, including RMSE, MAE, R
2, nRMSE, MAPE, and sMAPE. From the chart, it is clear that the ELASTIC model performs consistently well across almost all metrics. The persistence model shows similar performance in some metrics because of the short 10 min forecast horizon, but ELASTIC keeps errors more balanced overall. SVR and RF models, on the other hand, have much higher relative errors, especially for MAPE and sMAPE. This visual comparison confirms what we observed in
Table 4 and makes it easy to see that ELASTIC demonstrates more balanced forecasting behavior across multiple evaluation metrics.
We also checked the relationships between the predictor variables to see if multicollinearity could be a problem. To do this, we calculated the Variance Inflation Factor (VIF) for each environmental variable used in the models.
Table 5 shows all VIF values, which are well within the acceptable range. This tells us that there are no strong linear dependencies between the predictors, so the model coefficients remain stable and the machine learning performance is not negatively affected.
4.2. Data Preprocessing, Quality Control, and Descriptive Statistics
Raw data was transferred from Excel to the MATLAB R2023b environment, and a systematic preprocessing step was performed before the analysis. In the first step, the TIMESTAMP field was standardized to form a regular time-series structure, while derivative time scales (hourly, daily, and monthly) were created for visualization purposes. In the second step, physical consistency checks were performed, and physically unreasonable data were removed from the dataset. It should be noted that several extreme values reported in the descriptive statistics correspond to transient sensor/logger anomalies or short-duration measurement artifacts observed in the raw monitoring system outputs. These values were intentionally retained within the descriptive statistical summaries to transparently reflect the characteristics of the raw dataset. However, during predictive modeling, the leakage-free quality control procedure identified and filtered unstable or physically implausible observations prior to model training and evaluation, thereby minimizing their influence on forecasting performance. This step was an additional layer of quality control following the general data filtering procedure described above. This step was an additional layer of quality control for the analysis, following the general data filtering procedure as described in the report of the measurement campaign. In the third step, the treatment of missing data was determined. To avoid artificially affecting distribution-based analyses, missing observations were not filled using linear interpolation or other imputation methods; instead, the corresponding records were excluded from the dataset in a manner that did not compromise the analyses. This approach specifically aims to preserve extreme values and the distribution characteristics. The decision not to fill missing values applies only to exploratory data analysis (EDA) and the visualization of descriptive statistics; during predictive modeling, a split-aware causal imputation approach was additionally applied to prevent information leakage. Missing data rates were also calculated on a per-variable basis. According to the MissingFraction summary, the proportion of missing values across all variables was negligible, with only the h1 Avg variable exhibiting a very low missing fraction of approximately 1.4643 × 10−5. At this level, the missing data has no meaningful effect on statistical distributions or subsequent analysis steps. Descriptive statistic calculations were performed for all variables, providing quantitative information about the scale, unit, value range, variability, and possible range of extreme values of the dataset, although the focus is entirely descriptive in nature, without any causal relationship between the variables or any model-related interpretative results.
Descriptive statistic results reported in the current study include the following:
- (1)
Record information (RECORD);
- (2)
Wind speed statistic at the three levels;
- (3)
Turbulence intensity indicators (TI);
- (4)
Wind power density (WPD);
- (5)
Wind vector components (WVc);
- (6)
Meteorological and system-related variables;
- (7)
Reference extreme columns.
Using this structure, the dataset is able to track the important quantities of the wind regime, as well as the auxiliary variables related to the measurement conditions, in the same table format. The descriptive statistic results reported in the current study, as shown in
Table 6, include the measurement columns of the TOA5 data structure, as well as the derived variables obtained through the analysis procedure. As such, the turbulence intensity (TI) variables are not included in the raw header of the TOA5 data structure, although they are included in the current study’s descriptive statistic results reported in
Table 3, as they are obtained through the analysis procedure by using the wind speed statistic at the respective measurement levels. As such, the current study’s descriptive statistic results reported in
Table 6 do not follow the same structure as the header of the raw file, although they follow the same structure as the final variables included in the analysis procedure, ensuring the transparent and explicit reporting of the value range, distribution spread, and extreme value range of the variables, although the focus is entirely descriptive in nature without any causal relationship between the variables or any model-related interpretative results.
In
Table 6, TI variables are indicated, although turbulence intensity (TI) columns are not specified in the direct TOA5 header example. However, the final dataset for this study includes TI variables, TI v1, TI v2, and TI v3.
4.3. Exploratory Data Analysis and Visualizations
Before the modeling step, a number of visualizations were created as part of a process referred to as exploratory data analysis (EDA) to systematically uncover the distributional structure, level-dependent variability, and basic multivariate relationships in the data. The visualizations should not be considered as a way to display any results or as part of a performance evaluation process, but rather as a way to reproducibly document, prior to modeling, the typical value range and extreme values of the measurements, a condensed view of the temporal dependencies, and the linear dependencies among the essential variables. To explore the level-dependent distribution of the wind speed, boxplots were created for the 10 min mean wind speed values measured at the three levels. The boxplots simultaneously present the median, interquartile range (IQR), and outliers determined according to predefined thresholds, allowing for a concise summary of the distribution characteristics. These visualizations were used to methodologically highlight the comparability of distribution structures across measurement levels and the coverage of the wind speed range within the dataset (
Figure 4).
Boxplots were created for the derived turbulence intensity (TI) variables to examine the level-dependent distribution of short-term wind speed variability. This visualization allows the central tendencies, spread ranges, and outlier behavior of the turbulence intensity values to be simultaneously observed across measurement levels. In this way, the distribution characteristics of short-term wind fluctuations within the dataset are documented prior to modeling (
Figure 5).
The Pearson correlation coefficient was used to describe the linear relationship between wind speed variables and meteorological parameters. In this context, pairwise correlation coefficients were calculated between selected meteorological variables and wind speed variables at different measurement levels, and the results were presented as a correlation matrix. The correlation heatmap allows the sign and relative magnitude of linear dependencies between variables to be simultaneously observed in a single visual plane, aiming to methodologically document the fundamental correlation patterns of the multivariate dataset (
Figure 6).
To represent the diurnal as well as the seasonal variations in the wind speed, the measurements were grouped along the axes representing the months (1–12) and the hours (0–23). The arithmetic mean of the measurements of the wind speed corresponding to each of the combinations of the month–hour pairs was calculated, and the values were represented as a heatmap of size 12 × 24. This does not show the actual measurements directly; it shows the aggregated representation of the measurements corresponding to the same month–hour pairs. The aim is to show the diurnal and seasonal variations in the measurements over a long period of time, represented over a single plane (
Figure 7).
In addition to the aggregated representation in the form of the month–hour heatmap, the scatter plot was created along the hourly axis for the chosen wind speed variable, aiming at the direct representation of the diurnal distribution of wind speed at the raw data level. In the created visualization, each point is intended to represent the individual data point according to the respective hour without any form of summarization or averaging. This is meant to allow the raw distribution to be presented as an additional EDA result below the aggregated representations (
Figure 8).
Considering that the dataset covered two years (2021 and 2022), yearly correlation grids were produced to methodologically investigate whether the linear dependency between the core variables remained consistent over time. The correlation coefficients were computed individually for each year and presented in heatmap format. These annual correlation grids were used as a visual check to examine whether there were any significant structural changes in the fundamental dependency patterns between variables (
Figure 9).
In this study, all data reading, calculation of descriptive statistics, correlation analyses, and visualizations produced within the framework of exploratory data analysis (EDA) were performed in the MATLAB environment. The objectives of this stage are: (i) to transparently document the numerical scales, units, and ranges of the dataset through descriptive statistics (
Table 6), (ii) to methodologically reveal the distribution characteristics, temporal patterns, and linear relationships among core variables (
Figure 4,
Figure 5,
Figure 6,
Figure 7,
Figure 8 and
Figure 9), and (iii) to define the data preprocessing and characterization steps prior to modeling in a reproducible framework.
4.4. Time-Domain and Distribution-Based Validation
We have also tested the short-term prediction capability of the ELASTIC model. This was done using the temporal performance and accuracy of the distributions of predicted wind speed values in the test set. The visualized output was provided by two graphs: one describing the temporal performance of prediction, and another comparing the distributional accuracy of observations and predictions.
Figure 10 shows the comparison of observed and predicted time series of wind speeds, while
Figure 11 compares the linear consistency and the distribution of observations and predictions.
It can be seen from
Figure 10 that the ELASTIC model demonstrates good temporal tracking performance when predicting low- and high-speed changes in wind speed values. Prediction is performed with a slight time lag compared to the observed wind speed values. Even during rapid changes in wind speed, the model is capable of preserving variability in wind speed signal values. The above claim can be verified by looking at
Figure 11, which shows the scatter plot of observations and predictions. It can be observed that most values fall close to the 45-degree line. Thus, the obtained measures (RMSE ≈ 0.63 m/s, MAE ≈ 0.40 m/s, and R
2 ≈ 0.98) show that a considerable amount of variability in test data has been captured. It should be noted that dispersion occurs under low wind speed values, but the model shows stable performance during medium- and high-speed values.
Upon reviewing the graphs provided by
Figure 10 and
Figure 11, we found out that the ELASTIC model demonstrates equal temporal tracking and consistent statistical distributions of predictions made under certain forecast conditions. This finding was supported by the previously discussed global metrics and residuals. Generally speaking, the presented results allow us to state that the ELASTIC model forecasts wind speeds accurately under short-term conditions.
4.5. Residual Analysis and Statistical Consistency
We also analyzed the residuals of the ELASTIC model to better understand its error behavior. Residuals were calculated as the difference between observed and predicted values. We looked at (i) the overall distribution, (ii) residuals relative to predicted magnitudes, and (iii) temporal dependencies, shown in
Figure 12,
Figure 13 and
Figure 14. The histogram in
Figure 12 shows that most residuals are tightly clustered around zero, indicating minimal bias. We observed a sharp peak with many small errors and only a few large residuals, showing that the model consistently performs well across the majority of the test set. Overall, we confirmed that the errors are small, well distributed, and do not dominate the model’s predictions.
We checked the residuals more closely using a scatter plot and statistics. The scatter plot shows that residuals are generally centered around zero with no clear trend. At low predicted wind speeds, the spread is wider, forming a wedge shape, while at medium to high values, the spread is narrower. This indicates that the errors are mostly small and clustered near zero, though a few rapid changes may cause larger deviations. From
Table 7, the mean residual is very close to zero (0.026 m/s), skewness is low (0.11), and kurtosis is near normal (3.02). These results confirm that the ELASTIC model’s predictions are statistically reliable, with small, balanced, and symmetric errors.
According to the Diebold–Mariano test results, the ELASTIC model yields a statistically significant improvement over the SVR and RF models in terms of forecast accuracy, while the difference from the persistence benchmark is statistically insignificant in the context of an ultra-short-term forecasting horizon used in this research. It corresponds to the presence of a highly persistent temporal structure of wind speed series at 10 min forecasting interval lengths. Thus, the practical evaluation of forecast improvements must take into account not only p-values but also residual stability, systematic bias patterns, and forecasting consistency within regimes.
We also looked at the temporal behavior of the residuals using the autocorrelation function (ACF). Most lags show coefficients close to zero, with only small positive values at the first few lags. This indicates that the residuals do not have strong or persistent autocorrelation. However, the Ljung–Box test shows that the residuals are not fully independent for the first 10, 20, and 30 lags (p < 0.001). In short, the residuals have a weak but statistically significant temporal dependence, which matches the small peaks we see in the ACF plot.
To provide additional statistical support for the residual interpretation, several formal diagnostic tests were also applied to evaluate residual normality, stationarity, heteroscedasticity behavior, and temporal dependence characteristics, as summarized in
Table 8.
Residual normality was evaluated using the Jarque–Bera test together with the histogram and Q–Q plot analyses presented in
Figure 12 and
Figure 13. The Jarque–Bera results indicated mild deviations from strict normality, primarily associated with tail behavior under extreme wind conditions, although the central residual distribution remained approximately symmetric and well-behaved. Residual stationarity characteristics were confirmed using the Augmented Dickey–Fuller (ADF) test, while ARCH-based diagnostics indicated the presence of heteroscedastic residual variance behavior, which is commonly observed in atmospheric and wind-related time series due to changing turbulence intensity and stochastic variability. Together with the Ljung–Box and ACF analyses, these results suggest that the residual structure remains statistically stable overall, despite weak temporal dependence and variance fluctuations under certain operating conditions.
Looking at
Figure 12,
Figure 13 and
Figure 14 together, we can see that the residuals from the ELASTIC model on the test set are mostly centered around zero and tightly clustered. There are no large swings or dominant patterns in autocorrelation. We do notice a small but statistically significant temporal dependence, though it is minor. Overall, this confirms that the low RMSE and MAE values we observed earlier reflect not just numerical accuracy, but also a stable and consistent error structure. The residual distributions remained narrowly centered around zero with limited systematic bias, supporting the statistical consistency of the obtained forecasting behavior. Based on these findings, the ELASTIC model may be considered a statistically consistent reference approach for short-term wind speed forecasting under the evaluated conditions.
4.6. Regime-Based Performance Evaluation
We divided the test set into Low, Mid, and High wind speed ranges to see how the ELASTIC model performs across different conditions. For Low wind speeds, the model struggled the most. We observed an RMSE of 0.924 m/s and an MAE of 0.491 m/s, which are noticeably higher than the errors in the Mid and High ranges. The R2 and EVS values are also very low (0.140 and 0.237, respectively), meaning the model explains only a small portion of the variance at low speeds. Skill RMSE is negative (−0.140), and relative errors are extremely high, with MAPE at 82.1% and sMAPE at 56.0%. The residuals in this range are widely spread, with some extreme negative values, suggesting that the model has difficulty capturing rapid changes or small signals when wind speeds are low. This indicates that the ELASTIC model is less reliable in low-wind conditions, likely due to higher turbulence and a low signal-to-noise ratio. It performs much better in Mid and High wind speed regimes, where errors are lower and predictions more stable.
We looked at
Table 9 to see how the ELASTIC model performs across different wind speed ranges, and the results show clear improvement as wind speeds increase. In the Mid wind speed regime, the model does much better than at low speeds. The RMSE drops to 0.506 m/s and MAE to 0.352 m/s. R
2 and EVS are both 0.860, meaning the model explains most of the variance in this range. Relative errors are low and balanced, with MAPE ≈ 6.84% and sMAPE ≈ 6.87%. The residual boxplot shows a narrow spread with a median near zero, indicating stable and unbiased predictions. Skill RMSE is positive at 0.074, confirming the model outperforms the persistence baseline here. In the High wind speed regime, performance improves further. R
2 and EVS rise to 0.928, showing most variance is captured. RMSE is 0.632 m/s and MAE is 0.412 m/s, so errors remain low even with higher variability. Residuals are slightly more spread than in the Mid range, but they remain symmetrically distributed around zero, indicating no systematic bias. The positive Skill RMSE of 0.025 shows the model still performs better than the baseline. We see that the ELASTIC model handles Mid and High wind speeds very well, producing stable, accurate, and unbiased predictions, while Low wind speeds remain the most challenging in
Figure 15.
As can be seen, the ELASTIC model shows more consistent performance under the Mid and High wind speed regimes. Based on the regime-dependent performance metrics presented in
Table 9, forecasting accuracy is noticeably higher in these regimes, while based on the behavior of residuals shown in
Figure 14, there is limited long-term autocorrelation and generally stable residuals. Under the Low wind speed regime, forecasting performance is noticeably worse. It is possible that the performance drop is connected with measurement uncertainty, turbulence intensity, and lower signal-to-noise ratios during calm weather. All of these factors lead to worse model performance. In general, it appears that the model is able to show more consistent short-term forecasting behavior under moderate- and high-wind-speed regimes, but other techniques might be necessary to improve performance in calm wind regimes.
4.7. Bland Altman and Normality Analysis
We looked deeper into how well the ELASTIC model predictions match the observations using Bland Altman analysis and a Q-Q plot, shown in
Figure 15 and
Figure 16. From the Bland–Altman plot (
Figure 16), the differences between predictions and observations are plotted against their average. The mean difference (bias) is +0.026 m/s, showing that the model has a very low bias. Most differences fall within the 95% limits of agreement (−1.213 to 1.265 m/s), confirming good agreement between predicted and observed values. Importantly, the differences do not trend upward or downward as wind speed increases. This means there is no proportional bias or heteroscedasticity, indicating that the model’s errors remain stable across the full wind speed range. In short, the Bland–Altman analysis reinforces that the ELASTIC model is both accurate and consistent.
We also checked the normality of residuals in two ways. First, the Q-Q plot in
Figure 17 shows that most residuals in the center follow the reference line, indicating the distribution is roughly normal around the mean. At the tails, there are minor deviations, which are expected because wind speeds can change suddenly. These deviations occur only in a few cases, so they do not significantly affect the overall distribution. To confirm this statistically, we performed the Shapiro–Wilk normality test.
Table 8 shows a test statistic of 0.987 and a
p-value of 0.084, which is above the 0.05 significance level. This means we cannot reject the null hypothesis of normality, confirming that the residuals are approximately normally distributed. In short, both visual and statistical analyses indicate that the residuals are mostly normal, supporting the reliability of the ELASTIC model in
Table 10.
From the Bland–Altman and Q-Q plots, it is clear that the ELASTIC model shows very low bias, strong agreement between predictions and observations, and a residual distribution that is approximately normal in the central region. These results indicate that the model is statistically consistent and operationally reliable for short-term (10 min ahead) wind speed forecasting. In short, the ELASTIC model not only delivers accurate predictions but also maintains reliability and statistical consistency, making it a robust choice for practical forecasting applications.
4.8. Feature Importance and Sensitivity Analysis
We carried out a permutation feature importance analysis to understand which input variables most influence the ELASTIC model’s wind speed predictions. Basically, we shuffled each predictor one at a time and observed how much the prediction error increased; the bigger the increase, the more important the variable. This approach helps us interpret the model without changing its structure and also highlights the physical drivers behind wind speed variability. The results of this analysis, summarized in
Table 11, show which environmental variables contribute the most to accurate forecasting, giving both practical and theoretical insights into the model’s behavior.
Looking at the ELASTIC model’s predictor importance, we can see that previous wind speed clearly dominates, while other environmental variables play smaller but meaningful roles. This highlights that short-term wind speed forecasting relies heavily on temporal persistence, with atmospheric conditions providing additional fine tuning in
Figure 18.
From the table, it is clear that previous wind speed alone explains almost 40% of the model’s performance, confirming that short-term forecasts rely heavily on recent wind trends. Solar irradiation and air temperature together contribute over 35%, highlighting the role of environmental conditions in shaping wind dynamics. While humidity, pressure, and wind direction have smaller effects, they still provide meaningful adjustments that improve the prediction. This shows that the model effectively combines temporal continuity with physical drivers of wind, making it both accurate and interpretable. In practice, short-term predictions are mainly guided by recent wind patterns, while environmental variables help fine-tune the forecasts for subtle changes.
4.9. Statistical Model Comparison Using Diebold–Mariano Test
Beyond the usual performance metrics, we also compared the forecasting models statistically using the Diebold–Mariano (DM) test. This test checks whether the difference in predictive accuracy between two models is statistically meaningful. Essentially, it examines the loss differential between the errors of the models to see if the average difference is significantly different from zero. In this study, we used the squared error as the loss function, which aligns with the RMSE metric applied throughout the analysis.
Formally, for two models
i and
j, the loss differential at time
t is defined as follows:
where
L(⋅) is the loss function, and
ei,t and
ej,t are the forecast errors of models
i and
j, respectively.
The DM statistic is then calculated as follows:
This provides a quantitative measure to determine whether one model is significantly more accurate than another.
Looking at
Figure 19, the ELASTIC model demonstrates statistically significant improvements over SVR, RF, and partially over LSBOOST, while the differences relative to ENS and the persistence baseline remain statistically insignificant. These findings indicate that ELASTIC provides competitive and statistically consistent forecasting performance under ultra-short-term forecasting conditions, although its advantage over persistence-based approaches remains limited at the evaluated forecasting horizon.
The results of the Diebold–Mariano test are summarized in
Table 12. Looking at the comparisons, we see that ELASTIC significantly outperforms SVR and RF (
p < 0.01), confirming that these models are much less accurate for short-term wind speed forecasting. A marginally significant difference is also observed between ELASTIC and LSBOOST (
p ≈ 0.048), indicating that ELASTIC still performs slightly better, but the advantage is smaller. ELASTIC shows no significant difference compared to the ENS model or the persistence baseline. This observation is also consistent with the similar RMSE values reported in
Table 4. For very short-term forecasts, it performs similarly to ensemble and baseline models. This indicates that the ELASTIC model maintains statistically consistent forecasting behavior under ultra-short-term forecasting conditions. The DM test results support the use of ELASTIC as the primary model for the subsequent residual and regime-based analyses due to its balanced statistical performance across multiple evaluation criteria.
4.10. Forecast Error Decomposition Analysis
To better understand where the prediction errors come from, we performed a forecast error decomposition. Instead of just looking at the overall error size, we split the error into three parts: systematic bias, variance differences, and random fluctuations. This helps show whether errors come from model bias or from the natural randomness of wind. Using the MSE decomposition framework, the total error can be written as follows:
Here, the bias term captures systematic over or underprediction, the variance term measures how much the predicted variability differs from the observed, and the random error reflects unpredictable changes in wind speed. This approach allows us to see not just how big the errors are, but why they occur, which is useful for improving model design and understanding limitations in short-term wind forecasting.
Figure 20 shows that most of the prediction error comes from random atmospheric variability. Systematic bias and variance differences play only a small role in the total error. This means the ELASTIC model is not systematically over- or underpredicting, and it captures the variability of wind well. Most remaining errors are due to natural, unpredictable fluctuations in the atmosphere, which are difficult to eliminate.
The ELASTIC model shows minimal systematic bias according to
Table 13 which accounts for only 1.8% of total forecast errors. The variance difference component also remains limited (7.6%), indicating that the predicted wind speed variability is generally consistent with the observed dynamics. The majority of the remaining forecast error (90.6%) is associated with random fluctuations, which reflects the inherently variable and stochastic nature of wind behavior. The research results demonstrate that the proposed framework delivers accurate forecasting results which maintain statistical consistency throughout short-duration operational testing. The operational decision-making processes for wind energy scheduling, reserve planning, and short-term grid balancing applications will benefit from the stable short-term forecasting results.
5. Discussion
In this study, we assessed the short-term (Δt = 10 min) wind speed forecasting ability of the ELASTIC model using a multi-dimensional evaluation framework, rather than relying solely on traditional metrics like RMSE or MAE. We found that while conventional error metrics are useful, they do not fully capture the time-dependent behavior, error structure, or regime-specific performance of the model. The time-series analysis indicated that the ELASTIC model generally captures fluctuating wind speed behavior, including relatively rapid variations, while maintaining reasonable temporal consistency with the observed measurements. These observations indicate statistically consistent forecasting behavior under the evaluated forecasting conditions. Residual analysis further supports this interpretation. Additional residual diagnostics were also performed to evaluate statistical assumptions underlying the forecasting errors. Formal normality assessment using the Jarque–Bera test, residual stationarity analysis using the Augmented Dickey–Fuller test, and heteroscedasticity analysis using ARCH-type diagnostics were conducted to complement the graphical residual analyses. These tests further supported the statistical consistency of the residual structure under the proposed leakage-free forecasting framework. Errors were mostly concentrated around zero, varied in a narrow range, and did not show strong systematic trends, while only weak but statistically significant temporal dependence was detected. Although the Ljung–Box test indicated weak but significant temporal dependence, the autocorrelation function confirmed that its amplitude is minimal. This suggests that the low RMSE and MAE values are not artifacts of numerical results but are structurally backed, reinforcing the statistical reliability of ELASTIC. Bland–Altman and Q-Q plots also indicated minimal bias (~0.026 m/s) and an approximately normal error distribution, with minor deviations at extreme values. This is particularly valuable for operational use, as it ensures that large errors are unlikely, unlike some high accuracy models that are sensitive to extreme wind conditions [
47,
48]. Regime-based evaluation revealed that forecasting accuracy strongly depends on wind speed. At low wind speeds, the model shows high errors (RMSE = 0.924 m/s, MAPE ≈ 82%), likely due to measurement uncertainty, a low signal-to-noise ratio, and turbulence, consistent with previous studies highlighting the chaotic nature of near-calm conditions [
49,
50]. In contrast, the model performs much better in mid- and high-wind-speed regimes. For the mid regime, RMSE = 0.506 m/s and R
2 = 0.860; for the high regime, RMSE = 0.632 m/s and R
2 = 0.928. These results indicate statistically consistent forecasting behavior, which is particularly important for high wind speeds relevant to wind energy generation. In comparison with the PERSIST model, one can observe that a sizable proportion of the predictive power is due to temporal persistence, while the ELASTIC model adds additional predictive power, especially at mid and high wind speed levels. Persistence models are treated as basic references in the ultra-short-term wind forecasting area since wind speed processes possess very pronounced temporal autocorrelation structure.
The 10 min forecasting horizon has been chosen due to its applicability for short-term decision-making in wind energy, which implies fast scheduling adjustments, reserve revisions, and near-real-time balancing. In this regard, temporal persistence naturally forms the major element of ultra-short-term forecasts. This is the reason why the persistence benchmark is considered particularly robust in the present case study. With respect to longer forecasting horizons, the effects of temporal autocorrelation are likely to diminish. It means that the relative value of nonlinear and multivariate methods would increase. However, at the moment, only short-term forecast horizons are under consideration, and the leakage-free methodology developed herein might be revisited in the future for longer forecasting horizons.
Therefore, even relatively small gains beyond persistence could have practical significance, provided that these improvements were coupled with better statistical consistency, lower systematic errors, greater operational stability, and diagnostic adequacy of residuals.
The proposed framework establishes its primary value not only through numerical RMSE reduction, but also through the development of a forecasting framework with statistically consistent and interpretable behavior under realistic forecasting conditions. This confirms that the model does more than simply replicate past wind trends; it leverages environmental variables to improve forecast accuracy. In addition, the performance features described above provide valuable insights regarding the model’s complexity and applicability to short-term wind forecasting problems. Due to the fact that the prediction horizon considered in this research is 10 min, the forecasting process will be considerably influenced by the factors associated with temporal persistence and stable autoregressive behavior. In this context, much of the predictive power in the considered forecasting task can be captured through regularized linear relations using lagged features and environmental covariates. Even though modern studies prove the capabilities of complex deep learning architectures like LSTM-, GRU-, and Transformer-based models to solve complex forecasting tasks, the above approaches typically require more computational resources, larger amounts of training data, and significant hyperparameter optimization efforts. On the other hand, the ELASTIC architecture showed stable and consistent results within this current study without needing computationally extensive optimization processes and without experiencing leakage during chronological validation. In the absence of computational complexity benchmarks or real-time latency analysis in this current study, it is noted that the ELASTIC algorithm offers significant advantages in terms of computational performance, robust training, and limited hyperparameter optimization compared to more computationally complex architectures. It is possible that these characteristics would provide advantages in using this algorithm in an environment requiring short-term model updates and operational forecasting. The results from the forecasting indicate stability, consistency, and physical interpretability. The analysis on the error structure (residual distribution, autocorrelation, and Bland–Altman) shows low bias and nearly normal distribution, with only minor variations caused by wind dynamics.
The results of our study demonstrate that model validation needs error-structure analysis and regime-dependent evaluation methods for proper assessment of its performance. The ELASTIC system operates effectively at medium and high wind speeds but its performance at low wind speeds needs multiple improvements through specific automatic system calibration techniques and hybrid modeling and multi-step forecasting methods. The proposed framework can achieve operational deployment through its wind speed range.
The present study primarily focuses on deterministic point forecasting under a leakage-free evaluation framework. Future research needs to address three essential aspects, which include creating formal pathways for measuring uncertainty from sensor data to machine learning predictions, developing probabilistic uncertainty measurement systems, and building forecasting methods that use calibration knowledge. The operational reliability of short-term wind forecasting systems can be enhanced through the combination of prediction intervals, uncertainty-aware ensemble methods, and probabilistic forecasting frameworks.
Our multi-faceted evaluation framework shows that holistic assessments create better wind forecasting performance evaluations than single global metrics which previous research has documented [
47,
48,
49,
50,
51,
52]. The ELASTIC framework demonstrates statistically consistent short-term wind forecasting behavior under the evaluated conditions.
6. Conclusions
This paper has presented an information-leakage-free machine learning-based framework for short-term (10 min ahead) wind speed forecasting using practical data gathered from an actual wind farm in the Bandırma/Balıkesir region of Türkiye. The following machine learning models have been comparatively analyzed using information-leakage-free chronological validation settings: SVR, RF, LSBOOST, ENS, and ELASTIC models.
Among the considered models, the ELASTIC model demonstrated statistically consistent and competitive forecasting performance on the test set, with RMSE ≈ 0.63 m/s, MAE ≈ 0.40 m/s, and R2 = 0.977. Residual diagnostics, Bland–Altman, Q-Q-plot tests, and regime-based assessments indicate that the model shows low systematic bias and statistically consistent forecasting behavior, especially in the cases of medium- and high-wind-speed regimes. Forecast error decomposition further suggests that natural variability in the atmosphere is the main source of forecast errors, while systematic bias plays only a minor role in generating those errors. The results also indicate that ultra-short-term wind forecasting is highly sensitive to temporal persistence effects because of the extremely strong autocorrelation structure in the time series. Under such circumstances, it might make sense to apply regularized linear models with lagged features and environmental predictors which could deliver comparable performance to deep learning-based architectures while potentially requiring lower computational complexity. However, there are some limitations associated with the proposed framework. In particular, this work addresses only deterministic point forecasting under a 10 min ahead prediction horizon without performing any uncertainty quantification, uncertainty propagation, and real-time computations. Additionally, it seems that forecasting accuracy is slightly lower in the case of low wind speeds probably owing to increased turbulence and signal noise. Potential research directions might include long-term predictions, regime-specific calibration, hybrid architectures, and more comprehensive comparison studies involving advanced deep learning and Transformer-based architectures in the absence of information leaks. Real-time forecasting performance and operation might also be a useful area for future exploration.
To summarize, the proposed information-leakage-free ELASTIC framework demonstrates statistically consistent and interpretable short-term wind forecasting behavior under the evaluated operational conditions.