This section presents and discusses the findings obtained from all algorithms under the different experimental setups. The models were trained and validated using data from the first two years, with parameters optimized accordingly, and then tested on the final year of data, 2019. Testing was conducted across multiple days and diverse weather conditions to evaluate model performance under realistic operating scenarios. In addition, the models were assessed over several forecasting horizons, including 1 h, 6 h, 1 d, and 3 d ahead predictions, all at 15 min intervals.
For clarity and comparability with related studies, this section is divided into two subsections addressing wind speed and global horizontal irradiance, respectively. The data set was divided into three categories, which represented half years of data. Namely, the first year was used for training, the second year for validation and hyperparameter selection, and the third year for testing and performance comparison.
4.1. Wind Speed Forecasting
Each model was trained for 100 epochs. For the RNN-based models, the training time was approximately 6 s per epoch, whereas the CNN required about 3 s per epoch in the univariate setup. When the endogenous and exogenous variables of input were used with a similar training time, it was found that the presence of extra inputs did not significantly increase computational cost. In general, there was no significant difference between the training time across models and input configurations, although the CNN demonstrated a clear advantage in computational efficiency.
In all experiments, the training loss decreased rapidly within the first ten epochs and then stabilized at approximately 0.002. The validation loss followed a similar trend and converged to a comparable value with only minor differences. This behavior indicates that the models were neither overfitted nor underfitted, regardless of the input configuration.
Figure 10 presents the training and validation loss curves of each DL model as functions of epochs when wind speed and time-based signals are used as inputs. It is also observed that, during the initial epoch, the validation loss is lower than the training loss, after which the training loss becomes slightly lower than the validation loss. This pattern is consistent with findings reported in the literature [
48] and reflects a stable and well-fitted training process.
Figure 11 compares the day-ahead wind speed predictions generated by five DL models, RNN, GRU, LSTM, BiLSTM, and CNN, with the actual wind speed over a 24 h period. The horizontal axis represents the time of day, while the vertical axis shows wind speed in meters per second (m/s). The observed wind speed serves as a reference for evaluating the accuracy of the predicted profiles.
After training and validating the models using data from the first two years, the final year was reserved exclusively for testing. The testing procedure was conducted in two phases. The first phase evaluated model performance across different forecasting horizons to identify the strengths and limitations of each DL algorithm and input configuration. This phase focused on the last three days of 2019 and considered key forecasting horizons of 12 h ahead, 1 d ahead, and 3 d ahead, all generated at 15 min intervals.
Figure 12 presents a visual comparison of 1 d-ahead wind speed forecasts produced by various DL models, such as Simple RNN, GRU, LSTM, BiLSTM, and CNN with different kernel sizes (KS = 1, 2, and 4). Each subplot compares the predicted wind speed profile with the measured values during testing. The prediction horizon (iterations) is depicted on the horizontal axis, and wind speed in meters per second (m/s) on the vertical axis. The measured data are well represented by the predicted wind speed in all models, which shows that the DL architectures could reproduce the time dynamics and variability of wind speed. The substantial overlap between predicted and observed curves indicates high predictive accuracy. Minor deviations are observed during periods of rapid change, particularly at peak wind speeds and sharp transitions, but these differences remain limited. The strong forecasting performance is further validated by the quantitative performance metrics comparison given in
Table 5,
Table 6 and
Table 7 for three days, one day, and 12 h, respectively. These tables report the error metrics corresponding to each forecasting horizon evaluated in the first testing phase.
Figure 13 presents the percentage error of wind speed forecasts generated by the evaluated DL models across the testing iterations. For all models, the errors are centered around zero and remain within a narrow range, indicating unbiased and stable predictions without systematic overestimation or underestimation. The similar error patterns observed across architectures confirm the robustness and consistent forecasting performance of the proposed models.
Evidently, RMSE generally decreases when time-based signals are added to wind speed as input features. Similarly, MAPE presents a general decrease relative to the univariate configuration, indicating that multivariate models provide more accurate predictions relative to the observed values. Error metrics remain low across all forecasting horizons, and the corresponding correlation coefficients demonstrate strong agreement between predicted and actual values, confirming the reliability of the proposed models.
A comparison of the two multivariate input configurations indicates a number of important observations. On the whole, DL models using time-based signals and meteorological variables can always be expected to perform better than the univariate baseline. The magnitude of improvement is comparable between the two multivariate approaches. However, slightly lower MAPE values are typically achieved at shorter forecasting horizons when meteorological variables are used as inputs. In contrast, for longer forecasting horizons, time-based signals tend to provide greater performance gains than meteorological inputs. These findings indicate that time-related features can be especially effective for improving long-term wind speed prediction. Across model architectures, predictive performance is similar, with CNN showing a minor advantage in training efficiency and LSTM achieving slightly lower error metrics.
The second testing stage evaluates model robustness using twelve representative days, one from each month, to assess seasonal variability throughout the year.
Table 8 provides an overview of the RMSE values of each of the chosen days using various models and input configurations. The results indicate that, in all months, models incorporating wind speed and time-based signals consistently achieve lower RMSE values than models using wind speed alone. Furthermore, the improvements obtained with time-based signals are more consistent than those achieved with meteorological inputs. A comparison with other studies is presented in
Table 9. This finding demonstrates that augmenting wind speed with time-based exogenous features provides a robust and accurate forecasting framework across different seasons. Both RNN and CNN models exhibit comparable predictive performance, although the CNN requires less training time.
Figure 14 illustrates the day-ahead wind speed forecasts generated by all DL models.
Although
Figure 10 illustrates a representative daily wind speed forecast for visualization, model performance was evaluated comprehensively across multiple forecasting horizons and seasonal conditions.
Table 5,
Table 6 and
Table 7 present quantitative metrics, RMSE, MAPE, and R
2), for 12 h, 1 d, and 3 d ahead predictions, while
Table 8 summarizes monthly RMSE values for twelve representative days. These results confirm that the observed performance improvements are consistent across different time scales and seasonal regimes, rather than being confined to a single daily example.
4.2. GHI Forecasting
Similar to the wind speed experiments, all models were trained for 100 epochs. The training dataset covered an entire year, enabling the models to capture both seasonal and diurnal patterns in the data. Among the evaluated architectures, the CNN demonstrated faster convergence and smoother loss trajectories for both training and validation datasets. For both univariate and multivariate configurations, the CNN required approximately 2 s per epoch, whereas the RNN and BiLSTM required about 4 and 5 s per epoch, respectively. This underscores the computational efficiency of the CNN architecture relative to the other DL models.
After satisfactory convergence, testing was conducted using the 2019 dataset in two stages. In the first stage, GHI was predicted across multiple forecasting horizons using three input configurations: (a) a univariate baseline model, (b) a multivariate model incorporating time-based signals, and (c) a multivariate model using meteorological variables. This design enables direct comparison among the baseline approach, the proposed time-based configuration, and conventional multivariate meteorological models reported in the literature.
The training loss stabilized after a few epochs, converging to approximately 0.017 for univariate models and 0.015 for multivariate models with time-based signals. The validation loss closely followed the training loss, showing similar trends and plateauing slightly above it. This pattern indicates good generalization performance and suggests that the models are well suited for GHI forecasting.
Figure 15 presents the training and validation loss curves for all DL models using time-based signal inputs.
The Pearson correlation between GHI and the time-based elements (day of year and time-of-year components) of the engineered sine and cosine functions is shown in
Figure 16. The medium correlations between GHI and the day-based signals indicate significant temporal cycles related to the diurnal and seasonal variations in solar irradiance. In contrast, the annual signals exhibit only weak linear correlation with GHI, suggesting a lack of substantial direct linear relationships. However, these characteristics were retained during the feature-selection process to enable DL models to leverage potential nonlinear relationships, thereby complementing the meteorological and irradiance-based inputs for GHI forecasting.
The predicted and measured GHI time series using various DL models, including recurrent and convolutional architectures, are presented in
Figure 17. The models closely resemble observed GHI patterns, capturing diurnal and seasonal variations, and exhibit high coefficients of determination greater than 98% with identical RMSE values. Minor deviations occur primarily during rapid irradiance changes, indicating similar and stable performance across the models.
The performance of multivariate time-signal models was tested across different weather categories to evaluate the robustness of time-based features in predicting GHI. RMSE and the coefficient of determination were reported for each test case.
Figure 18 presents the training and validation loss curves of all DL models using a set of five features as multivariate inputs, selected based on Pearson correlation analysis. All models converge rapidly and exhibit stable loss behavior, with training and validation losses remaining close, indicating good generalization. CNN-based models show slightly faster convergence and more favorable loss profiles than recurrent models.
Figure 19 compares predicted and actual GHI during the testing period of selected multivariate inputs. Each model reflects temporal development and seasonal GHI fluctuations, achieving high coefficients of determination (above 98%) and similar RMSE values. Minor deviations occur primarily during rapid changes in irradiance.
Figure 20 presents a zoomed-in view of a typical clear-sky period to assess the accuracy of short-term predictions. The measured GHI profile is closely tracked by all models, with minimal error and R2 values exceeding 99%. These results confirm that a five-feature multivariate configuration provides sufficient information for reliable GHI forecasting across different model architectures. MAPE is not applicable due to the presence of several near-zero values in the dataset [
65].
Table 10,
Table 11 and
Table 12 report the error metrics for the different forecasting horizons.
Table 13 presents the evaluation metrics for GHI forecasting under different weather conditions, namely cloudy, partially cloudy, sunny, and clear. All models achieve lower errors and higher correlation values under sunny and very sunny conditions, whereas performance declines under cloudy and partly cloudy conditions due to high irradiance variability. Time-based multivariate models demonstrate consistent performance across weather classes, although meteorological inputs provide a slight advantage in cloudy conditions.
Although
Figure 21 presents a clear-sky day for visual illustration, model performance was systematically evaluated across multiple weather regimes.
Table 12 reports RMSE and R
2 values for cloudy, partially cloudy, sunny, and clear conditions. As expected, forecasting errors increase under cloudy and partially cloudy conditions due to greater irradiance variability and rapid atmospheric fluctuations. Nevertheless, the proposed framework maintains stable predictive performance across all regimes, demonstrating robustness beyond favorable clear-sky scenarios. This comprehensive evaluation ensures that the reported results are not biased toward less challenging forecasting conditions.
Figure 21 presents scatter plots of predicted versus observed GHI for all DL models, with data points tightly clustered around the 1:1 line across the full irradiance range. Slight dispersion at higher GHI values reflects increased variability under peak solar intensity, while the overall clustering confirms the effectiveness of the multivariate forecasting framework across different model architectures.
The results indicate that incorporating meteorological variables provides greater benefits than time-based signals under cloudy and partially cloudy conditions. However, including time signals as exogenous variables enhances GHI forecasting accuracy in sunny and clear weather. A comparison with related studies is presented in
Table 14. However, performance under cloudy conditions remains limited across all models and may require reconsideration of the algorithm and training data.
The proposed method can support smart city applications by enabling accurate intra-hour renewable forecasting using timestamp information alone. The framework is especially suitable for scalable deployment in cities with heterogeneous infrastructure, which helps achieve equitable, efficient, and smart energy systems to support sustainable urban development. Although the proposed models demonstrate stable performance across multiple forecasting horizons, 12 h, 1 d, and 3 d ahead, and seasonal variations within the Izmir dataset, the evaluation is confined to a single geographic location. The chronological train, validation, and test split across three consecutive years, 2017 to 2019, provides evidence of intra-site temporal robustness and interannual consistency. However, cross-site spatial generalization under different climatic regimes is not explicitly assessed in this study.
Notably, this study focuses on a controlled comparison of DL architectures and input feature configurations. Physical and hybrid forecasting baselines, including persistence models, numerical weather prediction models, clear-sky analytical models, and physics-informed machine learning frameworks, are not included in the present evaluation. The objective is to isolate the contribution of structured temporal feature engineering within data-driven architectures rather than to perform cross-paradigm benchmarking. Incorporating standardized physical baselines would offer additional operational insight and is therefore reserved for future research.