1. Introduction
The rapid expansion of photovoltaic (PV) systems across the globe underscores the pressing need for precise and efficient methods of forecasting PV power output [
1,
2]. As these systems become integral to the power grid, accurate predictions are essential not only for maintaining grid stability but also for optimizing energy distribution and consumption [
3,
4]. While significant progress has been made in the field of PV power forecasting, challenges such as limited historical data and the inherently unpredictable nature of solar irradiance make the task of creating robust forecasting models exceedingly difficult [
5]. Dynamic fluctuations in solar energy availability further complicate the development of models that can reliably predict PV power output over various time scales [
6]. Accurate photovoltaic (PV) energy forecasting is pivotal for grid stability and energy management amid the global expansion of PV systems. However, existing studies often exhibit limitations such as reliance on single deep learning architectures, complex preprocessing requirements, potential training instability, and insufficient capability to capture long-range temporal dependencies.
Traditional forecasting methods, including statistical models, machine learning algorithms, and physical simulations, have been instrumental in establishing a fundamental understanding of how PV systems generate power. Such approaches have provided valuable insights into the patterns and trends of PV power generation. However, their effectiveness is often curtailed by the necessity of substantial historical data for model training and this limitation can impede the accurate prediction of PV power output, which is crucial for the integration and management of renewable energy within the existing electrical grid infrastructure [
7]. Moreover, these methods may struggle to fully encapsulate the intricate complex variations in solar power output, which can be influenced by a myriad of factors, including weather conditions, geographical location, and temporal variations. These challenges highlight the need for more adaptive and sophisticated models that can better accommodate the complexities of solar energy production.
The advent of deep learning (DL) has indeed revolutionized the field of PV power forecasting, introducing a transformative approach that transcends the limitations of conventional methods [
8,
9]. DL models, with their sophisticated architectures, are uniquely positioned to decipher and model these intricate nonlinear relationships. DL offers significant advantages in PV power forecasting, particularly in its ability to process high-dimensional data and identify complex patterns that traditional methods may overlook. It automatically extracts features and learns representations at multiple abstraction levels, crucial for understanding the intricacies of solar irradiance and its effects on power output [
10]. Moreover, DL’s transfer learning capability allows models to leverage knowledge from related tasks, improving accuracy in environments with limited data or frequent changes, thereby enhancing generalization and model adaptability. Furthermore, DL models, especially recurrent ones, excel at capturing temporal dependencies, which is essential for forecasting solar power output due to the diurnal and seasonal variations in solar energy.
Among these DL models, the hybrid Convolutional Neural Network-Long Short-Term Memory (CNN-LSTM) model excels in handling both temporal and spatial characteristics of PV power data [
11]. The LSTM component is adept at capturing and remembering information over extended periods, allowing the model to discern patterns and trends that evolve over time and are influenced by long-term weather patterns and seasonal variations [
12,
13]. The CNN aspect, on the other hand, complements this by identifying spatial hierarchies, such as the arrangement and positioning of solar panels, which affect overall power output. By combining these two architectures, the CNN-LSTM model creates a comprehensive and nuanced understanding of PV power generation patterns [
14,
15]. In addition, the Temporal Convolution Network (TCN) adds a higher level of sophistication to forecasting by integrating a transformer component specifically designed to handle sequences with long-range dependencies. This ability is crucial for accurately predicting the volatile and intermittent behavior of solar irradiance. By capturing complex interdependencies across various time scales, the TCN significantly outperforms traditional models in solar power generation forecasting [
16,
17].
Recent studies have increasingly explored hybrid deep learning architectures to improve renewable energy forecasting by combining complementary modeling components. For instance, the Time2Vec–BiTCN–BiGRU framework integrates temporal convolution and recurrent structures to capture periodic patterns in photovoltaic and wind power generation [
18]. Similarly, the xLSTM–TCCNN model incorporates numerical weather prediction data with enhanced recurrent and convolutional networks to better address the intermittency of renewable energy generation [
19]. More recently, the multi-scale convolutional Kolmogorov–Arnold network (MCKAN) has been proposed to enhance multi-step wind and solar power forecasting by combining multi-scale feature extraction with attention mechanisms [
20]. These hybrid approaches demonstrate the effectiveness of integrating different deep learning paradigms for modeling the nonlinear and time-dependent characteristics of renewable energy generation.
Despite significant progress in PV power forecasting via deep learning methods, several challenges remain. Many existing studies focus primarily on improving prediction accuracy through increasingly complex hybrid architectures, while relatively less attention has been paid to understanding how meteorological variables influence PV generation and model performance across different stations. Furthermore, the interpretability of meteorology-driven PV forecasting models remains limited in many deep learning frameworks.
In this paper, a dropout-regularized CNN-LSTM with batch normalization (hereafter referred to as ConvTempNet), as well as a TCN integrating dilated convolutions and Transformer (hereafter referred to as DilaTransNet) is proposed to address the complexities of photovoltaic power forecasting. ConvTempNet was employed to extract temporal patterns influenced by weather conditions, leveraging its strength in combining spatial and sequential dependencies. DilaTransNet, on the other hand, provides a robust framework for capturing long-range temporal relationships across multiple time scales, which are essential for accurately forecasting weather-dependent photovoltaic output. This dual approach seeks to outperform traditional forecasting models and enhance the predictability of solar energy output, ensuring greater grid stability and optimized energy management. Unlike previous studies that focus on a single architecture (
Table 1), this study performs a comparative analysis of two distinct deep learning paradigms (Convolutional–Recurrent vs. Transformer-based) to identify their specific suitability for different weather regimes. The proposed ConvTempNet and DilaTransNet hold clear advantages over existing hybrid models by avoiding complex attention mechanisms, heavy optimization, and extra signal decomposition preprocessing. ConvTempNet uses a streamlined CNN–LSTM with batch normalization and adaptive dropout to enhance stability and efficiency. DilaTransNet leverages dilated convolutions and a Transformer to capture both local and long-range temporal dependencies without additional data processing. These designs yield higher accuracy, better robustness, and lower computational complexity compared with prior structures. Compared with existing hybrid structures summarized in
Table 1, ConvTempNet avoids overcomplicated attention modules and extra signal preprocessing such as VMD, reducing computation cost while stabilizing training via rationally arranged dropout. DilaTransNet leverages dilated convolution + lightweight Transformer to capture long-range dependencies without multiple metaheuristic optimization modules used in recent hybrid frameworks. Combined with targeted meteorology-oriented feature sensitivity experiments, the two proposed designs balance prediction accuracy, training robustness and computational efficiency, enabling better adaptability to weather-induced fluctuation of PV output than existing alternatives. Additionally, the study incorporates correlation analysis and feature exclusion experiments to quantify the contributions of meteorological and temporal variables to model performance. By integrating multi-station observations with interpretable feature analysis, the proposed framework not only evaluates forecasting accuracy but also provides insights into the meteorological factors governing photovoltaic energy production. Reliable forecasts enable more efficient integration of intermittent renewable energy resources into power grids, reduce operational uncertainties, and support the development of low-carbon and resilient energy systems. Therefore, the present study contributes to sustainable energy management and renewable energy deployment.
2. Dataset
Ilias et al. [
26] produced a dataset of two PV systems (Tartaruga and Zarco) located in Portuguese cities, comprising hourly measurements of PV output and corresponding weather records (
Table 2). The PV output originated from solar installations within a Portuguese energy community, while the weather data (
Figure 1) were sourced from both a local meteorological station (
https://www.wunderground.com/, assessed on 10 June 2026) and the Copernicus Atmosphere Data Store (
https://ads.atmosphere.copernicus.eu/, assessed on 10 June 2026). The input features selected for the forecasting models include generated energy (Produzida), year, month, day, timestamp, solar radiation, temperature, relative humidity, one-hot encoded representation of the month in the year (enabling the model to distinguish seasonal changes and generalize across different time periods), and sine/cosine transformations of the hour of the day (allowing the model to capture periodic trends in PV power generation effectively).
Following the standard protocol for time-series forecasting, the dataset was strictly partitioned into three disjoint subsets with ratios of 70%, 15%, and 15%, respectively: a training set (for model parameter optimization), a validation set (for hyperparameter tuning and overfitting monitoring during training), and an independent test set (completely held out from training and validation, used exclusively for final performance evaluation).
3. Methods
To clarify the motivation and workflow of the proposed approach,
Figure 2 illustrates the overall framework of the meteorology-driven photovoltaic energy forecasting system developed in this study. The framework integrates meteorological observations and historical PV output to construct predictive models using CNN-LSTM and TCN architectures. The design aims to capture both nonlinear meteorological influences and temporal dependencies in PV generation while maintaining model interpretability through correlation analysis and feature contribution experiments. ConvTempNet tests the efficacy of spatio-temporal feature extraction, while DilaTransNet tests the efficacy of attention mechanisms for long-term dependencies.
Compared to traditional forecasting techniques, ConvTempNet and DilaTransNet multiple key advantages. Firstly, their ability to be generalized from limited data makes them particularly well-suited for regions or systems with insufficient historical data [
27]. Secondly, their strong representation learning capability enables the models to capture complex nonlinear relationships from large datasets during the training process, thereby improving their adaptability and prediction accuracy under various meteorological conditions. Lastly, the hybrid nature of these models enables a more nuanced understanding of the multifaceted factors influencing PV power output, leading to more precise and reliable forecasts [
28,
29]. The hyperparameters used in this study are summarized in
Table 3. These parameters were determined through a combination of empirical settings and preliminary experiments to ensure stable training and reliable forecasting performance. Several candidate configurations were tested for key parameters such as the learning rate, batch size, number of convolutional filters, and hidden units. The final parameter values were selected based on their ability to achieve fast convergence and lower validation loss during model training. A total of 12 typical candidate configurations were tested for key hyperparameters including learning rate, batch size, filter number, and dropout rate. The final selection was based on the lowest validation loss and fastest convergence speed. Time-series cross-validation was adopted to avoid data leakage and ensure stable and generalizable parameter selection. Extensive sensitivity tests were conducted for sequence lengths ranging from 5 to 20. The length of 10 was selected because it achieved the optimal balance between prediction accuracy, model convergence, and computational cost. Longer sequences introduced redundant temporal information and increased training instability, while shorter sequences failed to capture sufficient hourly weather–PV generation dependencies.
The experiments were conducted using Python 3.10 with the PyTorch deep learning framework. Model training was performed on a workstation equipped with an NVIDIA RTX 4090 GPU (24 GB VRAM), and an AMD 16-core CPU, which enabled efficient handling of the deep learning computations. All experiments run on Python 3.10 and PyTorch 2.3.1 with an RTX 4090 GPU. ConvTempNet has fewer network parameters and requires around 1.2 min for full training; DilaTransNet with stacked Transformer blocks consumes roughly 2.1 min per training round. Benefiting from lightweight hybrid designs, both models are readily scalable to longer time sequences and multi-site datasets without excessive computational burden.
Outliers were removed by the 3σ rule and missing data were linearly interpolated. All inputs are normalized to [0, 1] via min-max scaling. Hyperparameters were preliminarily screened by grid search and further optimized by Artificial Lemming Algorithm and Crested Porcupine Optimizer. Early stopping and layered dropout were deployed to stabilize training convergence and suppress overfitting. To prevent overfitting and improve training stability, an early stopping strategy was employed during model training. Training was terminated when the validation loss did not improve for a predefined number of epochs, ensuring that the model maintained good generalization performance.
3.1. ConvTempNet
The convolutional operation
in the CNN component (
Figure 3) extracts spatial features from the input
x:
where
is the convolution kernel (or filter), and
is the bias term. The convolution operator
extracts local patterns by sliding the kernel over the input. The output of the convolution operation is passed through the ReLU activation function to introduce non-linearity:
The LSTM component (
Figure 3) is responsible for capturing temporal dependencies within the sequential data. It consists of the following gates controlling the flow of information.
The candidate cell state
is computed via the activation function
, weight matrix
, previous hidden state
, current input
, and bias term
:
Then, the cell state
is updated via the forget gate
and the input gate
:
Subsequently, the output gate calculates the final hidden state
through the sigmoid function
with weight matrix
and bias term
:
After the LSTM component processes the sequential data, the output is passed to two fully connected layers (
Figure 3) to produce the final predicted photovoltaic power
:
3.2. DilaTransNet
As a key component of the DilaTransNet, Temporal Block (
Figure 4) is designed to extract local features from time-series data by applying two consecutive 1D convolutional layers (
and
) to the input feature map
:
in which the dropout mechanism is incorporated to enhance generalization by randomly deactivating neurons, thereby mitigating overfitting. Each convolutional layer is followed by an ReLU activation function, introducing non-linearity to capture complex relationships in the data. The output from the block is then combined with a residual connection, where the original input is added to the convolutional output, preserving the input features while facilitating gradient flow during backpropagation. Bias terms,
and
, are included in the convolutional operations to further refine the output. Stacking multiple Temporal Block layers in the DilaTransNet progressively extracts higher-level features, enabling robust modeling of temporal dependencies in the data.
The Transformer Layer complements the Temporal Block by focusing on long-range dependencies within the data. At its core, the self-attention mechanism (
Figure 4) processes three matrices—
(Query),
(Key), and
(Value)—which are derived from the input sequence through linear transformations:
of which
is the dimension of the key vectors and used for scaling. This process allows the model to weigh the importance of different time steps in the input sequence, capturing intricate temporal patterns.
The feed-forward network (FFN,
Figure 4) refines the features further by applying the
activation to the input:
where
and
are weight matrices, and
and
are biases. The output of the Transformer Layer is fed into subsequent layers, forming a stack in the Transformer Encoder. Each layer builds upon the refined representation from the previous layer, enhancing the model’s ability to capture hierarchical relationships in time-series data. Overall, the Temporal Block and Transformer Layer modules enable the DilaTransNet to model both local and long-range dependencies effectively.
In this study, long-range dependencies denote long-span temporal correlations and periodic patterns within the hourly input sequence, rather than long-horizon forecasts over days or weeks. The DilaTransNet is constructed to capture such sequential patterns via dilated convolutions and Transformer. In the context of this hourly short-term PV forecasting task, “long-range temporal dependencies” denote the continuous sequential correlations and periodic patterns across multiple consecutive hours in the input time series, rather than long horizon predictions over days or weeks. This term describes the model’s ability to capture hourly scale persistent and periodic trends within the input window, not the length of the prediction horizon.
3.3. Evaluation Metrics
The models’ performance was evaluated via four metrics: root mean squared error (RMSE), mean absolute error (MAE), determination coefficient (R
2), and mean absolute percentage error (MAPE):
where
is the number of records,
represents the actual value of the
ith measurement,
marks the predicted value of the
ith measurement, and
denotes the mean of the actual values.
RMSE quantifies the average magnitude of the squared differences between predicted and observed values, emphasizing larger errors due to the squaring operation. MAE measures the average absolute differences, offering a straightforward interpretation of model error. The R2 metric assesses the proportion of variance in the observed data that is captured by the model, providing an indication of its explanatory power, with values closer to 1 signifying better performance. MAPE evaluates the relative prediction accuracy by calculating the percentage deviation between observed and predicted values, offering a scale-independent measure that facilitates comparison across datasets. Together, these metrics provide a comprehensive assessment of the models’ accuracy and reliability.
In addition to conventional evaluation metrics, the statistical significance of the correlation between predicted and observed photovoltaic energy outputs was assessed through the Pearson correlation test. The associated p-values were computed using a two-sided t-test.
4. Results and Discussion
4.1. Ablation Test
Figure 5 illustrates the impact of evolved dropout rates on evaluation metrics of ConvTempNet and DilaTransNet at two stations. Dropout, a widely recognized regularization technique, is applied to reduce overfitting by randomly deactivating neurons during training. However, the results reveal that while moderate dropout rates enhance model generalization, excessive dropout leads to significant performance degradation, impairing the models’ predictive capability. A dropout rate of 1.0 is set solely for boundary illustration and does not represent any applicable or recommended parameter for practical photovoltaic forecasting. It helps visualize the performance boundary when all neurons are disabled, clearly demonstrating the transition from effective regularization to severe underfitting. This extreme value provides a full reference for analyzing how dropout intensity affects model behavior.
For the ConvTempNet, moderate dropout rates of 0.2–0.4 yield relatively stable performance, with lower RMSE (MAE) of approximately 1.5 (0.7) kWh at Tartaruga and 2.6 (1.3) kWh at Zarco, while R
2 remains high above 0.9 and MAPE stays below 40%. However, as dropout rates increase beyond 0.6, RMSE (MAE) rises sharply, reaching approximately 6 (5) kWh at Tartaruga and 13 (10) kWh at Zarco for a dropout rate of 1.0. This is accompanied by a significant decline in R
2, dropping to −0.4, and MAPE increasing dramatically to about 300%, reflecting severe underfitting and a loss of predictive capability. Such trends align with findings that excessive rates hinder a model’s ability to capture complex temporal patterns [
30]. It should be noted that these values correspond to the intermediate sensitivity experiments. The final model configuration, obtained after hyperparameter tuning with an optimal dropout rate of 0.1 (
Table 3), achieves a lower RMSE of 1.16 kWh at Tartaruga.
The DilaTransNet exhibits slightly better resilience to higher dropout rates, as RMSE (MAE) remains low—around 1.5 (0.5) kWh at Tartaruga and 2.5 (1.2) kWh at Zarco—when the dropout rate is 0.2. R
2 remains above 0.9, and MAPE stays below 50%. However, as dropout increases, RMSE (R
2) rises (drops) significantly to the maximum (minimum) at both stations, reflecting the adverse effects of over-regularization. The observations reveal that DilaTransNet architectures benefits from moderate dropout rates to maintain predictive accuracy while excessive rates reduce the model’s ability to capture long-term dependencies critical for accurate forecasting [
22,
31].
Notably, the differences between Tartaruga and Zarco highlight the influence of site-specific factors, such as weather conditions and PV system configurations, on model performance. For instance, while Tartaruga maintains lower RMSE under moderate dropout, Zarco exhibits higher baseline errors, emphasizing the importance of tailoring dropout rates and model parameters to specific datasets for optimal forecasting accuracy [
32].
4.2. Model Performance
Figure 6 illustrates the evolution of training and validation loss over 50 epochs for ConvTempNet and DilaTransNet at two stations, which focuses on the model’s convergence behavior during training, while all final performance evaluations are conducted on a strictly held-out, independent test set that was not used for training or hyperparameter tuning. Both models show a steady decline in training loss during the initial epochs, reflecting effective learning and parameter optimization. By approximately the 10th epoch, the training loss flattens, indicating minimal further improvements with additional training. The validation loss, assessing generalization to unseen data, stabilizes within the first 10 epochs and remains consistently lower than the training loss throughout most of the training process, which highlights efficient convergence and robust generalization, with no indication of overfitting. At Zarco, the validation loss is slightly lower than at Tartaruga, potentially reflecting better prediction accuracy or less complex data variability, which raises the possibility that station-specific factors, such as environmental or data characteristics, may influence models’ ability to generalize across stations. Slight fluctuations in validation loss across epochs are observed in both models, possibly because of sensitivity to specific data features or minor adjustments during training. Despite the variations, the models achieve low final loss values, denoting reliable learning and strong predictive performance.
The scatter plots (
Figure 7) display the relationship between predicted and true PV energy outputs for ConvTempNet and DilaTransNet at two stations. For ConvTempNet, the results at Tartaruga demonstrate strong predictive accuracy, with most points tightly clustered around the diagonal line. The RMSE is low at 1.19 kWh, and the R
2 is 0.95, reflecting that the model effectively captures the variance in PV energy outputs. At Zarco, the ConvTempNet produces slightly higher errors, with an RMSE of 2.59 kWh. Despite these deviations, the overall performance remains reliable, as reflected by the high R
2 of 0.96. These patterns imply ConvTempNet model’s ability to generalize well across different stations [
21,
23,
33], although minor inaccuracies at Zarco may be attributed to greater variability in the data. DilaTransNet demonstrates slightly weaker performance compared to ConvTempNet. At Tartaruga and Zarco, the RMSE increases to 1.24 and 2.74 kWh, respectively, and the spread of points becomes more pronounced, highlighting greater prediction errors. The R
2 values, although still relatively high, fall short of those achieved by ConvTempNet, underscoring the DilaTransNet model’s reduced ability to capture finer details in the data, particularly at Zarco. Quantitative comparisons show that ConvTempNet achieves the fastest inference speed (2.3 ms per sample) and the strongest stability under cloudy and fluctuating radiation conditions. Under variable meteorological conditions, the RMSE fluctuation range is controlled within 0.15 kWh, indicating high robustness. For practical grid applications, this efficiency and stability support real time prediction and reliable power dispatching. Furthermore, the Pearson correlation tests indicate that the correlations between predicted and observed PV outputs are statistically significant for both stations (
p < 0.01), confirming the robustness of the forecasting results. Statistical comparison using pairwise
t-tests shows that the accuracy differences between ConvTempNet and DilaTransNet are statistically significant at the
p < 0.05 level across both stations. Although the numerical gaps in RMSE, MAE, and R
2 appear small, the consistent superiority of ConvTempNet is statistically confirmed. For real-world PV forecasting, this statistically significant advantage supports more stable power scheduling and more reliable grid operation.
The relatively high full-dataset MAPE ranging from 43% to 51% is a typical statistical artifact of MAPE when facing massive zero nighttime PV output, and all zero/near-zero samples are kept intact for error calculation in this study without filtering. Minimal absolute prediction errors for zero nighttime power trigger sharp growth in percentage error and raise the overall MAPE for all competing models. For objective performance evaluation unaffected by zero-value defects of MAPE, RMSE, MAE and coefficient of determination R2 serve as core complementary metrics, which are not mathematically sensitive to near-zero true values. Superior performance on these robust indicators confirms the proposed models’ reliable prediction precision for daytime effective power, ensuring practical usability for real photovoltaic operational scheduling despite the inflated aggregate MAPE.
To validate the superiority and innovation of the proposed architectures, comparative experiments with three widely used baseline models (CNN-BiLSTM, GRU, MLP) across two PV stations were added (
Figure 8). Compared with conventional CNN, GRU, and MLP models, CNN-BiLSTM combines spatial feature extraction and bidirectional temporal dependency learning, representing a stronger benchmark for photovoltaic forecasting. All baseline models exhibit significantly worse predictive performance than the proposed models. For Tartaruga, the baseline models have higher RMSE (1.18–2.00 kWh vs. ConvTempNet < 1.2 kWh), higher MAE, and lower prediction accuracy. For Zarco, the baseline models show even more severe performance degradation, with higher errors (MAE > 1.31 kWh vs. ConvTempNet < 1.2 kWh) and more scattered points deviating from the ideal diagonal fitting line, especially under large photovoltaic output conditions. These results fully verify that the proposed task-specific architectural reconfiguration, lightweight optimization, and complementary dual-model design deliver substantial performance gains over conventional unmodified single/hybrid network structures, demonstrating the innovation and effectiveness of the proposed ConvTempNet and DilaTransNet.
4.3. Effects of Hyperparameter Optimization
To further improve model performance, two meta-heuristic optimization algorithms—the Artificial Lemming Algorithm (ALA) and the Crested Porcupine Optimizer (CPO)—were introduced to optimize the key hyperparameters of ConvTempNet and DilaTransNet. Based on these strategies, four optimized models were constructed: ALA-ConvTempNet, ALA-DilaTransNet, CPO-ConvTempNet, and CPO-DilaTransNet. Additional experiments were conducted to evaluate the effectiveness of the optimization approaches at the Tartaruga and Zarco photovoltaic stations (
Figure 9 and
Figure 10).
For the Tartaruga station, ALA optimization improves the forecasting performance of both models. The RMSE, MAE, and MAPE of ConvTempNet decrease from 1.19 to 1.16, 0.66 to 0.60, and 43.10% to 33.18%, respectively. Meanwhile, the RMSE and MAE of DilaTransNet decrease from 1.24 to 1.19 and from 0.63 to 0.56, respectively, while the coefficient of determination (R2) increases from 0.94 to 0.95. For the Zarco station, the ALA-optimized models maintain high prediction accuracy despite minor changes in error metrics. In particular, ConvTempNet preserves a high coefficient of determination (R2 ≈ 0.95), demonstrating stable predictive capability across stations.
The influence of CPO optimization shows a similar trend. At Tartaruga, CPO improves the performance of DilaTransNet, with RMSE decreasing from 1.24 to 1.18 and MAE decreasing from 0.63 to 0.60, while R2 increases from 0.94 to 0.95. ConvTempNet exhibits only minor variations after CPO optimization, indicating that the baseline configuration already provides stable predictive performance. For Zarco, both models maintain strong predictive capability after CPO optimization, with R2 remaining close to 0.95. Although slight fluctuations occur in RMSE and MAE, the optimized models demonstrate robust forecasting performance across different meteorological conditions.
Although the RMSE reductions from ALA and CPO optimization are modest (approximately 0.03–0.05 kWh), these improvements are statistically significant at the p < 0.05 level based on pairwise t-tests. In real-world PV forecasting, such small but consistent error reductions effectively improve power scheduling stability, reduce reserve capacity costs, and enhance the reliability of grid-connected operation, making them practically meaningful for engineering applications.
4.4. Input Correlation and Sensitivity
Figure 11 highlights the impact of excluding individual input variables on evaluation metrics for ConvTempNet and DilaTransNet at the Tartaruga and Zarco stations, with ΔMetrics offering insights into the importance of each variable. Solar radiation (
R) emerges as the most critical variable at both stations. Its exclusion leads to a sharp increase in RMSE and MAE, particularly at Zarco, where RMSE rises by approximately 1.1 kWh and MAE by around 0.5 kWh for the ConvTempNet. This aligns with the correlation coefficient in
Figure 12, where
R exhibits a near-perfect positive correlation (~1.0) with PV energy output, confirming its dominant role in driving model accuracy [
34].
Relative humidity (
U) and temperature (
T) also play notable roles; their exclusion results in increases in RMSE and MAE on DilaTransNet model.
Figure 12 supports this observation, presenting a strong negative (positive) correlation between
U (
T) and PV output. This discloses that atmospheric effects captured by
U and
T are essential for improving predictive performance [
35], albeit secondary to
R. Interestingly, excluding
T leads to a decrease in RMSE and MAE for the ConvTempNet at Zarco. One possible explanation is including
T may overlap with the predictive information provided by other variables, such as
R or
U. This redundancy could result in the model overfitting to temperature-specific noise, leading to less accurate predictions. Removing
T likely forces the model to rely more heavily on variables with direct causal relationships to PV energy output, thereby improving performance. Excluding temperature at Zarco reduces RMSE and MAE by 0.12 kWh and 0.08 kWh, respectively, providing direct quantitative evidence of redundant information between temperature and radiation/humidity. This indicates that temperature provides limited unique information under the local meteorological conditions of Zarco, rather than qualitative speculation about overfitting. In practical PV modules, unabsorbed photons contribute to heat buildup and influence module efficiency and power output, which varies under different weather conditions and has been addressed in recent studies [
36] which propose efficiency recalibration through radiation control. While our model incorporates
T as an input feature, it does not explicitly account for thermal effects from unabsorbed radiation. Future extensions could integrate a module heat transfer model to enhance prediction accuracy in diverse environmental conditions.
The temporal variable hour (
H) has the most significant impact on the DilaTransNet, particularly at Zarco. Excluding
H causes RMSE to increase by nearly 10 kWh and MAE by approximately 9 kWh, reflecting the DilaTransNet model’s reliance on temporal patterns, although
Figure 12 displays a weak positive correlation (~0.25) with PV output. This contrast suggests that while
H may not directly drive PV energy generation, it serves as a critical contextual feature for capturing diurnal patterns and time-dependent dynamics essential on DilaTransNet’s performance [
37].
Month (M) has minimal effects on ΔMetrics for both models, which is consistent with its near-zero correlation coefficient, implying that M does not strongly influence PV output prediction when other inputs are introduced.
Combining these insights, the analysis confirms the dominant influence of R and H on model performance. ConvTempNet is more sensitive to meteorological variables like R and U, while DilaTransNet relies heavily on temporal inputs such as H. These observations underline the complementary strengths of the two architectures, as well as the importance of prioritizing key inputs to enhance prediction accuracy.
5. Conclusions
This study investigates the effectiveness of advanced deep learning architectures for PV power forecasting under varying meteorological conditions. Two proposed models, ConvTempNet and DilaTransNet, were developed to capture the complex nonlinear relationships between meteorological variables, temporal factors, and PV energy output. Their performance was systematically evaluated against three widely used baseline models to assess their predictive capabilities. The results indicate that ConvTempNet generally achieves superior forecasting accuracy due to its ability to jointly extract spatial and sequential features, while DilaTransNet demonstrates strong robustness in capturing long-range temporal dependencies through dilated convolutions and Transformer-based attention mechanisms.
In addition, heuristic optimization algorithms, including ALA and CPO, were introduced to optimize key hyperparameters of the proposed models. The experimental results demonstrate that these optimization strategies can further improve prediction accuracy and model stability, particularly for DilaTransNet at the Tartaruga station. Sensitivity analysis also highlights that solar radiation is the most influential predictor of PV energy output, while temperature, relative humidity, and temporal variables such as the hour of the day also contribute to forecasting performance by reflecting atmospheric conditions and diurnal generation patterns.
Despite the encouraging results, several limitations should be acknowledged. First, the analysis was conducted using data from two photovoltaic power plants, which may limit the generalizability of the conclusions to other climatic regions or PV system configurations. Detailed information regarding module technology, installed capacity, tilt angle, and azimuth angle was not reported in the original dataset publication and therefore could not be incorporated into the present analysis. Future research will collect multi-region PV datasets covering diverse climatic zones and various installed photovoltaic systems to further verify the universal applicability of ConvTempNet and DilaTransNet. Second, although advanced optimization algorithms were introduced to improve model performance, further studies could explore additional hybrid optimization strategies and larger datasets to enhance model robustness. Meanwhile, the complementary performance characteristics of the two proposed models summarized in our sensitivity analysis still provide referential guidance for short-term PV power forecasting under weather-driven conditions for other similar distributed PV stations. Future work could incorporate probabilistic forecasting frameworks, prediction intervals, or quantile-based approaches to better support operational decision-making. Although correlation and sensitivity analyses provide insights into the importance of individual predictors, advanced explainability techniques such as SHAP values and attention visualization may further improve the interpretability of deep learning forecasts. Future research could extend the framework to multi-horizon forecasting and assess model robustness under extreme weather scenarios, thereby enhancing its applicability to real-world grid operation and energy management systems.
Although the two experimental sites are located in Portugal with analogous coastal climatic conditions, the structural design of the proposed ConvTempNet and DilaTransNet follows a climate-agnostic feature extraction logic. The network takes globally universal meteorological and temporal covariates as input without site-specific empirical parameter tuning tailored to local Portuguese coastal weather patterns. Standardized preprocessing pipeline and embedded regularization strategies further enhance the robustness against distribution shifts arising from varied geographic and meteorological environments. While empirical cross-climate testing is unavailable presently due to limited multi-region measured data, the above design principles lay a theoretical foundation for subsequent cross-site transfer application. The proposed lightweight and robust forecasting frameworks provide reliable technical support for real-world photovoltaic energy management and smart grid operation. The stable short-term PV power prediction results help reduce the uncertainty of renewable power output, assist power departments in reasonable peak regulation and reserve capacity allocation, and lower grid operation costs. Furthermore, the optimized model structures and hyperparameter tuning strategies can be generalized to hourly photovoltaic forecasting tasks in similar regions, facilitating efficient and safe operation of distributed PV power systems. Consequently, the study contributes to ongoing efforts toward sustainable energy development, carbon-emission reduction, and the realization of global energy transition goals.