A Multi-Step-Ahead Photovoltaic Power Forecasting Approach Using One-Dimensional Convolutional Neural Networks and Transformer

: Due to environmental concerns about the use of fossil fuels, renewable energy, especially solar energy, is increasingly sought after for its ease of installation, cost-effectiveness, and versatile capacity. However, the variability in environmental factors poses a significant challenge to photovoltaic (PV) power generation forecasting, which is crucial for maintaining power system stability and economic efficiency. In this paper, a novel muti-step-ahead PV power generation forecasting model by integrating single-step and multi-step forecasts from various time resolutions was developed. One-dimensional convolutional neural network (CNN) layers were used for single-step forecasting to capture specific temporal patterns, with the transformer model improving multi-step forecasting by leveraging the combined outputs of the CNN. This combination can provide accurate and immediate forecasts as well as the ability to identify longer-term generation trends. Using the DKASC-ASA-1A and 1B datasets for empirical validation, several preprocessing methods were applied and a series of experiments were conducted to compare the performance of the model with other widely used deep learning models. The framework proved to be capable of accurately predicting multi-step-ahead PV power generation at multiple time resolutions


Introduction
Interest in renewable energy generation technologies is growing as concerns about climate change and fossil fuel depletion increase [1].Among these methods, photovoltaic (PV) energy is widely recognized as clean, abundant, and economically beneficial [2].Because of these advantages, PV power generation systems are becoming increasingly popular among renewable energy sources [3].However, the main drawback of PV power generation systems is their unpredictable and intermittent nature [4].Since PV power is highly dependent on solar irradiance, variations in the availability of solar energy have the potential to seriously affect the ability of the grid to operate consistently and economically.By using accurate PV power generation forecasting technology that can accurately capture changes in PV generation, these problems can be mitigated by reducing the uncertainty in dynamic power generation [5].
The two primary approaches used in the past to develop prediction models for PV power generation have been physical and statistical methods [6,7].The physical approaches utilize mathematical calculations to define the physical state of dynamic behavior and meteorological conditions.When the weather is stable, these approaches work well, but in dynamic weather conditions, the prediction accuracy can be compromised.The relationship between meteorological factors and PV power generation data is analyzed by statistical approaches, which are developed through the analysis of historical parameters and prediction models [8].The predictive accuracy of these approaches depends on the forecast horizon and input data.Regression, autoregressive, and other techniques are widely used [9,10].Because the input parameters are continuously improved and past Electronics 2024, 13, 2007 2 of 17 data are considered during training, these approaches can provide reliable performance.Their predictive accuracy is greater for linear data, but they face challenges with nonlinear complex data.Other limitations include the need for sufficient explanatory parameters for the prediction model and the requirement that the autoregressive integrated moving average (ARIMA) model remains stationary.
Recently, prediction solutions for PV power generation based on traditional machine learning, which considers external factors, have been developed [11][12][13].The challenge of performance degradation caused by nonlinearity between input and output, a limitation seen in statistical and physical approaches, is being addressed by machine learning-based prediction models.Machine learning techniques, such as support vector regression (SVR), decision tree (DT), and extreme learning machine (ELM), are commonly used to build prediction models [14][15][16].However, unbalanced PV power generation patterns under varying weather conditions can negatively impact the effectiveness of machine learningbased models.These methods also have certain drawbacks, such as poor performance with increasing numbers of data and prediction steps, complexity due to numerous features, and a heavy reliance on feature engineering, which increases the size and complexity of the model.It has been suggested that the instability of these models may render them less suitable for decision-making, challenging their suitability for PV power generation forecasting [17].
An improved statistical approach based on neural networks (NNs) was proposed by Chen et al. [18] for 24 h solar power prediction.Almonacid et al. [19] proposed an artificial NN (ANN)-based method to predict PV power output one hour in advance.They employed an ANN to predict air temperature and solar irradiance, which was then fed into another ANN to predict PV power.To predict PV power, Vaz et al. [20] used ANNs, taking measurements from nearby PV systems as inputs along with meteorological data.They noted that model performance decreased, and the root mean square error (RMSE) increased to 24% when the forecasting time horizon was extended.For a day-ahead forecast of solar PV power, Yang et al. [21] integrated NNs with a pattern sequence identification approach.This method relies on developing a separate prediction model for each type of pattern sequence.However, due to its dependence on the extracted attributes, the model is optimized for day-ahead prediction.To predict PV power generation, Wang et al. [22] evaluated three deep learning (DL) techniques and provided insights for selecting the most effective network for real-world applications.They concluded that the prediction of PV power generation benefits significantly from DL techniques.
For nonlinear data, DL-based models often perform well, but the volume and complexity of the input data significantly influence prediction performance.When categorized by prediction horizon, PV power generation predictions can be divided into one-step and multi-step predictions.A multi-step prediction is a prediction for several points in the future or a long sequence, while a one-step prediction is for only one point in the future.Contrary to previous statements, PV power generation patterns may not always be predicted more accurately using multi-step prediction than single-step prediction.This is because, regardless of the technique, predictions tend to face the problem of increasing errors as the number of prediction steps increases [23].Weather forecasting is a critical component for prediction because various weather elements affect PV power output.However, relying solely on DL techniques such as convolutional neural networks (CNNs), long short-term memory (LSTM), and gated recurrent units (GRUs) for predictions is insufficient due to the dynamic nature of PV time series data.
To address this, in this research, a multi-time-forecast horizon PV power generation forecasting approach is presented.This approach improves PV power generation estimates by merging forecasts from multiple forecast horizons.To enhance the effectiveness of multistep PV power generation forecasting, this multi-time resolution forecasting model unites single-step and multi-step forecasting.Multi-step forecasting is adept at identifying trends in long sequences, while single-step forecasting precisely predicts the amount of electricity generated at a specific time.By integrating single-step forecasting over the same physical time horizon with low temporal resolution with multi-step forecasting, which forecasts many points with high temporal precision, the aim is to enhance the performance of multistep forecasting.This approach aims to provide accurate estimates at each step in addition to revealing extended sequences of PV power generation patterns.To validate the approach, historical weather, solar radiation, and PV power generation data are collected and the models are tested using datasets relevant to the objectives of each model.Investigations include data exploration, comparisons with existing models, and further research.
The rest of this paper is organized as follows: In Section 2, reviews each model in the proposed framework.In Section 3, reports the results of the experiment, and in Section 4, this study is concluded.

Materials and Methods
This section presents a multi-time resolution PV power generation forecasting model based on a one-dimensional CNN (1D-CNN) and transformer.Using two different time resolutions over the same horizon, single-step and multi-step forecasts are generated and combined.The process starts with an overview of data acquisition and preprocessing.Following this, the single-step and multi-step forecasting models are explored and how the two forecasts are combined is detailed.

Data Acquisition and Preprocessing
First, the DKASC-ASA-1A and 1B datasets were collected; they are publicly available datasets and comprise detailed records from two PV power plants initiated in 2009 [24].Dataset 1A relates to a plant with a 10.5 kW capacity, while dataset 1B relates to a plant with a 23.4 kW capacity.Both employ mono-Si technology with dual-axis tracking.The data for these datasets are meticulously logged at 5 min intervals, capturing a wide range of measurements.These measurements include environmental timestamps and various weather conditions, as well as numerous internal operational factors.Data from 1 January 2016 to 31 December 2023 was examined.Data from the last two years were intentionally excluded from the analysis to ensure that the proposed model would accurately predict recent trends and situations.The strategic use of the most recent information for validation emphasizes the researcher's efforts to create models that are relevant and timely for current and future applications by presenting models that are grounded in historical data but are also capable of predicting future conditions.
From 1 January 2016 to 31 December 2023, the DKASC-ASA-1A and 1B datasets carefully analyzed, and it was found that the missing data were distributed across a number of variables, each representing a different percentage of the total data points.For Active Energy Delivered Received (AEDR), Current Phase Average (CPA), and Active Power (AP), both datasets were missing approximately 1.15%, corresponding to 9669 and 9665 data points, respectively.Wind Speed (WS) had the largest data deficit, with an astounding 77.4% of the dataset missing, or 651,586 points.The Weather Temperature Celsius (WTS) and Weather Relative Humidity (WRH) datasets were missing about 1.15% or 9675 data points each, a figure that also applies to Global Horizontal Radiation (GHR) and Diffuse Horizontal Radiation (DHR) with 9676 missing entries each.Wind Direction (WD) and Weather Daily Rainfall (WDR) each had about 1.15% of the data missing, corresponding to 9673 and 9670 data points, respectively.Notably, Radiation Global Tilted (RGT) was missing 11.24% or 94,603 data points, while Radiation Diffuse Tilted (RDT) had a deficit of about 9.27%, corresponding to 78,003 data points.
The timestamp information was converted from date and time to sine and cosine values, as shown in Algorithm 1, to preserve the cyclical nature of time [25].This coding approach allows for the model to effectively capture the periodicity inherent in days and minutes, which is critical for recognizing patterns and making accurate predictions over time.
Any erroneous records with negative values for CPA and AP were corrected during the data cleansing process by setting them to zero.This was a critical step in maintaining the integrity of the dataset, particularly for these attributes where negative values are not possible.It was recognized that generation capacity and metrics such as weather conditions, RGT, and RDT inherently have positive qualities and therefore cannot be negative.The outliers were identified using the process outlined in Algorithm 2 and were initially left blank to accurately reflect their status as anomalies pending further treatment.In this instance, the WDR attribute was not subjected to the outlier identification process outlined in Algorithm 2, as its values primarily reflect zero rainfall.Calculate the first and third quartiles and the interquartile range (IQR).

5:
Define cutoffs for outliers as 1.5 × IQR below the first quartile and above the third quartile.

6:
For each data point in the variable do 7: If the data point is below the lower cutoff or above the upper cutoff, then 8: Replace the data point with NA. 9: If the data point is below the 10th percentile, then 10: Replace the data point with NA. 11: EndFor 12: EndFor 13: Return the modified data frame 'total_data'.
In order to effectively manage data quality, it was determined that the variable WS, which contained a significant number of missing entries, should be removed.For all other variables, except RGT and RDT, linear interpolation was used to impute missing values.For the variables RGT and RDT, a missing value imputation model was constructed to accurately impute their missing values [26].In addition, the variable AEDR was removed from the dataset to ensure clarity and focus in the analysis.Furthermore, to prevent data leakage, the values for CPA, RGT, and RDT were replaced with data from one hour earlier.This approach ensures the integrity of the predictive modeling process.Table 1 presents the statistical analysis of each attribute from the DKASC-ASA-1A and 1B datasets after preprocessing has been completed.In this study, the dataset was prepared by dividing the available data into a training set of six years and a test set of the last two years, representing a 75:25 split.After imputing missing values, data normalization techniques were applied.Specifically, each variable (except the dependent variable AP) in the training set was normalized to a range between 0 and 1 using the min-max normalization technique, as described in Equation ( 1): The normalization parameters determined from the training set were then applied to the test set to ensure consistency in data processing between the two sets.This approach allows for a valid comparison between the proposed model and alternative models, thereby improving the accuracy of predictions made one hour in advance.

PV Power Prediction Model 2.2.1. One-Dimensional Convolution
One method for learning local trend features is 1D convolution.To extract features from local input patches, 1D convolution uses a convolution technique that promotes data efficiency and representation modularity [27].However, it is important to clarify that while 1D convolution is well suited for sequence processing, its application in computer vision typically involves 2D or 3D convolution due to the spatial nature of image data [28].By reducing multivariate time series processing parameters, the local awareness and weight sharing capabilities of 1D convolution can increase learning efficiency.This is because each subsequence undergoes the same input modification.Temporal translation invariance [29] allows for a pattern learned at one point in a sequence to be recognized at a different location later.
Sequence fragments within the window size can be learned by processing the input data using the convolution window of the convolution layer.Figure 1 illustrates the process of 1D convolution.The 1D input is processed through the 1D convolution layer to produce the 1D output.This approach is effective because it allows for the detection of these subsequences anywhere in the entire time series, capturing the local trend change aspects of the multivariate time series over time.The one-dimensional input time series is effectively analyzed in this way.

Transformer Architecture
The LSTM model's gradient vanishing and gradient explosion issues, as well as longdistance dependencies, may all be successfully mitigated by adding an attention mechanism.However, it does not fully address the aforementioned problems.In addition, problems such as low efficiency and excessive computational complexity may persist.To more effectively address these challenges, Vaswani et al. [30] presented the transformer model in 2017 that eschews the LSTM in favor of an all-encompassing attention mechanism.Numerous studies have demonstrated that when it comes to capturing long-term dependencies, transformer models generally outperform LSTM models [31].Additionally, they have been shown to provide better prediction accuracy in fields such as speech recognition and machine translation when compared to LSTM [32].
The transformer model is critical for understanding relationships within sequences of data, significantly improving the ability to process data over recurrent models.The transformer model is composed of six layers of encoders and decoders, as depicted in Figure 2. The encoder layers work to process input data through self-attention mechanisms that allow for the model to weigh the importance of different parts of the input data.Each encoder layer comprises a feedforward layer and a multi-head attention layer.The decoder includes three sub-layers: the feedforward layer, the multi-head attention layer, and the masked multi-head attention layer.These layers in the decoder work to generate predictions by focusing on relevant parts of the encoded input as well as previously generated outputs.After every sublayer, a residual connection and layer normalization are applied.This helps the model maintain stability and speeds up training by avoiding the vanishing gradient problem.The multi-head attention method splits the three parameters (query, key, and value) into multiple parts, yet the total parameter count remains constant.This division into multiple "heads" allows for the model to simultaneously attend to information from different representational subspaces at different locations.The attention weight is then computed by mapping each set of partitioned parameters to a distinct subspace of the high-dimensional space, enabling focused analysis on different input components.This parallel processing is key to the efficiency of the model and its ability to handle complex patterns in the data.Finally, the attention information from all subspaces is aggregated after several parallel computations.

Transformer Architecture
The LSTM model's gradient vanishing and gradient explosion issues, as well as long-distance dependencies, may all be successfully mitigated by adding an attention mechanism.However, it does not fully address the aforementioned problems.In addition, problems such as low efficiency and excessive computational complexity may persist.To more effectively address these challenges, Vaswani et al. [30] presented the transformer model in 2017 that eschews the LSTM in favor of an all-encompassing attention mechanism.Numerous studies have demonstrated that when it comes to capturing long-term dependencies, transformer models generally outperform LSTM models [31].Additionally, they have been shown to provide better prediction accuracy in fields such as speech recognition and machine translation when compared to LSTM [32].
The transformer model is critical for understanding relationships within sequences of data, significantly improving the ability to process data over recurrent models.The transformer model is composed of six layers of encoders and decoders, as depicted in Figure 2. The encoder layers work to process input data through self-attention mechanisms that allow for the model to weigh the importance of different parts of the input data.Each encoder layer comprises a feedforward layer and a multi-head attention layer.The decoder includes three sub-layers: the feedforward layer, the multi-head attention layer, and the masked multi-head attention layer.These layers in the decoder work to generate predictions by focusing on relevant parts of the encoded input as well as previously generated outputs.After every sublayer, a residual connection and layer normalization are applied.This helps the model maintain stability and speeds up training by avoiding the vanishing gradient problem.The multi-head attention method splits the three parameters (query, key, and value) into multiple parts, yet the total parameter count remains constant.This division into multiple "heads" allows for the model to simultaneously attend to information from different representational subspaces at different locations.The attention weight is then computed by mapping each set of partitioned parameters to a distinct subspace of the high-dimensional space, enabling focused analysis on different input components.This parallel processing is key to the efficiency of the model and its ability to handle complex patterns in the data.Finally, the attention information from all subspaces is aggregated after several parallel computations.
Multi-head attention is designed to identify connections between different types of input data, encoding multiple relationships and nuances as attention is allocated differently across different subspaces.This aspect endows the transformer model with significant functionality.Multiple independent heads focus on different aspects of the data (such as global and local contexts) to extract richer and more nuanced features.Multi-head attention is designed to identify connections between different types of input data, encoding multiple relationships and nuances as attention is allocated differently across different subspaces.This aspect endows the transformer model with significant functionality.Multiple independent heads focus on different aspects of the data (such as global and local contexts) to extract richer and more nuanced features.

Multi-Resolution PV Prediction
A model with a 1 h temporal resolution for one-step PV power generation prediction was built, ensuring that only nondependent variables were used to prevent data leakage.This approach minimized the risk of incorporating future information that could invalidate the model's predictions.This model was based on a 1D-CNN and incorporated timestamps (i.e., DateSine, DateCosine, TimeSine, and TimeCosine), weather conditions (i.e., WTC, WRH, GHR, DHR, WD, and WDR), and historical data (i.e., CPAH−1, RGTH−1, and RDTH−1) from both the DKASC-ASA-1A and 1B datasets.These inputs were carefully selected based on their relevance and independence from future data.To adapt the proposed model to practical time frames, the 5 min interval data were aggregated into hourly values.Specifically, cumulative values were used for AP and CPAH−1, summing 12 data points (e.g., from 10:05 a.m. to 11:00 a.m.).In contrast, the remaining variables-timestamp, weather information, RGTH−1, and RDTH−1-were adjusted to use the corresponding values

Multi-Resolution PV Prediction
A model with a 1 h temporal resolution for one-step PV power generation prediction was built, ensuring that only nondependent variables were used to prevent data leakage.This approach minimized the risk of incorporating future information that could invalidate the model's predictions.This model was based on a 1D-CNN and incorporated timestamps (i.e., Date Sine , Date Cosine , Time Sine , and Time Cosine ), weather conditions (i.e., WTC, WRH, GHR, DHR, WD, and WDR), and historical data (i.e., CPA H−1 , RGT H−1 , and RDT H−1 ) from both the DKASC-ASA-1A and 1B datasets.These inputs were carefully selected based on their relevance and independence from future data.To adapt the proposed model to practical time frames, the 5 min interval data were aggregated into hourly values.Specifically, cumulative values were used for AP and CPA H−1 , summing 12 data points (e.g., from 10:05 a.m. to 11:00 a.m.).In contrast, the remaining variables-timestamp, weather information, RGT H−1 , and RDT H−1 -were adjusted to use the corresponding values at the hour mark (e.g., 11:00 a.m.).This approach ensured that the forecast was based solely on the information available at the time of the forecast, thereby maintaining the integrity of the forecast.The 1D-CNN model processed this information to produce a prediction output, which was referred to as PV h+1 , ensuring the reproducibility and reliability of the prediction process.The 1D-CNN acted as a feature extractor, transforming the input data into a form suitable for further processing by the transform encoder.
The integration of single-step and multi-step forecasting was a novel approach that improved the accuracy of the model by combining short-term precision with long-term contextual understanding.Single-step and multi-step PV power generation forecasting were integrated to build a multi-time resolution forecasting model.This multi-time resolution approach provided a detailed forecast at different future intervals, providing a more granular understanding of expected power generation.The one-step forecast used a one-hour forecast horizon, while the multi-step forecast used a 5 min time resolution over a 1 h time horizon.The 5 min resolution was particularly important for applications where short-term variability was critical.The last two hours of data were sent to the multi-step forecaster.A 1D convolutional layer was used to extract features from the data and reduce dimensionality, as the input sequence length of the encoder could be extensive.Dimensionality reduction was an essential step in managing the complexity and computational load of the model.It was then subjected to positional encoding to represent temporal features.Position encoding was essential for the model to capture sequence order, a critical aspect for time series prediction.Figure 3 shows the general layout of the proposed model.
at the hour mark (e.g., 11:00 a.m.).This approach ensured that the forecast was based solely on the information available at the time of the forecast, thereby maintaining the integrity of the forecast.The 1D-CNN model processed this information to produce a prediction output, which was referred to as PVh+1, ensuring the reproducibility and reliability of the prediction process.The 1D-CNN acted as a feature extractor, transforming the input data into a form suitable for further processing by the transform encoder.
The integration of single-step and multi-step forecasting was a novel approach that improved the accuracy of the model by combining short-term precision with long-term contextual understanding.Single-step and multi-step PV power generation forecasting were integrated to build a multi-time resolution forecasting model.This multi-time resolution approach provided a detailed forecast at different future intervals, providing a more granular understanding of expected power generation.The one-step forecast used a one-hour forecast horizon, while the multi-step forecast used a 5 min time resolution over a 1 h time horizon.The 5 min resolution was particularly important for applications where short-term variability was critical.The last two hours of data were sent to the multistep forecaster.A 1D convolutional layer was used to extract features from the data and reduce dimensionality, as the input sequence length of the encoder could be extensive.Dimensionality reduction was an essential step in managing the complexity and computational load of the model.It was then subjected to positional encoding to represent temporal features.Position encoding was essential for the model to capture sequence order, a critical aspect for time series prediction.Figure 3 shows the general layout of the proposed model.The first multi-step prediction was then performed using a transformer encoder.The transformer encoder was able to capture complex dependencies across different time  The first multi-step prediction was then performed using a transformer encoder.The transformer encoder was able to capture complex dependencies across different time steps.To merge the two predictions, a combination of single-step prediction and a weighted partitioning on the encoder outputs was used.The weighting process was designed to balance the contribution of each time step to the final prediction.The decoder received the weighted partition values as input, and the two predictions were integrated to enhance the multi-step prediction.Integration was a form of ensemble learning that improved the robustness and accuracy of the prediction.The weighted division value was calculated using Equation (2), where EO represents the one-step prediction value at the i th point, which is used to compute the contribution of each encoder output to the final prediction: The weighted division normalizes the influence of each encoder output, ensuring that no single step has a disproportionate influence on the multi-step prediction.By concatenating the weighted division values and the weather data and then feeding them into the decoder, a more accurate multi-step prediction was achieved.This process effectively combined the extracted features with external data to refine the final output.

Experimental Results
This section outlines the assessment measures and a comparative study of several models.The technical resources utilized in this research included a GeForce RTX 2080 Super GPU (NVIDIA, Santa Clara, CA, USA), a Core i7 processor (Intel, Santa Clara, CA, USA), Windows 10 (Microsoft, Redmond, WA, USA), Python 3.9 (Python Software Foundation, Wilmington, DE, USA), and the PyTorch DL framework (Facebook, Menlo Park, CA, USA).To begin, the outcomes of the proposed model are contrasted with the actual solar PV power generation data from the DKASC database.This comparison is conducted with a focus on the forecasted power versus the actual power for the specified time interval.Secondly, the proposed model is contrasted with the most recent forecasting models and the findings of an ablation study are presented.
The mean absolute error (MAE) and RMSE measures are used to evaluate each model's performance.These are the general metrics used in the forecasting model evaluation procedures used by the baseline.The MAE (Equation ( 3)) is the average absolute difference between predicted and actual values.The RMSE (Equation ( 4)) calculates the average square root difference between actual and predicted model values.
Here, y i refers to the actual values observed; yhat i signifies the predicted values by the model; n is the total number of observations used in the evaluation.
Upon evaluation, the MAE and RMSE metrics were calculated using only non-zero actual values of solar PV power generation to assess the performance of the proposed model in comparison to other forecasting models.This method ensures that the performance metrics reflect the accuracy of predictions where actual generation occurred, providing a focused analysis of model effectiveness in practical scenarios.

Comparative Analysis of Hourly PV Power Generation Forecasting
To ensure that the proposed model can accurately predict recent trends and situations, the data were divided into two sets: the training set, which included data from 2016 to 2021, and the test set, which included data from 2022 to 2023.Some preliminary experiments were then conducted to determine whether the 1D-CNN model was the most effective for forecasting hour-ahead hourly PV power generation.
Figure 4 presents a comparative analysis of the actual hourly solar PV power generation against the 1D-CNN model's predictions for the DKASC-ASA-1A and 1B datasets.The dataset encompasses the pivotal summertime period in the Southern Hemisphere, specifically from 25 December 2022 to 7 January 2023.This period falls within the testing phase and coincides with the austral summer in Australia, resulting in elevated solar intensity and power generation.
The dataset encompasses the pivotal summertime period in the Southern Hemisphere, specifically from 25 December 2022 to 7 January 2023.This period falls within the testing phase and coincides with the austral summer in Australia, resulting in elevated solar intensity and power generation.The figure displays the prediction performance of the 1D-CNN model over multiple days, demonstrating the potential for reliable hour-ahead hourly solar PV power generation forecasting.Despite the inherent challenges posed by the irregular weather patterns characteristic of the summer months, this comparison demonstrates the 1D-CNN model's ability to adapt to the high energy output typically observed in Australia's hot summers.
In order to facilitate a performance comparison between the DL and 1D-CNN models, a number of different NN architectures were considered, including deep neural network (DNN), LSTM, bidirectional LSTM (Bi-LSTM), GRU, and bidirectional GRU (Bi-GRU).The hyperparameter settings of the DL models are as follows: the activation function was set to Leaky rectified linear unit (LeakyReLU), the number of feedforward neurons was set to 128, and two hidden layers were selected.Adam was used as the optimization algorithm.The learning rate was set to 0.0001, and the batch size was set to 24.
Table 2 compares the prediction performance of DL models for hour-ahead hourly PV power generation forecasting.The results indicate that the 1D-CNN model exhibits the lowest values for both MAE and RMSE in both datasets.Therefore, it can be concluded that the 1D-CNN model provides more accurate predictions than the other forecasting The figure displays the prediction performance of the 1D-CNN model over multiple days, demonstrating the potential for reliable hour-ahead hourly solar PV power generation forecasting.Despite the inherent challenges posed by the irregular weather patterns characteristic of the summer months, this comparison demonstrates the 1D-CNN model's ability to adapt to the high energy output typically observed in Australia's hot summers.
In order to facilitate a performance comparison between the DL and 1D-CNN models, a number of different NN architectures were considered, including deep neural network (DNN), LSTM, bidirectional LSTM (Bi-LSTM), GRU, and bidirectional GRU (Bi-GRU).The hyperparameter settings of the DL models are as follows: the activation function was set to Leaky rectified linear unit (LeakyReLU), the number of feedforward neurons was set to 128, and two hidden layers were selected.Adam was used as the optimization algorithm.The learning rate was set to 0.0001, and the batch size was set to 24.
Table 2 compares the prediction performance of DL models for hour-ahead hourly PV power generation forecasting.The results indicate that the 1D-CNN model exhibits the lowest values for both MAE and RMSE in both datasets.Therefore, it can be concluded that the 1D-CNN model provides more accurate predictions than the other forecasting models.This evidence supports the claim that the 1D-CNN architecture is effective for accurate hourly solar PV power generation forecasting.

Comprehensive Ablation Study and Performance Analysis
Figure 5 presents the actual 5min PV power generation in comparison to the proposed PV power generation forecasting model predictions for the DKASC-ASA-1A and 1B datasets.In contrast to the hourly forecasts in Figure 4, Figure 5 shows the 5 min interval forecasts from 30 December 2022 to 2 January 2023.The proposed model includes forecasts from one step ahead (5 min) up to twelve steps ahead (one hour).This figure illustrates the prediction performance of different forecast intervals on actual PV power generation during the pivotal summer period in the Southern Hemisphere.models.This evidence supports the claim that the 1D-CNN architecture is effective for accurate hourly solar PV power generation forecasting.

Comprehensive Ablation Study and Performance Analysis
Figure 5 presents the actual 5min PV power generation in comparison to the proposed PV power generation forecasting model predictions for the DKASC-ASA-1A and 1B datasets.In contrast to the hourly forecasts in Figure 4, Figure 5 shows the 5 min interval forecasts from 30 December 2022 to 2 January 2023.The proposed model includes forecasts from one step ahead (5 min) up to twelve steps ahead (one hour).This figure illustrates the prediction performance of different forecast intervals on actual PV power generation during the pivotal summer period in the Southern Hemisphere.The proposed model captured the distinct patterns of solar PV power generation throughout this period, maintaining a high degree of accuracy despite the challenging The proposed model captured the distinct patterns of solar PV power generation throughout this period, maintaining a high degree of accuracy despite the challenging fluctuations due to varying solar intensities.The predictions align closely with the actual data, demonstrating the model's ability to reliably forecast both the sharp rises and declines characteristic of the summer season.By capturing the day-to-day peaks and troughs as well as the fine fluctuations at 5 min intervals, the model offers a comprehensive insight into short-term solar PV power generation.
To demonstrate the effectiveness of integrating both transformer and 1D-CNN models with single-step forecasting, a comprehensive ablation study was conducted, detailed in Table 3.This table presents three distinct model configurations, which elucidate the rationale behind the chosen approach.Case 1 is a 5 min interval PV power generation forecasting model based on only 1D-CNN.This case can serve as a baseline to ascertain the predictive capabilities of a purely convolutional approach.Case 2 is a model combining transformer and 1D-CNN architectures without single-step forecasting.This case allows for the examination of the impact of not including that crucial feature.Case 3 is the proposed model, which integrates both transformer and 1D-CNN architectures while enhancing prediction performance by including single-step forecasting.This ablation study was essential for the identification of the most effective components and configurations for accurate PV power generation forecasting.To ascertain the potential advantages of the proposed model, a comprehensive evaluation was conducted, as detailed in Tables 4 and 5.The results from Table 4 (DKASC-ASA-1A dataset) and Table 5 (DKASC-ASA-1B dataset) show that the proposed model (Case 3) consistently outperformed the transformer-only and 1D-CNN-only models in terms of both MAE and RMSE across a multi-step-ahead PV power generation forecasting.
In the DKASC-ASA-1A dataset (Table 4), the proposed model achieved an average MAE of 1.243 and RMSE of 1.813, significantly outperforming Case 1 and Case 2. This trend was consistent across all forecast steps, demonstrating the proposed model's robustness in delivering accurate predictions even with varying forecast horizons.
In the DKASC-ASA-1B dataset (Table 5), the proposed model also demonstrated superior performance, with an average MAE of 0.770 and RMSE of 1.125.The performance gap widened notably at higher forecasting steps, with the proposed model providing a substantial improvement over Case 1 and Case 2. The statistical significance of the proposed model's superiority was validated through the Friedman test, which was performed separately for each dataset and evaluation metric.The Friedman test is a nonparametric statistical test employed to identify discrepancies in performance across a range of models or conditions [33].In the DKASC-ASA-1A dataset, the p-value for MAE was 1.54 × 10 −5 , and the RMSE p-value was 6.14 × 10 −6 .In the DKASC-ASA-1B dataset, the p-value for MAE was 8.84 × 10 −5 , and for RMSE, it was also 8.84 × 10 −5 .These results demonstrate the robustness of the proposed model in terms of predictive accuracy, particularly with the combined transformer and 1D-CNN architecture when integrating single-step prediction.

Model Performance: Comparison with the State of the Art
A comparison was conducted between the prediction performance of the proposed model and that of recent state-of-the-art DL models, including transformer [30], CNN-LSTM [22], LSTM-CNN [34], GRU-CNN [35], and echo state network (ESN)-CNN [36].To ensure the reliability of the results, a comprehensive review of the original papers was conducted to replicate the DL architectures in their original form and adapt them to the experimental environment.The hybrid DL approaches-CNN-LSTM, LSTM-CNN, GRU-CNN, and ESN-CNN-employ feature extraction at the front end and prediction at the back end.Previous studies have demonstrated that these models outperform single DL models in terms of prediction performance when used with similar datasets.Consequently, these models were selected as benchmark models to validate the effectiveness of the proposed model, given their proven capability in providing accurate predictions.
Furthermore, post-processed datasets and identical hyperparameters were employed in all experiments to guarantee a fair comparison.To ensure consistency, the training set (2016-2021) and the test set (2022-2023) were aligned.Predictions for PV power generation were made at 5 min intervals, ranging from 5 min to one hour ahead (12-step prediction).This approach ensured that each model was implemented consistently and evaluated under the same conditions.Consequently, accurate and reliable comparative results can be provided.The experimental results are presented in Tables 6 and 7.
In Table 6, the proposed model achieved an average MAE of 1.243 and RMSE of 1.789, demonstrating an advanced predictive capability in terms of RMSE, where it consistently outperformed all other forecasting models, including transformer, CNN-LSTM [22], GRU-CNN [35], LSTM-CNN [34], and ESN-CNN [36].However, in terms of MAE, the proposed model was slightly less accurate than ESN-CNN (MAE 1.241), but still showed an improvement over the remaining models.These results highlight the robustness of this approach, especially in minimizing large errors captured by the RMSE.In Table 7, the proposed model achieved the lowest MAE (0.770) and RMSE (1.125), significantly outperforming the transformer and other benchmark models in both metrics.The MAE was significantly lower than that of ESN-CNN (0.790), the second best performing model, highlighting the predictive accuracy of the proposed model.Similarly, the RMSE showed a significant improvement and was consistently lower than all other forecasting models.In order to ascertain the statistical significance of the proposed model's performance, the Wilcoxon signed-rank test, a nonparametric statistical method, was employed [37].This test compares two related groups in order to determine whether one group exhibits systematically higher values than the other.The Wilcoxon signed-rank test offers a robust means of evaluating prediction performance by analyzing the rankings of individual scores rather than relying on assumptions about data distribution.The procedure entails ranking the differences between pairs of related observations in ascending order.After assigning ranks, the sum of ranks for positive and negative differences is calculated, which is denoted as R+ and R−.Then, the test statistic W is computed as the smaller of these two sums, expressed as Equation (5).W = min(R+, R−). ( The p-value can then be obtained by comparing W to its distribution under the null hypothesis, which assumes no significant difference between the two groups.
Table 8 presents the results of the Wilcoxon signed-rank test, which uses 12-step MAE and RMSE values to assess the performance of various models across the DKASC-ASA-1A and DKASC-ASA-1B datasets.This table compares the proposed model with recent state-of-the-art DL models.[22] 0.00049 0.00049 0.00049 0.00049 LSTM-CNN [34] 0.04255 0.03418 0.00049 0.00049 GRU-CNN [35] 0.01221 0.00049 0.00049 0.00049 ESN-CNN [36] 0.33936 0.38037 0.00049 0.00049 The results indicate that the proposed model exhibits statistically significant improvements over several benchmark models, including CNN-LSTM [22], LSTM-CNN [34], and GRU-CNN [35], with p-values below 0.05.However, the differences between the proposed model and the ESN-CNN [36] or transformer [30] models are not statistically significant, as their p-values are above 0.05.The transformer model demonstrates inconsistent results across two datasets.In the DKASC-ASA-1A dataset, the proposed model exhibited a statistically significant difference for MAE but not for RMSE.Conversely, in the DKASC-ASA-1B dataset, the proposed model demonstrated a significant difference for both metrics.In contrast, the ESN-CNN model exhibited no statistically significant difference in the DKASC-ASA-1A dataset but did demonstrate statistical significance in the DKASC-ASA-1B dataset.
In conclusion, the proposed model consistently outperforms many advanced deep learning models in predictive accuracy.Although it sometimes has a slightly higher MAE than the ESN-CNN model, it still provides reliable multi-step forecasts for PV power generation, which helps stabilize the grid and improve energy management.However, since 1D convolution struggles to capture spatial patterns and transformer models are complex to compute, future research could adopt the parallel pooling approach [38].This would facilitate a more comprehensive understanding of complex weather patterns and enhance prediction accuracy across diverse forecast horizons.In future research, the researcher intends to investigate the potential for refining the parallel pooling approach to enhance the model's capacity to capture complex weather patterns and improve predictive accuracy.

Conclusions
This study presented a multi-time resolution solar PV power generation forecasting approach that employed both one-step and multi-step forecasting methods integrated into a final model that served as a comprehensive solution for maintaining a solar PV power system planning and maintenance schedule.The proposed model was distinguished by its exceptional predictive accuracy and advanced pattern recognition capabilities.Experiments conducted to evaluate the performance of the framework demonstrated that the proposed model significantly improved predictive accuracy, achieved lower MAE and RMSE compared to existing DL-based forecasting models, and more accurately identified solar PV power generation patterns.
Notwithstanding the favorable outcomes, the proposed approach is not without inherent limitations.A principal limitation lies in the model's reliance on the quality and completeness of the input data, which may influence performance under varying environmental conditions.Furthermore, the computational complexity involved in integrating multiple forecasting methods presents a substantial challenge for real-time application scenarios.Future efforts will concentrate on optimizing the computational efficiency of the forecasting model to enhance its applicability in real-time forecasting scenarios.Furthermore, the researcher intends to incorporate a wider array of environmental factors to enhance the accuracy of the predictions.Additionally, focus will be placed on improving the quality of the data through the use of advanced preprocessing techniques and on investigating the model's adaptability to other forms of renewable energy sources beyond solar PV.

Algorithm 2 : 1 : 2 : 3 :
Replacement of Outliers with NAsInput: CSV file containing the dataset.Output: Dataset with outliers in specific variables replaced with NAs.Load the CSV file into a data frame 'total_data'.Define the target variables for outlier replacement in 'interpolate_vars'.For each variable in 'interpolate_vars' do 4:

Figure 1 .
Figure 1.Schematic of the one-dimensional (1D) convolution process from input to output layers.

Figure 1 .
Figure 1.Schematic of the one-dimensional (1D) convolution process from input to output layers.

Figure 2 .
Figure 2. Transformer architecture flowchart depicting encoder and decoder processing layers.

Figure 3 .
Figure 3. Integrated forecasting architecture combining single-step and multi-step predictions with 1D convolutional and transformer models.

Figure 3 .
Figure 3. Integrated forecasting architecture combining single-step and multi-step predictions with 1D convolutional and transformer models.

Table 1 .
Statistics for each attribute of the post-processed DKASC-ASA-1A and 1B datasets.

Table 2 .
Mean absolute error (MAE) and root mean square error (RMSE) comparisons for hourly solar photovoltaic (PV) power generation forecasting models (unit: kWh).

Table 2 .
Mean absolute error (MAE) and root mean square error (RMSE) comparisons for hourly solar photovoltaic (PV) power generation forecasting models (unit: kWh).

Table 6 .
MAE and RMSE comparisons for predicting hour-ahead PV power generation using 5 min interval multi-step forecasts (steps 1 to 12) on the DKASC-ASA-1A dataset (unit: kW).

Table 7 .
MAE and RMSE comparisons for predicting hour-ahead PV power generation using 5 min interval multi-step forecasts (steps 1 to 12) on the DKASC-ASA-1B dataset (unit: kW).

Table 8 .
Wilcoxon signed-rank test results for different PV power generation forecasting models.