Assessment of Different Deep Learning Methods of Power Generation Forecasting for Solar PV System

: An increase in renewable energy injected into the power system will directly cause a ﬂuctuation in the overall voltage and frequency of the power system. Thus, renewable energy prediction accuracy becomes vital to maintaining good power dispatch efﬁciency and power grid operation security. This article compares the one-day-ahead PV power forecasting results of three models paired with three groups of weather data. Since the number, loss, and matching problem of weather data will all inﬂuence the training results of the model, a pre-processing data framework is proposed to solve the problem in this study. The models used are a deep learning algorithm-based artiﬁcial neural network (ANN), long short-term memory (LSTM), and gated recurrent unit (GRU). The weather data groups are Central Weather Bureau (CWB), local weather station (LWS), and hybrid data (the combination of CWB and LWS data). Compared to the other two groups, hybrid data showed a 5–8% improvement in measurements. In addition, when it comes to different weather conditions, the advantages of the LSTM model were highlighted. After further analysis, the LSTM model combined with hybrid data showed the most accurate measurements, which was proved through forecasting results for one month. Finally, the results indicate that when the amount of data is limited, using hybrid data and the ﬁve weather features is helpful for training the model. Accordingly, the proposed model shows better one-day-ahead PV forecasting.


Introduction
In recent decades, renewable energy development and carbon reduction have been energy policy targets in various countries in response to global warming. Due to the intrinsic intermittent and uncertain nature of renewable energy, previous experience shows that a high penetration percentage of renewable energy will affect the system's voltage, frequency, and load demand fluctuations. In the state of California, the solar penetration rate of residential roofs has increased rapidly, causing the load to show a duck curve, which conflicts with conventional generation efficiency and the frequency of daily start-up/shutdown operations [1,2]. In this context, developing an accurate forecast of long-term and short-term renewable generation ability is vital for efficient power dispatch and operations. This paper compares different models and weather data to predict one-day-ahead hourly PV power generation.
Accurate prediction of PV power generation is challenging due to complex interactions between many environmental conditions and uncontrollable factors. Therefore, studying weather data is crucial to predicting the impact of solar power generation [3][4][5]. Sangrody et al. [6] selected the most influential variables, including sky coverage, relative humidity, and temperature, for forecasting model training. Sensitivity tests showed that relative humidity plays a dominant role in forecasting. Sangrody et al. [7] concluded that were categorized as sunny, cloudy, rainy, low light, sunny-cloudy, and rainy-sunny as the training dataset to carry out one-day-ahead solar power forecasting. The results showed that their proposed simplified LSTM model outperformed the multilayer perceptron (MLP) model. U. K. Das et al. [28] provided a comprehensive and systematic review of the direct forecasting of photovoltaic power generation. They also provided a critical analysis of recent work, including statistics and machine learning models based on historical data. Meftah et al. [29] compared the performance of LSTM and MLP for PV power forecasting in winter and summer. The results indicated that the effect using LSTM was better for short-term forecasting. A summary of different forecasting algorithms, data inputs, and horizons in the literature is shown in Table 1. The actual PV power generation site used in this study is in Taoyuan City, Taiwan. The number of equivalent sunshine hours is about 3 h. The weather in this city in the experimental year was relatively unstable. The objective of this study was to compare three kinds of deep learning models, namely, the artificial neural network (ANN), long short-term memory (LSTM), and gated recurrent unit (GRU), for one-day-ahead PV power generation forecasting. Furthermore, a data pre-processing framework is proposed to improve predictive accuracy and prevent the failure of one-day-ahead PV power forecasts. The structure of this article is as follows: Section 1 presents a brief review of weather factors and modeling research for PV power forecasting. The details of the various weather data resources used in this study are introduced in Section 2. Then, Section 3 proposes the framework for data pre-processing. Section 4 introduces the application of deep learning Appl. Sci. 2022, 12, 7529 4 of 16 modeling and evaluation methods. Section 5 presents the numerical PV power forecasting results, followed by the final section for the conclusion.

Data Description
This section introduces the sources of the weather and solar photovoltaic parameters and clarifies the means of obtaining various data. The architecture of data collection is shown in Figure 1. The data used in this study include weather data and PV power generation data. The measured weather forecast data and historical data can be easily accessed from the Central Weather Bureau (CWB) website. These data were used for model training and prediction for one-day-ahead PV power forecasting. In addition, the PV power data used in this study were obtained from the SQL server of the energy management system (EMS), which is used as the target of deep learning.

Data Description
This section introduces the sources of the weather and solar photovoltaic parameters and clarifies the means of obtaining various data. The architecture of data collection is shown in Figure 1. The data used in this study include weather data and PV power generation data. The measured weather forecast data and historical data can be easily accessed from the Central Weather Bureau (CWB) website. These data were used for model training and prediction for one-day-ahead PV power forecasting. In addition, the PV power data used in this study were obtained from the SQL server of the energy management system (EMS), which is used as the target of deep learning.

Photovoltaic Generation Data
The actual PV module field is depicted in Figure 2. The PV module was installed on the rooftop of a fast-food restaurant in Taoyuan City, Taiwan, and the capacity is 10 kW. The number of effective sunshine hours is about 3 h in this area. The collection period of the photovoltaic generation data used in this study was from 1 August 2020 to 3 July 2021. The PV power generation data are sampled every minute in this field, and hourly PV power data were obtained as the average data for 60 min.

Photovoltaic Generation Data
The actual PV module field is depicted in Figure 2. The PV module was installed on the rooftop of a fast-food restaurant in Taoyuan City, Taiwan, and the capacity is 10 kW. The number of effective sunshine hours is about 3 h in this area. The collection period of the photovoltaic generation data used in this study was from 1 August 2020 to 3 July 2021. The PV power generation data are sampled every minute in this field, and hourly PV power data were obtained as the average data for 60 min.

Data Description
This section introduces the sources of the weather and solar photovoltaic parameters and clarifies the means of obtaining various data. The architecture of data collection is shown in Figure 1. The data used in this study include weather data and PV power generation data. The measured weather forecast data and historical data can be easily accessed from the Central Weather Bureau (CWB) website. These data were used for model training and prediction for one-day-ahead PV power forecasting. In addition, the PV power data used in this study were obtained from the SQL server of the energy management system (EMS), which is used as the target of deep learning.

Photovoltaic Generation Data
The actual PV module field is depicted in Figure 2. The PV module was installed on the rooftop of a fast-food restaurant in Taoyuan City, Taiwan, and the capacity is 10 kW. The number of effective sunshine hours is about 3 h in this area. The collection period of the photovoltaic generation data used in this study was from 1 August 2020 to 3 July 2021. The PV power generation data are sampled every minute in this field, and hourly PV power data were obtained as the average data for 60 min.

Meteorological Data from the CWB
The experimental database of historical meteorological data used in this study was provided by the Xinwu Weather Station of CWB (about 26 km away from the site) from August 2020 to July 2021. The observed weather database has 55 different weather parameters and can be easily accessed from the CWB website. On the other hand, the weather forecast dataset has 16 different weather features, providing the next three days of weather forecast data, and the data interval is hourly. Details can be found on the CWB website [33,34].

AccuWeather Data
From the historical PV data, it was found that on rainy days, there was a drastic change or almost no PV power generation, as shown in Figure 3. The figure illustrates hourly power generation from 2 to 4 March 2021. The photo shows the weather condition captured by the sky camera at that time for reference. As a result, rainfall data are one of the important parameters for photovoltaic power forecasting. Since there were no rainfall data in the forecast data of the CWB, the rainfall data of Yangmei District in this article were obtained from the AccuWeather Data website, and the data interval is once per hour [35].

Meteorological Data from the CWB
The experimental database of historical meteorological data used in this study was provided by the Xinwu Weather Station of CWB (about 26 km away from the site) from August 2020 to July 2021. The observed weather database has 55 different weather parameters and can be easily accessed from the CWB website. On the other hand, the weather forecast dataset has 16 different weather features, providing the next three days of weather forecast data, and the data interval is hourly. Details can be found on the CWB website [33,34].

AccuWeather Data
From the historical PV data, it was found that on rainy days, there was a drastic change or almost no PV power generation, as shown in Figure 3. The figure illustrates hourly power generation from 2 to 4 March 2021. The photo shows the weather condition captured by the sky camera at that time for reference. As a result, rainfall data are one of the important parameters for photovoltaic power forecasting. Since there were no rainfall data in the forecast data of the CWB, the rainfall data of Yangmei District in this article were obtained from the AccuWeather Data website, and the data interval is once per hour [35].

Local Weather Station (LWS) and Pyrheliometer
The LWS data are from a micro weather station installed in the field of Yangmei District, Taiwan, which is a 6-in-1 weather station. It was installed on the platform less than 100 m from the PV panel. That equipment provides data that include temperature, relative humidity, average wind speed, wind direction, rainfall, and atmospheric pressure. The datasheet of the weather station is presented in Table 2. A pyrheliometer was installed in the field to measure and record the current solar radiation, and the datasheet is depicted in Table 3. The data are updated every minute. The hourly LWS data were obtained as the average data for 60 min.

Local Weather Station (LWS) and Pyrheliometer
The LWS data are from a micro weather station installed in the field of Yangmei District, Taiwan, which is a 6-in-1 weather station. It was installed on the platform less than 100 m from the PV panel. That equipment provides data that include temperature, relative humidity, average wind speed, wind direction, rainfall, and atmospheric pressure. The datasheet of the weather station is presented in Table 2. A pyrheliometer was installed in the field to measure and record the current solar radiation, and the datasheet is depicted in Table 3. The data are updated every minute. The hourly LWS data were obtained as the average data for 60 min.

Data Pre-Processing
The literature review showed that the acquisition of meteorological data has a critical influence on solar photovoltaic forecasting. The accuracy of the forecasting results will vary considerably and is subject to the quality of the measured parameters. These historical and forecast weather data may be lost during collection or unavailable because the website is inaccessible, and this will affect the forecasting results of photovoltaic power generation for the next day. Therefore, data pre-processing is the most important and critical factor for a one-day-ahead PV power forecasting study. The data pre-processing methods used in this study can be divided into three steps, which are data classification, data filtration, and missing data processing. The structure of data pre-processing used in this study is presented in Figure 4.

Data Pre-Processing
The literature review showed that the acquisition of meteorological data has a critical influence on solar photovoltaic forecasting. The accuracy of the forecasting results will vary considerably and is subject to the quality of the measured parameters. These historical and forecast weather data may be lost during collection or unavailable because the website is inaccessible, and this will affect the forecasting results of photovoltaic power generation for the next day. Therefore, data pre-processing is the most important and critical factor for a one-day-ahead PV power forecasting study. The data pre-processing methods used in this study can be divided into three steps, which are data classification, data filtration, and missing data processing. The structure of data pre-processing used in this study is presented in Figure 4.

Data Classification
The main purpose of data classification is to classify weather data from different sources before performing forecast training. To study the influence of data from different sources on one-day-ahead solar photovoltaic forecasting, the data used in this study were divided into three sources: The first group is the historical and forecast data provided by the open platform of the CWB. The second group is the historical weather data from the on-site LWS equipment to carry out one-day-ahead forecasting. The third group is the combination of CWB and LWS data, which means using the same weather features and training data from the CWB and LWS, named hybrid weather data in this article.

Data Filtering
The data filtration process consists of the pre-processing of solar photovoltaic power data, historical weather data, and historical weather forecast data. The details are described as follows: 1. The filtration method of solar photovoltaic power data: When collecting solar photovoltaic power data, very little electricity consumption is needed to maintain the standby status of the inverter when no power generation occurs. In addition, solar photovoltaic power generation is too low in the early morning. These data not only affect the forecast calculation but are useless in the actual power generation forecast.

Data Classification
The main purpose of data classification is to classify weather data from different sources before performing forecast training. To study the influence of data from different sources on one-day-ahead solar photovoltaic forecasting, the data used in this study were divided into three sources: The first group is the historical and forecast data provided by the open platform of the CWB. The second group is the historical weather data from the on-site LWS equipment to carry out one-day-ahead forecasting. The third group is the combination of CWB and LWS data, which means using the same weather features and training data from the CWB and LWS, named hybrid weather data in this article.

Data Filtering
The data filtration process consists of the pre-processing of solar photovoltaic power data, historical weather data, and historical weather forecast data. The details are described as follows: 1.
The filtration method of solar photovoltaic power data: When collecting solar photovoltaic power data, very little electricity consumption is needed to maintain the standby status of the inverter when no power generation occurs. In addition, solar photovoltaic power generation is too low in the early morning. These data not only affect the forecast calculation but are useless in the actual power generation forecast. Therefore, the pre-processing method of solar photovoltaic data entails replacing values less than 0.5 kW and the standby value of −0.002 W by 0. In the zoomed-in area of the collected data illustrated in Figure 5, the values are much lower than −0.002 W, so these values are replaced by 0.

2.
The filtration method of weather feature data: There are 55 different weather feature parameters in CWB's historical data but only 16 features from CWB's weekly forecast data. Thus, 16-parameter weather forecast data were compared to 55-historicalparameter data to find exact matches. The eigenvalues used in the model training process must be the same as those used in the predictive model so that the calculated weights of the deep-learning model are consistent. After cross-checking with the complete data used in this research, there were only 5 parameters that could be used, as shown in Table 4. Among these 5 parameters, the value subject to rainfall has a certain influence on the forecast for the next day. It was obtained from the AccuWeather website and used as one of the parameters. Therefore, the pre-processing method of solar photovoltaic data entails replacing val ues less than 0.5 kW and the standby value of −0.002 W by 0. In the zoomed-in are of the collected data illustrated in Figure 5, the values are much lower than −0.002 W so these values are replaced by 0. 2. The filtration method of weather feature data: There are 55 different weather featur parameters in CWB's historical data but only 16 features from CWB's weekly forecas data. Thus, 16-parameter weather forecast data were compared to 55-historical-pa rameter data to find exact matches. The eigenvalues used in the model training pro cess must be the same as those used in the predictive model so that the calculated weights of the deep-learning model are consistent. After cross-checking with th complete data used in this research, there were only 5 parameters that could be used as shown in Table 4. Among these 5 parameters, the value subject to rainfall has certain influence on the forecast for the next day. It was obtained from the Ac cuWeather website and used as one of the parameters.

Missing Data Processing
In the process of data transmission and storage, data may be lost due to network disconnection, database maintenance, or some unexpected malfunction. Different meth ods were used to pre-process lost historical and forecast data, which are described as fol lows: 1. The pre-processing method of the historical data, including the data from the CWB

Missing Data Processing
In the process of data transmission and storage, data may be lost due to network disconnection, database maintenance, or some unexpected malfunction. Different methods were used to pre-process lost historical and forecast data, which are described as follows: 1.
The pre-processing method of the historical data, including the data from the CWB and local weather station, involved directly deleting abnormal and missing data. The data for such days were erased when the historical data were −9999, null, or intermittently lost. The number of processed data points is shown in Figure 6. The hollow bars show the amount of data that can be used for one-day-ahead solar photovoltaic forecasting training after processing by the above method. There are 170 days of hybrid data, 308 days of local weather station data, and 181 days of CWB data. The original data are from 1 August 2020 to 20 June 2021, a total of 324 days.

2.
However, the pre-processing method used for lost weather forecast data differs from that used for historical data. There were two methods used in this study. The first was the interpolation of data that were missing for one or two hours of the day. If more than one-fourth of the forecast data for a day were lost, the following method was used.
The historical data of the next day's forecast were searched on the CWB's database, and similar weather data were directly used as the forecast weather factor for the next day. For example, if the forecast for the next day was mostly clear from 6:00 to 18:00, the forecast data for the same weather were searched in the historical database and used as the next day's forecast data. The reason for adopting this method is that the unavailability of weather information for the next day will cause the predictive system to crash. As a result, it cannot be applied to actual cases, and it will be meaningless to introduce the predictive module into the energy management system. 2. However, the pre-processing method used for lost weather forecast data differs from that used for historical data. There were two methods used in this study. The first was the interpolation of data that were missing for one or two hours of the day. If more than one-fourth of the forecast data for a day were lost, the following method was used. The historical data of the next day's forecast were searched on the CWB's database, and similar weather data were directly used as the forecast weather factor for the next day. For example, if the forecast for the next day was mostly clear from 6:00 to 18:00, the forecast data for the same weather were searched in the historical database and used as the next day's forecast data. The reason for adopting this method is that the unavailability of weather information for the next day will cause the predictive system to crash. As a result, it cannot be applied to actual cases, and it will be meaningless to introduce the predictive module into the energy management system.

Methods and Evaluation
We note that PV power generation and weather data are related to time series. Thus, the deep learning algorithms chosen for one-day-ahead PV power forecasting were LSTM and GRU neural network models, and the traditional ANN model was added to compare the accuracy of the three models in this study. Hyperparameter tuning of the deep learning network will directly affect the model's performance and then influence the prediction results. This section introduces the method of hyperparameter tuning for each model. The accuracy evaluation methods used in this study are the most widely used metrics, namely, MAPE, RMSE, and MAE. They are described below.

Artificial Neural Network (ANN)
An artificial neural network is one of the main tools used in deep learning. It consists of an input layer, hidden layer, and output layer. The input layer comprises artificial input neurons and brings the initial data into the system for further processing by subsequent layers of artificial neurons. The hidden layer is the layer between input layers and output layers, where artificial neurons receive a set of weighted inputs and produce an output through an activation function. The output layer is the last layer of neurons and produces the given outputs for the program. For the neurons in each layer, the activation function measures the importance of the neuron output to improve the model's effectiveness. The advantage of ANN is it can deal with complicated nonlinear problems and has excellent

Methods and Evaluation
We note that PV power generation and weather data are related to time series. Thus, the deep learning algorithms chosen for one-day-ahead PV power forecasting were LSTM and GRU neural network models, and the traditional ANN model was added to compare the accuracy of the three models in this study. Hyperparameter tuning of the deep learning network will directly affect the model's performance and then influence the prediction results. This section introduces the method of hyperparameter tuning for each model. The accuracy evaluation methods used in this study are the most widely used metrics, namely, MAPE, RMSE, and MAE. They are described below.

Artificial Neural Network (ANN)
An artificial neural network is one of the main tools used in deep learning. It consists of an input layer, hidden layer, and output layer. The input layer comprises artificial input neurons and brings the initial data into the system for further processing by subsequent layers of artificial neurons. The hidden layer is the layer between input layers and output layers, where artificial neurons receive a set of weighted inputs and produce an output through an activation function. The output layer is the last layer of neurons and produces the given outputs for the program. For the neurons in each layer, the activation function measures the importance of the neuron output to improve the model's effectiveness. The advantage of ANN is it can deal with complicated nonlinear problems and has excellent adaptive ability. Therefore, ANN is one of the most popular methods to realize predictions [36]. However, the performance will be relatively poor when there are less data.

Long Short-Term Memory (LSTM)
RNN (Recurrent Neural Network) is a neural network that can cause the vanishing and exploding gradient problem, so it cannot handle information for a long period of time. Therefore, long short-term memory was introduced by Hochreiter and Schmidhuber [37], who proposed that LSTM can solve the vanishing and exploding gradient problem during the training process. In 1999, Gers et al. [38] introduced the forget gate into the LSTM structure, which decides the ratio of information that needs to be kept in the memory cell. LSTM performs better when there are more data.

Gated Recurrent Unit (GRU)
The gated recurrent unit was introduced by Cho et al. [39]. GRU's design is similar to that of LSTM, but without an output gate. It tries to solve the vanishing gradient problem by using an update gate and a reset gate. When dealing with small datasets, GRU is a better choice than LSTM because it has only two gates with a fast calculating speed.

Hyperparameter Adjustment Process
Hyperparameter tuning is a way to optimize the deep learning model, including the hidden unit, batch size, and epochs. Note that the batch size indicates the number of training samples. By increasing the number of samples, the training speed can be accelerated. If the batch size is too large, the memory will be insufficient. By contrast, too small a batch size may lead to underfitting. The hidden unit indicates the number of parameters. The higher the number of hidden layer units, the easier overfitting becomes. Epochs represent the number of training iterations for all of the batches. The training curve will descend when the epoch increases to optimize the loss value. Therefore, the batch size was fixed at the beginning, and three to five layers were tested in every step with the same epochs in the hyperparameter adjustment process.

Evaluation Indices
After obtaining every hourly forecast value of one-day-ahead PV power with the proposed model, it was necessary to validate the applicability of the model through certain indices. In machine learning, the most common indices used to evaluate the performance of models are MAE, RMSE, and MAPE [40].

1.
Mean Absolute Error (MAE). MAE can be used to measure the error between predicted values and actual values. It depends on the scale of continuous variables. The lower the value, the higher the accuracy of the predicted model. The equation is given below: whereX i and X i represent the i th forecasted and actual values, respectively, and N is the size of the test dataset (N = 168 for one week in this study).

2.
Root Mean Squared Error (RMSE). RMSE can be used to measure the deviation between predicted values and actual values. The difference between RMSE and MAE is that RMSE is sensitive to outliers. That means that RMSE is easily influenced by large deviations. Hence, a smaller error indicates better performance. The equation is given below: whereX i and X i represent the i th forecasted and actual values, respectively, and N is the size of the test dataset (N = 168 for one week in this study).

Mean Absolute Percentage Error (MAPE)
. MAPE measures the accuracy as a percentage. It can be used to judge the quality of the predicted result. The definition is: whereX i and X i represent the ith forecasted and actual values, respectively, and N is the size of the test dataset (N = 168 for one week in this study).

Results of Hyperparameter Adjustment
For the actual forecast from 4 February to 21 May 2021, five features (temperature, relative humidity, rainfall, average wind speed, and UV index) were employed with Adam optimizer to tune the hyperparameters. The test results of hyperparameters are shown in Tables 5-7. The tables show that the accuracy of hybrid weather data for the one-day-ahead forecast is higher than that of CWB and LWS data when applied to different models.

Forecast Performance with Different Weather Data Groups
After deep learning model training, these models were applied to execute solar photovoltaic forecasting from 27 June to 3 July 2021. This section describes the use of different deep learning methods (ANN, LSTM, and GRU) with different weather groups for solar photovoltaic forecasting. The forecasting results of the ANN model are presented in Figure 7. The solid blue line is the actual PV power generation, the green dashed line is the predicted result using LWS weather data, the red dashed line is the predicted result using hybrid weather data, and the yellow dashed line is the predicted result using CWB weather data. The lines used in the other graphs have the same meanings. It is apparent that when only CWB data are used, the prediction results of the ANN model severely deviate from the actual power generation during the peak period on 27 June and 29 June 2021. The results indicate that the ANN model does not provide good predictions, even under sunny conditions. On the other hand, when only historical LWS data are used, the results during peak power generation periods deviate quite substantially from the actual values due to poor weather (e.g., 28 June to 29 June). In contrast, the overall prediction curve using hybrid data is closer to the actual power generation trend, although the results are still not good enough under poor weather conditions. values due to poor weather (e.g., 28 June to 29 June). In contrast, the overall prediction curve using hybrid data is closer to the actual power generation trend, although the results are still not good enough under poor weather conditions. The forecasting results of the LSTM model are shown in Figure 8. When CWB data are used for prediction, the accuracy of LSTM is better than that of ANN. This means that the learning mode with time series helps to improve the predictive accuracy. However, from the prediction results of LWS data, it is found that although the results are better than those of the ANN model (the trends on 28 and 29 June 2021 show significant improvement), there is still a big difference in the prediction results on 1 July 2021. This means that the predictive ability is still limited when only historical data are employed. Conversely, the forecast trend of the hybrid model is similar to the actual PV power generation results. The forecasting results of the GRU model are shown in Figure 9. It is also evident from the results that when using CWB data, the GRU model can effectively improve the accuracy in comparison with ANN, especially on 27 June 2021. However, the results show that if only the LWS historical weather data are used for forecasting, the prediction results are still inaccurate with severe deviations in tendency. This consequence is the same as those of ANN and LSTM, e.g., on 28 June and 1 July 2021. In contrast, when the hybrid data are used, the one-day-ahead solar photovoltaic forecasting trend is better than when  Figure 8. When CWB data are used for prediction, the accuracy of LSTM is better than that of ANN. This means that the learning mode with time series helps to improve the predictive accuracy. However, from the prediction results of LWS data, it is found that although the results are better than those of the ANN model (the trends on 28 and 29 June 2021 show significant improvement), there is still a big difference in the prediction results on 1 July 2021. This means that the predictive ability is still limited when only historical data are employed. Conversely, the forecast trend of the hybrid model is similar to the actual PV power generation results. values due to poor weather (e.g., 28 June to 29 June). In contrast, the overall prediction curve using hybrid data is closer to the actual power generation trend, although the results are still not good enough under poor weather conditions. The forecasting results of the LSTM model are shown in Figure 8. When CWB data are used for prediction, the accuracy of LSTM is better than that of ANN. This means that the learning mode with time series helps to improve the predictive accuracy. However, from the prediction results of LWS data, it is found that although the results are better than those of the ANN model (the trends on 28 and 29 June 2021 show significant improvement), there is still a big difference in the prediction results on 1 July 2021. This means that the predictive ability is still limited when only historical data are employed. Conversely, the forecast trend of the hybrid model is similar to the actual PV power generation results. The forecasting results of the GRU model are shown in Figure 9. It is also evident from the results that when using CWB data, the GRU model can effectively improve the accuracy in comparison with ANN, especially on 27 June 2021. However, the results show that if only the LWS historical weather data are used for forecasting, the prediction results are still inaccurate with severe deviations in tendency. This consequence is the same as those of ANN and LSTM, e.g., on 28 June and 1 July 2021. In contrast, when the hybrid data are used, the one-day-ahead solar photovoltaic forecasting trend is better than when The forecasting results of the GRU model are shown in Figure 9. It is also evident from the results that when using CWB data, the GRU model can effectively improve the accuracy in comparison with ANN, especially on 27 June 2021. However, the results show that if only the LWS historical weather data are used for forecasting, the prediction results are still inaccurate with severe deviations in tendency. This consequence is the same as those of ANN and LSTM, e.g., on 28 June and 1 July 2021. In contrast, when the hybrid data are used, the one-day-ahead solar photovoltaic forecasting trend is better than when only using CWB or LWS data. The prediction results on 1 July 2021 show that hybrid data significantly help improve the effect of the one-day-ahead solar photovoltaic forecast.
Appl. Sci. 2022, 12, x FOR PEER REVIEW 12 o only using CWB or LWS data. The prediction results on 1 July 2021 show that hybrid d significantly help improve the effect of the one-day-ahead solar photovoltaic forecast The predictive ability of different models can be clearly seen in the above figures addition, the predicted generation intensity every hour will largely vary when using ferent methods with different weather data sources. The average evaluation indices of three models with different weather data groups are shown in Table 8. Comparing forecasting results of different models, it is found that the predictive effect of ANN is atively poor compared to those of LSTM and GRU. The one-week average value of MA can vary by up to 8~10%, and the maximum differences in MAE and RMSE are up to 1 and 0.46, respectively. The results imply that the LSTM and GRU models are more suit for one-day-head solar photovoltaic forecasting. Subsequently, the forecasting result different weather groups are compared. From the one-week average result, when o LWS historical data are used for forecasting, the prediction effect of all models is com atively poor. Finally, the comprehensive comparison results from the trend of the PV power f cast curve or numerical statistical analysis show that using hybrid weather data can ef tively improve the accuracy of solar photovoltaic forecasts. On the other hand, from average prediction results of the three different groups of weather data, the effect of LSTM model is the best, indicating that this model is more suitable as a solar photovol prediction model in this study, with a MAPE of 20.0%, MSE of 1.158, and RMSE of 1. Based on the suggestion by Lewis [41], the accuracy of the predictive ability based MAPE is classified as highly accurate (less than 10%), good (11% to 20%), reasonable ( to 50%), and inaccurate (more than 50%). It is shown that the data pre-processing met proposed in this study combined with the LSTM deep learning forecasting model achieve good prediction results.
From the analysis results mentioned above, the use of hybrid data can significa improve the one-day-ahead solar photovoltaic forecast effect. Therefore, the result The predictive ability of different models can be clearly seen in the above figures. In addition, the predicted generation intensity every hour will largely vary when using different methods with different weather data sources. The average evaluation indices of the three models with different weather data groups are shown in Table 8. Comparing the forecasting results of different models, it is found that the predictive effect of ANN is relatively poor compared to those of LSTM and GRU. The one-week average value of MAPE can vary by up to 8~10%, and the maximum differences in MAE and RMSE are up to 1.089 and 0.46, respectively. The results imply that the LSTM and GRU models are more suitable for one-day-head solar photovoltaic forecasting. Subsequently, the forecasting results of different weather groups are compared. From the one-week average result, when only LWS historical data are used for forecasting, the prediction effect of all models is comparatively poor. Finally, the comprehensive comparison results from the trend of the PV power forecast curve or numerical statistical analysis show that using hybrid weather data can effectively improve the accuracy of solar photovoltaic forecasts. On the other hand, from the average prediction results of the three different groups of weather data, the effect of the LSTM model is the best, indicating that this model is more suitable as a solar photovoltaic prediction model in this study, with a MAPE of 20.0%, MSE of 1.158, and RMSE of 1.004. Based on the suggestion by Lewis [41], the accuracy of the predictive ability based on MAPE is classified as highly accurate (less than 10%), good (11% to 20%), reasonable (21% to 50%), and inaccurate (more than 50%). It is shown that the data pre-processing method proposed in this study combined with the LSTM deep learning forecasting model can achieve good prediction results.
From the analysis results mentioned above, the use of hybrid data can significantly improve the one-day-ahead solar photovoltaic forecast effect. Therefore, the results of one-day-ahead solar photovoltaic forecasting with the hybrid data group for one week were further analyzed in this study, as shown in Figure 10. The results of the one-week forecast statistical numerical analysis are presented in Table 9. In Figure 10, there is an obvious prediction difference among the three models on 2 and 3 July 2021. The forecast results of ANN are worse than those of LSTM and GRU. This means that when there is a big change in weather conditions, the difference between the forecast and actual power generation trends will become more pronounced.
one-day-ahead solar photovoltaic forecasting with the hybrid data group for one week were further analyzed in this study, as shown in Figure 10. The results of the one-week forecast statistical numerical analysis are presented in Table 9. In Figure 10, there is an obvious prediction difference among the three models on 2 and 3 July 2021. The forecast results of ANN are worse than those of LSTM and GRU. This means that when there is a big change in weather conditions, the difference between the forecast and actual power generation trends will become more pronounced.  When comparing the daily forecast numerical analysis results of the LSTM and GRU models for one week, the results of the LSTM model are better than those of the GRU model, and the average forecast results are relatively stable. The average results of LSTM for a week have a MAPE of 16%, MSE of 0.71, and RMSE of 0.83. On the other hand, the GRU model is better than LSTM on sunny days. The MAPE, MAE, and RMSE of GRU are 8%, 0.263, and 0.513, and those of LSTM are 11%, 0.531, and 0.792, respectively, on 27 June 2021. In addition, the forecast results of GRU on 2 July are also better than those of LSTM. However, the advantages of the LSTM model are highlighted when the weather pattern is poor. From the forecasting results on 1 July, the MAPE of LSTM is about 8% higher than that of GRU, and MAE and RMSE are about 0.3-0.4 higher, respectively. Therefore, when selecting the prediction results for the next day, the weather forecast can be used as a reference. If the weather forecast for the next day is sunny, the results of the GRU model are more accurate. If it is cloudy, the results of LSTM are a better choice.
According to the aforementioned results, it is found that forecasting via the LSTM model combined with the hybrid weather data yields the best prediction results. Therefore, the LSTM model was used to predict the solar photovoltaic power for June 2021, and  When comparing the daily forecast numerical analysis results of the LSTM and GRU models for one week, the results of the LSTM model are better than those of the GRU model, and the average forecast results are relatively stable. The average results of LSTM for a week have a MAPE of 16%, MSE of 0.71, and RMSE of 0.83. On the other hand, the GRU model is better than LSTM on sunny days. The MAPE, MAE, and RMSE of GRU are 8%, 0.263, and 0.513, and those of LSTM are 11%, 0.531, and 0.792, respectively, on 27 June 2021. In addition, the forecast results of GRU on 2 July are also better than those of LSTM. However, the advantages of the LSTM model are highlighted when the weather pattern is poor. From the forecasting results on 1 July, the MAPE of LSTM is about 8% higher than that of GRU, and MAE and RMSE are about 0.3-0.4 higher, respectively. Therefore, when selecting the prediction results for the next day, the weather forecast can be used as a reference. If the weather forecast for the next day is sunny, the results of the GRU model are more accurate. If it is cloudy, the results of LSTM are a better choice.
According to the aforementioned results, it is found that forecasting via the LSTM model combined with the hybrid weather data yields the best prediction results. Therefore, the LSTM model was used to predict the solar photovoltaic power for June 2021, and the results are shown in Figure 11. The results show that MAPE, MAE, and RMSE are 16.984%, 1.764, and 1.283, respectively. It is shown that the combination of CWB data and LWS data together with the deep learning model is helpful for improving the accuracy of the one-day-ahead solar photovoltaic forecast. Furthermore, a comparison with other one-day-ahead solar photovoltaic forecast methods from the literature is shown in Table 10. It can be found that the application of complex deep learning algorithms combined with the weather data and data pre-processing method proposed in this study has a favorable effect on one-day-ahead solar photovoltaic forecasting.
the results are shown in Figure 11. The results show that MAPE, MAE, and RMSE are 16.984%, 1.764, and 1.283, respectively. It is shown that the combination of CWB data and LWS data together with the deep learning model is helpful for improving the accuracy of the one-day-ahead solar photovoltaic forecast. Furthermore, a comparison with other oneday-ahead solar photovoltaic forecast methods from the literature is shown in Table 10. It can be found that the application of complex deep learning algorithms combined with the weather data and data pre-processing method proposed in this study has a favorable effect on one-day-ahead solar photovoltaic forecasting.

Conclusions
The acquisition of weather feature data greatly influences the one-day-ahead solar photovoltaic prediction results. The previous literature focused on a single data source, comparing different weather features, or used historical time-series data directly for oneday-ahead PV power forecasting. The objective of this study was to find the appropriate weather features and weather data group paired with a deep learning algorithm to implement one-day-ahead PV power forecasting. In this work, the application of ANN, LSTM, and GRU combined with different weather data (LWS, CWB, and hybrid data) was used for one-day-ahead photovoltaic power generation forecasting, and the forecasting effects were compared. The area of study covers a rooftop solar photovoltaic system in Yangmei District, Taiwan. The data pre-processing method proposed in this article can prevent the failure of forecasting results and solve the data unavailability problem. Hybrid weather data show up to 5-8% improvements in accuracy compared to the CWB and LWS. Considering all of the weather patterns, the LSTM model is more accurate for one-day-ahead PV power forecasting. MAPE, MAE, and RMSE are 16%, 0.71, and 0.83, respectively. The results indicate that when the amount of data is limited, using hybrid data and the five

Conclusions
The acquisition of weather feature data greatly influences the one-day-ahead solar photovoltaic prediction results. The previous literature focused on a single data source, comparing different weather features, or used historical time-series data directly for oneday-ahead PV power forecasting. The objective of this study was to find the appropriate weather features and weather data group paired with a deep learning algorithm to implement one-day-ahead PV power forecasting. In this work, the application of ANN, LSTM, and GRU combined with different weather data (LWS, CWB, and hybrid data) was used for one-day-ahead photovoltaic power generation forecasting, and the forecasting effects were compared. The area of study covers a rooftop solar photovoltaic system in Yangmei District, Taiwan. The data pre-processing method proposed in this article can prevent the failure of forecasting results and solve the data unavailability problem. Hybrid weather data show up to 5-8% improvements in accuracy compared to the CWB and LWS. Considering all of the weather patterns, the LSTM model is more accurate for one-day-ahead PV power forecasting. MAPE, MAE, and RMSE are 16%, 0.71, and 0.83, respectively. The results indicate that when the amount of data is limited, using hybrid data and the five weather features is helpful for training the model. Accordingly, the proposed model shows better one-day-ahead PV forecasting.
The pre-processing data framework may remove a large number of historical data points, which may lead to insufficient data and data sample discontinuity. This causes a limitation in model training in this study. Nevertheless, the deep learning method has good generalization ability (the adaptability of machine learning algorithms to fresh samples). Therefore, keeping the valuable data will have the best effect on long-term predictions. This article shows that the validation data over the one-month study is stable. The MAE, MAPE,