Short-Term Solar Irradiance Forecasting Based on a Hybrid Deep Learning Methodology

: Accurate prediction of solar irradiance is beneﬁcial in reducing energy waste associated with photovoltaic power plants, preventing system damage caused by the severe ﬂuctuation of solar irradiance, and stationarizing the power output integration between di ﬀ erent power grids. Considering the randomness and multiple dimension of weather data, a hybrid deep learning model that combines a gated recurrent unit (GRU) neural network and an attention mechanism is proposed forecasting the solar irradiance changes in four di ﬀ erent seasons. In the ﬁrst step, the Inception neural network and ResNet are designed to extract features from the original dataset. Secondly, the extracted features are inputted into the recurrent neural network (RNN) network for model training. Experimental results show that the proposed hybrid deep learning model accurately predicts solar irradiance changes in a short-term manner. In addition, the forecasting performance of the model is better than traditional deep learning models (such as long short term memory and GRU).


Introduction
While the overall world's energy consumption increases, photovoltaic energy generation is becoming increasingly important. Non-renewable energy resources (coal, oil, natural gas, etc.) have the disadvantages of having limited storage, providing high pollution, and causing landscape changes [1]. As part of the process of replacing traditional energy resources with renewable energy resources, the factor of environmental protection is highly relevant. As a result, clean and non-polluting renewable energy resources (solar, wind, and geothermal, etc.) have been attracting both scientists' and engineers' attention [2]. Moreover, for countries with a large landscape, such as China, the use of solar energy as a replacement for traditional oil-based energy resources is an urgent priority. The solar radiation in the whole country is 3340 MJ/m 2 -8400 MJ/m 2 . An efficient and effective way of utilizing solar irradiance power is highly demanded for those countries [3,4].
Solar irradiance forecasting has been widely studied in related fields [5]. Accurate forecasting of solar irradiance provides important evidence for predicting photovoltaic energy generation at the same location, since solar irradiance is directly proportional to photovoltaic energy generation using solar panels. However, solar irradiance prediction is challenging because it is highly co-related to various environmental factors, such as the sun position, temperature, wind speed, and cloud movement. Accurately predicting solar irradiance is a complex and difficult task.
Machine learning and deep learning technologies are popular data-driven prediction models for time series data forecasting [6,7], including artificial neural networks [8], grey theory methods [9], and support vector regression [10]. There are two main types of research on hybrid learning forecasting methods: (1) serialized ensemble learning methods, which combine two or more prediction models, such as the clustering method and least squares support vector machine algorithm [10], a combination of wavelet transform and neural network [11], and so on. (2) parallel ensemble learning methods, which combine the prediction results obtained by different prediction methods, where different prediction weights are given to each prediction method based on certain optimization criteria to form a final prediction result. Before this study, many different machine learning methods have been proposed before to predict the power output of PV systems, such as the BP Neural Network (BPNN) [11], Radial Basis Function Neural Network (RBFNN) [12], generative adversarial network (GAN) [13], semi-supervised algorithm [14], fuzzy inference method [15], mathematical analysis methods, and so on.
In this study, a hybrid deep learning framework has been proposed to forecast the solar irradiance changes at the Nevada desert in the USA. The solar irradiance is directly proportional to the energy outputs of photovoltaic plants at the Nevada desert. Multi-dimensional data inputs for the deep learning neural network were adopted. Additional input features include the average wind speed and peak wind speed, which make the prediction results more accurate and reliable. The experiment results also demonstrate that the solar irradiance prediction results considering multiple factors (such as the wind speed) are significantly better than the prediction result of a single factor. This research has greatly improved the prediction accuracy of the solar irradiance, providing a basis for the energy generation forecasting of photovoltaic systems.

Introduction to the Comparison Model
In this paper, a series of models are used to predict the short-term solar irradiance of the same dataset. The models used are briefly described as follows: The Long short term memory (LSTM) model [16]: the LSTM model uses two LSTM layers to extract important features of the input power generation time series and data sets, respectively. It then extracts the extracted feature vectors into one-dimensional vectors and fuses them, and then outputs the prediction results through a fully connected layer.
The Gated recurrent unit (GRU) model: unlike LSTM, GRU has a simple structure with only two gates: a reset gate and an update gate. Its biggest advantages are that it requires fewer parameters, has a fast training speed, and saves time.

Inception Network
The usual way to improve network capabilities is to increase the depth and width of the network, but this also provides a lot of "side effects", such as model overfitting, gradient disappearance, and so on. The Inception Network provides a solution to this problem, so that the amount of calculation does not need to change, and the depth and width of the network are improved.
The Inception Network [17] has gone through multiple versions from InceptionV1 (also known as "GoogLeNet") to InceptionV3. In the improvement to InceptionV2, the convolution of two 3 × 3 convolution was removed in favor of 5 × 5, and some minor improvements of Batch Normalization were proposed. When it was improved to InceptionV3, the main changes involved the decomposition of the convolution kernel size: this convolution kernel is now decomposed into a symmetric small convolution kernel or an asymmetric convolution kernel. The network structure of InceptionV3 is shown in Figure 1.

ResNet Network
After solving the problem of increasing network depth, theoretically, more complex features can be extracted to achieve better results. But in practice, degradation problems occur. This means the network accuracy increases while the network depth increases. Subsequently, there is a trend of saturation or even decline in network depth, meaning the difficulty of model training increases, which can be further revolved using the ResNet network structure [18].
The main structural residuals of the ResNet network enable the ResNet network to use multiple referenced layers to learn the residual representation between input and output, instead of using a reference layer to learn the input-to-output mapping like in a traditional network. The output of the residual unit is added to by the concatenated output of the multiple convolutional layers and the input elements (guaranteeing that the convolutional layer output and the input element dimensions are the same), and then the activation of ReLU, the structure is cascaded to obtain the residual network, which is the ResNet network. Experiments show that the residual structure can effectively accelerate the training convergence speed of the model, and can also effectively mitigate the degradation problem. Its residual structure is shown in Figure 2. This experiment combined Inception and Residual "strong combination" into a new Inception_ResNet structure. The specific structure is shown in Figure 3:

GRU-Gated Cyclic Neural Network
GRU (Gate Recurrent Unit) [19] is a variant of the Recurrent Neural Network (RNN), which is the same as the LSTM (Long-Short Term Memory) in terms of being used in order to solve the problem of long-term memory network and backpropagation. GRU and LSTM are almost the same in performance, but this paper chose GRU, mainly because GRU is easier to train with LSTM and improves training efficiency substantially. In addition, the model structure is simpler.
Unlike the LSTM, the GRU replaces the forget gate and the input gate in the LSTM with an update gate that controls the cell state information from the previous moment and brings it into the current cell. The larger the value of the update gate is, the more the cell state information can be brought in from the previous time. In addition, a reset gate is used to control how much information is written to the current state from the previous state. In the cell state, the smaller the reset gate is, the less information about the previous cell state is written. At the same time, the GRU also mixes the cell state and the hidden state to achieve simplification of the model. Its forward calculation process can be expressed as: where [ ] indicates that two vectors are connected, and * indicates the multiplication of the matrices. z t and r t respectively represent the output of the update gate and the reset gate, while W z , W r , W, and W o respectively represent the matrix of the corresponding weight coefficient and the offset term. σ and tanh respectively represent the sigmoid and the hyperbolic tangent activation function, and x t is the network input at time t. h t and h t−1 represent the hidden layer information of the current time and the previous time, and h t represents the candidate state of the input. First, the network input x t at the time of the hidden state h t−1 and t at the last moment calculates the output of the reset gate and the update gate by the Formulae (1) and (2). Then, after the reset of the reset gate r t determines how much memory is retained, the implicit layer h t is calculated by Formula (3), that is, the new information at the current time t. Afterwards, by using Formula (4), the update gate z t determines how much information was discarded at the previous moment, and how much information is retained in the candidate hidden layer h t at the current moment. The hidden layer information h t is then added. Finally, the output of the GRU is passed to the next GRU gated loop unit by using Formula (5): which is a sigmoid function. The internal structure of the GRU gated loop unit is shown in Figure 4.

The Attention Mechanism
The attention mechanism [5] was developed based on the human visual attention mechanism. In short, it is a series of attention distribution coefficients, i.e., a series of weight parameters. The attention mechanism extracts important information without increasing the computation time, making the model's learning process more flexible. In this paper, Attention (the attention mechanism) is added after GRU, and its output vector is used as the input of attention. Attention finds the attention weight by itself, which makes the model's prediction accuracy more accurate. Its calculation equations are as follows: Attention searches for h t 's attention weight β t , where W h and b h are weight coefficients and biases. The attention vector A = h 1 , h 2 , . . . , h t is obtained by multiplying the attention weights β t and h i . The specific calculation equation is as shown in Equation (8):

Model Structure
The GRU_attention solar photovoltaic power generation prediction model is generally composed of the feature extraction of a CNN network and the regression prediction of a RNN cyclic neural network. Inception_ResNet is the combination of Inception and ResNet, which not only increases the width and depth of the network, but also effectively improves the expressiveness of the network. Specifically, in Inception_ResNet, the Residual structure can not only greatly accelerate the convergence of the Inception network, but also increase the Inception structure and network depth, and ultimately provides the model with higher network accuracy. The basic flow of the GRU_attention model is shown in Figure 5. The overall structure of the GRU_attention model is shown in Figure 6. After the multi-dimensional photovoltaic data is input into the model, the data is first stitched into the form of n × n through the data processing module, and then it is input to the Inception_ResNet network for feature extraction, and then the extracted features are input to the GRU network for training and prediction. The main role of the data processing part is to transform the data into a specific dimension. The CNN convolutional neural network part uses two Inception structures, namely, Inception_ResNet and InceptionV3 networks, to achieve feature extraction at different scales. The subsequent GRU part uses a two-layer structure to make predictions. Afterwards, the attention mechanism processes the output of the GRU hidden layer, where each element has a different attention weight, and then passes through a Dropout layer to randomly discard hidden neurons. The Dropout operation can effectively improve the model. The training convergence speed can prevent the model from overfitting to a certain extent. Finally, a layer of a fully connected neural network is used to output the prediction results. The entire process is an end-to-end learning model.

Data Preprocessing
The data used in this study was collected by University of Nevada-Las Vegas and NREL to support solar power generation research in the United States [20]. Local solar irradiance data in the year 2014 was collected. The measurement station is located at latitude: 36.107 • north and longitude 115.1425 • west. There is missing data in the original data, so the data needs to be pre-processed. For data with missing values, the data at the previous moment is selected for completion. Additional features, such as averaged wind speed and peak wind speed, were used to enhance the prediction performance [21,22]. The data of the whole year 2014 was adopted. For each season, one month of data was selected as the testing dataset and the remaining two months were treated as a training dataset. As a result, we have eight months of solar irradiance data for training and four months of data for testing. Finally, the lagging value was used for the prediction, as shown in Equation (9): The Adam network optimizer was selected with a learning rate of 0.002, optimizing the neural network weights of the entire GRU_attention model to minimize the loss value of the loss function.

Evaluation Criteria
Compared with the cyclic architecture, the calculation of the convolution model could be fully parallelized, which can achieve high training efficiency, and at the same time, makes optimization easier. MAE, MAPE, and RMSE were used to evaluate the performance of the prediction model. The calculation equations of the evaluation criteria MAE, RMSE, and MAPE are shown in Formulae (10)- (12): where: y i is the result of the model prediction; y i is the actual test sample value; and n is the total number of test samples. The smaller the values of MAE, RMSE, and MAPE, the better the prediction performance of the model.

Results
This experiment divides a year of data set into four parts: spring, summer, autumn, and winter. The irradiance pattern in the four seasons changes greatly, which provides the main difficulties for the experiment. This study divided the experiment into 5, 10, 20 and 30 min time intervals to learn the changing trend of the prediction results of solar photovoltaic power generation under three models: LSTM, GRU, and the model GRU_attention proposed in this paper. Table 1 shows the specific evaluation index data of MAE, RMSE, and MAPE for the three models in the four different seasons. The experimental results are shown in Figures 7-10.     In order to study the performance of the proposed model in different time ranges, experiments have been conducted in the time ranges of 5 min, 10 min, 20 min, and 30 min, respectively, to predict the photovoltaic irradiation power in the four different seasons, i.e., spring, summer, autumn, and winter. The prediction accuracy was generally higher for 5 min using all three compared methods. And the accuracy gradually decreased for 10, 20, and 30 min time intervals. In all cases, the proposed hybrid deep learning model maintained better prediction performance compared to the other two methods. Moreover, the advantage became more obvious when the time interval expanded.
In summary, the prediction results show that the difference between the three models is small in the seasons with fewer fluctuations, e.g., in summer and winter. However, the proposed GRU_Attention model has better performance over all compared methods. It is evident that the model GRU_Attention proposed in this paper adapts to the diversity of solar irradiance changes in different seasons. In contrast, the direct applications LSTM and GRU do not fit the actual solar irradiance change curve well. It is evident that the proposed hybrid deep learning model is more stable and simultaneously improves the accuracy of solar irradiance prediction.

Conclusions and Future Works
This study proposes a hybrid deep learning solar irradiance prediction model, namely, GRU_attention, based on the Keras framework, which has the advantages of strong generalization ability, fast modeling, high portability, and high prediction accuracy. This study mainly introduces the principle and structure of the network, the overall framework, etc., and describes the specific structure of the entire end-to-end model. In particular, a GRU_attention convolutional neural network automatically extracts data features, overcomes the problem of high dimension of the original data, improves model learning performance, and provided accurate forecasting results for solar irradiance at the Nevada desert in the USA. Deep learning has excellent forecasting capabilities, especially for large and medium-sized data sets. Combining different deep learning techniques effectively improves the accuracy of power generation forecasting results by increasing the reliability and stability of these results. In the future, we will further expand the time interval to make medium-term to long-term solar irradiance predictions. Future research directions include exploring the PV system damage caused by the severe fluctuations of solar irradiance.