Solar Power Prediction Using Dual Stream CNN-LSTM Architecture

The integration of solar energy with a power system brings great economic and environmental benefits. However, the high penetration of solar power is challenging due to the operation and planning of the existing power system owing to the intermittence and randomicity of solar power generation. Achieving accurate predictions for power generation is important to provide high-quality electric energy for end-users. Therefore, in this paper, we introduce a deep learning-based dual-stream convolutional neural network (CNN) and long short-term nemory (LSTM) network followed by a self-attention mechanism network (DSCLANet). Here, CNN is used to learn spatial patterns and LSTM is incorporated for temporal feature extraction. The output spatial and temporal feature vectors are then fused, followed by a self-attention mechanism to select optimal features for further processing. Finally, fully connected layers are incorporated for short-term solar power prediction. The performance of DSCLANet is evaluated on DKASC Alice Spring solar datasets, and it reduces the error rate up to 0.0136 MSE, 0.0304 MAE, and 0.0458 RMSE compared to recent state-of-the-art methods.


Introduction
Regarding solar energy generation, sustainable development and global climate change are the two main issues [1]. Each year energy consumption is increased by 2% globally, where the total energy production is significantly based on fossil fuels, such as natural gas, coal, and oil, which considerably increases anthropogenic greenhouse gas (GHG) emission [2,3]. Furthermore, power generation from fuels produces environmental risks and energy crises, such as energy resource reduction and an increase in environmental pollution, which is considered a major threat to lives [4][5][6]. These drawbacks of energy generation from fossil fuels force governments to explore the resources of renewable energies [6,7]. Solar power is considered the alternative when compared to fossil fuels due to various characteristics, such as being clean, green, and naturally replenished. Solar power generation, either as an islanded or grid-connected mode of operation, brings unstable uncertainty, which causes problems for the stability of the power systems, particularly for the integration of solar power in a large microgrid system [8,9]. To overcome these challenges a reliable solar power prediction is an effective way to decrease the uncertainty, which is important for the planning, management, and operation of energy systems [10]. Therefore, the researchers investigated several techniques for solar power prediction. These techniques are broadly categorized into statistical (ST), artificial intelligence (AI), and hybrid methods (HM) [11]. In ST-based methods, several algorithms are developed, including auto-regressive [12], Bayesian [13], Kalman [14], grey models [15,16], and the Markov chain model [17]. Additionally, MaatAllah et al. [18] and Reikard et al. [19] developed ST-based models for renewable power prediction. In contrast, statistical models rely on linear data for learning but are unable to learn complex data; therefore, ST-based methods are not recommended for problems requiring nonlinear predictions, such as those associated with solar power.
Due to their potential for extracting representative features and data mining, AI-based models have proven to be more successful than physical and statistical ones [20]. These AI-based methods developed in the literature for solar power generation include neural networks [21], SVR [22], the adaptive fuzzy approach [23], and ELM [24], etc. Unlike ST-based approaches, most of these AI-based approaches are used to manage nonlinear relationships between input and output. Additionally, in the literature on power generation prediction, some special AI-based models, such as those models based on CNNs and generative adversarial networks, were developed by [25], and it became evident that weather classification played a significant role in developing such an accurate model. Furthermore, a number of AI-based approaches, including RNN [26], LSTM [27], CNN [28], GRU [29], etc., have been developed by the researchers for solar power generation, where the details are given in a recent survey [30]. This survey [30] also concluded that due to balancing parameter stability with accuracy, and their pros and cons, hybrid models are effective for solar power prediction. These AI-based methods are constructed via shallow architecture, requiring handcrafted feature engineering and having limited generalization capabilities [31]. Furthermore, in AI-based methods, CNN and RNNs achieved better performance; however, using CNN, the feature is extracted in spatial dimensions [32][33][34], while the RNNS learns in temporal dimensions, while solar power generation includes both types of features. Therefore, an approach with the ability of spatial and temporal feature extraction is required for accurate solar power prediction. In the light of current literature, hybrid models achieved state-of-the-art accuracy for solar power prediction [38]. These models include CNN-RNN [42], CNN-GRU [43], CNN-LSTM [44], CNNLSTM with autoencoder [45], convolutional LSTM (CLSTM) [46], CNN-GRU with preprocessing [45,47], and LSTM-CNN [48]. Some recent hybrid models for renewable power generation prediction are summarized in Table 1. Hybrid methods achieved improved prediction performance compared to other predictive modeling techniques. However, the current literature focuses on the stacked layers procedure to develop a hybrid model for solar power prediction where historical data of solar power have a limited number of features, which makes it difficult to learn spatial and temporal features using the stacked layers phenomena. Furthermore, prediction accuracy needs to be improved for reliable and accurate solar energy prediction. Therefore, in this work, we developed DSCLANet for solar power prediction with the ability to learn spatial and temporal features parallelly from actual solar power and weather data. The first stream of the proposed network utilizes CNN for spatial feature extraction, while the second stream is responsible for temporal feature extraction. Finally, the outcome of these streams is concatenated and passed to fully connected layers for solar energy prediction. The performance of the proposed model is evaluated on benchmark datasets and extensively decreases the error rates compared to state-of-the-art models. The following are the main contributions of this work:

•
To select the most suitable model for solar power prediction, an ablation study is conducted, where the main objective is to evaluate the performance of several techniques including CNN, LSTM, GRU, CNNLSTM, CNNGRU, and DSCLANet to select an accurate prediction model for solar power.

•
Our findings from this ablation study indicate that DSCLANet gives the best prediction accuracy comparatively, which has been confirmed experimentally by various comparisons. The DSCLANet process is the input via separate streams for spatial and temporal features which are then fused and passed to the attention for feature refinement. The refined features are then forwarded to a fully connected layer for final solar power prediction. • A number of benchmark datasets are utilized to assess the DSCLANet performance, and the results indicate a marginal reduction in error rates compared to other state-ofthe-art methods.

•
The remainder of this article is organized as follows. Section 2 describes the internal architecture of DSCLANet, and Section 3 defines the datasets, evaluation metrics, and performance comparison of DSCLANet with ablation study and baseline methods. Finally, this article is concluded in Section 4, with possible future directions.

Materials and Methods
The main framework of the DSCLANet is shown in Figure 1. where the input data is parallelly processed using CNN and LSTM architecture to extract spatiotemporal information. The output of these two architectures is then fused and fed to the attention stage for feature refinement and, finally, to the fully-connected layers for prediction. The internal architecture of the proposed model is further described in the following subsection.

CNN-LSTM
Dual CNN-LSTM architecture integrates CNN and LSTM for solar energy prediction. The proposed model has the ability to store the irregular complex trend and can extract complex features from historical solar power generation data. The first stream is incorporated to extract spatial features via CNN from the input data, while the second stream is responsible for temporal features extraction using LSTM. The CNN is a well-known deep learning architecture consisting of four types of layers, namely convolutional, pooling, fully connected, and regression layers [49]. The convolutional layers include multiple convolution filters which perform convolutional operations between convolutional neuron weights and input volume connected regions which generate a feature map [50,51]. The LSTM architecture is responsible for storing time information about important characteristics of solar power data. It supplies a solution by maintaining log-term memory by merging memory units that can update the previous hidden state [52]. With this function, it will be easier to understand temporal relationships in a long-term sequence. In this case, gate units receive the output values from the preceding CNN layer. The LSTM network addresses vanishing and explosive gradient problems that can happen when learning basic RNNs. The three gates unit's mechanism can be used for determining the state of each individual memory cell. The input, output, and forget gates represent the gate unit. The mathematical of an LSTM from input to output generation is given in Equations (1)- (6).
where x t is the input, hidden layer output is represented by h t , Φ is the sigmoid function, and C t is the cell state, while its state candidate is represented byŴ i ,Ŵ o ,Ŵ f , andŴ C , which are the input, output, forget gate, and memory cells weights, respectively, while B i , B o , B f , and B c are the bias terms for the input, output, forget gate, and cell, respectively. Finally, the output of CNN and LSTM streams are then fused with a concatenation layer and faded to attention layers for further processing.

Attention Mechanism
The final output of deep learning architectures named (CNN and LSTM) are integrated to obtain a single feature vector, and then fed the output streams to the self-attention SA mechanism to determine a representative feature vector for final forecasting. In addition, the invisible detail at different timestamps has a high impact on final results, but the CNN and LSTM streams are unable to predict forecasting accurately. To cope with these issues, our work is focused on integrating the SA architecture which has the capability to strengthen dominant and undermine trivial details by adaptively weighting the hidden features. In this paper, we utilized the SA architecture for the recognition of dominant features; in this regard, the combined feature vector of CNN and LSTM streams is used as an input to the SA network before forecasting. Moreover, the correlation of the proposed architecture at different timestamps among hidden features is investigated from every dimension. The calculation of the hidden features score, such as the k th timestamp and N th dimension, is based on Equation (7), as follows: where g k, n indicates the d th dimension of the invisible state at k th timestamp, whereas the weight matrix, such as w k, n , f i is a function applied using dense layers, and n and n i describe the number of timestamps and hidden feature dimensions, respectively. The proposed network also contains dense layers, which are utilized to forecast power (PV) for a certain period of time, for instance an hour ahead of the PV power forecasting. The final output of the SA architecture is flattened to a Z i = z 1 , z 2 , z 3 . . . .z n feature vector, whereas i represents the output dimensions of the proposed model. The output of the S-AM architecture is fed to the fully connected layers as an input, where the mathematical form of these layers is presented as follows in Equation (9): where w l−1 ji indicates a weigh metric, x describes the activation function, namely the X l−1 i input data in this equation, while B l−1 j represents the bias term.

DSCLANet Archatecture
The architecture of DSCLANet includes CNN, LSTM, attention, and fully connected layers. Optimal DSCLANet architecture is developed by adjusting various parameters, including the size of the filter for CNN, the size of the kernel, the size of the LSTM cell, etc. Several experiments are conducted to choose the optimal parameters for the model before finalizing its internal parameters. The two streams allow for the parallel extraction of spatiotemporal features from large data sets, which are inputs to both streams. The CNN stream includes three CNN layers, while the LSMT stream includes two LSTM layers for each type of feature extraction. A concatenation layer is then applied to the output of both streams, followed by a feature-attentional layer and fully connected layers. The internal architecture of DSCLANet in terms of number of parameters, filters, and kernels is given in Table 2. In the first stream, the hyper-parameters of CNN layer 1 are as follows: the filter size is set to 32, with a kernel size of 5, padding is set to the same, the stride is set to 1, with default valid padding, and we used ReLU as the activation function. In the second CNN layer, the filter size is set to 64 with a kernel size of 3 while other hyper-parameters are the same as CNN layer 1. Furthermore, in the third CNN layer, the filter size is set to 128 while the kernel size of 1 is used. Other hyper-parameters of CNN layer 3 are the same as CNN layer 1. In the second stream, two LSTM layers are used with the same cell size of 100. These streams are then concatenated with a fusion layer, and the output is forwarded to the attention layer. The combined feature vector from both streams of the network includes redundant information, making the network computationally expensive, leading to non-convergence of the network, and achieving limited performance. Thus, the attention layer is used to enable the network to remove the redundant information and to enable the network to focus on important information while ignoring the rest of the information, which leads to fast convergence of the network and achieves considerable performance. This optimal feature is then passed to a fully connected layer for the final prediction, where 3 fully connected layers of sizes 64, 32, and 12 are used in DSCLANet.

Results
This section delivers a comprehensive discussion about evaluation metrics, datasets, and experimental results. The experiments are conducted in the Keras framework with a backend TensorFlow, utilizing a GeForce RTX 2070 graphics card.

Evaluation Metrics
The performance of the DSCLANet is assessed on standard evaluation metrics, such as MAE, MBE, RMSE, and MSE. These are common metrics used in the literature to evaluate the forecasting performance of solar power prediction models. The MAE is the average absolute difference between actual and predicted values, and MBE indicates the average difference between these values. The MSE is the square difference between predicted and actual data, while RMSE is the square root of MSE. The mathematical equation of these metrics is given in Equations (9)-(12), as follows: where A represents the actual and P represents the predicted values by the model.

Datasets
In this work, we utilized DKASC Alice Spring DKASC-AS datasets to evaluate the performance of the proposed and other models. . These datasets include historical weather and solar power generation data with different generation capacities installed on different dates. Detailed information of the datasets, such as installation date, number of panels, type of panel, etc., are available of the DKASC website [53]. All the datasets are split into 70%, 20%, and 10% training, testing, and validation data, respectively. The proposed model and other ablation study models are evaluated using two-hour historical data as input to predict one hour ahead power generation.

Performance Evaluation of Deep Learning-Based Models
To substantiate the robustness of the proposed DSCLANet, we conducted experiments on several models based on deep learning. These models include LSTM, CNN, GRU, CNNGRU, CNNLSTM, and DCNN-BRLSTM. The results attained by each model for every dataset is demonstrated in Table 3

Comparison with State-of-the-Art
In this section, we compared the performance of DSCLANet with other baselines. The performance of the proposed approach is compared with the wavelet packet decomposition (WPD-LSTM) [54], RCC-LSTM [55], HIMVO-SVM [56], ESN-CNN [7], CNN-LSTM [57], DenseNet [28], LSTM-CNN [48], ELM [58], graph-network [59], and SolarNet [60] models. The detailed performance of these models is given in Table 4, where the DSCLANet attained the smallest error rates comparatively. The DKASC Alice Spring sites include several solar power plants, and the researcher evaluated their model performance over one, two, or three sites' data. Therefore, in this work, we compared the average performance of DSCLANet for three sites' data, namely Trina 1A, Trina 1B, and Eco 2, with these methods. Comparatively, the DSCLANet achieved a better performance in all error metrics, as shown in Table 4.

Conclusions
It is important to forecast solar power generation accurately to avoid penalties from customers, build trust in the energy markets, and schedule power generation. In mainstream deep learning and traditional learning methods, features are based on simple phenomena, and they only take into account spatial or temporal features to get around the nonlinearities of solar power generation series. However, some studies combine different methods for spatial and temporal feature extraction via a stacked layers mechanism. Therefore, in this work, we developed a dual-stream CNN-LSTM network for solar power prediction. The performance of DSCLANet is evaluated for real solar power datasets collected from a photovoltaic system located in Alice Springs, Australia. Before selecting the proposed model, extensive experiments are performed over different deep learningbased models. Furthermore, we compared the performance of the DSCLANet with other baselines and found that the proposed model outperforms them in terms of error reduction. Alongside higher performance, the DSCLANet uses two architectures, namely LSTM and CNN, for spatial and temporal feature extraction. However, combining multiple methods for spatial and temporal feature extraction increases the model complexity. Therefore, in the near future, we intend to develop a solo architecture with the ability to extract both types of features. Furthermore, we also intend to investigate emerging technologies, such as probabilistic forecasting, incremental learning, active learning, and reinforcement learning for solar power prediction.