PV Power Prediction, Using CNN-LSTM Hybrid Neural Network Model. Case of Study: Temixco-Morelos, México

Due to the intermittent nature of solar energy, accurate photovoltaic power predictions are very important for energy integration into existing energy systems. The evolution of deep learning has also opened the possibility to apply neural network models to predict time series, achieving excellent results. In this paper, a five layer CNN-LSTM model is proposed for photovoltaic power predictions using real data from a location in Temixco, Morelos in Mexico. In the proposed hybrid model, the convolutional layer acts like a filter, extracting local features of the data; then the temporal features are extracted by the long short-term memory network. Finally, the performance of the hybrid model with five layers is compared with a single model (a single LSTM), a CNN-LSTM hybrid model with two layers and two well known popular benchmarks. The results also shows that the hybrid neural network model has better prediction effect than the two layer hybrid model, the single prediction model, the Lasso regression or the Ridge regression.


Introduction
Nowadays renewable energy (RE) aims to be a real solution to solve the fossil fuel problem. A very well known renewable energy is solar photo-voltaic (PV), which is also in fast growth. Global PV capacity is estimated to increase significantly from 593.9 GW in 2019 to 1582.9 GW in 2030 following significant capacity additions by China, India, Germany, the US and Japan [1]. Solar PV generation plays an important role in the future of energy structure. According to some estimations of the Joint Research Centre (JRC), solar PV power generation capacity may reach 1.4 TW by 2024 just in Europe [2]. Nevertheless, the intermittent nature of the solar resource poses big challenges for energy integration into existing energy systems. Precise PV prediction is a good way to solve this problem [3].
On the other hand, over the past eight years, off-grid systems in the form of stand-alone solar home systems (SHS's) have proved the most popular and immediate solution for increasing energy access in rural areas across the Global South [4]. In Mexico, there are aprox. 1.2 million people living without access to the electrical grid [5]. SHS's can be a real option to empower rural communities with autonomous energy production [6]. In rural places with solar resource and with no access to the grid, this is a viable solution.
The interconnection of existing stand-alone solar home systems can form micro grids [4]. This tiny grid is formed by prosumers (households capable of producing and consuming electrical power, Figure 1) and consumers (households only consuming power). In this new architecture of distributed PV generation, many authors propose that consumers and prosumers can even trade self-produced energy on a micro grid energy market. This involves an information layer that feeds data to an energy management system (EMS) as it is shown in Figure 1. The EMS acts as the middleman between the physical and the information layer. The EMS needs to have accurate prediction photo voltaic power (PVP) models, consumption prediction models, an energy market model and other tools which are essential in a smart energy management system for increasing reliability, sustainability, efficiency and flexibility [7][8][9][10]. In general, there are three main prediction methods: statistical models, physical models and machine learning models. The physical model relies on dynamics between solar radiation and laws of physics [11]. The statistical model mainly depends on historical data, statistics and probability theory to forecast future time series [12,13]. The machine learning models map directly from inputs to outputs, they extract complex nonlinear features in a very efficient way [14,15]. In the machine learning models we can highlight artificial neural networks (ANN) especially one type of ANN. Recurrent neural networks (RNN), which is one of the most commonly used methods for forecasting time series [16,17]. The RNN has been studied in various applications like wind speed prediction [18], energy power consumption prediction [19,20] or even traffic prediction [21], achieving excellent results.
However, one common problem in RNN (with gradient-based learning methods and back-propagation), is the vanishing of gradient. Gradient vanishing occurs while training long data sequences. This means that the gradient of the loss function approaches zero, making the network hard to train [22]. Long Short Term Memory networks (LSTM) solve this problem [23,24].
PV prediction using LSTM models has been studied by many authors [14,17,22] achieving to reduce the prediction error compared with other traditional methods.
In recent years, many researchers have combined CNN and LSTM models to extract temporal and spacial features. In the medical field Gill et al. proposed a CNN-LSTM model to accurately detect arrhythmias in the ECG [16]. Zhang C. Y. used deep belief networks (DBN) for wind speed prediction [21] achieving better results than the traditional methods (such as SVR). Kim [25] proposed a hybrid CNN-LSTM model for electric energy consumption achieving superior results than other deep learning based methods. They found that extracting first the local features and then temporal ones worked better than a LSTM-CNN model, performing with a medium square error (MSE) of 3.738. They also found that time series decomposition with deep learning models provides useful visualizations to better understand the problem of predicting and analyzing energy consumption. Wang et al. proposed an hybrid LSTM-CNN model for PV power prediction [26]. Accomplishing four main contributions:

•
They proposed a hybrid photo voltaic power prediction deep learning model which considers the temporal and local features.

•
The temporal features of the data were first extracted (using a LSTM model) and then the local features using a CNN model.

•
They reduced the complexity of the model by selecting the 1D-CNN model to extract the local features, and then the data conversion link was eliminated. Therefore PV power prediction is greatly facilitated.

•
They compared with other models (CNN, LSTM, CNN-LSTM), to prove the validity of the model.
Wang et al. [26] proposed a hybrid model of one dimension for PV prediction. In this work we choose a hybrid model with a stronger multi-layer architecture, this includes a 5D-CNN model with max pooling and a 5D-LSTM model. Indeed, the five dimensional CNN and LSTM model will consume more computational resources for training than a uni-dimentional model, but high accuracy will be achieved [27,28]. Therefore, the computational time and some metrics such as the mean square error (MSE) of the proposed model will be compared with other deep learning models.

The Dataset
We use data of a 1.7 kW photo voltaic solar system (PVSS) and from a meteorological station (the ESOLMET station ( Figure 2)) 120 m from the PVSS (Figure 3) . All these ten variables will conform our dataset, and are shown in Figure 4. The dataset consists of 52,428 values per variable. From the date: 01/01/2019 00:00:00:00 to the date: 01/01/2020 00:00:00:00, with a resolution of 10 minutes. We used 80% of the data to train the model (41,942 values) and 20% to validate our predictions(10,486 values).
Photo voltaic power generated (PVPG), is the variable we want to predict, in Figure 5. PVPG is shown through the complete year. As it will be explained in detail in the next section, our model takes an input of 10 variables and the proposed neural network throws a specific output(PVPG) in time.

Local Feature Learning with CNN
CNN are very popular for extracting local features in images, for example. Convolution is the main concept of a CNN. Our proposed CNN includes two parts: the convolutional layers (Equations (1) and (2)) and the pooling layer (the main purpose of this layer is to reduce the number of parameters of the tensor by reducing its size Figure 6) which helps to reduce computation time. In the convolution layer, the previous layer features graph interacts with the convolutional kernel; this interaction forms the output feature graph j of the convolutional layer. Each one of this output feature graph j might contain a convolution with multiple input feature graphs.
The equations that define the convolution layer are: where c j is a set of input feature graphs. b l j is the bias, y l j the output of the convolution and w t l j the feature graph of the convolution layer l.
f is known as the activation function. In this work we use a rectified linear unit (Relu) defined in Equation (3).

Temporal Feature Learning with LSTM
The lower layers of the proposed model are the LSTM. This layers store time information about important characteristics of PV power generation extracted by the CNN. LSTM preserves long-term memory by using memory units that can update the previous hidden state. This functionality makes it possible to understand temporal relationships on a long-term sequence. This time recurrent neural network was proposed by Hochreiter & Schmidhuber in 1997 [30]. Its internal memory unit and gate mechanism overcome the gradient disappearance problem that occurs in training traditional recurrent neural networks (RNN). The memory channel and the gate mechanism (which includes: forget gate, input gate, update gate and output gate) are shown on Figure 7. The equations of the LSTM model are the following [24]: f t is the output value of the forget gate. And σ refers to the sigmoid activation function.
i t is the output value of the input gate.
g t is the output value of the update gate.
c t refers to the memory cell.
o t is the output value of the output gate.
where h t is the output vector result of the memory cell at time t (see Figure 2). W f ,i,g,o are the weights matrices and b f ,i,g,o the bias vectors.

CNN-LSTM Hybrid Model
The proposed CNN-LSTM structure as it is shown in Figure 8. consists of a five CNN and LSTM layers and a full connected layer. The input of the neural network has ten variables, such as temperature, irradiance, global radiation and others (see the section Dataset for more information about the variables).
First, the upper layer of CNN-LSTM consists of CNN. The CNN layer can receive various variables that affect PV power generated, such as voltage, intensity, and global radiation. On the other side, the dataset is separated in two parts: 80% for training the model and 20% to validate the results. In the validation part we used PVPG in Temixco during 2019 as shown on Figure 5.
The CNN consists of an input layer that accepts sensor variables as inputs, an output layer that extracts features to LSTMs, and several hidden layers. The hidden layers used consists of: a convolution layer, a ReLU layer,an activation function, or a pooling layer.
From the presented data in the input a unique output pattern is generated. The CNN extracts the local features and the LSTM temporal part. With this structure, the neural network "learns" for every input a weight that determines a specific output.  Table 1.
All the hyper-parameters were designed by the trial and error method. This process means that certain hyper-parameters are specified, then trained with 80% of the data and validated with other 20%, for the four seasons throughout the year. Hyper-parameters are then changed and the process is repeated until they are finally optimized.
In order to demonstrate the good performance of the proposed model, the results obtained by our model are compared with other models (LSTM and 2D CNN-LSTM) and two well known benchmarks [31]. We will forecast PVPG for three different scenarios: summer, fall and winter. These results will be further discussed in the last sections.

Performance Evaluation Metrics
For this work, we selected some metrics to evaluate the model. They are: MSE (Mean Square Error), RMSE (Root Mean Square Error) and MAE (Mean Absolute Error). These metrics are defined as follows [25]: where: Y i is the real PVPG value, Y i the predicted value and n the number of Y i . MSE, RMSE and MAE have been used as a standard statistical metric to measure models performance. This is an easily computable quantity because is sample-dependent [32,33].

Performance Comparison with Other Deep Learning Models
In this work we propose a five CNN layer with max-pooling, a five layer LSTM structure and a fully connected layer, also known as dense layer. We used the framework Keras and Tensorflow in Python 3.6 to program the models. The computational process was accomplished in a personal computer, with a 64 bit operating system, 8 GB of RAM with an Intel Core processor i7-4700MQ (2.4 GHz, 6 MB cache and 4 cores).
A part of the dataset was used to train the model (80 %) and another part was used to verify the predictions (20 %). Figures 9-11 show the graphs of the PV power prediction during summer, fall and winter respectively. On the X axis, we have time with a 10-min time resolution and on the Y axis, power in [W]. The blue curve are the actual measurements and the yellow curve is the prediction of the five-layer hybrid model, the orange curve corresponds to the two-layer model and the green curve is a five-layer LSTM model. The metrics that evaluate the performance of the different models are shown in Table 2. In Table 3, the computation time processing for each model is displayed (Time window prediction is of 10 min for this experiment).   As it is shown in Figures 10-12, if we compare the single LSTM and the 2D CNN-LSTM models with the five layer CNN-LSTM model we can observe that the five layer model can predict more efficiently irregularities in Figure 13 is accomplished with a strong 5D CNN-LSTM network.

Performance Comparison with Competitive Benchmarks
We compare the performance of the proposed method with competitive benchmarks using Temixco ESOLMET dataset [34]. Table 4 summarizes the performance of competitive benchmarks (Time window prediction is of 10 min for this experiment.).
Yang et al. refer to benchmarking, a essential practice in solar prediction for comparing published results [35]; some authors like Pedro et al., used some well known benchmarks based on linear regression: Lasso and Ridge regression [31]. We select these two models for benchmarking due to their high interpretability and availability in statistical software and in Python.
Furthermore, we compare competitive benchmarks with the proposed model, the results are shown in Table 4. The forecasting horizons range from 10-min ahead to 180-min ahead. As it is expected, when the horizon prediction increases the RMSE also does it in a linear way (Figure 13).
It can be seen that the proposed model can provide good prediction performance. Moreover, at all prediction horizons in the study cases, the proposed method has the smallest RMSE metrics, which shows the best prediction performance compared to other competitive benchmarks.  Most of the related work [26,[36][37][38], uses a single layer approach for PV power prediction. Nevertheless, the results of this work show metrics like MSE, to be ten times lower for a multi-layer CNN-LSTM deep neural architecture. This obviously comes with more processing time for training the model, as it is shown in Table 3. However, this time is completely useful for all of the prediction horizons presented, and can be dropped down if the model is trained with a GPU or TPU processing unit.
Also results for different prediction horizons (from 10-min horizon to 180-min horizon) show the RMSE of the proposed model to be lower than competitive benchmarks in all cases (see Table 4 and Figure 13).

Conclusions
We use real data from the location 18 • 50 24.1 N 99 • 14 09.0 W (Temixco, Mexico) to train and validate a 5D CNN-LSTM hydrid model. The dataset was composed by ten variables such as temperature, solar radiation and others, for predicting PV power generation (PVPG). A five layer CNN-LSTM hybrid model was compared with a single LSTM model and another CNN-LSTM hybrid model of two layers. The error of a robust CNN-LSTM net (in this experiment with five layers) was: MSE = 0.00689, as shown in Table 2. Therefore, we can say that there is a considerable correlation between a robust neural network model with a highly accurate prediction effect.
We propose a CNN-LSTM model for precise prediction of PV power energy generation. The 5D CNN-LSTM hybrid model accurately predicts PV power generation by extracting features from variables that affect PVPG. The proposed model is compared to other machine learning methods to demonstrate its usefulness. We used CNN-LSTM to learn trends in the PVPG and temporal information from multivariate time-series.
The CNN-LSTM model proposed in this paper predicts irregularities in PV power generation (PVPG) that could not be well learned by other existing machine learning models. In Figure 13 we see how on a rainy day the 5D CNN-LSTM model (in green) is exact by predicting the irregular PVPG trend in cloudy conditions. The results of this paper represent that the 5D CNN-LSTM model predicts PVPG with high accuracy, and shows the highest performance compared with the single LSTM model and the 2D CNN-LSTM hybrid model. The results also shows the highest performance compared with two existing competitive benchmarks.
Nevertheless, the computational time for a five layer hybrid model (69.1148 s) is bigger than the two layer hybrid model (8.0362 s) or the single LSTM model (5.1394 s) as it is shown in Table 3.
A neural network prediction model should take in consideration that a high accuracy performance is compromised by computational time processing.
On the other hand, the introduction of TPU's (Tensor Processing Units) had include hardware advantages with faster computational processing time [39], this experiments were conducted in a CPU Intel Core i7, but robust hybrid neural networks models like the one presented in this work, should be studied in new generation hardware like the Coral TPU accelerator introduced by Google [40], in order to reduce time processing and increase performance.
Also, further studies should be done to implement dynamic prediction systems (generation and consumption in households) with data available from an IoT network of devices in low cost and low power in open-source platforms. Thus, aiming to create tools for a smart grid and efficient trading energy management systems.

Future Work
The proposed model will be integrated in a Raspberry Pi 3 with a Coral TPU accelerator to have a real time data prediction system. The general scheme is shown in Figure 14. The ESP32 sends measurements to an IoT server (Thingsboard). This real time data is also available for the Raspberry Pi to calculate new weights and update the hybrid model. The Coral TPU will provide the Raspberry Pi the capability to run machine learning models. This kind of low cost system will be possible to use in multiple prosumers in a micro-grid environment. Accurate PV power and electric energy consumption predictions are essential for energy management systems in energy transaction architectures [8,[41][42][43], where energy becomes a medium of exchange. A low power prediction processing unit such as a Raspberry can increase the performance of SHS's in rural areas by introducing a trading energy prediction mechanism but more studies should be done in this area.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: