Short-Term Forecasting of Photovoltaic Solar Power Production Using Variational Auto-Encoder Driven Deep Learning Approach

The accurate modeling and forecasting of the power output of photovoltaic (PV) systems are critical to efficiently managing their integration in smart grids, delivery, and storage. This paper intends to provide efficient short-term forecasting of solar power production using Variational AutoEncoder (VAE) model. Adopting the VAE-driven deep learning model is expected to improve forecasting accuracy because of its suitable performance in time-series modeling and flexible nonlinear approximation. Both singleand multi-step-ahead forecasts are investigated in this work. Data from two grid-connected plants (a 243 kW parking lot canopy array in the US and a 9 MW PV system in Algeria) are employed to show the investigated deep learning models’ performance. Specifically, the forecasting outputs of the proposed VAE-based forecasting method have been compared with seven deep learning methods, namely recurrent neural network, Long short-term memory (LSTM), Bidirectional LSTM, Convolutional LSTM network, Gated recurrent units, stacked autoencoder, and restricted Boltzmann machine, and two commonly used machine learning methods, namely logistic regression and support vector regression. The results of this investigation demonstrate the satisfying performance of deep learning techniques to forecast solar power and point out that the VAE consistently performed better than the other methods. Also, results confirmed the superior performance of deep learning models compared to the two considered baseline machine learning models.


Introduction
The accurate modeling and forecasting of solar power output in photovoltaic (PV) systems are certainly essential to improve their management and enable their integration in smart grids [1,2]. Namely, the output power of a PV system is highly correlated with the solar irradiation and the weather conditions that explain the intermittent nature of PV system power generation. Particularly, the characteristic of fluctuation and intermittent of the temperature and solar irradiance could impact solar power production [3]. In practice, a decrease of larger than 20% of power output can be recorded in PV plants [4]. Hence, the connected PV systems to the public power grid can impact the stability and the expected operation of the power plant [5]. Given reliable real-time solar power forecasting, the integration of PV systems into the power grid can be assured. Also, power forecasting becomes an indispensable component of smart grids to efficiently manage power grid generation, storage, delivery, and energy market [6,7].
Long-and short-term forecasting methods are valuable tools for efficient power grid operations [8,9]. The success of integrating PV systems in smart grids depends largely on the accuracy of the implemented forecasting methods. Numerous models have been developed to enhance the accuracy of solar power forecasting, including autoregressive integrated moving average (ARIMA), and Holt-Winters methods. In Reference [10], a short term PV power forecasting based on the Holt-Winters algorithm (also called triple exponential smoothing method) has been introduced. This model is simple to construct and convenient to use. In Reference [11], different time series models including Moving average models, exponential smoothing, double exponential smoothing (DES), and triple exponential smoothing (TES) have been applied for short-term solar power forecasting. In Reference [12], a coupled strategy integrating discrete wavelet transform (DWT), random vector functional link neural network hybrid model (RVFL), and SARIMA has been proposed to a short-term forecast of solar PV power. This study showed that the use of the DWT negatively affects the accuracy of solar PV power forecasting under a clear sky. While the quality of the forecast model is improved when using DWT in cloudy and rainy sky weather. In addition, the coupled model showed superior forecasting performance in comparison to individuals models (i.e., SARIMA or RVFL). However, switching between two forecast models is not an easy task, particularly for real-time forecasting. In Reference [13], a hybrid model merging seasonal decomposition and least-square support vector regression was developed for forecasting monthly solar power output. Improved results have been obtained with this hybrid model compared to those obtained with ARIMA, SARIMA, and generalized regression neural network.
In recent years, shallow machine learning (ML) as non-parametric models, which are more flexible, have been widely exploited in improving solar PV forecasting. These models possess desirable characteristics and can model the complicated relationship between process variables and do not need an explicit model formulation to be specified, as is generally required. In Reference [14], a hybrid approach combining support vector regression (SVR) and improved adaptive genetic algorithm (IAGA) is developed for an hourly electricity demand forecasting. It has been shown that this hybrid approach outperformed the traditional feed-forward neural networks, the extreme learning machine (ELM) model, and the SVR model. In Reference [15], an approach for forecasting PV and wind-generated power using the higher-order multivariate Markov Chain. This approach considers the time-adaptive stochastic correlation between the wind and PV output power to achieve the 15-min ahead forecasting. The observation interval of the last measured samples are included to follow the pattern of PV/wind power fluctuations. In Reference [16], a univariate method is developed for multiple steps ahead of solar power forecasting by integrating a data re-sampling approach with machine learning procedures. Specifically, machine learning algorithms including Neural Networks (NNs), Support Vector Regression (SVR), Random Forest (RF), and Multiple Linear Regression (MLR) are applied to re-sampled time-series for computing multiple steps ahead predictions. However, this approach is designed only for univariate time series data. In Reference [17], a forecasting strategy combining the gradient boosting trees algorithm with feature engineering techniques is proposed to uncover information from a grid of numerical weather predictions (NWP) using both solar and wind data. Results indicate that appropriate features extraction from the raw NWP could improve the forecasting. In Reference [18], a modified ensemble approach based on an adaptive residual compensation (ARC) algorithm is introduced for solar power forecasting. In Reference [19], an analog method for day-ahead regional photovoltaic power forecasting is introduced based on meteorological data, and solar time and earth declination angle. This method exhibited better day-ahead regional power forecasting compared to the persistence model, System advisor model, and SVM model.
Over the last few years, deep learning has emerged as a promising research area both in academia and industry [20][21][22][23][24]. The deep learning technology has realized advancement in different areas, such as computer vision [25], natural language processing [26], speech recognition [27], renewable energy forecasting [4,28], anomaly detection [29][30][31], and reinforcement learning [32]. Owing to its data-driven approaches, deep learning has brought a paradigm shift in the way relevant information in time series data are extracted and analyzed. By concatenating multiple layers into the neural network structures, deep learning-driven methods enable flexible and efficient modeling of implicit interactions between process variables and automatic extraction of relevant information from a voluminous dataset with limited human instruction. Various deep techniques have been employed in the literature for improving solar power forecasting. For instance, in Reference [33], Recurrent Neural Networks (RNNs) is adopted for PV power forecasting. However, simple RNN is not suited to learn long-term evolution due to the vanishing gradient and exploding gradient. To bypass this limitation, several variants of RNN have been developed including Long Short-Term Memory Networks (LSTM) and gated recurrent unit (GRU) networks. Essentially, compared to a simple RNN model, LSTM and GRU models possess the superior capacity in modeling time-dependent data within a longer time span. In Reference [4], the LSTM model, which is a powerful tool in modeling time-dependent data, is applied to forecast solar power time series data. In Reference [34], a GRU network, which is an extended version of the LSTM model, has been applied to forecast short-term PV power. In Reference [35], at first, an LSTM recurrent neural network (LSTM-RNN) is applied for independent day-ahead PV power forecasting. Then, the forecasting results have been refined using a modification approach that takes into consideration the correlation of diverse PV power patterns. Results showed that the forecasting quality is improved by considering time correlation modification. In Reference [36], by using the LSTM model, a forecasting framework is introduced for residential load forecasting to address volatility problems, such as variability of resident's activities and individual residential loads. Results show that the forecasting accuracy could be enhanced by incorporating appliance measurements in the training data. In Reference [37], a hybrid forecasting approach is introduced by combining a convolutional neural network (CNN) and a salp swarm algorithm (SSA) for PV power output forecasting. After classifying the PV power data and associated weather information in five weather classes: rainy, heavy cloudy, cloudy, light cloudy, and sunny, the CNN is applied to predict the next day's weather type. To this end, five CNN models are constructed and SSA is applied to optimize each model. However, using several CNN models makes this hybrid approach not suitable for real-time forecasting. In Reference [38], a method combining deep convolutional neural network and wavelet transform technique is proposed for deterministic PV power forecasting. Then, the PV power uncertainty is quantified using quantile regression. Results demonstrated the deterministic model possesses reasonable forecasting stability and robustness. Of course, deep learning models possess the capacity to efficiently learn nonlinear features and pertinent information in time-series data that should be exploited in a wide range of applications.
This study offers a threefold contribution. Firstly, to the best of our knowledge, this the first study introducing a variational autoencoder (VAE) and Restricted Boltzmann Machine (RBM) methods to forecast PV power. Secondly, this study provides a comparison of forecasting outputs of eight deep learning models, including simple RNN, LSTM, ConvLSTM, Bidirectional LSTM (BiLSTM), GRUs, stacked autoencoders, VAE, and RBM, which takes into account temporal dependencies inherently and nonlinear characteristics. The eight deep learning methods and two commonly used machine learning methods, namely logistic regression (LR) and support vector regression (SVR), were applied to forecast PV power time-series data. Finally, for the guidance of short-and long-term operational strategies for PV systems, both single-and multi-step-ahead forecasting are examined and compared in this paper. Data sets from two grid-connected plants are adopted to assess the outputs of the deep learning-driven forecasting methods. Section 2 introduces the eight used deep learning methods. Section 3 describes the deep learning-based PV power forecasting strategy. Section 4 assesses the forecasting methods and compares their performance using two actual datasets. Finally, Section 5 concludes this study and sheds light on potential future research lines.

Model
Description Key Points • RNNs are able to include historical information in the forecasting process via their recurrent structure and memory units • Simple RNN do not have gates [39].

•
The RNNs are entirely trained in a supervised way.

+
Modeling time dependencies -Simple RNNs fail to catch the long-term evolution due to the vanishing gradient and exploding gradient [33].
• LSTM consists of three gates regulating the information flow called input, forge, and output gates [40].
• Gate mechanism is used to store and memorize historical data features.
• GRU use two gates, while LSTM is based on three gates.

+
LSTM showed good performance for learning long-term dependencies more easily than the conventional RNN -Its training is relatively longer than that of other RNN algorithms - The architecture of typical LSTM is very complex

•
The major demarcation of GRU from LSTM is that only one unit is used to control both the forgetting factor and the decision to update the state unit [41].
• GRU contains only two gates, the update, and the reset gates. • The GRU has been widely used in time-series, data sequence (e.g., speech and text processing), temporal features extraction, prediction, and forecasting.

+
The attractive features of the GRU model are the shorter training time compared to the LSTM and the fewer parameters that the GRU model possesses compared to the LSTM [41].

-
GRU models have problems such as slow convergence rate and low learning efficiency, resulting in too long training time and even under-fitting.

•
Compared to the LSTM model that passes the input data through the network in one direction from past to future (forward), the BiLSTM processes the input also in the backward direction from the future to the past [42].
• This architecture improves the learning of complex temporal dependencies through double processing.

+
Modeling time dependencies + Improved accuracy in state reconstruction is achieved by BiLSTM that merges the desirable features of both bidirectional RNN and LSTM [42].
The ConvLSTM is a special variant of the traditional LSTM, in which the fully-connected layer operators are replaced with convolutional operators [43].
• LSTM with recurrent connection to deal with data sequences. • The convolutional layer can deal with 2D inputs like a sequence of images.

+
The ConvLSTM can process 2D input through convolutional transformations to learn the spatial features and then feed the LSTM module. + It has been used in modeling time dependencies, feature extraction, and spatiotemporal modeling -Complex architecture Table 1. Cont.

Model Description Key Points
• Autoencoders are neural networks that aim to create a compact representation of a given input x like images or any type of data [44]. • The network learns how to compress the input features by keeping the most important information by minimizing the reconstruction error between the compressed input and the original input x [44].
• Autoencoders are usually stacked to build a deep-stacked autoencoder.

Variational Autoencoders Model
VAEs are an essential class of generative-based techniques that are efficient to automatically extract information from data in an unsupervised manner [20,47]. One desirable characteristic of VAEs is their ability for reducing the input dimensionality enabling them to compress large dimensional data into a compressed representation. Moreover, they are very effective for approximating complex data distributions using stochastic gradient descent [47]. There are two major advantages of VAEs compared the conventional autoencoders, one is they are efficient to solve the overfitting problem in the conventional autoencoders by using a regulation mechanism in the training phase, and the second advantage is that they have proved effective when handling various kinds of complex data in different applications, including handwritten digits, and urban networks modeling [48]. Here, VAE is adopted for solar PV production forecasting. Figure 1 shows a schematic diagram of the construction of a VAE. Basically, the VAE, as a variant of autoencoders, contains two neural networks an encoder and a decoder, where the encoder mission is to encode a given observed set, X into a latent space Z as distribution, q (z|x). The latent (termed hidden) space dimension is decreased in comparison to the dimension of the observed set. Indeed, the encoder is built to compress the observed set toward this reduced dimensional space efficiently. Then, a sample is generated via, z ∼ q(z|x),using the learned probability distribution. On the other hand, the key purpose of the decoder, p (x|z), consists in generating the observation x based on the input z. It should be emphasized that the reconstruction of data using the decoder results in some deviation of reconstruction, which is calculated and backpropagated through the network. This error is minimized in the training phase of the VAE model by the minimization of the deviation between the observed set and the encoded-decoded set.
To summarize, the VAE encoder is gotten via an approximate posterior q θ (z|x), and the decoder is obtained by a likelihood p φ (x|z), where θ and φ refers respectively to the parameters of encoder and decoder. Here a neural network is constructed for learning θ and φ. Essentially, the VAE encoder's role is learning latent variable z based on gathered sensor data, and the decoder employs the learned latent variable z for recovering the input data. The deviation between the reconstructed data and the input data should be close to zero as possible. Notably, the learned latent variable z from the encoder is used for feature extraction based on the input data. Usually, the dimension of the output of the encoder is smaller than that of the original data, which leads to the dimensionality reduction of input data. Note that the encoder is trained by training the entire VAE comprising encoder and decoder.
It is worth pointing out that the loss function has an essential effect on feature extraction for training VAE. Assume that X t = [x 11 , x 2t , . . . , x Nt ] is the input data points of VAE at time point t, and X t is the reconstructed data using the VAE model. Furthermore, it is assumed maximizing the marginal likelihood learning of parameters, expressed as [49]: where D KL [.] denotes the Kulback-Leibler divergence, and L refers to the likelihood of the parameters of encoder and decoder (i.e., θ and φ). Hence, the loss can be expressed as The VAE's loss function is composed of two parts: the reconstruction loss and a regularizer. Reconstruction loss tries to get an efficient encoding-decoding procedure. In contrast, a regularizer part permits the regularization of the latent space construction to approximate the distributions out of the encoder as near as feasible to a prefixed distribution (e.g., Normal distribution). Figure 2 schematically summarizes the procedure for computing the loss function.
The term (2) permits reinforcing the decoder capacity to learn data reconstruction. Higher values of the reconstruction loss mean that the performed reconstruction is not suitable , while lower values mean that the model is converging. The regularizer is reported using the Kulback-Leibler (KL) divergence separating the distribution of the encoder function (q θ (z|x)) and of the latent variable prior (z, |p φ (z)). Indeed, KL is employed to compute the distance that separates two given probability distributions. The gradient descent method is used to minimize the loss function with respect to the encoder's parameters and decoder in the training phase. Overall, we minimize the loss function to ensure getting a regular latent space,z, and adequate sampling of new observation using z ∼ p φ (z) [50]. Let assume that p φ (z) = N (z; 0, I), we can write q θ (z|x) in the following form: The mean and standard deviation of the approximate posterior are denoted by (µ, σ), respectively. Note here that a layer is dedicated to both of them. Moreover, the latent space z is constructed using a deterministic function g parameterized by φ and an auxiliary noise variable ε ∼ p(ε) or more specifically ε ∼ N (0, I).
The reconstruction error term can be expressed in the following form: where the denotes the element-wise product. Overall, the encoder and decoder's parameters are obtained by minimizing the loss function, L(θ, φ), using the training observations. The VAE is trained using the procedure tabulated Algorithm 1.

Deep Learning-Based PV Power Forecasting
The input data consists of PV power output that variates between 0 and the rated output power. Thus, when handling some large-value data with the RNN model, a gradient explosion can be occurred and negatively affects the performance of the RNN. Furthermore, the learning effectiveness of RNN will be reduced. To remedy this issue, the input data is normalized via min-max normalization within the interval [0, 1], and then used for constructing the deep learning models. The normalization of the original measurements, y is defined as: where y min and y max refer to the minimum and maximum of the raw PV power data, respectively. After getting forecasting outputs, we applied a reverse operation to ensure that the forecasted data match to the original PV power time-series data. y = y * (y max − y min ) + y min .
As discussed above, the generated PV power shows a high level of variability and volatility because of its high correlation with the weather conditions. Hence, for mitigating the influence of uncertainty on the accuracy of the PV power forecasting this work presents a deep-learning framework to forecast PV power output time-series. Essentially, deep learning models are an efficient tool to learn relevant features and process nonlinearity from complex datasets. In this study, a set of eight deep learning models have been investigated and compared for one-step and multiple steps ahead forecasting of solar PV power. The overall structure of the proposed forecasting procedures is depicted in Figure 3. As shown in Figure 3, solar PV power forecasting is accomplished in two phases: training and testing. The original PV power data is split into a training sub-data and a testing sub-data. At first, the raw data is normalized to build deep learning models. Adam optimizer is used to select the values of parameters of each model by minimizing the loss function based on training data. Once the models are constructed, they are exploited for PV power output forecasting. The quality of models are quantified using several statistical indexes including the Coefficient of determination (R 2 ), explained variance (EV), mean absolute error (MAE), Root Mean Square Error (RMSE), and normalized RMSE (NRMSE). Essentially, the deep learning-driven forecasting methods learn the temporal correlation hidden on the PV power output data and expected to uncover and captures the sequential features in the PV power time series. The main objective of this study is to investigate the capability of learning models namely RNN, LSTM, BiLSTM, ConvLSTM, GRU, RBM, SAE, and VAE for one-step and multiple-steps ahead solar PV power forecasting.

Training Procedure
The eight models investigated in this study can be categorized into two classes: autoencoders and recurrent neural networks. The autoencoders represented include RBM, VAE, and SAEs while the RNN-based models contain RNN, LSTM, GRU, BiLSTM, and ConvLSTM. The dataset used for training and testing are normalized first, and more data preprocessing is needed for the autoencoder models. For instance, data reshaping is needed to transform the univariate PV power time-series data to a two-dimension matrix to be used as input for the autoencoders including the SAE, VAE, and RBM. The main difference between the two classes in the training phase is the learning way, the RNNs are entirely supervised trained while the auto-encoders are first pre-trained in an unsupervised manner and then the training is completed based on supervised learning. Specifically, RNNs models are trained in a supervised way by using a subset of training as input sequence (X = x 1 , . . . , x k ) and an output variable Y = x k+1 . The sequence length l, called the lag, is a crucial parameter used in the data preparation phase. The mapping sequence to the next value is constructed using a window sliding algorithm. The value of l is determined using the Grid Search approach [51]. Here, the value of l is chosen 6, which is the lowest value that maximizes the overall performance of the proposed approach.
RNN-based models are trained to learn the mapping function from the input to the output. After that, these trained models are used to forecast new data that complete the sequence. On the other hand, the greedy layer-wise unsupervised plus fine-tuning were applied to RBM, VAE, and SAES. It should be noted that PV power output forecasting based on autoencoder is accomplished as a dimensionality reduction. That is these models do not have the possibility to discover time dependencies or model time series data. Hinton [44] shows that a greedy layerwise unsupervised learning for each layer followed by a fine-tuning improves the features extraction and learning process of the neural networks dedicated to prediction problems or for dimensionality reduction like autoencoders. The VAE-driven forecasting procedure including the pretreatment step is illustrated in Figure 4.

Measurements of Effectiveness
The deep learning-driven forecasting methods will be evaluated using the following metrics: R 2 , RMSE, MAE, EV, and NRMSE.
where y t are the actual values,ŷ t are the corresponding estimated values, y is the mean of measured power data points, and n is the number of measurements. Instead of using RMSE that relies on the range of the measured values, the benefit of using NRMSE as the statistical indicator is that it does not rely on the range of the measured values. NRMSE metric indicates how well the forecasted model response matches the measurement data. A value of 100% for NRMSE denotes perfect forecasting and lower values characterize the poor forecasting performance. Lower RMSE and MAE values and EV and R2 closer to 1 are an indicator of accurate forecasting.

Data Description
In this study, solar PV power data from two PV systems are adopted to verify the performance of the eight deep learning-driven forecasting methods.

Forecasting Results
Accurate short-term forecasting of PV power output gives pertinent information for maintaining the desired power grid production delivery and storage [7,53]. This section assesses the eight models (i.e., RNN, GRU, LSTM, BiLSTM, ConvLSTM, RBM, AE, and VAE) and compares their forecasting performance using PV power output collected from two different PV systems. Towards this ends, we first build each to capture the maximum variance in training data and then use them to forecast the future trend of PV power output. The training data in Data Set 1 consists of one-minute power data collected from 1 January 2017 to 29 June 2017. The training data in Data Set 2 is collected from 1 January 2018 to 19 October 2018. The hyper-parameters of the built deep learning methods based on training datasets are tabulated in Table 2. For all models, we used the cross-entropy as loss function and Rmsprop as an optimizer in training. The principal feature of the PV power output is its intermittency. This unpredicted fluctuation in solar PV power could lead to many challenges including power generation control and storage management. Essentially, it is crucial to appropriately forecast PV power output to guarantee reliable operation and economic integration in the power grid. In the first case study, the above-trained models will be evaluated using the testing solar PV power output starting from 30 June to 6 July 2017 collected from Parking lot canopy array. Forecasting outputs using the eight deep learning models using test measurements are displayed in Figure 7. These results illustrate the goodness of deep learning models for PV power forecasting.
Also, to show clearly the accordance of the measured and the forecast outputs from the investigated deep learning models, the scatter plots are presented in Figure 8. Figure 8 shows that the forecasted data from RBM and SAE models are moderately correlated with the actual PV power output. The forecasting with ConvLSTM is relatively weakly correlated with the measured power data. On the other hand, the forecasted power with RNN-based models and the VAE model are strongly correlated with the measured PV power.  Now, to quantitatively evaluate the forecasting accuracy of the eight considered models based on the testing data, five statistical indexes are computed and listed in Table 3. Also, we compared the eight the forecasting results of the ten deep learning models with two baseline machine learning models: LR and SVR (Table 3). For this application, ConvLSTM performs poorly in terms of the forecasting accuracy compared to the other models and cannot track well the variability of PV power and does not describe the most variance in the data (i.e., EV = 0.832). Moderate forecasting performance are obtained using RBM and SAE by explaining respectively 0.929 and 0.932 of the total variance in the testing PV power data. The results of this investigation exhibit also that the VAE model provides accurate forecast in comparison to the other models by achieving PV power forecast with lower RMSE, MAE, and higher NRMSE (%) as well as the highest R2, EV values closer to 1 that means that most of the variance in the data is captured by the VAE model. Specifically, the VAE model achieved the highest R2 of 0.992 and the lower RMSE (6.891) and MAE (5.595). We highlight that this is the first time that the VAE model is used for solar PV power output forecasting. This application showed that the VAE method for PV power forecasting has superior performance. Also, it is noticed that RNN and its extended variants LSTM, BiLSTM, and GRU achieve slightly comparable performance to the VAE in terms of the statistical indexes (RMSE, RMSE, MAE, EV, and NRMSE). Table 3 indicates that deep learning models exhibited improved forecasting performance compared to the baseline methods (i.e., LR and SVR). Now, the effectiveness of the eight methods will be tested based on power output data collected from the 9 MWp PV plant in Algeria (Data Set 2). In this experiment, the above-trained models will be evaluated using the testing solar PV power output collected from 20 October to 31 December 2018. The measured test set together with model forecasts are charted in Figure 9. Similar conclusions are also valid for these datasets. One major reason is that RNN-based models have a strong capability to describe time dependents data and can better model the complicated relationship between historical and future PV power output data than other methods. The RNN-based models and the VAE model again confirm the superior forecasting performance of PV power output as shown by the scatter plots in Figure 9. The ConvLSTM model shows poor forecasting performance results (Figure 9).
And then, the statistical indicators are computed to compare the forecasting performance between the eight models, and baseline machine learning models: LR and SVR based on testing datasets (Table 4). It is worth noting that the RNN-based models (i.e., RNN, LSTM, BiLSTM, ConvLSTM, and GRU) and the VAE model show the improved forecasting performance compared to the RBM, and SAE. Results in Figure 9 and Table 4 indicate that using RNN-based models and VAE method has led to improved forecasting performance. Furthermore, the error analysis highlights that the forecasting accuracy obtained by these models can satisfy practical needs and can be useful for PV power management. It should be noted that the VAE model is trained in an unsupervised manner in order to forecast solar PV power. This means that the forecast is based only on the information from past data. However, the other models are trained in a supervised way by using a subset of training as input sequence (x 1 , . . . , x k ) and an output variable x k and we train RNN-based models to learn the mapping function from the input to the output. After that, these trained models are used to forecast new data. Even if the VAE model is trained in an unsupervised way, it can provide comparable forecasting performance to those obtained by the supervised RNN-based models. Accordingly, the VAE-based forecasting approach is a more flexible and powerful tool to be used in real-time PV power forecasting.  Overall, the NRMSE (%) quantifies the quality of power forecasting between the measured and forecasted PV power output time-series data, where the larger value indicates a better prediction performance. A visual display of the NRMSE (%) derived with the eight considered deep learning methods based on the testing datasets from the two PV systems is displayed in Figure 10. The first dataset is with a one-minute resolution and the second dataset is with fifteen minutes resolution. The VAE model achieves better PV power flow forecast performance compared to the RBM and SAE models and RNN-based models. Furthermore, the results show that VAE models are efficient in capturing the linear and nonlinear features in PV power data with different time resolutions.

Multi-Step Ahead PV Power Forecasts
Precise multi-step forecasts are essential to managing the operation of PV systems appropriately. Now, we assess the capability of the eight methods for multi-step ahead forecasting of PV power output using data from Data Set 1 (a 243 kW parking lot canopy array in the US) and Data Set 2 (a 9 MW PV system in Algeria). Based on the past measurements, x = [x 1 , x 2 , . . . , x l ], the computed single-, two-, and multistep-ahead forecast are respectively x l+1 , x l+2 , and x l+n . The 5, 10, 15 steps-ahead forecastings of PV power data based on the testing data of the Parking lot canopy array dataset and the Adrar PV system are tabulated in Table 5. We can easily observe that, for all data sets, except BiLSTM and ConvLSTM, the other models performed consistently reasonable forecasting results five-, ten-, fifteen-step-ahead forecasting.
For instance, the VAE model achieved R2 values of 0.902,0.873, 0.856 for five-, ten-, fifteen-step-ahead forecasting when using the first for Data Set 1, R2 values of 0.951,0.877, 0.818 for Data Set 2. The RNN, GRU, RBM, SAE, and VAE models performed about equally in terms of R2, MAPE, and RMSE in all cases.
For Data Set 1, the five-step-ahead forecasting R2's for all models except ConvLSTM is around 0.90 (Table 5). Results in Table 5 show that for five-steps ahead forecasting based Data Set 2 almost all models provide relatively good forecasting accuracy in terms of R2 which is around 0.94. It is worthwhile noticing that for a ten-step -ahead forecast, the accuracy of all models starts to decrease and achieve R 2 values around 0.86. In the fifteen-step -ahead forecasting, we observed that LSTM, BiLSTM, and ConvLSTM achieved poor forecasting performance. The other models are still providing acceptable forecasting accuracy. We notice that the SAE model outperforms slightly the other models with higher R 2 and lowest forecasting errors. The overall forecasting performance of the RNN, GRU, RBM, SAE, and VAE model was satisfying, and they can maintain a reasonable forecasting performance to forecast solar PV power output as the number of steps increases. The error for the second dataset is large compared to the first one. The first dataset is 15 min time resolution recorded for one year, while the second data is of one-minute time resolution recorded for three years. Moreover, we used 90% of data for both datasets for training and 10% for testing. The one-minute data is very dynamic, which explains the large error compared to the first dataset.
It is challenging to tell which models were absolutely superior on the basis of the R2, MAPE, and RMSE values. The results of this study show that RNN, GRU, and VAE performs slightly better on average than the other models in most cases for one-and multi-step-ahead forecasting. The obtained results demonstrate that both RNNs with supervised learning and VAE with unsupervised learning can perform a one-step and multi-step forecasting accurately. Overall, the VAE deep learning model gives an effective way to model and forecast PV power output, and it has emerged as a serious competitor to the RNN-driven models (i.e., RNN, GRU, and LSTM).

Conclusions
PV power output possesses high volatility and intermittency because of its great dependency on environmental factors. Hence, a reliable forecast of solar PV power output is indispensable for efficient operations of energy management systems. This paper compares eight deep learning-driven forecasting methods for solar PV power output modeling and forecasting. The considered models can be categorized into two categories: supervised deep learning methods, including RNN, LSTM, BiLSTM, GRU, and ConvLSTM, and unsupervised methods, including AE, VAE, and RBM. We also compared the performance of the deep learning methods with two baseline machine learning models (i.e., LR and SVR). It is worth highlighting that this study introduced the VAE and RBM methods to forecast PV power. For efficiently managing the PV system, both single-and multi-step-ahead forecasts are considered. The forecasting accuracy of the ten models has been evaluated using two real-world datasets collected from two different PV systems. Results show the domination of the VAE-based forecasting methods due to its ability to learn higher-level features that permit good forecasting accuracy.
To further enhance the forecasting performance, in future study, we plan to consider multivariate forecasting by incorporating weather data. Also, these deep learning models can be applied and compared using data from other renewable energy systems, such as forecasting the power generated by wind turbines. Further, it will be interesting to conduct comparative studies to investigate the impacts of data from different technologies, such as monocrystalline, and polycrystalline.