2-D Convolutional Deep Neural Network for the Multivariate Prediction of Photovoltaic Time Series

Here, we propose a new deep learning scheme to solve the energy time series prediction problem. The model implementation is based on the use of Long Short-Term Memory networks and Convolutional Neural Networks. These techniques are combined in such a fashion that interdependencies among several different time series can be exploited and used for forecasting purposes by filtering and joining their samples. The resulting learning scheme can be summarized as a superposition of network layers, resulting in a stacked deep neural architecture. We proved the accuracy and robustness of the proposed approach by testing it on real-world energy problems.


Introduction
The application of novel machine learning techniques to the prediction problem is a cornerstone for improving the penetration of Distributed Energy Resources (DERs) in modern flexible smart energy systems and grids [1,2]. Forecasting is becoming a fundamental building block of any energy management approach [3,4], including generation scheduling, planning and management [5], utility side management [6] and demand side management [7]. Goals such as resource conservation, cost minimization, widespread access to energy, market and production planning, and demand management can be achieved only by endowing expert systems with advanced and improved forecasting capabilities. The development of future cooperative Virtual Power Plants (VPPs) [8] and the integration of Local Energy Markets (LEMs) in energy communities [9] will rely on good and reliable forecasts, particularly concerning the output of Renewable Energy Sources (RESs). Due to their intermittent and fluctuating nature, RESs' power output forecasting can be considered as one of the most challenging and up-to-date prediction problems in the energy framework [10,11].
For practical reasons, forecasting in the energy context is usually carried out using univariate approaches, especially regarding time series retrieved from single independent physical quantities [12][13][14][15]. While this is a valuable approach, it does not consider how other data can be incorporated in the prediction scheme, with the drawback of specifically tailoring the method to the specific time series being predicted. The most used models for this purpose are statistical regression [14,16] or computational intelligence techniques [12,17,18], based on different mathematical backgrounds. Fuzzy predictors are used in [19] to predict RESs generation and load data in a microgrid framework, with the advantage of being able to incorporate the uncertainty of the future predictions, but, at the same time, they are stuck with those predicted errors for actual energy management. LSTM layers have been proposed, such as the one in [41]. In fact, the authors firstly use a convolutional network to extract the interrelationship of different variables at the same time step (as well as by regularizing the dimension of the input structure). Then, a dualattentioning mechanism combined with LSTM is used. While there are some similarities in the combination of the CNN and LSTM in the deep structure, the main difference compared to our approach is that those authors apply the CNN to the input frame, constructed as a combination of different samples taken from different sequences, at the same instant in time. Instead, we not only use the CNN to extract the spatial correlations among different sequences, but also to take into account the correlation in time among those sequences, similarly to a multivariate intrinsic embedding mechanism. Indeed, we construct the input frame in a different fashion, by feeding the network with different samples taken from different sequences at several times, as described more clearly in Section 2.
In a similar fashion, there are some works which use a multivariate scheme based on LSTM regarding vertical applications in specific sectors: cybersecurity [42], time series classification [43], industrial anomaly detection [44], point process analysis [45]. By following the approach proposed in this work, it is possible to view the novel adopted scheme as a superposition of neural layers, forming a stacked deep network. Each layer feeds the output to the input of the sequent layer. This increase in the dimension of the NN towards a deeper architecture has been linked to a positive correlation with prediction accuracy [46].
The paper is organized as follows: in Section 2, the novel 2-D convolutional embedding process is explained, in Section 3, the architecture of the forecasting system is described, experiments are reported in Section 4, and conclusions are drawn in Section 5.

2-D Convolutional Embedding
It is widely known that LSTM networks are a staple technique for prediction purposes. In this work, our aim is to enhance the LSTM structure to achieve a deep NN model in which feature extraction (i.e., data representation) can be obtained in a robust and automatic fashion, considering a higher abstraction level [47].
In most use cases, known, past samples of the time series subject to forecasting are input into the LSTM model, achieving an univariate prediction method. In the present article, the aim is to extend this approach to a multivariate framework: the predictor's input will be built using several time series. We will do this by employing a bidimensional CNN layer which has the role of filtering all the known samples of the available time series to acquire the expanded set of generalized features. These features are then flattened and used as a multivariate (multivalued) time series to be processed by the actual LSTM.
Let S 1 [n], with n > 0, be the scalar time series to be predicted. Let S m [n], m = 2 . . . M, be further M − 1 scalar time series correlated with S 1 [n], i.e., the time series S m [n] contain additional information that can be used for forecasting S 1 [n]. Our aim is to predict the sample S 1 [n + k], where k represents the prediction distance, considering that all samples prior to time step n (including the latter) are known. The input frames x[n] are sent into the CNN layer as if they were a sequence of images, where the single frame is given by which is a one-channel M × D frame, where M represents the number of correlated time series and D the number of past samples (for each time series), both used to forecast S 1 [n + k]. This operation represents the main novelty of the proposed approach. In fact, the input frame structure allows the network to efficiently recognize the temporal correlation among the considered sequences. Furthermore, in the training phase, the back-propagation learning algorithm is able to infer different temporal correlation structures through the use of several convolutional filters of size H. The result is stored in data structures called feature maps, as illustrated in Figure 1. Usually, in the context of time series, forecasting is carried out using an embedding procedure as a preprocessing step in order to select the meaningful past samples of the time series through statistical or heuristic techniques [48]. In an univariate context, the estimated output is obtained as follows where T is called 'time lag' and D 'embedding dimension' [49]. The aforementioned inference process of the data structures in the input matrix x can be considered as an 'extended version' of the classic embedding. This is because, in the proposed approach, we use different time series and autonomously select the past samples. The sample S 1 [n + k] is estimated by the deep neural network through a general non-linear, dynamical (recurrent) model f m (·) as where the two embedding approaches are the same when M = 1 and T = 1.

Proposed Deep Neural Architecture
In this work, the backbone of an LSTM network [50] is used to predict time series data. The latter can have long term dependencies that make it difficult to locate relevant information in the data with respect to the current time step. LSTM networks solve the problem by managing the amount of information that passes in a single cell; for this reason, LSTM networks are used as an integral part of the predictive scheme. This scheme is represented in Figure 2

Experimental Results
We evaluate the performance of the proposed forecasting approach considering a real photovoltaic power production plant, identified geographically by the following coordinates: 39 • 54 38.2 N, 105 • 14 04.9 W, elevation 1855 m. The photovoltaic plant is named 'NWTC' and is located in Denver, CO, USA. Irradiance data, together with those relating to other meteorological factors, are retrieved through the Measurement and Instrumentation Data Center (MIDC) database. The output power to be predicted, which is denoted as S 1 in kW, is calculated from irradiance using a system balance of 0.9 and by applying an inverter curve with MPPT.
In addition to the sequence to be predicted, we also consider time series relative to temperatures, wind speed and wind direction, turbulence, humidity and pressure levels. Some of them are taken both at ground level (2 m above ground level) and at a higher altitude (80 m above ground level). The used sequences and their attributes are summarized in Table 1. The used sequences and their attributes are summarized in Table 1. All time series are collected in the same plant and sampled every hour (i.e., 24 samples a day). Each time series refers to the years 2017 and 2018 and is normalized between −1 and 1, for the sake of regularization, before applying the learning procedures. The extremes of normalization are given by the physical functioning of the plants (also reported in Table 1). To assess the validity of the proposed approach, we have carried out some experiments with classic and widely used benchmark architectures. In particular, we used the standard LSTM as the baseline model. Given that the goal of this work is to provide a reliable and sound solution to the multivariate prediction problem, and knowing the vast superiority of the LSTM approach in time series forecasting with respect to other algorithms (e.g., shallow predictors), it is sufficient to prove that our algorithm outperforms a multivariate implementation of the LSTM. Here, we will briefly indicate the methods considered for the tests: • V-LSTM: this is the elementary multivariate implementation of the classic LSTM model, in which the multivariate sequences are fed directly to the LSTM layer, with a vector input of dimension equal to the number of considered sequences. When the number of adopted sequences is reduced to 1 (first row of the numerical results reported in the tables), this case coincides with the elementary univariate LSTM; • 2D-CNN: this is the proposed method, where the convolutional layer is stacked before the LSTM predictor, as previously described. Similarly, for the 2D-CNN, the univariate case is considered as a benchmarking level to evaluate the multivariate performance.
A training set of 3 months (i.e., 90 consecutive days) is used for the experiments and it is associated with test sets whose lengths are 3 days (i.e., k = 72) and 7 days (i.e., k = 168) after the last available sample of the training set, respectively (the training set contains the known samples that are used to forecast the future ones). In all of the experiments, we set the embedding dimension to D = 24, the number of filters in the CNN layer to F = 5 and the number of hidden units in the LSTM layer to N = 50. These numbers were determined by using a grid search procedure on the training data in order to avoid overfitting. We used the ADAM algorithm [51] to train the deep neural network; the initial learning rate was set to 0.01 with 50% reduction every 30 iterations, the gradient decay factor was set to 0.9 and the maximum number of iterations to 300. All of the experiments were performed using Matlab ® R2019a on a machine provided with an Intel ® Core™ i7-3770K 64-bit CPU at 3.50 GHZ and with 32 GB of RAM.
For the sake of illustration, we considered several different situations to incorporate all seasonal effects in the analysis, relative to eight months of the year 2018: each test is composed of a single day, either the 15th day of the month (for March, June, September and December) or the 30th (for January, April, July and October), and by the successive 2 or 6 days after it, resulting in the 3-day and 7-day test sets introduced earlier. Ordinarily, in real-world applications, it is true that NN algorithms that are seasonally dependent must be retrained when the environmental changes affect the performance. For our solution, since training times are much smaller than the sampling intervals of the time series, the training of the network can be done at every iteration. For this reason, we care about the difference in seasonal performance and we test various times of the year, retraining the network each time, to ensure the stability and consistency of the results. In order to evaluate the possible improvements brought about by the proposed multivariate forecasting approach, we considered several options by testing different combinations of data to give an exhaustive study on how the relationship between data influences the forecasting procedure. Every network is trained on the same dataset considering 10 different runs. For each run, a different (random) initialization of the network parameters is performed. The performance is then evaluated considering the average RMSE in the 10 training sessions. The general optimization procedure is summarized in Algorithm 1.
The numerical results are reported in the Tables 2-9, which are one per considered month of 2018. This was done to explicitly evaluate the seasonality effect in different periods with different irradiation/meteorological conditions, along with the comparison between the 3-day and 7-day test sets. Additionally, as stated previously, by looking at the first rows of each table, it is possible to gain an insight into how well the univariate approach is performing; this makes the benchmarking more valuable and puts all the numerical results in the right perspective. As the reader may notice, we used quite a lot of combinations to analyze the composition of the input data, to better understand the information carried by each sequence in relation with the others. In particular, we tested the output power by itself and by coupling it with each and every other sequence. Then, we also tested the performance when constructing the input data frame using the sequences grouped in relation with their physical meaning (all the time series relative to temperature, all relative to wind, all relative to the same altitude). Specifically, we reported the RMSE for these combinations of more than two sequences incorporated in the input data frame. A benchmarking test has also been carried out using all the considered sequences.
In the January results reported in Table 2, the RMSE is quite stable and there are no large variations in terms of performance. The univariate benchmark is satisfactory and accurate, while the multivariate approach and the proposed 2D-CNN model are better or on par. These considerations can be linked to the stability of the January set in terms of irradiation, given that the irradiation itself is quite low in winter. For the results of mid-March, which can be seen in Table 3, similar considerations can be drawn relative to the differences between V-LSTM and 2D-CNN approaches. Additionally, the accuracy of the multivariate solution is considerably better than the univariate one. for actual model training. We obtain the validation set by extracting some samples from it. In detail, we use the sequence of the latest 3 or 7 days as the target and the preceding samples as inputs. In both test cases, the training will end one day before. 6 Tables 2-9).
In the April tests shown in Table 4, since the irradiation starts to grow for the increasing number of sun hours in spring in the Boreal Hemisphere, the accuracy starts to degrade a bit, still retaining the balance in terms of RMSE that was highlighted in the previous comments. This consistency is expected and proves that our solution is robust. The June results illustrated in Table 5 are very interesting, since they prove the vast superiority of the proposed approach with respect to both the univariate and multivariate LSTM adoption. In fact, when every time series is considered in a fully multivariate solution, the RMSE of 2D-CNN is far below almost all the results of the V-LSTM. The representative behavior of 2D-CNN in June is confirmed by the figures shown for both the 3-day and 7-day tests: for 3-day tests, the univariate case is reported in Figure 3, the best case in Figure 4, and the multivariate case using all data in Figure 5; accordingly, for 7-day tests, the univariate case is shown in Figure 6, the best one in Figure 7, and the fully multivariate case in Figure 8. Once again, we remark that some predictions can be influenced by the meteorological variations in the training set, which can cause a worsening of the prediction for some hours continuously (e.g., the hours of the first day in Figure 5). Furthermore, it has to be noted that the 7-day test's accuracy is worse than the 3-day one, while remaining quite acceptable and stable, given the different meteorological implications of predicting longer sequences.  Table 5, row 1, column 2).  Table 5, row 5, column 2).  Table 5, row 18, column 2). Figure 6. Predicted (red) and real (blue) value of output power in the mid of June 2018 by using the 2D-CNN approach with the univariate input on 7-day test set (see Table 5, row 1, column 4).  Table 5, row 2, column 4).  Table 5, row 18, column 4).
The results for the July tests, which are reported in Table 6, show a similar behavior given the small difference in terms of irradiation and physical phenomena variations with respect to June. Additionally, the increasing stability in weather conditions gives a prominent effect that can be seen in the similarities of the RMSE values in all cases. The results for the September tests in Table 7 are definitely on par with the aforementioned cases. Here, the difference between the V-LSTM and 2D-CNN is evident, especially for the 3-day test set. The same analysis can be drawn for the October results shown in Table 8. The only noticeable difference is the general higher error due to the worsening of weather conditions, resulting in a higher variability in the associated physical quantities.  Lastly, some interesting remarks can be drawn from the analysis of the December results reported in Table 9. In fact, along a similar line with the analysis of the June test set, the December case proves the high stability and accuracy of our proposed method. The V-LSTM model and both the univariate methods are always worse than the proposed 2D-CNN method applied on a multivariate dataset. Furthermore, we show, for December tests, the graphical results of 2D-CNN for both the 3-day and 7-day tests. For 3-day tests, the univariate case is reported in Figure 9, while the best case coincides, in this case, with the multivariate one using all data in Figure 10; for 7-day tests, the univariate case is shown in Figure 11, and the best one is associated with the fully multivariate case in Figure 12. These figures are in accordance with the numerical results and they highlight that the variability is scarce in all cases. While the error is still low, due to the lowest irradiance for the winter season, the accuracy of the multivariate 2D-CNN is consistently better.  . Predicted (red) and real (blue) value of output power in the mid of December 2018, by using the 2D-CNN approach with the univariate input on the 3-day test set (see Table 9, row 1, column 2). Figure 10. Predicted (red) and real (blue) value of output power in the mid of December 2018, by using the 2D-CNN approach with the combination of all the time series, which is also associated with the best performance on the 3-day test set (see Table 9, row 18, column 2). Figure 11. Predicted (red) and real (blue) value of output power in the mid of December 2018, by using the 2D-CNN approach with the univariate input on the 7-day test set (see Table 9, row 1, column 4).

Figure 12.
Predicted (red) and real (blue) value of output power in the mid of December 2018, by using the 2D-CNN approach with the combination of all the time series, which is also associated with the best performance on the 7-day test set (see Table 9, row 18, column 4).
Summarizing the results by a general outlook, the most significant improvement with respect to the univariate approach is met when considering the time series relative to wind and the ones relative to a higher altitude. Overall, this is in accordance with [27], because irradiance is strictly linked with cloud movement, which is predictable using wind information at high altitudes. In fact, as in December, the variation in weather is more frequent, the prediction performance is evidently improved by the multivariate approach. This improvement is less evident in more stable months, as in June.
As expected, the results of the 7-day case are slightly worse than the ones of the 3-day case. While this is obviously due to the longer forecasting horizon, the difference is not so large; the proposed method can be valued as stable and consistent for the tested methodologies. It is also evident that the proposed convolutive structure gives advantages over the pure LSTM, despite the low dimensionality of the frame processed by the convolutional filter. In fact, all the numerical results sport the same trend, despite the heavy changes during different seasons.
An additional remark can be drawn from the visual analysis of the results; there is quite a difference in the absolute value of the analyzed time series. This is the result of a different general dynamic in the physical system, which is quite complicated. For this matter, the complex multivariate analysis, along with the higher variability, impose that changes should be carried out in the training phase, and thus the different dynamics are reflected in the network tuning. In fact, we suggest that the latter is retrained when those drastic changes occur in actual operational applications.

Conclusions
In this work, we studied a new technique for multivariate forecasting of energy time series. Our contribution relies on a bidimensional convolutional layer to obtain an enriched embedding representation, feeding an LSTM model used to take advantage of long-term dependencies among the different physical time series. We carried out several tests on real world data to assess the efficient performance of the proposed model, comparing it with state-of-the-art LSTM techniques. It is possible to conclude that this novel system is a viable and robust solution for prediction applications, with the main advantage being its efficient and smart ways of exploiting access to different physical data. Future works might focus on considering architectures employing multi-stacked CNNs and tentatively applying the same concepts to distributed forecasting models.