A New Architecture Based on IoT and Machine Learning Paradigms in Photovoltaic Systems to Nowcast Output Energy

The classic models used to predict the behavior of photovoltaic systems, which are based on the physical process of the solar cell, are limited to defining the analytical equation to obtain its electrical parameter. In this paper, we evaluate several machine learning models to nowcast the behavior and energy production of a photovoltaic (PV) system in conjunction with ambient data provided by IoT environmental devices. We have evaluated the estimation of output power generation by human-crafted features with multiple temporal windows and deep learning approaches to obtain comparative results regarding the analytical models of PV systems in terms of error metrics and learning time. The ambient data and ground truth of energy production have been collected in a photovoltaic system with IoT capabilities developed within the Opera Digital Platform under the UniVer Project, which has been deployed for 20 years in the Campus of the University of Jaén (Spain). Machine learning models offer improved results compared with the state-of-the-art analytical model, with significant differences in learning time and performance. The use of multiple temporal windows is shown as a suitable tool for modeling temporal features to improve performance.


Introduction
Currently, photovoltaic (PV) power generation has been shown to be a successful technology with a remarkable level of maturity with more than 500 GW of solar photovoltaic (PV) power installed all over the world at the end of 2018, in some cases running for several years, and with a forecast of 1 TW of total power being generated by 2022, most of it in large PV plants. The management of the operation and maintenance (O&M) of these systems is a relevant research field for the solar PV industry [1,2].
Data represent a key asset in this PV management area, since they enable us to model the standard behavior of the system and to monitor its performance compared with the expected output determined by the model. This monitoring, when applied promptly and comprehensively, taking account of all the factors that may impact performance, enables early damage and fault detection, which then allows operation and maintenance actions to maximize the up-time and efficiency of PV plants.
Traditionally, approximate analytical expressions based on the physical laws and the electrical parameters of the solar cells, together with the engineering data of the devices that conform the PV system, have been used to build standard performance models. Leveraging the latest software advances in machine learning, a different approach can be taken by using regressors to build models, which learn from data on the actual behavior of the system during a relevant period of time and use the time series prediction to monitor performance. Machine learning approaches bring the advantage of modeling independently from the deployment and configuration parameters of the PV system, which are strongly affected by location and environmental conditions. This work presents an important extension of the proposal [3], where two deep learning models showed a better performance in forecasting energy generation with regard to standard analytical models [4,5]. The main contribution of this work is evaluating in further detail the capabilities of data-driven models for nowcasting the energy generation of photovoltaic systems from ambient sensor information. In this way, two main data-driven approaches are evaluated: (i) human-crafted features which are computed by means of multiple temporal windows and (ii) deep learning models with automatic feature extraction and learning. Several configurations of segmentation and aggregation by means of temporal windows have been proposed showing an improvement in terms of performance and learning time. So, an important advance is made in this knowledge area through the use of machine learning techniques to make predictions about PV system consumption in order to check its status. In addition, an IoT module which collects photovoltaic data in real time within the Opera platform is described. The module has collected the evaluation data over 24 weeks, which are openly available to the scientific community.
The remainder of the paper is organized as follows: in Section 2, we detail the review of works related to our proposal; Section 3 describes the supporting infrastructure and IoT module for collecting real-time data within the Opera Digital Platform; Section 4 presents the methodology to develop data-driven nowcasting of PV system consumption; Section 5 introduces the results of the dataset collected by the Opera Digital Platform. Finally, conclusions and ongoing works are discussed in Section 6.

Related Works
PV systems are now considered a well-established technology for energy generation and have reached a significant maturity level. However, being relatively recent most of the systems have been running no more than 20 years [1,6]-means that there is not much experience in Operations and Maintenance (O&M). Most of the tasks and tools regarding O&M make little use of new information technologies such as big data, deep learning, business intelligence, etc. [7]. Up to now, the most common way to estimate the behavior of PV systems has been the use of classic models based on the physical process of the solar cell to define the analytical equation to obtain its electrical parameter [8]. There are many of these models with very different approaches, difficulty levels and results [4,[9][10][11]. The main objective of these tools is to nowcast the electrical energy generated by the cells and also by the PV system. Among all of these classic models, we have selected the Araujo model plus constant FF (FF: Fill Factor is a noteworthy solar cell figure regarding maximum power delivered vs. maximum current and maximum voltage of the cell; its upper limit is 1;) [4] to compare and evaluate the performance of PV systems with the performance estimated by our proposed machine learning model. Araujo is a standard PV model that combines enough accuracy with a very simple formulation [4,8]; additionally, it needs only a few variables to be measured: current and voltage of the cell, irradiation and ambient temperature [5].
Nevertheless, to obtain output energy using any of these classic models it is necessary to know a large number of parameters and specifications of the PV generator in question: technical specs, topology of the generator, location etc. One of the main advantages of our machine learning-based proposal is the ability to nowcast all of these parameters and specs independently, and hence enabling easier and more efficient PV deployment and customization.
Recently, several works regarding the use of new technologies to monitor and nowcast PV system behavior have been presented. However, none of them have been used in or have produced-a usable O&M management system [7,[12][13][14]. A previous work related to a O&M analytics platform was presented in [15]. The use of new information technologies in O&M management in the renewables sector has, up to now, been restricted to a few large and expensive platforms developed by companies to use in utility-scale generator power plants [16,17].
Regarding the collection of operating data to monitor PV systems, it is traditionally carried out with wired sensor data acquisition systems, which are sometimes expensive, allow little flexibility and have limited cloud connectivity. Recently, several works on the new concept of using IoT connectivity in monitoring the behavior of PV systems have been presented [12,14,18]. Incorporating these sensors in a comprehensive O&M management tool has allowed us to develop a highly versatile and easy-operation data collection system with wireless sensors, which offers great advantages as regards ease of use, cost efficiency and standardization of data capture [13,19,20].
Several proposals based on the IoT paradigm in photovoltaic systems have been presented in the relevant literature. In [21], a literature review of IoT energy platforms aimed at end users is presented, where platform selection, new energy platform construction and, finally, platform comparison are considered. In [22], the design and implementation of an IoT-based solar monitoring system for city-wide, large-scale, and distributed solar facilities in smart cities was presented. In [23], a solar tracking system enabling increased efficiency of photovoltaic systems was proposed. The proposed system executes a tracking algorithm in the Firebase web service and allows the exchange of data with said service through a NodeMCU development board, which has an integrated Wi-Fi module. Finally, in [24], the use of IoT and machine learning paradigms for next-generation solar power plant monitoring systems was analyzed and discussed.
Regarding the use of IoT and machine learning paradigms for analyzing sensor data streams, there are techniques that have proven to be successful in other contexts. For example, evaluation of single and multiple windows to segment and fuse temporal information from sensor data streams [25,26], whose window size can be imbalanced [27,28] to aggregate data from shorter to longer terms, enriching the features of sensor streams.
On the other hand, the use of Deep Learning in temporal series has become a prolific research field [29]. Mainly, with the use of Long-Short Term Memory (LSTM) [30], which is a type of recurrent neural network that includes a memory and is designed to learn from sequence data, such as sequences of observations over time. LSTM is most widely used in natural language processing and speech recognition, can model temporal dependence between observations [31] and is suitable for prediction from sensor data [32]. LSTM has obtained encouraging results in several fields, such as activity recognition [28] or estimating building energy consumption [33]. Moreover, modeling spatial features in time series by means of Convolutional Neural Networks (CNNs) [31] qiu2017learning has achieved promising results in speech recognition [34] or gas classification [35], together with LSTM models [36].

IoT Module for Real-Time Data Collection in the Opera Digital Platform
In this section, we describe the IoT module for collecting the photovoltaic data in the Opera Digital Platform, which have been collected to nowcast output energy generation in the photovoltaic system.
Opera Project is a digital platform developed by an interdisciplinary team, covering the areas of ICTs, PV and Electronic Technology, and has been designed to provide O&M management services for renewable energy installations [15]. This digital platform has been developed with the knowledge and the working data of the UniVer Project. This project see Figure 1 is a standard, medium-sized, grid-connected PV system that has been running for the last 20 years in the Campus of the University of Jaén [37]. The PV modules are made of 60 multicrystalline Si solar cells with 18.34% efficiency and a 156.75 × 156.75 mm 2 surface. The PV generator is composed of 220 of these modules with a topology of 20 (serial) × 11 (parallel) and a total power of 59.4 kW at Standard Test Conditions (STCs; that is, 1000 W/m 2 of normal irradiance onto cells, cell temperature of 25 • C and AM1.5 solar spectrum). The Opera Platform is now also managing the O&M of this PV system. The main objective of the IoT-based PV system O&M optimization module, besides reducing costs, is to monitor the generated energy. Energy E T is the end product of every electric generator and is computed as the integral of instantaneous power P in a period of time T: E T = T P · dt. Electric power output is the instantaneous variable to be measured by this data collection system and also targeted by the models to nowcast the behavior of PV systems, such as the one developed in this paper. This output mainly relies upon the entry product: solar irradiance G whose magnitude is defined by the square density of power incident on a surface measured in Watts per square meter (W/m 2 ). The temperature and the specs of the PV generator (PVG) are the other inputs for this data collection system monitoring the performance of the PVG.
Monitoring of the PVG must be done following the European Standard IEC 61724 [38]. In line with this, the variables that have been measured are shown in Table 1. From these measured data and with the nominal specs of the PVG at STC, we compute derived parameters and metrics regarding losses in energy performance, which is useful to evaluate the behavior of the PV system and very helpful for fault diagnosis and descriptive operation analysis, such as : (i) global irradiation on the PVG surface, (ii) net energy from the PVG in a period of time, (iii) performance ratio and (iv) yields and losses. All of them are well defined analytically and conceptually in [38] and their function, meaning and usefulness are also described in [39][40][41]. In this work, we have focused on the nowcasting of output power generation, which is straightforwardly related to the analytical metrics on the behavior of the PV system.
In order to collect environmental and energy generation information from the Opera Digital Platform in real time, we have developed and deployed a genuine integration of ambient and power supply sensors. This is composed of a set of sensors based on IoT technology connections and controlled by a microprocessor which uploads the data by wireless network. These sensors measure the working data of the PVG and the environmental variables shown in Table 1, needed to monitor and nowcast PVG operation in accordance with standard [42]. Table 1. Variables measured by the data collection system.

Parameter Symbol Unit
Irradiance on PV surface The central unit of the IoT module is an Arduino. It is a standard board device that includes, in addition to a µP, an input data conditioner, a communication network interface and other display interfaces. The IoT module is responsible for collecting the photovoltaic data to send the information to the cloud by means of an internet connection (i.e.: wired, WiFi or modem). The module is powered by standard power supply or by solar panel plus battery.
The ambient sensors connected to the Arduino board detect: (i) solar irradiance, (ii) module temperature and (iii) ambient temperature. The irradiance sensor is a calibrated Si solar cell (calibration certificate from CIEMAT, the Spanish Research Centre in Energy, Environment and Tech.), with an analogical output from 0-5 V corresponding to an irradiance range from 0 to 1250 W/m 2 . The ambient and cells temperature sensors are 4-wire Pt100 Probes, also with an analogical output of 0-5 V, corresponding to a temperature range of −20 to 130 • C. These two sensors, plus the corresponding interface circuitry, are included in a commercial unit made by Atersa S.L. (www.atersa.com), as we describe in Figure 2. The ambient sensors are placed close to the panels and are powered by their own solar mini-module. The ambient sensors send the measured data to the Arduino microprocessor using Zigbee protocol to enable direct wireless communication between the devices and the Arduino board [43] in open areas, which is inherent in the deployment of photovoltaic systems. We included the Zigbee connection since experimental results with other popular wireless technologies, such as Wi-Fi and Bluetooth, show that it is more energy efficient [44]. The PVG data measured by the IoT module are the instantaneous values of output voltage and intensity, which enable the computing of output power by multiplying output voltage and current intensity. This is possible since the data are instantaneous values; in this case, the output of the PVG is DC current, so this way to obtain power is also valid for mean values over a period of time. Alternatively, an output power sensor can be installed, such as a power meter or a grid analyzer, to get some redundancy in the measured data and, with the second device, some additional secondary electrical output parameters.
Finally, in Figure 3 we show the voltage and current sensors, along with the microprocessor unit used to measure operation data of the UniVer Project PV generator. Figure 4 shows a schematic diagram of the data collection architecture.

Machine Learning Approaches to Nowcast Power Generation
In this section, we describe the methodology used for processing, segmenting and modeling the sensor data from the Opera PV System in order to nowcast output power generation from the ambient sensor information in real time.
As stated previously, several models are evaluated in this work. They are mainly grouped into: (i) human-crafted features and multiple temporal windows and (ii) deep learning for automatic feature extraction and learning. In the following sections, we detail: (first) basic segmentation with temporal sliding windows for sensor streams in a data-driven model; (second) modeling for human-crafted features and multiple temporal windows; and (third) deep learning approaches to nowcast output power generation of the Opera PV System.

Data-Driven Model to Nowcast Power Generation
Following a formal definition, a sensor s collects data in real time in the form of a pair s i = {s i , t i }, where s i represents a given measurement and t i the time-stamp, respectively. Thus, the data stream of the sensor source s is defined by S s = {s 0 , . . . , s i } and a given value in a timestamp t i by S s (t i ) = s i . In this work, irradiance on PV surface G I , ambient temperature T am , PVG output I A , PVG output voltage V A and PVG output power generation P A provide five data streams which describe the behavior and energy production of the PV system.
Next, temporal sliding windows, which are defined by the window size of a time interval [45], segment the samples of a sensor stream S s and aggregate the values s i by a given aggregation function T t (S s , W w , t * ): whose value of aggregation defines a given feature T t of the sensors S s in a current time t * . In Figure 5, we describe the segmentation and aggregation by temporal sliding windows in some visual examples of data streams.

Human-Crafted Features and Multiple Temporal Windows for Efficient Nowcasting of Output Power Generation
In this section, we describe human-crafted features based on multiple sliding temporal windows where an expert defines an aggregation function to process sensor streams training a data-driven regressor to compute a feature vector for learning purposes.
Among the broad spectrum of models, we focus on efficient regressor, which enables both learning and evaluating on micro boards in real time under fog computing environments [27]. To this end, we evaluate a human-crafted feature approach [46], where the aggregation functions and multiple windows of different sizes are defined by experts. In concrete terms, we include the following configuration of models: • Aggregation functions T t based on statistical metrics, such as maximal, minimal, average and standard deviation have been defined in this configuration as they have been demonstrated as relevant features in describing sensor streams [47].

•
Segmentation and fusion of temporal information from sensor streams with: (i) single window, (ii) multiple windows [25], and (iii) incremental windows [27] to aggregate data from shorter to longer terms enriching the features of sensor streams. Window size is also defined by human criteria.

•
Classification from efficient regressors, with low learning time and training requirements, such as linear regression, k-nearest neighbors (kNN), support vector machines (SVM) and random forest (RF).
Therefore, starting from a set of input sensors S = {S 1 , . . . , S s , . . . , S |S| }, a set of window sizes W = {W 1 , . . . , W w , . . . , W |W| } and a set of aggregation functions T = {T 1 , . . . , T t , . . . , T |T| } we define a total number of features |S| × |W| × |T| which describe the sensor streams S for each point of time t * [47]. Since our model is based on a data-driven supervised approach, the features which describe the sensor streams are associated for each point of time t * with a target sensor to nowcast S * (not included in the input sensors S ∩ S * = ):

Deep Learning Modeling to Nowcast Output Power Generation
In this section, we describe DL models to nowcast output power generation in a PV device. Contrary to the previous proposal, DL does not require human-crafted features and data pre-processing is applied to compute a homogeneous sequence of data between the different collection rates from raw sensor sources. Here, a minimal signal segmentation is defined by sliding temporal windows of short-term window size, which is related to a minimal temporal granularity ∆. The raw data are averaged = µ for each short-term temporal window within the segment.
So, we obtain a sequence of data for each sensor source, whose sequence size is the same for all sources S s : which are related to the target sensor to nowcast S * for each current time t * under a sliding window approach.
Once the input and output data from the DL model are defined, in this work, we propose two architectures of DL neural networks to nowcast the output power generation of the PV device, which have been shown as suitable configurations to sequence time series in sliding window approaches [48].

•
2LSTM. Two layers of LSTM which have been previously identified as a suitable configuration to nowcast energy load [49]. • 3CNN+2LSTM. Three layers of CNN are firstly integrated as spatial feature extractors. Next, two layers of LSTM model the temporal dependencies from CNN. The combination of CNN-LSTM hybrid networks has been selected due to providing encouraging results in modeling output power generation [50].
In Table 2, we include the parameters and layers for each proposed model.

Evaluation
In this section, we present the evaluation of our proposal. First we shall present the experimental setup, then the results obtained and, finally, we will discuss our proposal based on the results presented.

Experimental Setup
In this section, we describe the experimental setup and results of a case study developed in the University of Jaén (Spain), where the Opera Project and PV device were deployed. The IoT module which collected the photovoltaic data in real time within the Opera platform was running from the 9th of June to the 23rd of November 2019, generating data collection over 168 days. The location of the IoT module in the campus of the University of Jaén was (latitude: 37.787253, longitude: −3.776258).
In the experimental setup, five sensors, which were installed in the PV device, collected the following measures: irradiance, ambient temperature, module temperature, output current and output voltage, as described in Section 3. The output power generation to be estimated by the machine learning model was obtained using output current and output voltage according to the following equation: P = Vİ.
Both data and learning models are openly available to the scientific community at this GitHub repository: https://github.com/galmonacid/opera/. Below, we detail the configuration and results in nowcasting output power generation by several machine learning models.

•
Human-crafted features and multiple temporal windows. We evaluate the nowcasting performance of the following models with human-crafted features and multiple temporal windows and times with the configurations shown below: -Linear regression, with intercept = True. In order to nowcast output power generation from the ambient data collected in the PVS, we compared the predicted and ground truth in the tests using 30-fold cross validation. We note the ambient data from photovoltaic sources has been normalized using the max-min method in a previous learning stage.

Results
In this section, we describe the obtained results from the standard analytical method and the machine learning approaches described in the work.
Output power generation was collected by the IoT module representing the ground truth for evaluation purposes. The estimated output power generation for each model was based on data from ambient sensors. The prediction versus the ground truth for the full time-line of tests were compared using Mean Absolute Error (MAE), Root Mean Squared Error (RMSE) and coefficient of determination (R2). With the 30-fold cross validation configuration, we also computed learning time and evaluation time to assess the resource consumption of the models.
First, we evaluated the Araujo model which provides a base performance provided by the standard analytical method. The results from this baseline model are shown in Table 3: Second, we evaluated one of the data-driven approaches analyzed in this work: regressor models which nowcast energy generation by means of human-crafted features computed from sensor streams. The results are shown in Table 4 in terms of RMSE, MAE and R2 metrics. Furthermore, in order to evaluate the computational energy consumption, we have included a comparison of learning and evaluation time for the models based on human-crafted features in Table 5.
Third, we evaluated the data-driven approach based on deep learning. To compare the results with Araujo and models based on human-crafted features, we provide the comparison of the performance of DL models in terms of RMSE, MAE and R2 metrics in Table 6 and the learning and evaluation time in Table 7.
Finally, as summary of the results of the different models, in Table 8 we include a comparison between the different approaches: Araujo, the best-performing DL model (3CNN+2LSTM) and the best regressor among human-crafted feature approaches (random forest 90 min).
In order to provide a visual representation of the nowcasting of energy consumption, in Figure 6 we show a 2-day sample test comparing measured output power generation with the regressor models.

Discussion
In this work we describe an IoT module for collecting ambient sensor information and output energy consumption from the photovoltaic system deployed under the Opera Project. In order to evaluate the standard behavior of the system and to monitor its performance, we have focused on nowcasting output energy generation from the ambient sensor devices. To this end, two different approaches for machine learning models have been proposed: (i) human-crafted features and multiple temporal windows and (ii) deep learning for automatic feature extraction and learning.
Both approaches present encouraging performance in nowcasting output energy generation in the photovoltaic system based on data collected from ambient sensors; however, we highlight the model based on human-crafted features and multiple temporal windows for its lower learning time and best results. Specifically, we note: (i) the use of multiple imbalanced temporal windows increases nowcasting performance, (ii) random forest is the best regressor and (iii) kNN provides an excellent balance between learning time and results. Moreover, the use of kNN should be highly recommended for nowcasting energy generation in photovoltaic systems using fog-based approaches, where mini boards could perform the data learning in a short time using low computational resources and computational energy consumption.
In the case of DL approaches, the use of CNN+LSTM provides improved nowcasting performance when comparing the results with the Araujo analytical model. This fact is due to the automatic feature extraction generated by CNN, which summarizes the key patterns to nowcast output power generation, providing a remarkable improvement compared with only using LSTM. The performance of the DL model with 10-min segmentation increases compared to 5-min segmentation because short-term segmentation duplicates the number of input variables in the sequence of samples and the higher complexity of data reduces nowcasting performance. However, the human-crafted features model with imbalanced temporal windows has overtaken the performance of DL approaches and the Araujo analytical model, coming out as the leading model according to the results presented in this work.

Conclusions and Ongoing Works
In this work, an IoT module and data-driven models to nowcast output energy generation integrated in the Opera Digital Platform project have been described. The IoT module is based on Arduino and low-cost sensors which collect ambient and energy data sources in a photovoltaic system. The IoT module has collected the data presented in this work over 24 weeks.
Two approaches based on machine learning have been evaluated: (i) human-crafted features with multiple temporal windows, and (ii) deep learning models. CNN+LSTM, kNN and random forest provide better performance compared with the standard analytical model Araujo. In the case of CNN+LSTM, the advantage of DL is the lack of human intervention in feature definition. The performance of kNN is remarkable, with notably low learning time and providing fog integration capabilities in micro boards. Finally, random forest with incremental temporal windows had the highest performance in terms of error metrics.
A potential advance in this line of work would consist of an in-depth analysis of the diagnosis, typology and fail patterns in PV systems to predict these events by means of machine learning models. Funding: This contribution has been supported by the Cátedra ELAND for Renewable Energies of the University of Jaén and by the Spanish government by means of the project RTI2018-098979-A-I00 and the Action 1 (2019-2020) no. EI_TIC01 of the University of Jaén.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: