Evaluation of Transfer Learning and Fine-Tuning to Nowcast Energy Generation of Photovoltaic Systems in Different Climates

: New trends of Machine learning models are able to nowcast power generation overtaking the formulation-based standards. In this work, the capabilities of deep learning to predict energy generation over three different areas and deployments in the world are discussed. To this end, transfer learning from deep learning models to nowcast output power generation in photovoltaic systems is analyzed. First, data from three photovoltaic systems in different regions of Spain, Italy and India are uniﬁed under a common segmentation stage. Next, pretrained and non-pretrained models are evaluated in the same and different regions to analyze the transfer of knowledge between different deployments and areas. The use of pretrained models provides encouraging results which can be optimized with rearward learning of local data, providing more accurate models.


Introduction
Solar photovoltaics or PV have continued to experience remarkable growth in 2020 and the first half of 2021 despite the overall negative impact of the COVID-19 pandemic and several price increases affecting the processes and materials involved in the production of PV modules. Thus, cumulative solar capacity surpassed three quarters of a terawatt (TW) globally in 2020. Furthermore, during this year, solar was again the power generation technology with the largest net installed capacity.
However, the most likely scenario is that we shall enter the solar terawatt age in 2022. Finally, the Global Market Outlook for solar power 2021-2025 anticipates that total solar capacity will reach almost 2 TW by the end of 2025 [1].
Nowadays PV technology has become a solid choice for electricity generation, not only because of the advantages provided by green electricity, but even by producing electricity more cheaply than conventional sources in many cases. A record low generation cost of 1.32 US cents per kWh was reached in the second Portuguese auction in 2020 [1].
All these features, among others, are leading to a very significant growth in the global solar market, particularly in an aspect that is especially relevant to this work: distributed power generation. Data engineering and deep-learning are key techniques to find open and low-cost solutions, as powerful tools that are able to manage the smart grids that enable distributed generation [2,3].
Predictive maintenance and efficient operation of PV plants is necessary [4]. To this end, as a first step, accurate estimation of the available energy is required, especially in the coming hours [5]. Thus, having a reliable tool available to nowcast the energy generated by PV installations is the first essential step to achieving smart grids that allow distributed • Data collected in Jaen (Spain) using the IoT module to gather ambient sensor information and output power generated from the photovoltaic system deployed under the Opera Project [15]. • Data from the SolarTech Lab at the Polytechnic University of Milan proposed to compare the prediction in a different context. • Data from the Kaggle database from India region, to compare the prediction in a different weather region of the world (Dataset on PV system in India. Retrieved 1 March 2022, from https://www.kaggle.com/anikannal/solar-power-generationdata). • A homogeneous segmentation of data proposed to aggregate solar irradiance and temperature of sensor data streams from different contexts. • Two deep learning models based on CNN and LSTM proposed to nowcast the generated power. Energy estimation transfer capabilities between Spain and Italy evaluated with different configurations for learning, comparing training from scratch and pretrained models.
In the remainder of this article, we provide further detailed descriptions of the proposed approach. Section 1.1 reviews related works and the state of the art. Section 2 presents the proposed methodology for homogenizing source streams and translating the knowledge from different PV system contexts. Section 3 introduces the evaluation of the methodology analyzed in three real-world datasets. Finally, in Section 4, conclusions, ongoing and future works are discussed.
The completion of this work has resulted into three major contributions in the field: (a) a configuration of a deep learning model which provides good performance when reused across different contexts with no reconfiguration, (b) a methodology to follow in order to reuse deep learning models across different domains, and (c) a baseline for pretrained models error and time metrics.

Related Works
Over the last twenty years, PV systems have been integrated as efficient sources of energy generation and this technology has reached a significant maturity level in several regions, such as America [17,18], Australia [19], Asia [20] and Europe [21]. The role of Machine Learning has enabled a relevant number of applications in PV installations [22].
Moreover, the great growth in the contribution of energy from renewable sources, in general, to the electricity grid has led to new challenges for distribution networks, such as distributed generation and smart grids. In particular, as already mentioned, PV energy-with a 39% share of all new installations-had the greatest growth in new installed capacity over the past year [1]. Data engineering and artificial intelligence are therefore essential to manage these resources and meet these challenges. Solar energy is contributing greatly to achieving distributed generation [23][24][25]. But it will only be possible with new management tools in the electricity grid-in other words: Smart Grids [26][27][28]. Furthermore, new applications that are geared towards individual customers are being developed, enhancing trends such as self-consumption, off-grid generation taking account of new advances in electricity storage, etc. [29][30][31][32][33][34][35]. In addition, the considerable increase in the number of customers that, not only consume, but generate energy (prosumers) also highlights the need for powerful but simple tools to enable these individual customers to operate, control and maintain their installations [36][37][38].
New approaches are exploring Internet of Things architectures [15] together with new information technologies such as big data or business intelligence [39]. Prior to data-driven models, the common method to estimate the energy generated by the cells and by the PV system was based on the physical process of the solar cell in order to define the analytical equation to obtain its electrical parameter [40]. These models followed different approaches with notable results [16,41]. However, the use of machine learning models developed from supervised data under data-driven approaches have overtaken previous models' performance in nowcasting output energy generation [42][43][44].
First, Deep learning (DL) has become a prolific research field on the basis of temporal series [9]. Long Short-Term Memory (LSTM) [10] is a type of recurrent neural network that includes a memory and is designed to learn from sequence data, such as sequences of observations over time. The use of LSTM is suitable for predictions from sensor data [45] obtaining encouraging results in several fields, such as activity recognition [46] or estimating building energy consumption [47]. In PV forecasting, LSTM has shown encouraging results in the state of art in terms of performance [48]. Additionally, Convolutional Neural Networks (CNNs) have been combined to extract spatial features in time series achieving promising results together with LSTMs models [49], particularly in nowcasting the output energy generation of PV systems [42]. The DL models for nowcasting PV output have provided highly accurate learning that includes recognition of the threshold of solar irradiance which overcomes the non-zero energy generation in different weather circumstances.
Second, in a general sense, transfer learning is a methodology used to translate learning models from previous patterns recognized in a given domain to another domain. In this broad definition, homogeneous transfer learning refers to predictive modelling where the domain and input feature space is the same [50]. On the other hand, heterogeneous transfer learning aims to align the input data between different source and target domains which are represented in different feature spaces [50]. This work falls in the middle of these two approaches: it is homogeneous to the extent that the input feature space of signals to estimate output energy generation is the same; however, the collection rate and representation of data between PV systems differ. To address this, a common segmentation method to process the sensor stream is proposed.
Third, the outcomes of transfer learning are broad. For example, instance-based approaches separate samples from target and source domain samples in order to be used in the target domain to improve training performance [51]. Parameter-based approaches are focused on transferring knowledge by means of sharing structures from a latent space from the source domain [52]. This kind of transfer learning is strongly related to DL approaches, since the weights which configure the learned kernels from network layers describe hypothetically common patterns from the target domain. So, in spite of using initialized to zero or random weights, the learned network is used as a starting point for learning, proceeding with fine-tuned learning of the target domain [53]. These approaches have provided encouraging results in reduction of learning time and increased performance, above all, in computer vision problems [54].
Finally, in time-series forecasting, the use of DL and transfer learning [55] has been applied successfully in different scenarios, such as indoor temperature forecasting [56]. In the context of PV systems, transfer learning capabilities have been newly evaluated [57] with promising results. LSTM has been proposed to reduce prediction errors with pretrained initialization of weights in a case study limited to the same region (China), with deployment of the same plant and using the same data segmentation on different days for evaluation. In other areas of photovoltaics, machine learning has been proposed to diagnose: (i) the fault detection of PV systems [58], (ii) the grid connection of PV architectures [59] or (iii) the module defects [60], for example, using aerial images [61], infrared images [62].
Based on the previous works described, in this work we aim to analyze the capabilities of transfer learning to nowcast energy generation so as to reach general prediction models which estimate the performance of PV systems. Our goal is to transfer knowledge from different PV plants, in different areas and with different collection rates of sensors in two dataset domains.

Methodology
In this section, we describe the proposed methodology for evaluating the capabilities of transfer learning for the purpose of nowcasting the energy generation of photovoltaic systems in three different datasets, from the Univer Project at the University of Jaen, from SolarTech Lab at the Polytechnic University of Milan and from the India region at the Kaggle database. Our approach is based on the following key points: (i) evaluating two different datasets from different regions of the world using a segmentation method to homogenize the input from both domains, (ii) proposing CNN and LSTM models to nowcast the energy generation of the PV systems when learning from scratch versus pretrained networks with fine-tuned learning from other domains.

Data Collection of PV Systems from Heterogeneous Sources
The data for this work has been collected from two different sources. The first one is the Univer project of the University of Jaen. This dataset has been generated capturing the signals below with a frequency of 30 s, from June to November 2019. The data collected from Univer are: • Time recordings expressed in Central European Time (CET). • Global irradiance in plane of array (W/m 2 ) measured by an IoT device installed on the PV system. • Ambient temperature (°C) measured by an IoT device installed on the PV system. • Module temperature (°C) measured by an IoT device installed on the PV system. • Output Power (W) from the PV module, measured at the inverter of the system.
A sample row of the Univer data file is shown in Figure 1. The second dataset includes PV power production measured in the SolarTech Lab, Polytechnic University of Milan, Italy. The dataset spans a whole year, from January to December 2017, and is composed of the following variables and specifics, with a time resolution of 1 min ("NaN" is reported when a value is missing in the original measurement recording): A sample row of the SolarTech data file is shown in Figure 2. The datasets from both systems present some commonalities that enable us to carry out the experiment by using the same signals in the deep learning model. However, to make sure we can extract significant conclusions from the results, the datasets need to be homogeneous. The following actions have been taken to achieve this: Aggregating the data to a 10 min average and defining a segmentation as described in Section 2.2.

Segmentation to Nowcast Power Generation
Following a formal definition, a sensor s collects data in real time in the form of a pair s i = {s i , t i }, where s i represents a given measurement and t i the timestamp. Thus, the data stream of the sensor source s is defined by S s = {s 0 , . . . , s i } and a given value in a timestamp t i by S s (t i ) = s i . In this work, irradiance on the PV surface G I , ambient temperature T am and PV output power generation P A provide three data streams which describe the behavior and energy production of the PV system.
In order to homogenize the data collected by the different sensors, we define several simetric temporal sliding windows. They are defined by the window size of a time interval , segment the samples of a given sensor stream S s and aggregate the values s i by means of an aggregation function T t (S s , W w , t * ): So, the aggregation s i from the sensor data s i applied over a short time interval W w = [W − w , W + w ] represents the relevant value in a given point of time t * . Using the aggregation of data, the signal segmentation is defined by several sliding temporal windows of short size, which are defined by the temporal granularity ∆. The data in the temporal window are aggregated by the aggregation function. In concrete, the data within each temporal window are averaged = µ for each short-term temporal window within the segment defined in the time interval. The average provides a strong aggrega-tion function in order to homogenize heterogeneous raw sensor sources in case they are provided by different collection rates.
So, we obtain a sequence of data for each sensor source, whose sequence size is the same for all sources S s : We note the different sensor streams S 1 , . . . , S |S| , are: (i) aligned in the same point of time t * , (ii) segmented in homogeneous sliding temporal windows . . , [∆, 2∆]}, and (iii) represented by a feature vector of unique size which aggregates previous values for each sensor stream S 1 (t * ).
In Figure 3, we describe a visual representation of the segmentation of a sensor stream defined by sliding temporal windows to homogenize different frequencies of data collection from heterogeneous sources. In this work, the segmentation of input and output signals from the PV system is developed for: Ambient Temperature and Global Irradiance as the input signals and Output Power as the output. This process of aggregating and segmenting the data is key in the transfer learning approach, since the sample rates are generally different between the domains. In the two contexts of this work, the input data is configured as follows: • Univer Dataset. The signals are aggregated from the original 30 s sample to a 10 min average and then segmented in a 90 min sliding window. • SolarTech Dataset. The signals are aggregated from the original 1 min sample to a 10 min average and then segmented in a 90 min sliding window.
At the end of this stage, the heterogeneous data from both domains have a homogeneous representation of input and output signals.

Transferring Knowledge by Weight Initialization and Fine-Tuned Learning between Domains
As we detailed previously, the aim of this work is to evaluate the capabilities of transfer learning for the purpose of nowcasting the energy consumption of PV systems between domains. This leads to the following questions: • How accurate are the learning models which are trained in their own domain (climate) in estimating the energy generation of PV systems? • What is the difference in performance when nowcasting energy generation with an unseen domain which has been trained in another plant, climate or deployment? • Is the structure of DL models (CNN and LSTM) effective for transferring and optimizing the weights in estimating the energy generation of PV systems from one climate to another?
In order to address these questions, after providing the unified representation of data from the previous section, we have evaluated three learning combinations for models, consisting of: • (A) Bare learning between domains, where isolated learning and evaluation is performed between domains. This standard learning provides baseline performance data from the same domain. • (B) External learning between domains, where the learning occurs in one given domain but the evaluation is carried out in a different one. The performance of this complex learning describes the transferability of DL structures when dealing with different unknown domains. • (C) Fine-tuned learning in the target domain from a pretrained network from another climate. Under this learning configuration, performance can improve or deteriorate with regards to the baseline (bare learning). If there is improvement, the patterns and kernels computed in DL layers provide general information on energy generation which can be optimized with local data to generate a sturdier model.
In Figure 4, we detail the configuration of data and model training in the bare, external and fine-tuned learning scenarios. In Table 1, we describe the configuration of the learning model. Three layers of CNN are firstly integrated as spatial feature extractors. Next, two layers of LSTM model the temporal dependencies from the CNN. This combination of CNN-LSTM hybrid networks has been selected as it has provided encouraging results in modeling output power generation as described in [15].

Results
In this section, we describe the results from the implementation of transfer learning capabilities of DL models for the purpose of nowcasting and fine-tuning the energy generation of PV systems in different contexts.
The transfer learning capabilities were studied first in the SolarTech Lab context following the different domain combinations defined in the previous Section 2.3. The CNN+LSTM model also described in Section 2.3 was first trained and validated on the SolarTech dataset in a bare learning scenario. Then, it was trained on the Univer dataset and validated on the SolarTech dataset, to evaluate the performance of the external learning scenario. Finally, the model trained on the Univer dataset was optimised by training again on the SolarTech dataset and later validated on SolarTech. The results of these three scenarios were compared using the NMSE (Normalised Mean Square Error) and NMAE (Normalised Mean Absolute Error) error metrics. These results are shown in Table 2. Afterwards, the same model and scenarios were tested in the Univer context. The bare learning model was implemented with training and validation on the Univer dataset. For the external learning scenario, the model was trained on SolarTech and validated on Univer. And finally, for the fine-tuned scenario, the model previously trained on SolarTech was optimised and validated on the Univer dataset. Table 3 shows the results of the experiment in this context using the same error metrics as in the previous context (NMSE and NMAE). To have a sense of the performance of the model in the different scenarios when trying to nowcast the output power generation for each of the systems, the denormalized RMSE (Root Mean Square Error) and MAE (Mean Average Error) error metrics are shown in Table 4 together with the maximum value of the Output Power seen in the datasets for each domain. Along with the error metrics described so far in this section, time metrics were captured as well for all the combinations to assess the performance of the model in a transfer learning from a time complexity perspective. First, in order to evaluate the improvement in the learning rate of a fine-tuned model, we have used the loss function during the training and validation of the model to compare the number of epochs needed to achieve the same performance in different scenarios. The comparison has been done for the bare learning and fine-tuned scenarios on SolarTech, setting two thresholds of different orders of magnitude of the loss function NMSE < 1% and NMSE < 0.1%, and comparing the learning time in number of epochs (number of passes of the model over the data) for the two scenarios. In Table 5 we present the results of this experiment. In order to provide a visual representation of the differences in learning rates, Figure 5 shows the improvement of the error metric NMSE over the number of epochs, during training and validation of the model for both bare learning and fine-tuned learning combinations in the SolarTech context. Also from a time complexity perspective, the decrease in the learning time of the model was measured, comparing the total training time in seconds for the bare learning and fine-tuned learning scenarios in the SolarTech context. Table 6 shows the results of this comparison.  Finally, to broaden the understanding of the transfer learning capability of the deep learning model, and in order to assess to what extent the knowledge can be transferred to more disparate contexts, a third dataset was also evaluated as part of this work, which presented very different conditions to the two homogeneous datasets used for the previous experiments: • It is a dataset generated by a PV plant in India, a different climate region from the two previous PV systems (Spain and Italy). • The dataset spans 31 days across May and June, as opposed to the six month length of the Univer and SolarTech datasets. • The data collection for the PV input and output signals has a frequency of 15 min, so the segmentation 135 min window The results of the experiment in this third context are presented in Table 7, which shows the normalised error metrics, and Table 8, which shows the denormalized RMSE (Root Mean Square Error) and MAE (Mean Average Error) error metrics together with the maximum value of the Output Power.

Discussion
In this work we describe the reuse of deep learning models through transfer learning. In order to evaluate the nowcasting capabilities of a neural network model in unseen domains, we have tested the model on a combination of datasets from two different power plants (Univer PV System in the University of Jaen, Spain, and SolarTech Lab PV system in the Polytechnic University of Milan).
It is relevant to highlight that the deep learning model was reused across domains without having to change any configuration or parameters of the model itself specific for each of the domains. This can be considered one of the main advantages of this approach compared to the traditional monitoring models, which typically rely on the configuration of the model with the physical parameters of the PV system.
The results of the experiment show that the CNN+LSTM model performs well in the external learning scenario, with the ability to nowcast the output power generation of the SolarTech Lab system with a small margin of error after being trained on the Univer dataset. This margin of error was only 0.25 percentage points larger than the baseline error calculated with the model trained on the SolarTech system. The best results were obtained when applying the fine-tuned learning. The model was pretrained on the Univer dataset and then optimized on local data, resulting in nowcasting the output power generation of the SolarTech system with the smallest margin of error of all the combinations included in the experiment. This margin of error was 0.03 percentage points smaller than the baseline error.
In the Univer context, we obtained a larger margin of error for the model in the external learning scenario (model trained on the SolarTech dataset and validated on Univer), with a 5.66 percentage point increase in normalised mean square error. In the fine-tuned learning scenario, the margin of error was smaller than the margin calculated in the bare learning scenario, with a normalised mean square error 0.05 percentage points smaller than the baseline error.
We observed an improvement in the learning rate and learning time as well in this fine-tuned learning scenario. It required only one epoch with the pretrained model but 7 epochs with the bare model to achieve the same performance of NMAE < 1%. To get to a smaller error metric NMAE < 0.1%, the fine-tuned model only needed 7 epochs compared to the 39 in the bare learning scenario. In terms of total learning time, it took only 40.43 s to train the model in the fine-tuned learning scenario compared to 81.82 s in the bare learning scenario.

Limitations of the Work
The results described in this work have shown an improvement in the use of transfer learning in nowcasting output energy generation of PV systems. However, some limitations, which could be faced in ongoing works, are noted:

•
The potential optimization between extreme different climates could worsen by transfer learning. Learning from the Mediterranean (Spain) and optimizing with India has provided an improvement of the performance, but deeper data and evaluation is required to evaluate the impact of regions. • Aggregating and segmenting the raw data in 10 min granularity is key to homogenizing the data. Datasets with a higher collection rate could not be evaluated. • Sensoring of ambient PV plants in three datasets have been developed with high quality architectures and devices, so the results and patterns seems stable between them. Differences in precision from data collected by low resolution devices are hard to be optimized by transfer learning.

Conclusions and Future Works
In this work we have evaluated the transfer learning of a deep learning model, built with convolutional and long short-term memory layers, for the purpose of nowcasting photovoltaic power generation in two different domains: the Univer PV system in the University of Jaen and the SolarTech Lab in the Polytechnic University of Milan. The results highlight the strength of this approach when working with heterogeneous sources. Using the bare learning scenario from each of the domains as the baseline, a larger error margin is measured in the external learning scenario, i.e., training the model on one domain and validating the model on the other one. However, the model showed better performance in the fine-tuned learning scenario than the baseline, obtaining a smaller error margin when a pretrained model from a different context is used to run the training and validation in the local context. Finally, an improvement is also observed when measuring the learning time of the fine-tuned learning model compared to the bare learning model.
In ongoing works, we will create a new architecture to include transfer learning in real time systems on demand [63]. To this end, we will include a clustering method to estimate seasonality for long time periods and select clusters of patterns, where the nowcasting models are uploaded based on the the weather context for each day.
Modern information and communication technologies, such as deep learning, are playing a key role in policy design, decision-making, implementation and final productive services, i.e., policy aimed at boosting self-consumption, smart cities, cybersecurity, etc. This may be another task for future work. Acknowledgments: This contribution has been supported by the Cátedra ELAND for Renewable Energies of the University of Jaén, by the Spanish government by means of the project RTI2018-098979-A-I00. This work has been partially funded by "La Conselleria de Innovación, Universidades, Ciencia y Sociedad Digital", under the project "Development of an architecture based on machine learning and data mining techniques for the prediction of indicators in the diagnosis and intervention of autism spectrum disorder. AICO/2020/117".

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: