Estimation of Soiling Losses from an Experimental Photovoltaic Plant Using Artificial Intelligence Techniques

Fossil fuels and their use to generate energy have multiple disadvantages, with renewable energies being presented as an alternative to this situation. Among them is photovoltaic solar energy, which requires solar installations that are capable of producing energy in an optimal way. These installations will have specific characteristics according to their location and meteorological variables of the place, one of these factors being soiling. Soiling generates energy losses, diminishing the plant’s performance, making it difficult to estimate the losses due to deposited soiling and to measure the amount of soiling if it is not done using very economically expensive devices, such as high-performance particle counters. In this work, these losses have been estimated with artificial intelligence techniques, using meteorological variables, commonly measured in a plant of these characteristics. The study consists of two tests, depending on whether or not the short circuit current (Isc) has been included, obtaining a maximum normalized root mean square error (nRMSE) lower than 7%, a correlation coefficient (R) higher than 0.9, as well as a practically zero normalized mean bias error (nMBE).


Introduction
Of all the renewable energy sources, solar photovoltaic (PV) energy is considered one of the best options for generating clean energy [1,2]. This type of technology is free of the polluting emissions that cause the greenhouse effect [3]. On studying the energy demand worldwide and the possibility of supplying it with solar photovoltaic energy technology, one can say that it is feasible since about four million exajoules (EJ = 10 18 J) of solar energy is received each year, of which, 5 × 10 4 EJ can be used (1.25 × 10 −20 %). Furthermore, much of the landmass is capable of harnessing it. Consequently, if one has the facilities and the appropriate form of production, this objective could be achieved, even if, at the moment, its contribution to worldwide supply is very low [4].
In Spain, the southern zone is in one of the areas receiving the most solar radiation. Nonetheless, the country's energy policy has not encouraged the development and application of solar technology. The economic and socials situation experienced in previous years has meant that the necessary security for promoting this type of technology has been absent [5]. However, this is changing, and not only in Spain-solar energy generated by photovoltaic plants is being promoted more and more as an energy source, with the number of plants increasing globally [6]. These installations greatly improve issues related to the economy and the environment [7].
Large-scale photovoltaic plants are often sited in desert locations to maximize potential energy production [8]. Once the location has been determined, there are many factors that influence the plant's production-the type of technology used, the orientation of the panels and the tilt angles, etc., are some of the variables that need to be considered during the pre-installation work. To this end, there are numerous studies that show the appropriate configurations, depending on the location and/or weather conditions. In this regard, it is worth mentioning the direct relationship between the atmospheric variables and plant optimization-irradiance, ambient temperature, relative humidity, and pressure, etc., are variables that must be taken into account when determining the best locations (within the same region of interest). Studies on this subject have shown that such variables are related to the place and circumstances where the plant is located [9,10], as well as the possible presence (or not) of soiling, common in desert and other areas that may be in the path of intense soiling currents [11,12]. In general, dust is the most studied pollutant to analyze the performance of solar photovoltaic systems. Dust soiling is a factor that has appeared and has been studied as photovoltaic solar systems have been built. Specifically, this phenomenon appears by covering the photovoltaic panels with a layer of dirt (causing deposition on the panels [13]), reducing the performance, and, consequently, the electrical production of each photovoltaic panel [14,15]. Therefore, soiling becomes a fundamental issue in terms of optimization [16] and, thus, on photovoltaic plant production. The accumulation of soiling and dirt makes it more difficult for radiation to reach the panels, thus reducing performance [17,18]. Desert climates are areas of the planet where these phenomena occur most, with particles being able to travel far from their place of origin [19]. As for the cycles of these currents, one could say that in the east of the European continent, they occur at the end of winters and in the west from June to September [20,21]. However, there are other phenomena that can affect the production of a solar photovoltaic system. The location of the system is fundamental since it can be in the vicinity of an industry with waste in the atmosphere that can deposit on the panels, as is the case of mining in Chile [22], where the contaminants emitted into the atmosphere are usually deposited on the roof of the photovoltaic panels of the adjacent solar plants. Pollen is usually another phenomenon that is transported in the environment and, in rural areas, can end up precipitating on the field of photovoltaic panels whenever they are in nearby areas. In addition, dirt due to animal droppings are another cause of fouling and the need for programmed maintenance.
The studies looking at this topic have presented multiple conclusions, each related to the conditions occurring at the place and time that each study was carried out [16]. They have studied the importance of the soiling molecule size as well as the optimal inclination angle of the photovoltaic panels. However, the most important and fundamental issue of the work is the relationship between the short-circuit current and soiling accumulation [23,24]-a very important indicator to consider if the losses caused by fouling are to be predicted [23]. By comparing two photovoltaic panels, with and without maintenance, it can be seen how the value of the short circuit current is greater when maintenance is carried out on the panel. The more radiation reaches the panels, the greater the difference between the two currents [23]. It has been shown that soiling has a very representative effect on the short circuit current, decreasing its value considerably. In the IV curves, this effect can be seen, comparing a curve of a panel with maintenance and another without maintenance. The current is the most affected because it has a linear relationship with the radiation, while the voltage hardly notices any difference. When receiving less radiation, this value will decrease. Therefore, the soiling significantly affects the degradation of the panels, affecting the cost of the plant and avoiding the expected profits. Photovoltaic plants are exposed to episodes of soiling, contamination, or meteorological phenomena that reduce performance (past African) (simulation). Knowing how the accumulation of soiling affects the installation is fundamental when establishing correct maintenance of it [25,26]. For a correct cleaning, besides the moment in which it is done, it is fundamental how to carry out this task. If manual cleaning is chosen, it is necessary to know that such cleaning can damage the panels, besides allowing the permanence of small particles. The use of detergents presents multiple disadvantages for the environment and the panel itself, in view of the danger of corrosion. All of this leads to a more automatic process to avoid situations of this type [17]. Therefore, it is necessary to have information on the state of soiling of a plant, to determine the ideal time to clean the photovoltaic panels, thus saving natural or economic resources. Therefore, if a maintenance plan for the panels is to be effective, it is essential to be able to know how soiling affects the installation. This maintenance must be optimal, and for this, it is necessary to know the right time to perform it [27].
Given the importance of soiling on performance, this study has estimated the losses due to dust soiling in a photovoltaic plant, using meteorological variables as well as those specific to the installation, such as the short-circuit current (Isc), the module temperature (Tpanel), global irradiation (Iglo), relative humidity (RH), ambient temperature (Tamb), atmospheric pressure (P), and solar altitude (α). These were then combined with artificial neural networks. In this work, it has been modeled the soiling losses incurred by an experimental pilot plant installed at the Solar Energy Research Center (CIESOL). It has been mainly compared measurements from two panels, one receiving maintenance and the other not. Subsequently, artificial intelligence techniques, such as artificial neural networks, were employed to estimate soiling losses from parameters commonly measured in any photovoltaic plant.

Photovoltaic System
This study was carried out in an experimental photovoltaic plant located in the Solar Energy Research Center (CIESOL) at the University of Almería (Spain), where numerous studies are being conducted with a view to integrating solar energy into different areas. The emplacement is located in southeast Spain, with a Mediterranean climate (36.8 • N, 2.4 • W, at sea level). The photovoltaic system began operating on 24 July 2019. Figure 1 shows a picture of the photovoltaic plant. this type [17]. Therefore, it is necessary to have information on the state of soiling of a plant, to determine the ideal time to clean the photovoltaic panels, thus saving natural or economic resources. Therefore, if a maintenance plan for the panels is to be effective, it is essential to be able to know how soiling affects the installation. This maintenance must be optimal, and for this, it is necessary to know the right time to perform it [27]. Given the importance of soiling on performance, this study has estimated the losses due to dust soiling in a photovoltaic plant, using meteorological variables as well as those specific to the installation, such as the short-circuit current (Isc), the module temperature (Tpanel), global irradiation (Iglo), relative humidity (RH), ambient temperature (Tamb), atmospheric pressure (P), and solar altitude (α). These were then combined with artificial neural networks. In this work, it has been modeled the soiling losses incurred by an experimental pilot plant installed at the Solar Energy Research Center (CIESOL). It has been mainly compared measurements from two panels, one receiving maintenance and the other not. Subsequently, artificial intelligence techniques, such as artificial neural networks, were employed to estimate soiling losses from parameters commonly measured in any photovoltaic plant.

Photovoltaic system
This study was carried out in an experimental photovoltaic plant located in the Solar Energy Research Center (CIESOL) at the University of Almería (Spain), where numerous studies are being conducted with a view to integrating solar energy into different areas. The emplacement is located in southeast Spain, with a Mediterranean climate (36.8° N, 2.4° W, at sea level). The photovoltaic system began operating on 24 July 2019. Figure 1 shows a picture of the photovoltaic plant. The image shows the photovoltaic panels along with the other components and sensors with which the installation is equipped. The plant has four south-oriented photovoltaic panels (ATERSA model A222-P), two on the north side of the roof and two on the south side. For the work, only the two south-side panels were used. In this sense, only two panels were compared because the dust response was similar in nearby locations. The limitation that it could have was that they have only been analyzed at an inclination (22°) The image shows the photovoltaic panels along with the other components and sensors with which the installation is equipped. The plant has four south-oriented photovoltaic panels (ATERSA model A222-P), two on the north side of the roof and two on the south side. For the work, only the two south-side panels were used. In this sense, only two panels were compared because the dust response was similar in nearby locations. The limitation that it could have was that they have only been analyzed at an inclination (22 • ) to make them coincide with the inclination of a 10 kWp photovoltaic plant precisely in a higher plant. After attending to different sensitivity analyses of variations between panels, it was obtained that the Isc is one of the variables most affected by dust; therefore, having only two panels was enough to quantify how the Isc value could vary over time when one panel had frequent maintenance and the other, no maintenance at all. As said, one of these panels was periodically cleaned while the other was not. To perform maintenance and cleaning of panels, the panel with maintenance was cleaned with water and a soft cloth. Subsequently, they were dried with paper, and this was done daily. This allowed us to compare the short-circuit current measurements obtained from each.
Two Pt100 temperature sensors were positioned on each panel, providing us with the panel temperature. One of them was located on the top rear part of the photovoltaic module, whilst the other was located in the bottom rear part. When working with this data, the average temperature of both sensors is calculated to give a representative value. This was done exclusively on the unmaintained panel since the temperature sensor located in the southern area of the maintenance panel provided erroneous data (a sensor defect) and, therefore, was not used. On the back of the panels, there were also shunts (Shunt 15 A/150 mV KL.0.5 KAYNOS), from which the short-circuit current was obtained as a voltage measurement (in mV). Located to the right of the panels on the metal support structure are the calibrated cells that provide global irradiance data on the array plane (at the same inclination as the panels). One of the cells was maintained in an optimally clean condition, while the other was not maintained at all. Finally, the installation had a sensor that measures both the ambient temperature and the relative humidity, as well as a barometer that measured the atmospheric pressure. Table 1 shows the information concerning the sensors and measuring ranges, thus presenting a metrological analysis of the sensors.

Data Processing
Thanks to the different sensors that the plant had, its behavior was monitored, recording values every minute. The sensors were connected to a data acquisition system, a datalogger, which in turn was connected to a computer server that stored the records. The data used were gathered from 24 July 2019 to 14 February 2020, a period in which the southern panels had a 22-degree inclination. The main objective was to create a solid database to provide sufficient valid measurements to accurately and optimally model the variable to be estimated. To do this, first, the night-time data was eliminated thanks to the solar altitude angle (formed by the straight line from the earth's surface to the sun), its value being zero at the beginning and end of the day. In meteorological studies, this parameter is of great importance and so is stored along with the other variables. After eliminating these night-time values, any erroneous values caused by current failure, data acquisition errors, and damage to sensors, etc. were filtered out.

Measurement Calibration for Standardization
Leaving aside the filtering, the next phase consisted of normalizing the irradiance measurements on the array plane and the short-circuit current. This was necessary because the panels, despite being equal, did not give exactly the same measurements. If both measurements were identical, the following step would not have been necessary. To correct this error, both the calibrated cell and the short-circuit current from the unmaintained panel were multiplied by two correction factors. These factors were calculated from reference measurements taken at midday on the first optimal day of plant operation under standard conditions. With these data, it was known, minute by minute, what loss originated from soiling accumulation on the unmaintained photovoltaic panel since the two panels were now calibrated; therefore, the final difference between the values from the two panels would be due only to soiling. In this work, the influence of soiling was very significant on the short-circuit current. To quantify the instantaneous losses as a percentage, the mathematical expression shown in Equation (1) was used [28].  (1) Once the standardization had been carried out, both panels should have always given the same measurements unless one of them was affected by an issue that does not occur in the other, as in whether they were maintained or not. If standardization was guaranteed, the only difference possible was that caused by soiling; therefore, this was the value used to model the equation. The soiling factor that could be extrapolated to a real plant would be that of the panel without maintenance, as panels can go years without receiving such maintenance.

Correlation between the Variables and Soiling Losses
In this study, it was fundamental to understand the relationships between the variables in order to study their dependencies. For this purpose, the Pearson correlation coefficient was used, which is widely used, especially in studies modeling atmospheric and/or meteorological variables. With the idea of estimating the result of a model in a simple and abbreviated way, this coefficient allowed us to know the parameters that most influence the output variable [29]. The coefficient values range from −1 to 1, with the correlation being positive when it is greater than zero and negative when it is less. If the values coincide with the previous figures, the correlation is perfect, being nonlinear when the null value is obtained.
In this section, the correlation was studied between the meteorological and installationspecific variables and losses caused by soiling. Knowing this is of fundamental importance. Having measurements that are perfectly standardized and calibrated means that any possible differences between the panels (with or without maintenance) were the result of a soiling layer (or film) on the unmaintained panel-this will be the loss that occurs due to soiling on the photovoltaic panel. Figure 2 shows the dispersion diagrams of each variable against the instantaneous soiling losses, while Table 2 shows the correlations between the instantaneous soiling losses and the different variables.  No single variable was highlighted as having a strong correlation with soiling losses because it is complex to model them. The variables with the highest correlation were ambient temperature, pressure, and panel temperature, but none stood out; indeed, they indicated rather moderate relationships. The rest of the values were very close to zero, so there would be no correlation with soiling losses in any of these cases. Relative humidity and pressure had negative values, unlike the others, which were positively correlated. The graphs support these parameters, as only the ambient temperature shows some correlation with the losses. Therefore, there was a circumstance in which there was no linearity between the input and output variables that would be obtained in a future model.

Model Development
In terms of common tools for estimating a variable, artificial neural networks (ANN) occupy an important place. They are part of "Machine Learning" technology and have a remarkable capacity for learning patterns, making it possible to estimate an output variable autonomously. As for the network structure, first of all, there is the input layer, where the records are introduced for each of the variables to be considered as part of the input parameter. Then there is the hidden layer, where the interconnections between the variables are made, allowing conclusions to be drawn and the variable to be estimated in the output layer. The creation of the networks is comprised of two stages-training and validation. In the training stage, the input variables and the real output variable are introduced so that the network learns to estimate these values according to the inputs available. To check if it is effective and capable of predicting the correct values, other different input data are introduced in the validation phase, for which the network has to calculate the output values [30].
In this study, soiling losses of a photovoltaic panel are modeled using two possible alternatives: the first did not contemplate the introduction of the short circuit current from the unmaintained panel as the red neuronal input (model 1), whereas the second did (model 2), the aim being to determine the best solution and thus provide a more accurate model.
With this information, the final objective is to construct an optimal artificial neural network with the least number of input variables and the minimum execution time, where the toolbox 'Neural Net Fitting' from MATLAB [31] and the Levenberg-Marquardt algorithm were used to create the ANN functions. In order to carry out correct modeling of the recorded data, 80% of the records have been assigned to the training phase and 20% to the validation phase, that is to say, of the total data, 101,633, 81,306 have been assigned for training and 20,327 for validation. Within the training, MATLAB handles with certain inflexibility the training and validation patterns. Although, of the 80% of the data used for training, MATLAB requires using at least 5% for testing and 5% for validation (both internal to the training). In this case, that configuration was determined, where 90% (of the 80% of data for training) has been used for training and the rest for internal validations. It should be noted that always 20% of the data was independent to make a validation of the models that will have nothing to do with the training data set.
In the hidden layer, there were a series of neurons, and each neuron had an exit and several entrances. The output of all the neurons in one layer are the inputs of each of the neurons in the next layer, so each neuron calculates its output on the basis of all these inputs. The function that relates the output of the neuron to the input is defined by the weights that each neuron has with each input. These weights are defined during training.
To achieve this, the modeling process began by creating a network with the maximum number of inputs using all the available variables. Subsequently, each variable was removed, one by one, to observe the errors made by the network in the training, test and validation phases, eliminating only the variable where the error is lowest. This operation is repeated until an optimal set was found to model the variable in question. The statistical variables help to decide these situations. When the best network is selected, it will have to be adjusted according to the number of hidden neurons; this serves as an optimization Appl. Sci. 2021, 11, 1516 8 of 18 task for the resulting network. With more neurons in the hidden layer, the interconnections between the variables are increased, although this does not always make the network more optimal as the duration is usually quite long. Therefore, it is possible that better results are obtained with a network with fewer interconnections. The selection criteria have been used to obtain a value of MSE as close to zero as possible, as well as a value of R as close to one, trying to have the network have as few iterations as possible.

ANN Model 1
Firstly, the soiling losses were modeled in the absence of the short-circuit current of the unmaintained panel. Figure 3 shows the structure of the initial network.
removed, one by one, to observe the errors made by the network in the training, test and validation phases, eliminating only the variable where the error is lowest. This operation is repeated until an optimal set was found to model the variable in question. The statistical variables help to decide these situations. When the best network is selected, it will have to be adjusted according to the number of hidden neurons; this serves as an optimization task for the resulting network. With more neurons in the hidden layer, the interconnections between the variables are increased, although this does not always make the network more optimal as the duration is usually quite long. Therefore, it is possible that better results are obtained with a network with fewer interconnections. The selection criteria have been used to obtain a value of MSE as close to zero as possible, as well as a value of R as close to one, trying to have the network have as few iterations as possible.

ANN Model 1
Firstly, the soiling losses were modeled in the absence of the short-circuit current of the unmaintained panel. Figure 3 shows the structure of the initial network. Observing the first table, the results of the 16 tests carried out are appreciated, divided into three blocks according to the number of variables that have been eliminated as possible entries. The first block includes network 1 to 7, the second from network 8 to 12, and the last from 13 to 16. The idea was to know those dispensable variables and eliminate them. In the case of the first block, the network that offered the best result was 3, and therefore the pressure was eliminated as an input variable. This choice is due to the fact that the mean square errors present values close to 3% and regression coefficients very close to or higher than 0.72. In the second block, network 10 was selected as the most suitable since it has the highest regression values in the three training phases. Therefore, in the third block, pressure and relative humidity would be excluded as inputs. In the Observing the first table, the results of the 16 tests carried out are appreciated, divided into three blocks according to the number of variables that have been eliminated as possible entries. The first block includes network 1 to 7, the second from network 8 to 12, and the last from 13 to 16. The idea was to know those dispensable variables and eliminate them. In the case of the first block, the network that offered the best result was 3, and therefore the pressure was eliminated as an input variable. This choice is due to the fact that the mean square errors present values close to 3% and regression coefficients very close to or higher than 0.72. In the second block, network 10 was selected as the most suitable since it has the highest regression values in the three training phases. Therefore, in the third block, pressure and relative humidity would be excluded as inputs. In the third block networks 13 and 16 showed very similar results, although slightly better by eliminating the panel temperature as an input. It was not considered to model without this temperature or the solar altitude by decreasing the R parameter and increasing the RMS error, as well as the number of iterations, so grid 10 was chosen before grid 16. In conclusion, the network to model the losses without using the short circuit current was going to include as inputs the panel temperature, the global irradiance, the ambient temperature, and the solar altitude.  Once the input variables to the network were chosen, other results were studied by altering the number of neurons in the hidden layer of the network. Tests were made from 10 to 50 neurons as shown in Table 4 (each row represents a new network with 10 neurons plus), being the last network the one selected. Although it had a longer duration, R values of the order of 0.71 were obtained to work with values higher than 0.77 in the training process, so the network was more adjusted to the data. Figure 4 shows the final network with the selected input variables.  Secondly, soiling losses were modeled by adding the short-circuit current of the unmaintained panel as an input. Figure 5 shows the structure of the initial network.

ANN Model 2
Secondly, soiling losses were modeled by adding the short-circuit current of the unmaintained panel as an input. Figure 5 shows the structure of the initial network.

ANN Model 2
Secondly, soiling losses were modeled by adding the short-circuit current of the unmaintained panel as an input. Figure 5 shows the structure of the initial network.    In this scenario, the working mechanics are the same as in the previous one, adding one more possible entry. To find the variables, 19 tests were made, shown in Table 4, and divided into three blocks. The first block was comprised of network 1 to 8, the second block from 9 to 14, and the third from 15 to the last test. In the first block, the pressure was again dispensed with, since in its absence, better results were obtained. In the second block, in addition to the pressure, the ambient temperature was eliminated, another of the stable variables, as it presented values higher than 0.9 in several of the training phases. Network 16 of the third block was the most suitable for this sector, presenting the smallest mean square errors and the regression coefficients closer to 1, eliminating solar altitude as an input variable. Comparing the three best grids in each block, it can be seen that the most suitable was grid 11. They were better results with respect to grid 16, and with respect to grid 4 there was no noticeable difference, reducing the iterations.
Once the input variables to the network were selected, the work was done with more than 10 neurons, reaching 40 although finally opting for 20 (Table 6). Working with 30 neurons resulted in a greater error, and the network of 40 neurons has no longer duration without a notorious improvement. Therefore, the selected network provided MSE values close to 1% and regression coefficients higher than 0.9. Figure 6 shows the initial network structure.  Finally, the variables chosen for this Model 2 were the panel temperature, irradian relative humidity, solar altitude, and the short-circuit current.

Results
This section will show the final results for the models from the validation phase. this phase, 20% of the total data was worked on. This percentage was chosen because t vast majority of the data must be invested in training, but it is necessary to leave a sign icant part for validation. For validation purposes, the real values were those obtained the sensors and measurements made, whereas the estimates were those from the neu network models. To determine the success of the models, it was necessary to establ certain statistical indicators that prove the models' validity. For this, the statistical MB nMBE, RMSE, and nRMSE were used, in addition to the "r" correlation coefficient.
Before studying the validation phase, Figure 7 shows the calculated losses (not es mated) from July to February. Finally, the variables chosen for this Model 2 were the panel temperature, irradiance, relative humidity, solar altitude, and the short-circuit current.

Results
This section will show the final results for the models from the validation phase. In this phase, 20% of the total data was worked on. This percentage was chosen because the vast majority of the data must be invested in training, but it is necessary to leave a significant part for validation. For validation purposes, the real values were those obtained via the sensors and measurements made, whereas the estimates were those from the neural network models. To determine the success of the models, it was necessary to establish certain statistical indicators that prove the models' validity. For this, the statistical MBE, nMBE, RMSE, and nRMSE were used, in addition to the "r" correlation coefficient.
Before studying the validation phase, Figure 7 shows the calculated losses (not estimated) from July to February.

Results
This section will show the final results for the models from the validation phase. In this phase, 20% of the total data was worked on. This percentage was chosen because the vast majority of the data must be invested in training, but it is necessary to leave a significant part for validation. For validation purposes, the real values were those obtained via the sensors and measurements made, whereas the estimates were those from the neural network models. To determine the success of the models, it was necessary to establish certain statistical indicators that prove the models' validity. For this, the statistical MBE, nMBE, RMSE, and nRMSE were used, in addition to the "r" correlation coefficient.
Before studying the validation phase, Figure 7 shows the calculated losses (not estimated) from July to February.  This graph shows the actual loss data as a whole. In other words, all the data, both from the training phase and the validation phase, are plotted. In this way, it was possible to study which has been the tendency of accumulation of soiling and the periods in which they suffered greater losses. The maximum values were in the range of 12% and 15% and an average of 2.3448%.

Model 1 Results
This section presents the results obtained after modeling the soiling losses with ANN, but without including the short-circuit current of the unmaintained panel as an input variable. Figure 8 compares the real and estimated losses, and the data is in chronological order, so it is also possible to see how soiling has been accumulating from July to February.
The above graph shows how the soiling accumulated with upward trends as well as sudden drops; these were probably due to rain on both panels. To be clear, if both panels were totally clean, the losses caused by soiling would be 0. In spite of the ups and downs and the great amplitude, in the middle of the graph, a very pronounced upward trend could be seen (possibly caused by episodes of Saharan soiling) and how, suddenly, the values fell back down, possibly due to rainfall. After that, a stable trend was maintained, synonymous with a soiling-free atmosphere. In this way, it was possible to perceive those periods during which there was more accumulated soiling, as well as the period in which it was the highest. Consequently, the modeling has proven satisfactory, even in the absence of the short-circuit current. Figure 9 shows a scatter plot where the real losses are represented against the estimated losses.

Model 1 Results
This section presents the results obtained after modeling the soiling losses with ANN, but without including the short-circuit current of the unmaintained panel as an input variable. Figure 8 compares the real and estimated losses, and the data is in chronological order, so it is also possible to see how soiling has been accumulating from July to February. The above graph shows how the soiling accumulated with upward trends as well as sudden drops; these were probably due to rain on both panels. To be clear, if both panels were totally clean, the losses caused by soiling would be 0. In spite of the ups and downs and the great amplitude, in the middle of the graph, a very pronounced upward trend could be seen (possibly caused by episodes of Saharan soiling) and how, suddenly, the values fell back down, possibly due to rainfall. After that, a stable trend was maintained, synonymous with a soiling-free atmosphere. In this way, it was possible to perceive those periods during which there was more accumulated soiling, as well as the period in which it was the highest. Consequently, the modeling has proven satisfactory, even in the absence of the short-circuit current. Figure 9 shows a scatter plot where the real losses are represented against the estimated losses. Although not a perfect correlation, the two represented variables followed a positive linear trend. The point cloud follows this trend to a greater extent even if there are numerous points that move away from it, indicating a slight overestimation of the cases. The cloud is denser at the beginning, where most of the values are concentrated. This means that most losses were found in the range from 0% to 6%, approximately, with the point cloud being diluted at higher values. Table 7 presents the parameters obtained from the Although not a perfect correlation, the two represented variables followed a positive linear trend. The point cloud follows this trend to a greater extent even if there are numer-ous points that move away from it, indicating a slight overestimation of the cases. The cloud is denser at the beginning, where most of the values are concentrated. This means that most losses were found in the range from 0% to 6%, approximately, with the point cloud being diluted at higher values. Table 7 presents the parameters obtained from the network validation. The table shows how the MBE and nMBE values determined that the model had not been overestimated, given that their values were close to zero. Likewise, the RMSE value and its normalization showed a 1.60% error in losses due to soiling, while the normalized error in percentage terms shows that it was below 12% for all the cases analyzed. Nevertheless, this standardized parameter was still within fairly acceptable limits since the estimate was considered good at between 10% and 20%. The result of the "r" correlation coefficient was very satisfactory, with a strong relationship between the two losses.

Model 2 Results
In the search for an optimal result to moderate these losses, the short-circuit current was incorporated into the neural network as a possible input variable. Figure 10 compares the actual and estimated soiling losses with ANN of a photovoltaic panel. The graph shows how the network more efficiently estimated the losses by using the short-circuit current as an extra input. The settings were now finer, and a further adjustment of the soiling accumulation trends can be seen. The soiling accumulation overtime was still visible, with continuous rises and falls, as well as two large peaks in the middle of the graph. Therefore, during this period, the soiling accumulated to a greater extent than in previous scenarios, probably as a result of suspended soiling that had finally precipitated onto the panels. Figure 11 shows the scatter plot comparing the real losses and the estimated losses with ANN.
This was a positive correlation because it was upward and linear, with most points The graph shows how the network more efficiently estimated the losses by using the short-circuit current as an extra input. The settings were now finer, and a further adjustment of the soiling accumulation trends can be seen. The soiling accumulation overtime was still visible, with continuous rises and falls, as well as two large peaks in the middle of the graph. Therefore, during this period, the soiling accumulated to a greater extent than in previous scenarios, probably as a result of suspended soiling that had finally precipitated onto the panels. Figure 11 shows the scatter plot comparing the real losses and the estimated losses with ANN. Comparing with other models in the literature, (Laarabi et al., 2019) [32] used Iglo, wind speed and direction, Tamb, RH, and rainfall to create a neural network and see the effect of soiling in Morocco, resulting in a neural network with a total of 35 hidden layers, where the RMSE was close to 0.5%, the Mean Absolute Percentage Error (MAPE) greater than 9% and the higher r-value of 0.96. In this sense, the model presented in this work needs two fewer variables (equivalent to a wind sensor), and the number of hidden neurons is reduced, making its execution more efficient and where the results are quite similar.

Discussion
The need to optimize photovoltaic plants makes soiling accumulation an important variable. It has been shown to have a significant effect on performance and to cause numerous losses; for this reason, it should not go undetected. Knowing these losses provides the opportunity to propose efficient maintenance without exhausting water or unnecessary resources since the final objective is to obtain the cleanest energy possible.
Performing maintenance on one panel but not another is a good way of comparing and studying any possible differences. This is what was done using an experimental photovoltaic plant located at CIESOL. Several sensors and various meteorological variables were measured in order to study their effect on soiling. In this work, only two panels were used to characterize dust losses. The panels had been positioned at 22°, oriented to the South.
Artificial neural networks were used to estimate the losses caused by soiling contamination on the photovoltaic panels. The correlation of all these variables was studied, and it was concluded that none of the variables had a strong correlation with the losses; thus, there was no linearity between the input and output variables. To construct the model, 80% of the data were used, while the remaining 20% were used for model validation. This was a positive correlation because it was upward and linear, with most points following the same trend. The point cloud is denser in the first part, dispersing at the higher values, showing that the most common loss values are in the range between 0% and 6% losses. The important achievement of the model is that the point cloud is very homogeneous, with a clear linear trend (although some specific cases go beyond the fit line). Table 8 presents the parameters obtained in the network validation. The table shows that the model was neither underestimated nor overestimated. One can also see a very significant improvement in the model error, given that the nRMSE, with a value of less than 7%, signified a 5% improvement over the previous model. Therefore, one can conclude that the best model was that using the short-circuit current of the unmaintained panel to estimate the losses caused by soiling contamination. The "r" correlation coefficient shows a high positive dependence between the two variables, which was excellent.
Comparing with other models in the literature, (Laarabi et al., 2019) [32] used Iglo, wind speed and direction, Tamb, RH, and rainfall to create a neural network and see the effect of soiling in Morocco, resulting in a neural network with a total of 35 hidden layers, where the RMSE was close to 0.5%, the Mean Absolute Percentage Error (MAPE) greater than 9% and the higher r-value of 0.96. In this sense, the model presented in this work needs two fewer variables (equivalent to a wind sensor), and the number of hidden neurons is reduced, making its execution more efficient and where the results are quite similar.

Discussion
The need to optimize photovoltaic plants makes soiling accumulation an important variable. It has been shown to have a significant effect on performance and to cause numerous losses; for this reason, it should not go undetected. Knowing these losses provides the opportunity to propose efficient maintenance without exhausting water or unnecessary resources since the final objective is to obtain the cleanest energy possible.
Performing maintenance on one panel but not another is a good way of comparing and studying any possible differences. This is what was done using an experimental photovoltaic plant located at CIESOL. Several sensors and various meteorological variables were measured in order to study their effect on soiling. In this work, only two panels were used to characterize dust losses. The panels had been positioned at 22 • , oriented to the South.
Artificial neural networks were used to estimate the losses caused by soiling contamination on the photovoltaic panels. The correlation of all these variables was studied, and it was concluded that none of the variables had a strong correlation with the losses; thus, there was no linearity between the input and output variables. To construct the model, 80% of the data were used, while the remaining 20% were used for model validation.
The modeling was performed using artificial neural networks, studying which variables were the best as possible inputs, as well as finding out the number of neurons with which best results were obtained. In this context, two different scenarios were defined. In the first scenario, soiling losses were estimated using the module temperature, the irradiance on the array plane, the ambient temperature, and the solar altitude. The results obtained showed that the nMBE was close to 0% whereas the nRMSE was less than 12%. In the case of the "r" correlation coefficient, the value corresponded to 0.77, which showed a strong correlation between the real and estimated losses.
In the second scenario, the same soiling loss was modeled based on the module temperature, the irradiance at the array plane, the relative humidity, the short circuit current, and the solar altitude. The results show a very significant improvement compared to the model without Isc, obtaining an nMBE of 0% and an nRMSE value of approximately 6.80%. The "r" correlation coefficient had a value higher than 0.90, showing how successful the model is.
It could be said that for both networks, the input variables were the same, although they replaced the ambient temperature with the relative humidity in model 2 and add the short circuit current. Therefore, it is intuitive that the variables needed to estimate these losses are the temperature in the panel, the irradiance in the plane of the array, and the solar altitude, including the short circuit current in model 2. The other variables are more stable and provide less information, and therefore they are more dispensable.
If two more modern panels are used, it is likely that the power will vary, even the Isc will surely vary, but the aim is to have the ratio between the clean and the dirty panel. Regardless of the values achieved, the ratio will give a value of difference and would be similar whatever the technology unless the panels used some antisoiling coating; but in this case, the price of PV panels cannot be compared with conventional PV panels.
Consequently, the novel technology presented in this work demonstrates that a satisfactory model has been developed to estimate the losses caused by soiling, a factor that is important and relevant in these systems.
As a suggestion for possible future work, a study could be carried out to relate the influence of solar altitude, or time of day, with soiling losses. Also, it is expected to be able to contemplate the degradation of the panels for future studies, as well as the possible extrapolation of the method to other plants and geographies. It is worth mentioning that it is highly desirable to find the correlation between the instantaneous losses and the power losses affecting an industrial plant. Furthermore, a novel approach would be to conduct a socio-economic study on the effect of soiling and the importance of cleaning photovoltaic solar systems in order to optimize the economic performance of these systems and make them more profitable.