Genetic Algorithm Methodology for the Estimation of Generated Power and Harmonic Content in Photovoltaic Generation

: Renewable generation sources like photovoltaic plants are weather dependent and it is hard to predict their behavior. This work proposes a methodology for obtaining a parameterized model that estimates the generated power in a photovoltaic generation system. The proposed methodology uses a genetic algorithm to obtain the mathematical model that best ﬁts the behavior of the generated power through the day. Additionally, using the same methodology, a mathematical model is developed for harmonic distortion estimation that allows one to predict the produced power and its quality. Experimentation is performed using real signals from a photovoltaic system. Eight days from di ﬀ erent seasons of the year are selected considering di ﬀ erent irradiance conditions to assess the performance of the methodology under di ﬀ erent environmental and electrical conditions. The proposed methodology is compared with an artiﬁcial neural network, with the results showing an improved performance when using the genetic algorithm methodology.


Introduction
The smart grid concept involves the inclusion of renewable sources of electric generation and the use of devices for control and communication, leading to a more efficient and reliable electric supply for final users [1]. Among all renewable energies, solar photovoltaic (PV) is the one with the highest growth, reaching an installed worldwide capacity of 402 GW in 2017 [2]. Notwithstanding, there are some challenges and disadvantages associated with photovoltaic generation, for instance, the generated power is unpredictable because it is highly dependent on environmental factors like the incident solar irradiance and the temperature [3,4]. Additionally, it has been reported that the inclusion of PV generation is associated with harmonic contamination [5,6], because it is necessary to use power inverters for converting the DC signals delivered by the PV panels into AC signals, which can be used by the final users. It is necessary to note that even small variations in the background conditions may lead to a severe harmonic contamination, making the constant detection of these variations essential [7]. From the smart grid point of view, these are major issues, as the variability makes the generation process unreliable, and the harmonic content compromises the quality of the power supply. In this sense, several methodologies have been reported for the proper measurement and identification of harmonics. These methodologies use signal processing techniques like discrete Fourier transform (DFT) [8], chirp-z transform [9], phasor measurement units [10], Kalman filter based techniques [11], and discrete wavelet transform [12]. Although these methodologies offer a solution, they do not provide information regarding the source of the harmonic currents. Moreover, they cannot provide a model for describing or estimating the behavior of the harmonic content under different operating conditions.
In terms of PV generation, there are works focused on the prediction of the power delivered by a PV system [13]. All these methodologies aim to develop models for describing the PV system's production using deterministic or probabilistic techniques [14,15]. However, methods for irradiance and power forecasting based on artificial intelligence have recently gained popularity, being support vector machines [16], support vector regression [17], and artificial neural networks (ANN) [18][19][20] the most common techniques used for this purpose. These methodologies can predict the behavior of any variable on a daily or hourly basis. Nevertheless, these works only perform an estimation of the generated power, and do not deliver information regarding the quality of the generation. Additionally, the use of other metaheuristic methodologies (a set of procedures that do not follow a formal mathematical model but a series of empirical rules applied to the search of an optimal value), like genetic algorithms (GA), has not been studied. This technique can be used for obtaining a mathematical model that describes the behavior of electrical variables in PV generation. Furthermore, the use of GA could help to achieve a parameterized model, in a multioptimization scheme, which describes the behavior of any variable in the PV process even when this behavior presents nonlinear characteristics. Moreover, the electrical performance cannot be described using only one parameter, and the GA allows one to obtain several parameters in the same operation.
Some works have included these metaheuristic techniques for solving problems in which the performance in the PV generation needs to be improved. For instance, maximum power point tracking (MPPT) is a problem in which classical techniques meet difficulties when partial shading conditions (PSC) provide the PV system with a nonmonotonic characteristic of the power-voltage curve [21][22][23]. Therefore, solutions based on particle swarm optimization (PSO), ant colony optimization (ACO), and simulated annealing (SA) have been proposed. However, some of the limitations include the initial parameters required for the optimization, the amount of data required from the PV grid for a correct search, and the validation through controlled simulations without considering real variations of external factors. In other works, the GA and the PSO are used to define the best topology of electrical devices for optimizing commercial building microgrids [24]; yet, these solutions need to be adapted for the particular microgrid and the specific user requirements. Finally, metaheuristic techniques have been used in the identification of model parameters of PV generation systems for simulation and design purposes, such as the crow search algorithm (CSA) [25] and the whale optimization algorithm (WOA) [26]. Nevertheless, in both cases, the validation of the estimation of the mathematical model is performed through simulations. Therefore, the use of these modern optimization techniques might be helpful if applied to explore the PQ analysis of PV generation systems in the predictive modeling field. Thus, there exists a need for developing methodologies that help in the optimization of models for the prediction of the generation process, including its quality.
This work proposes the use of GA for parameterizing mathematical models that can estimate the power delivered by a PV system and the level of distortion in the voltage signals associated with harmonic contamination. The use of four different parameters to fit the mathematical models is proposed: sun irradiance, cell temperature, DC voltage, and DC current. The proposed methodology was applied to signals from a real PV generation plant. Experimentation was performed during a year, and a significant sample of 8 days was taken for its analysis. These days were selected to be representative cases of the four seasons of the year with different weather conditions. Results prove that sun irradiance, cell temperature, DC voltage, and DC current can describe the behavior of the power delivered by the PV inverter, and also the behavior of the total harmonic distortion (THD) in PV generation. Moreover, the resulting models were compared with the results from an ANN, which is one of the most common techniques in forecasting tasks. It is demonstrated that the estimations performed by the GA are better than those delivered by the ANN.

Theoretical Background
This section introduces the theoretical background for the implementation of the proposed methodology.

Active Power Definition
The standard IEEE std. 1459-2010 [27] defines the active power (P) as the average value of the instantaneous power during a time interval τ, as shown in (1): where is the instantaneous power. Equation (1) can be discretized resulting in (2): where N is the number of samples comprising the time interval τ ; and v(k) and i(k) are the k -th element of the voltage and current signals, respectively.

Total Harmonic Distortion (THD)
According to the standard IEEE 519-2014 [28], harmonics are sinusoidal components in voltage signals that have frequencies that are integer multiples of the fundamental frequency component. Harmonic components are related to waveform distortion and the level of distortion due to harmonics is quantified using the THD index. The mathematical expression for THD is presented in (3): where P 1 is the power of the fundamental component and P h is the power of the harmonic component of order h. The power of the harmonic components is found using the Fourier transform. The THD calculation must be performed using a 200 ms time window, i.e., 12 cycles for 60 Hz power systems or 10 cycles for 50 Hz power systems.

Evolutionary-Based Algorithm
Genetic algorithms (GA) are used in this work since they can be easily adjusted to this particular problem with some advantages over other evolutionary-based algorithms. For instance, GA keep a population of potential solutions, whereas other techniques work with a single variable. Another benefit is their concept simplicity and their easiness of implementation [29]. GA are a set of elements based on Darwin's theory of survival of the fittest. The following five features, applied in an iterative process, allow the GA to be a functional optimization search: (i) design variable coding; (ii) objective function and fitness value; (iii) selection mechanism and genetic operators; (iv) crossover; and (v) mutation [30]. It is worth understanding how the technique works. In natural genetics, a group of chromosomes (or individuals) make up the population to be evolved (converged); therefore, each individual is integrated by genes, which represent the design variables to be optimized (or searched for).
According to [29], the next steps illustrate a general form in which GA are implemented: Step 1: Definition of general parameters, according to the problem to be solved, and generation of a random initial population (potential solutions).
Step 2: Evaluation of the population by substituting the potential solution in an objective function that calculates the fitness value, which evaluates quantitatively how good every individual is.
Step 3: Performance of an elitist selection of the best individuals according to the fitness value for reproduction purposes with the genetic operators: crossover and mutation.
Step 4: Evaluation of the termination criteria for the iterative process, and if they are satisfied, then go to Step 8, where the best solutions are presented; otherwise, go to Step 5.
Step 5: Generation of a new population by applying the crossover operation to the selected individuals in Step 3. This ensures evolution (convergence) of the possible solutions.
Step 6: Generation of population diversity, and local trapping scape, by applying the mutation operation according to a mutation probability to avoid losing valuable information.
Step 7: Replacement of the initial population with the new population obtained through Steps 5 and 6 and go to Step 2.
Step 8: Return the best solutions obtained.

Parameterization Dataset and Experimentation Dataset
In this work, two different datasets are used: one for parameterization and one for experimentation. The parameterization dataset takes into account a year from which two different days per every month are selected, comprising a total of 24 days. The criteria for selecting the two days of each month are as follows: one day with almost no irradiance variations associated with cloud presence; and a second day with cloud presence that generates unexpected variations in the irradiance profile. This dataset is only used for the estimation of the parameters of every model (active power and THD). Once the parameters of the model are estimated, a different dataset is used for experimentation. On the other hand, 8 days, taken through the year, are analyzed to form the experimentation dataset. Each one of these days is different from those selected for the parameterization dataset. This selection is performed considering the following: two representative days per season of the year are selected to assess the model under different environmental conditions. Moreover, for each season, one day with only a few clouds through the day (or none if it is possible) is selected. For the second day of each season, one with many abrupt irradiance variations due to cloud presence is selected. In this way, it is possible to assess the performance of the methodology under different scenarios.

Genetic Algorithm for the Parameter Estimation
A general block diagram of the proposed methodology is presented in Figure 1. First, a GA scheme is used for the active power forecasting of a PV inverter. This scheme requires the following inputs: sun irradiance, cell temperature, DC voltage, DC current, an active power signal, and the non-parameterized mathematical model, which works as the objective function. These descriptive parameters are selected considering that PV panels deliver energy in DC levels depending on the environmental factors. The objective function for this task is presented in (4). This mathematical model is selected because it is observed that the relationship between the solar irradiance and the active power tends to be proportional [31,32].
where P i is the i-th value of the estimated active power; x 1,i is the i-th value of the sun irradiance; x 2,i is the i-th value of the cell temperature; x 3,i is the i-th value of the DC voltage signal; x 4,i is the i-th value of the DC current signal; and w 1 , w 2 , w 3 , and w 4 are constant weights determined by the GA.
Every weight in (4) specifies the level of contribution of each variable in the description of the behavior at the response function. It must be said that the relationship between the descriptive parameters and the obtained result is not linear in all cases, but in this work, we propose to develop a model that linearizes this relationship in order to allow for a simple solution to the problem. It is clear that this situation introduces an error in the estimation; however, the GA must be able to find the weights that minimize this error in order to obtain accurate results. Once all the inputs have been defined, the next step consisted of the initialization of the GA. For this particular case, the parameters that must be defined by the GA are w 1 , w 2 , w 3 , and w 4 . Therefore, a random initial population of 50 individuals is generated for each weight, this way a good design space distribution is ensured. This is followed by an evaluation of the population, which consists of substituting the value of each individual on the objective function and determining the error with respect to the active power signal.
Appl. Sci. 2020, 10, x FOR PEER REVIEW 5 of 13 clear that this situation introduces an error in the estimation; however, the GA must be able to find the weights that minimize this error in order to obtain accurate results. Once all the inputs have been defined, the next step consisted of the initialization of the GA. For this particular case, the parameters that must be defined by the GA are , , , and . Therefore, a random initial population of 50 individuals is generated for each weight, this way a good design space distribution is ensured. This is followed by an evaluation of the population, which consists of substituting the value of each individual on the objective function and determining the error with respect to the active power signal.  (4) Subsequently, a selection process is carried out. For this process, the obtained error for each individual is used to organize them from the best fitting (the individual with the least error) to the worst fitting (the individual with the highest error). At this point, it is necessary to evaluate if the stop criterion has been reached; hence, the best fitting individual is taken as the solution. Otherwise, it is necessary to generate a new population. In this work the stop criterion is a maximum number of epochs set as 500, because it is experimentally observed that this number ensures the best convergence of the model parameters. To generate the new population, the individual with the lowest error is preserved, in this manner, the convergence of the algorithm is granted. The rest of the individuals in the new population are obtained through two different genetic operations: the crossover and the mutation. The crossover operation consists of the average of the best fitting individual with the rest of the individuals, one at a time, as shown in (5): where 2, 3, … ,50; , is the i-th individual of the old population; is the best fitting individual; and , is the i-th individual of the new population. The mutation operation implies a random substitution of a particular individual of the new population. The individual is substituted if a random value lies below the mutation probability (0.2 in this work) to maintain diversity in the population, but without losing valuable genetic information. Once the new population is obtained, the process is repeated from the evaluation step as long as the stop criterion is not reached. When the stop criterion is reached, the GA delivers the estimated parameterized model for the active power forecasting. In the case of the THD prediction, Subsequently, a selection process is carried out. For this process, the obtained error for each individual is used to organize them from the best fitting (the individual with the least error) to the worst fitting (the individual with the highest error). At this point, it is necessary to evaluate if the stop criterion has been reached; hence, the best fitting individual is taken as the solution. Otherwise, it is necessary to generate a new population. In this work the stop criterion is a maximum number of epochs set as 500, because it is experimentally observed that this number ensures the best convergence of the model parameters. To generate the new population, the individual with the lowest error is preserved, in this manner, the convergence of the algorithm is granted. The rest of the individuals in the new population are obtained through two different genetic operations: the crossover and the mutation. The crossover operation consists of the average of the best fitting individual with the rest of the individuals, one at a time, as shown in (5): where i = 2, 3, . . . , 50; y i, old is the i-th individual of the old population; y 1 is the best fitting individual; and y i,new is the i-th individual of the new population.
The mutation operation implies a random substitution of a particular individual of the new population. The individual is substituted if a random value lies below the mutation probability (0.2 in this work) to maintain diversity in the population, but without losing valuable genetic information. Once the new population is obtained, the process is repeated from the evaluation step as long as the stop criterion is not reached. When the stop criterion is reached, the GA delivers the estimated parameterized model for the active power forecasting. In the case of the THD prediction, the aforementioned process is carried out with only two differences. The first is related to the objective function. For the THD model, Equation (6) is used: where H i is the i-th value of the estimated THD; x 1,i is the i-th value of the sun irradiance; x 2,i is the i-th value of the cell temperature; x 3,i is the i-th value of the DC voltage signal; x 4,i is the i-th value of the DC current signal; ,i x 4,i ; and u 1 , u 2 , u 3 , u 4 , u 5 , u 6 , u 7 , u 8 , u 9 , u 10 , u 11 , u 12 , u 13, and u 14 are the constant weights determined by the GA.
The objective function for the THD forecasting is selected in this way because Equation (3) is an expression that involves quadratic terms; therefore, these types of terms should be considered in the model for the estimation. Moreover, electric parameters are not conventionally related with the THD. However, in this case, the voltage and currents signals represent an important part of the internal behavior of the PV system and this is why their use in the estimation of THD is being proposed only in the PV generation process. To obtain the THD prediction, the GA is trained using THD testing signals. With these two modifications (objective function and test signal), it is possible to perform the THD prediction. The methodology is performed a total of 24 times, one for each day of the parameterization dataset. The final weights for the two models are the average of these 24 results. Finally, the proposed methodology is used to perform the forecasting of the active power and the THD of eight different days from those used for the estimation of the parameterized model. These days come from four different seasons of the year, and they present different weather conditions from each other. This way, it is possible to evaluate the variability associated with the specific climatic changes of each season, but also the variability that results from the lack of sunlight when there are clouds in the sky.

Piecewise Approach of the Proposed Methodology
Since it has been demonstrated that a PV inverter has an anomalous behavior when it operates in regions far from its rated values [33], the described methodology is applied considering the signals in a piecewise approach. Therefore, the design variables and the objective function are divided into four sections through a day. Considering that the maximum expected irradiance value is around the 1000 W/m 2 , a threshold of the 20% of this value is set for defining two regions: one at the beginning and another at the end of the day. These two sections are named S1 and S4, and they are the regions where the values of the irradiance are below the threshold value, i.e., they represent the low power operation regions of the PV inverter. The remaining data of the signals are divided into halves to obtain two other sections called S2 and S3. This way, a total of four sections (S1, S2, S3, and S4) are obtained and a different set of weights is estimated for each one.

Experimental Setup
Experimentation was performed with real signals coming from a PV generation plant located in central Spain, at a latitude of 39 • 36'N and a longitude of 02 • 05'W. Measurements were performed in a 100 kW installation that uses an Ingecon Sun 100 solar inverter. Herein, a set of polycrystalline silicon PV panels on the DC side of the PV inverter was located, which delivered a peak power of 125 kW. The PV panels are south facing and present a tilt angle β = 45 • . The global irradiance that reaches the PV panels was measured using a calibrated PV reference cell that poses the same orientation and tilt angle as the PV panels. The data from both sides of the PV inverter were acquired and collected using a proprietary FPGA-based (Field Programmable Gate Array) data acquisition system (DAS). The DAS can acquire data from seven simultaneous channels at 8000 samples per second (SPS) with a 16-bit resolution. The DAS on the DC side was designed for measuring voltages up to 1000 V and for working with any current clamp that delivers a ±4 V output. On this side, only two channels are required: one for the voltage and another one for the current. The clamp used for the current measurement is the HOP 500-SB/SP1 by LEM. On the AC side, six channels of the DAS are required to measure the voltage and current of the three phases. Since the PV inverter operates in a low voltage grid, the DAS on this side was designed for measuring voltages up to 400 Vrms, and for using any current clamp with a ±2 V output. The current was taken from a measurement transformer, so SCT-013-010 sensors by YHDC were used with the DAS. All the voltage signals were acquired using wires directly connected from the PV inverter to the DAS. The voltage and current waveforms were stored during an extended time using a standard 128 GB micro SD card.

Results and Discussion
This section presents the results of applying the proposed methodology for the model parameterization and the experimentation for the active power and THD estimation.

Parameterization Results
Firstly, the proposed methodology was applied for parameterizing the active power model and the THD model, using the piecewise approach previously described. In Table 1, the weights w 1 , w 2 , w 3 , and w 4 , delivered by the GA using the 24 days of the parameterizing dataset for the four sections of the active power estimation, are summarized. It is observed in Table 1 that the weight values are very different from one section to another; thus, it can be inferred that using the whole signal, instead of sectioning it, may compromise the accuracy of the results. Additionally, the resulting 14 weights for the four sections of the THD signals are presented in Table 2. Once again, the parameterization dataset was also used to obtain these weights. Just as in the previous case, all the values significantly vary from one section of the THD signal to another, confirming the fact that the piecewise approach implemented in this work helps to increase the reliability of the methodology. As aforementioned, the behavior of the PV system is different depending on the operating conditions [33], i.e., the behavior of the system is nonlinear. This nonlinearity is the main reason for the variation in the weights presented in the different sections of the estimated models.

Experimentation Results
In order to demonstrate that the proposed methodology delivers similar and uniform output data, it is necessary to perform a comparison with 2 sunny days and 2 cloudy days of the same month. In this sense, 4 days were randomly selected from the month of August, because in this month it is easy to find suitable days that match these characteristics (sunny and cloudy). The weights presented in Table 1 were used to estimate the delivered active power of these 4 days, and the results are presented in Figure 2.

Experimentation Results
In order to demonstrate that the proposed methodology delivers similar and uniform output data, it is necessary to perform a comparison with 2 sunny days and 2 cloudy days of the same month. In this sense, 4 days were randomly selected from the month of August, because in this month it is easy to find suitable days that match these characteristics (sunny and cloudy). The weights presented in Table 1 were used to estimate the delivered active power of these 4 days, and the results are presented in Figure 2. From Figure 2, it is observed that when the days are sunny, the estimated and the real active power are very similar (see Figure 2a,b), reaching an estimation error of 0.1% for 13 August and 0.37% for 14 August. The difference between both errors is less than 0.3%, proving the consistency of the results delivered by the methodology. In the case of the cloudy days (Figure 2c,d), the cloudy From Figure 2, it is observed that when the days are sunny, the estimated and the real active power are very similar (see Figure 2a,b), reaching an estimation error of 0.1% for 13 August and 0.37% for 14 August. The difference between both errors is less than 0.3%, proving the consistency of the results delivered by the methodology. In the case of the cloudy days (Figure 2c,d), the cloudy nature of the days causes a high variability in the generated power. However, the methodology is able to follow most of the variations in a reasonable way reaching an error of 0.39% for 15 August and 0.56% for 17 August. This time the difference between both errors is even lower than in the previous case (<0.2%). In this way, it is possible to validate the similarity and uniformity of the resulting data for days with similar characteristics.
As aforementioned, 8 days were used for experimentation with the proposed methodology in the piecewise approach. These days are presented in Figure 3 with the corresponding sections of the day (S1 to S4). The values of the irradiance are overall normalized in order to better appreciate the average tendency of every season of the year. It can be seen that the highest values are reached in spring, but they are sudden and last only a few minutes. The lowest values occur in winter; this situation is mainly due to the elevation of the sun in this season being the lowest of the year. Additionally, it is observed that there is a great variability between them because of the different environmental conditions. The first season presented in Figure 3 is summer, characterized by an abundance of sun and a few cloudy periods. The first day of this season is a completely sunny day (there is none irradiance variation), whereas the second day presents some unexpected variations (due to clouds) from the 12:00 to the 15:00 (see Figure 3a). The second season is autumn; here, the first day presents a few variations associated with cloud presence, whereas the second presents a storm during the second half of the day resulting in many abrupt irradiance variations (Figure 3b). The third season presented in the figure is winter; here, the first day ( Figure 3c) presents a profile very similar to the first day of summer. However, it is observed that the total sunny hours are lower in winter. The highest irradiance level reached is also lower in winter than summer. The second day of winter shows a very irregular pattern because there is cloud presence during the entire day. Finally, the fourth and last presented season is spring. The first day presents a normal pattern of behavior for most of the day, but around the 17:00 it becomes cloudy and some variations in irradiance appear (Figure 3d). The second day of spring presents an erratic behavior since there are clouds all day. The weight values presented in Table 1 were substituted into (4) along with the irradiance, cell temperature, DC voltage, and DC current from every one of the sections.
Appl. Sci. 2020, 10, x FOR PEER REVIEW 9 of 13 previous case (<0.2%). In this way, it is possible to validate the similarity and uniformity of the resulting data for days with similar characteristics. As aforementioned, 8 days were used for experimentation with the proposed methodology in the piecewise approach. These days are presented in Figure 3 with the corresponding sections of the day (S1 to S4). The values of the irradiance are overall normalized in order to better appreciate the average tendency of every season of the year. It can be seen that the highest values are reached in spring, but they are sudden and last only a few minutes. The lowest values occur in winter; this situation is mainly due to the elevation of the sun in this season being the lowest of the year. Additionally, it is observed that there is a great variability between them because of the different environmental conditions. The first season presented in Figure 3 is summer, characterized by an abundance of sun and a few cloudy periods. The first day of this season is a completely sunny day (there is none irradiance variation), whereas the second day presents some unexpected variations (due to clouds) from the 12:00 to the 15:00 (see Figure 3a). The second season is autumn; here, the first day presents a few variations associated with cloud presence, whereas the second presents a storm during the second half of the day resulting in many abrupt irradiance variations (Figure 3b). The third season presented in the figure is winter; here, the first day ( Figure 3c) presents a profile very similar to the first day of summer. However, it is observed that the total sunny hours are lower in winter. The highest irradiance level reached is also lower in winter than summer. The second day of winter shows a very irregular pattern because there is cloud presence during the entire day. Finally, the fourth and last presented season is spring. The first day presents a normal pattern of behavior for most of the day, but around the 17:00 it becomes cloudy and some variations in irradiance appear (Figure 3d). The second day of spring presents an erratic behavior since there are clouds all day. The weight values presented in Table 1 were substituted into (4) along with the irradiance, cell temperature, DC voltage, and DC current from every one of the sections. Then, the estimation of the active power for the 8 days of analysis was carried out. In order to show the effectiveness of the proposed methodology, the performed estimation is compared with the estimation performed with an ANN, which is the most common technique for this purpose. The Then, the estimation of the active power for the 8 days of analysis was carried out. In order to show the effectiveness of the proposed methodology, the performed estimation is compared with the estimation performed with an ANN, which is the most common technique for this purpose. The ANN was trained using the same 24 days that were used for estimating the parameterized model with the GA. Figure 4 shows the results of the active power estimation using these two techniques. It is observed that in the 8 days, the value estimated by the GA (red line) remains very close to the real value (blue line). Although there are days with many unexpected variations like 13 November (Figure 4d), 10 February (Figure 4f), and 21 March (Figure 4h), the obtained model can follow every variation reasonably. In the days where there are no abrupt changes, the difference between the estimated and the real value is almost imperceptible. When an ANN is used for the active power forecasting, the results are also good (yellow line). The prediction accurately follows the behavior shown by the real signal on most of the days. However, it is observed that for 9 January (Figure 4e), the estimation performed by the ANN presents a noticeable deviation from the real signal. In this sense, the estimation performed by the GA does not present this deviation making it a more reliable technique.  (Figure 4h), the obtained model can follow every variation reasonably. In the days where there are no abrupt changes, the difference between the estimated and the real value is almost imperceptible. When an ANN is used for the active power forecasting, the results are also good (yellow line). The prediction accurately follows the behavior shown by the real signal on most of the days. However, it is observed that for 9 January (Figure 4e), the estimation performed by the ANN presents a noticeable deviation from the real signal. In this sense, the estimation performed by the GA does not present this deviation making it a more reliable technique. On the other hand, the weights from Table 2 were substituted into (6) to obtain the mathematical model for the THD forecasting. Once again, the result of the estimation performed by the GA is compared with the estimation performed by an ANN. The results of using these models for estimating the THD through the day are depicted in Figure 5. In the days with almost no cloud presence (Figure 5a,c,e,g), the values estimated by the GA (red line) present just a few deviations from the real ones (blue line). In the days with severe cloud presence, the deviation between the estimated values and the real values is more noticeable, i.e., the error in the estimation increases. However, when an ANN is used for the THD prediction, it is noticeable that the network does not make a good estimation even on days when the sky is clear (yellow line). Thus, it can be inferred that the ANN presents problems for estimations where the relationship among the parameters is nonlinear. The error values for both techniques (GA and ANN) are summarized in Table 3. It is worth noting that the errors, for the case of the active power estimation, using the GA always remains in values lower than 1%. As expected, the estimation performed by the ANN presents the highest error on 9 January, but also 10 February, and 13 November present errors above 1%. Meanwhile, in the case of the THD estimation, the errors obtained indicate that when using the GA, the estimation error never goes beyond 2%. Regarding the results obtained using the ANN, they are very variable, and the errors fluctuate between 3.2% and 33.2%, showing that the ANN is not as reliable as the GA for this particular estimation. The results obtained using the proposed methodology are meaningful because they show that the proper combination of sun irradiance, cell temperature, DC voltage, and DC current can reasonably estimate the active power of a whole production day in PV systems. However, a more important fact is that the proposed approach can also describe the THD associated with the generation process. Having a priori knowledge of the quality of a generation process is important from the smart grid point of view in order to guarantee a On the other hand, the weights from Table 2 were substituted into (6) to obtain the mathematical model for the THD forecasting. Once again, the result of the estimation performed by the GA is compared with the estimation performed by an ANN. The results of using these models for estimating the THD through the day are depicted in Figure 5. In the days with almost no cloud presence (Figure 5a,c,e,g), the values estimated by the GA (red line) present just a few deviations from the real ones (blue line). In the days with severe cloud presence, the deviation between the estimated values and the real values is more noticeable, i.e., the error in the estimation increases. However, when an ANN is used for the THD prediction, it is noticeable that the network does not make a good estimation even on days when the sky is clear (yellow line). Thus, it can be inferred that the ANN presents problems for estimations where the relationship among the parameters is nonlinear. The error values for both techniques (GA and ANN) are summarized in Table 3. It is worth noting that the errors, for the case of the active power estimation, using the GA always remains in values lower than 1%. As expected, the estimation performed by the ANN presents the highest error on 9 January, but also 10 February, and 13 November present errors above 1%. Meanwhile, in the case of the THD estimation, the errors obtained indicate that when using the GA, the estimation error never goes beyond 2%. Regarding the results obtained using the ANN, they are very variable, and the errors fluctuate between 3.2% and 33.2%, showing that the ANN is not as reliable as the GA for this particular estimation. The results obtained using the proposed methodology are meaningful because they show that the proper combination of sun irradiance, cell temperature, DC voltage, and DC current can reasonably estimate the active power of a whole production day in PV systems. However, a more important fact is that the proposed approach can also describe the THD associated with the generation process. Having a priori knowledge of the quality of a generation process is important from the smart grid point of view in order to guarantee a reliable and robust supply.
On the other hand, it has been demonstrated that even slight variations in the operating conditions of the PV system may affect the harmonic distortion in the network [7]. This situation can be verified by performing a comparison between On the other hand, it has been demonstrated that even slight variations in the operating conditions of the PV system may affect the harmonic distortion in the network [7]. This situation can be verified by performing a comparison between Figures 4 and 5; for instance, Figure 4b    From Table 3, it is observed that in the case of the THD, both methodologies present the highest estimation error in the month of March. However, the second month with the highest error correspond to November for the GA approach and January for the ANN approach, respectively. In this sense, it is important to mention that there exist other factors (such as the network impedance) that are related to the existing THD, and these factors are not considered in this particular model. Therefore, through the effects of non-considered factor changes, the accuracy of any methodology may be compromised; thus, as has been previously mentioned, the error in the estimation does not necessarily follow the same tendency in both methodologies. This means that the accuracy of the estimation would depend on the robustness of the used methodology to the variation of non-considered parameters. In this case, the GA methodology seems to be more robust to these kinds of variations.

Conclusions
The contribution of the present work is a novel methodology that combines environmental  From Table 3, it is observed that in the case of the THD, both methodologies present the highest estimation error in the month of March. However, the second month with the highest error correspond to November for the GA approach and January for the ANN approach, respectively. In this sense, it is important to mention that there exist other factors (such as the network impedance) that are related to the existing THD, and these factors are not considered in this particular model. Therefore, through the effects of non-considered factor changes, the accuracy of any methodology may be compromised; thus, as has been previously mentioned, the error in the estimation does not necessarily follow the same tendency in both methodologies. This means that the accuracy of the estimation would depend on the robustness of the used methodology to the variation of non-considered parameters. In this case, the GA methodology seems to be more robust to these kinds of variations.

Conclusions
The contribution of the present work is a novel methodology that combines environmental factors and PV generation variables to define two models for forecasting the behavior of the generated power and its harmonic content in a PV generation plant. An optimization technique based on the GA is used for parameterizing the models and it proves to be effective, even when the model presents nonlinear characteristics in its terms. These models show how environmental factors affect the amount of generated power and its quality. Furthermore, the proposed methodology can estimate all the parameters of the models in a single trial. Since any variation in the background conditions of the PV system affects the harmonic contamination of the power grid, it is important to develop methodologies that can deal with this situation. Hence, the proposed methodology proves to be a useful tool for treating these issues. In comparison with other methodologies, like ANN, the proposed approach improves the results obtained since the GA can process data with complex features, such as non-linearity, non-convexity, multioptimization, and a wide searching design space, among others.