An ANFIS-Based Modeling Comparison Study for Photovoltaic Power at Different Geographical Places in Mexico

: In this manuscript, distinct approaches were used in order to obtain the best electrical power estimation from photovoltaic systems located at different selected places in Mexico. Multiple Linear Regression (MLR) and Gradient Descent Optimization (GDO) were applied as statistical methods and they were compared against an Adaptive Neuro-Fuzzy Inference System (ANFIS) as an intelligent technique. The data gathered involved solar radiation, outside temperature, wind speed, daylight hour and photovoltaic power; collected from on-site real-time measurements at Mexico City and Hermosillo City, Sonora State. According to our results, all three methods achieved satisfactory performances, since low values were obtained for the convergence error. The GDO improved the MLR results, minimizing the overall error percentage value from 7.2% to 6.9% for Sonora and from 2.0% to 1.9% for Mexico City; nonetheless, ANFIS overcomes both statistical methods, achieving a 5.8% error percentage value for Sonora and 1.6% for Mexico City. The results demonstrated an improvement by applying intelligent systems against statistical techniques achieving a lesser mean average error.


Introduction
Considerable research has been developed internationally in the field of photovoltaic systems and power generation [1][2][3]. Mexico is a country that receives abundant solar energy, with the northwest region being the one with the highest annual incidence of solar radiation, achieving radiation indexes between 5.6 and 6.2 kWh/m 2 per day; nevertheless, its advances on the photovoltaic field are scarse nowadays, although there are many possibilities of research in this topic [4].
There are many scientific reports on statistical methods to estimate power generation [5][6][7][8]. Multiple Linear Regression (MLR) includes any study area, since it finds an approach to the relation among variables [9].
A satisfactory photovoltaic power estimation involving meteorological variables was carried out in [10] in which a Gradient Descent Optimization (GDO) minimizes the error value between the real and estimated variables after many iterations; even so, it is mentioned that other techniques may obtain better results despite their lower or greater complexity.
On the other hand, intelligent techniques have acquired a worldwide reputation as simple methods to represent and replicate the behavior of a process with a not-so-understood performance. These techniques have potential to model, precisely, linear and non-linear processes, the latter being their strongest application. Some studies have used intelligent techniques to analyze photovoltaic power behavior, e.g., in [11] an artificial neural network (ANN) was used to obtain a model to forecast the photovoltaic energy; however, solar radiation was the only meteorological variable analyzed. In [12], different ANN techniques were used to perform a comparative study of systems predicting photovoltaic thermal energy data; nevertheless, due to solar radiation measurement devices being expensive and requiring periodic maintenance, some results employed global solar radiation (GSR) estimations through ANNs. In [13], it was established that a photovoltaic system is affected by many meteorological conditions; a predictive model was proposed considering outside temperature, solar radiation and direction and speed of wind. In [14], an ANN-Adaptive Neuro-Fuzzy Inference System (ANFIS)-based forecast model for predicting the photovoltaic generation and wind energy generation is presented and considers the susceptibilities to which renewable energies are exposed due to nature's vagaries. In [15], it is proved that an adaptive neuro-fuzzy inference system technique provides a reliable tool to estimate temperature from photovoltaic systems. In [16], the solar still productivity prediction aims to be improved by using an ANFIS due to its simple maintenance and ready affordability.
The importance of achieving a satisfactory model design with a minimum error between estimated and real values is crucial in precision studies or management tasks in which certain differences among them may result in economic problems or loss of information.
As can be seen, diverse statistical and intelligent methods have been applied to estimate photovoltaic power generation. However, this topic still is an open field that requires better estimation methods, in particular, those related to intelligent systems.
Since it is relevant to develop new estimation strategies in photovoltaic systems, the aim of this work is to compare ANFIS methodology with MLR and GDO as statistical approaches by comparing electrical power estimation data from photovoltaic systems [10]. It was observed that all three methods achieve a satisfactory estimation performance, but ANFIS had a better estimation capacity.

Multiple Linear Regression
To create a statistical representative model of the power generated by solar energy, the concept of multiple linear regression was applied. The relation that exists among all the input variables and the output is represented by a "linear regression". Depending on the number of input variables, the regression can be simple or multiple [6,[17][18][19]. The purpose of multiple linear regression is to find an estimation of the real output through an equation involving all the gathered data, as shown in Equation (1).
where y is the estimated output, x k is the kth input variable, β k is the characteristic coefficient of every variable and is the error between the model and the real data.
To obtain Equation (1), a matrix "X" and a vector "y" are generated. The number of columns for "X" is k + 1, with the condition that the first column is filled with ones. The dimension of the columns and the vector "y" depends on the total quantity of data "n". Consequently, Equation (2) is applied to find the respective coefficient values.
The data collected from Hermosillo, Sonora (located in the northwest region of Mexico) were obtained with the support of the University of Sonora (UNISON), while the data from Mexico City were issued by the Centro de Investigación y de Estudios Avanzados (CINVESTAV) campus Zacatenco. Solar radiation, outside temperature, wind speed and daylight hour (time) served as the input meteorological variables, while the electric power was the output for both of the PV systems. Each data sample was registered every 5 min. According to the above, the matrix "X" and the vector "y" from Equation (2) are shown in Equation (3).

Gradient Descent Optimization
The GDO has been used to estimate different behaviors in several studies [20,21]. As seen in [10], this method seeks the optimum β k coefficients now symbolized by θ. To achieve the minimum possible error value in Equation (4), the cost function is used.
where θ = β is as mentioned in Equation (1), x (i) represents the ith row of matrix "X" and y (i) is the value of the ith row of vector "y", both described in Equation (2). It is important to notice that, in this case, the number of columns in matrix "X" is equal to the number of variables involved, omitting the column full of ones. The gradient descent, denoted by Equation (5), aims to converge to the cost function minimum by its partial derivative. The quickness of the convergence is given by α.
Equation (6) represents the substitution of Equation (4) into Equation (5) and it is known as the gradient descent implementation on linear regression method, which has to be repeated n-times until the convergence is done.
Once all the coefficients have been computed, they are substituted into Equation (7) h where h θ (x) is the estimated output, x k is the kth input variable, θ k is the characteristic coefficient for every variable and is the error between the model and the real data.

Intelligent Technique
An intelligent technique consists of a dynamic learning process in order to generate an output as close as possible to the required one. Among the best intelligent systems are the artificial neural network, fuzzy systems and the adaptive neuro-fuzzy inference system [22,23]. For this contribution, an ANFIS is considered combining the strengths of a neural network and fuzzy systems, obtaining a better performance.
An artificial neural network (ANN) imitates the processing of a human brain and it has a great number of processing units (neurons or nodes) working in parallel. These nodes can be circular or squared in order to represent different adaptive capabilities. A circular node employs fixed operations that cannot be altered at any time (sum or product of inputs), unlike a squared node, which can be modified by the user (activation functions).
All the neurons are highly connected among each other through links (synapse) with weights. The network has a layer for all the inputs, a layer for one or more outputs and one or more hidden layers between the input and the output.
A neural network requires previous assumptions so it can learn from examples by adjusting the connection weights. The learning may be supervised if the right output is specified (being the case for this work) or non-supervised if it has to explore the relations between the patterns and learn to categorize the inputs [11,24].
A fuzzy system is able to apply conditional sentences as a human brain would. The objective of every system that uses fuzzy logic is to describe the degrees of the output sentences (given by a series of rules) according to the input ones. The "if-then" fuzzy rules are sentences with the form "if an event A happens then an event B will occur", where both events are known as the labels of fuzzy sets from their corresponding membership functions (MF). The strength of this system is to be able to capture all the imprecise modes of the reasoning, for example: If the pressure is high, then the volume is small where, analogous to the neural network, each and every variable has a specific type of membership function according to its behavior, which can be triangular, trapezoidal, gaussian, etc., as well as to its quantity [25]. Figure 1 shows three trapezoidal membership functions for the variable called "pressure". Each of them represents a range of values labeled as low (blue), medium (red) and high (yellow). This advantage allows to embrace all the possible data and to classify them as necessary. ANFIS is the union of both methods mentioned above, the neural network and fuzzy systems, in which the advantages of both cooperate in order to achieve easier and faster estimations [26].
The ANFIS consists of if-then rules and input-output pairs. Moreover, for the training, the learning algorithms of neural networks are used. In order to simplify the explanations, a basic fuzzy inference system consists of two inputs (x and y) and one output (z) [27]. Figure 2 shows that, if each input consists of three membership functions (represented as squared nodes, because they can be modified by the user), then the input layer would have six neurons (three for each input) and, in turn, the output layer would just have one neuron. As for the hidden layers, the first one of them will be formed by each of the if-then fuzzy rules mentioned before; by this, an ANFIS is created. The variables used to train the ANFIS are radiation, outside temperature, wind speed, daylight hour and electrical power, as mentioned in Section 2.1.1. Figure 3 presents the MF of input variables for the intelligent system. The behavior of each variable helps to identify the type of MF to be declared. Solar radiation, wind speed and daylight hour, due to their rapid change with respect to time, have 4, 3 and 3 triangular MF, respectively. As for temperature, which presents a slow variation with time, it has 3 gaussian MF. According to the above, the structure of the ANFIS is presented in Figure 4. The method applied to train the ANFIS is called hybrid training. This process consists of two steps; the first one computes the result of the next linked node according to the nodes behind, i.e., based on Figure 2 each circular node depends on two rectangular nodes (MF of variables x and y). Once all the nodes have been calculated and the output is obtained, the second step must identify the error in each node and minimize it during the training iterations in order to improve the estimation result. The first and second steps are based on least squares and gradient descent, respectively.
This ANFIS model was applied for both locations by using the MATLAB R software in order to prove its effectiveness against the statistical method, regardless of the place where the system is installed.

Results
Seven different cases or sets of data were computed to analyze which estimation technique executes a better performance for the Hermosillo site (HS) and the Mexico City site (MCS). Every case was represented by data collected during a whole month; consequently, the first case corresponds to six months of collected data, while the seventh case stands for the accumulation of all data. According to Section 2.1.1, the time step of each estimation, regardless of the method, was five minutes. The former was considered to generate a greater amount of data allowing a better estimation result by the ANFIS. Figures 5 and 6 present the physical systems for each location, HS and MCS, respectively.   Table 1 shows the resulting equations for the monthly and overall electrical power estimation by MLR and GDO with 1000 repetitions to minimize Equation (4), considering the data gathered from HS as mentioned in Sections 1 and 2, and by using the MATLAB R software. Figures 7 and 8 present the resulting estimation for each method considering a random period to achieve a reliable comparison and a better appreciation of the behavior (24th November to 24th December). The computational load for the overall case using MLR and GDO resulted in total durations of approximately 30 s and 10 min, respectively.    Table 2 displays the monthly and overall trained parameters applying the ANFIS technique with the structure specified in Section 2.2 by using the MATLAB R software, with a hybrid mode and 1000 iterations. Finally, Figure 9 shows the resulting estimation within the same time range, i.e., total durations of approximately 30 s and 10 min in MLR and GDO, respectively. The computational load for the overall case using this method resulted in a total duration training of approximately 1.5 h.  Each case analyzed contemplates a different range of time (different month) and its respective ANFIS gets trained according to the data gathered within that period. Given that meteorological inputs vary in time, so do the ANFIS coefficients for every case; nevertheless, even if they are changing, the variation range is not too wide, as can be seen from solar radiation, outside temperature and daylight hour. As for wind speed, its variation is due to a smaller relationship with the output than with the other inputs, as well as to the different behaviors in each month.

Mexico City Site
Analogous to the HS case, the same procedure was applied to the data registered from the Mexico City site (MCS) mentioned in Sections 1 and 2, achieving approximately the same time duration to find the beta and theta parameters. For the MLR and GDO methods, Table 3 presents the monthly and overall estimation equations. Figures 10 and 11 depict the resulting MLR and GDO estimation in the period 24th November to 24th December (same as HS) for a better appreciation of the convergence of the method. Jul y = 0.0927 + 53.4273x 1 + 2.2298x 2 + 0.1005x 3 + 0.0491x 4 Ago y = 0.1083 + 56.0178x 1 + 2.5973x 2 + 0.1441x 3 + 0.0580x 4 Sep y = 0.1159 + 59.0818x 1 + 2.7233x 2 + 0.1431x 3 + 0.0625x 4 GDO Oct y = 0.1035 + 57.3927x 1 + 2.2950x 2 + 0.1899x 3 + 0.0560x 4 Nov y = 0.0908 + 57.2318x 1 + 1.9788x 2 + 0.1036x 3 + 0.0463x 4 Dec y = 0.0913 + 53.3033x 1 + 1.8920x 2 + 0.0441x 3 + 0.0477x 4 Total y = 0.1005 + 55.9815x 1 + 2.2910x 2 + 0.1290x 3 + 0.0533x 4  The same ANFIS structure with 1000 epoch hybrid training for the HS case was employed to obtain the estimation model of the photovoltaic system with the data collected from MCS. Table 4 shows the monthly and overall trained MF parameters. Figure 12 displays the comparison between the resulting estimation against real data with the same time range as for MLR and GDO. The time duration to train the neuro-fuzzy system was approximately the same as in the HS case.

Error Analysis
In order to demonstrate the impact of the intelligent technique over statistical approaches, an error analysis was applied for every result obtained in Sections 3.1 and 3.2 for HS and MCS. Considering [10], Mean Absolute Error (MAE) and Mean Absolute Percentage Error (MAPE) were used to achieve a solid comparison between MLR, GDO and ANFIS. MAE is one of the most implemented errors in estimation studies to measure and analyze precision and is described by Equation (8), and MAPE computes the percentage error value as in Equation (9).
where "s" is the sample taken into consideration, "N" is the total amount of data samples gathered, "P m " is the photovoltaic electrical power real measure and "P e " is the estimated value. Table 5 presents the monthly and overall MAE and MAPE for each applied technique in HS. By the analysis of these results, an improvement of GDO over MLR can be observed by achieving a lesses error value; nonetheless, the ANFIS overcomes both MLR and GDO, successfully obtaining lower error values for every time range. By increasing "N" in Equations (8) and (9) from the first to the last data sample, Figure 13 displays graphically the overall MAE and MAPE behavior of each implemented technique, showing and proving a clearly better performance by ANFIS against the statistical methods.   Analogous to Figure 13, Figure 14 displays graphically the overall MAE and MAPE behavior of each implemented technique for MCS, proving as well an improved performance by ANFIS compared to the statistical methods.

Conclusions
An ANFIS was used to estimate the photovoltaic electrical power generated by solar energy at two different locations, Hermosillo and Mexico City. The intelligent technique demonstrated a better performance than the MLR and GDO as statistical methods. The reported results for MCS showed that all three methods achieved a satisfactory estimation performance by their comparison against real measured values; nonetheless, the ANFIS system clearly displayed an improvement among the methods employed. For the case of HS, the MAE overall values with respect to GDO and MLR were 200.1 W and 209.3 W, and the MAPE overall values with respect to GDO and MLR were 6.9% and 7.2%. For the MCS, the MAE overall values with respect to GDO and MLR were 1130.4 W and 1142.2 W, and the MAPE overall values with respect to GDO and MLR were 1.9% and 2.0%. Consequently, GDO produced better results than MLR overall and in almost every monthly case; however, ANFIS outperformed both MLR and GDO in every case, achieving overall results with respect to MAE and MAPE of 169.2 W and 5.8% for HS and 924.7 W and 1.6% for MCS.
As outlined in Section 3, even when the ANFIS computational time is considerably greater than the one involved in statistical methods, the neuro-fuzzy result displays a better performance. It should be mentioned that, although the intelligent method took approximately 90 min to be trained, it is quite an acceptable amount of time and can even be considered fast in terms of neuro-fuzzy models with months of data.
During a period of a year, the times of dusk and dawn are modified by minutes, resulting in different times to end and begin each day; as outlined in Section 3.1, for every transition day, dusk and dawn times may vary due to external variables, e.g., cloudiness. These issues generate discrepancies between the estimated and the real data, as seen in the results section, regardless the applied method; nonetheless, these may be minimized by gathering a greater amount of data to fulfil the range of changing time values.
The ANFIS proved to have a better performance than the conventional statistical methods, demonstrating that this kind of intelligent system is a potential tool to be considered for power estimation in Mexico.