Meteorological Variables’ Inﬂuence on Electric Power Generation for Photovoltaic Systems Located at Different Geographical Zones in Mexico

: In this study, the relation among different meteorological variables and the electrical power from photovoltaic systems located at different selected places in Mexico were presented. The data was collected from on-site real-time measurements from Mexico City and the State of Sonora. The statistical estimation by the gradient descent method demonstrated that solar radiation, outdoor temperature, wind speed, and daylight hour inﬂuenced the electric power generation when it was compared with the real power of each photovoltaic system. According to our results, 97.63% of the estimation results matched the real data for Sonora and 99.66% the results matched for Mexico City, achieving overall errors less than 7% and 2%, respectively. The results showed an acceptable performance since a satisfactory estimation error was achieved for the estimation of photovoltaic power with a high determination coefﬁcient R 2 .


Introduction
Renewables energies represent a potential alternative in the transition towards a low-carbon society, where photovoltaic sources play a key role; however, consumers are also investors and a project is implemented only if economic conditions are verified [1].
Solar technologies are characterized depending on the way they capture, convert, and distribute sunlight such as photovoltaic (PV) systems and their corresponding requirements for an energy storage arrangement [2,3]. Consequently, these technologies feed power to the electric grid by using solar panels as generators [4][5][6]. In addition, concentrating solar power plants (CSP) use mirrors to focus the energy from the sun to drive traditional steam turbines or engines to create electricity; solar heating and cooling (SHC) systems which collect the thermal energy from the sun use this heat to provide hot water, space heating, and cooling for residential, commercial, and industrial applications. These technologies displace the need to use electricity or natural gas [7][8][9].
A photovoltaic system is composed of several components: the solar panel and the inverter for grid-connected systems and additionally energy storage for stand-alone systems. The fabrication of the solar panel involves diverse stages. The first step is to define the type of panel, where the most known is monocrystalline or polycrystalline and where the solar cells are manufactured using wafers made of silicon [10]. Nowadays, the PV systems are widely used to generate electricity due its accessible cost [5,6,11,12]. Some studies aimed at reducing this cost even more, e.g., in Reference [13], a design optimization model for the residential PV systems in South Korea was proposed, where the objective function to be minimized consisted of three costs, such as the monthly electric bill, the PV-related construction costs, and the PV-related maintenance cost.
The solar radiation (photons) is responsible for the photovoltaic effect; nonetheless, some weather factors have an effect on the amount of energy generated even with the optimum radiation. A cloudy day generates a shadow on the solar panel by the time it has to capture the photons so that the incidence of these particles will be less, achieving a minor electric power. It is well-known that clouds are water steam concentrated in the air; in other words, humidity and temperature work together, and such a relation is an example of how the meteorological variables impact the generation of electric power [14,15]. Solar energy, besides wind energy, is currently the most resourceful renewable source worldwide [16,17]. Its obtainment, unlike many others currently used, does not mean any harm to the environment, and its resources overcome, by far, everyone else [12,[18][19][20]. Some countries such as Germany, Italy, Spain, United States of America, and China are ahead on solar energy research; meanwhile in Mexico, being a country rich in solar radiation, with a great territorial extension, and having some solar energy studies, solar energy still does not have the necessary research compared to countries leading solar technology [4][5][6]11,12,21]. Figure 1 depicts the behavior of the horizontal solar radiation on the world. Similarly, if Mexico is compared with Europe, it can be seen that the only country with a notorious radiation incidence is Spain, achieving a maximum value between 4.8 and 5.4 kWh/m 2 [22]. Mexico exceeds Spain both in incidence territory and in radiation intensity, with an average between 5.6 and 6.2 kWh/m 2 , as can be seen in Figure 2, exceeding even China, which mostly contemplates values of 4.6 kWh/m 2 .   Table 1 shows the data compilation from 2010 describing the global-horizontal solar radiation for some locations in Mexico. The total irradiation provided by solar energy in the year 2018 was summarized in Reference [25]. Both reports agree that the Northwest of Mexico reached the maximum values, the states of Sonora, Baja California, Coahuila, and Chihuahua being the main producers/receivers.
Recently, diverse reports have studied the relations between some meteorological variables and the electric power generated in a photovoltaic system. In Reference [26], an analysis of the climatic factors and solar data from the Andes site was performed; nonetheless, all the data gathered for the study was averaged every 10 min and the power measurements were estimated. In [27], the effect of diverse meteorological variables (outdoor temperature, air pressure, humidity, wind speed, and solar radiation) on the generated energy are mentioned, although a representative model of the plant was not obtained and all the data was experimentally generated. In Reference [28], a statistical method was applied to forecast the energy generated by a solar plant, though a database of 30 plants from different locations is necessary and although it does not have real power measurements from the site and the computational load is heavy. In Reference [29], an artificial neural network (ANN) is used to obtain a model to forecast the photovoltaic energy; however, the solar radiation was the only meteorological variable analyzed.
Higher Education Institutions (HEI) currently play a major role in the generation of human capital and the associated impact on societal development; HEIs are ideal locations to focus the resources in terms of the deployment and experimentation of decarbonization technologies to demonstrate the best practice for a further replication within wider society [30].
Careful planning is required to manage the future electricity demand of PV systems due to its increasing potential demand in Mexico [31,32]. Therefore, it is vital to understand the influence of meteorological variables on energy consumption in which a better understanding of it can contribute to a more useful strategy in meeting the energy efficiency goal for the country.
According to the above and considering References [1,13,30], the aim of this work is to present a statistical analysis based on the gradient descent method that is easy to implement and has a low computational load to estimate the electric power generation from the meteorological data such as the solar radiation, outdoor temperature, wind speed, and daylight time collected from PV systems located in Mexico City and Sonora [33]. This is important because most of the current photovoltaic system deployments do not monitor these factors or employ them in adaptation and prediction tasks [34][35][36].
The proposed methodology achieves a satisfactory estimation of the PV power with a high determination coefficient and a fair error percentage value.

Statistical Method
The amount of data collected and the correlation among them are quite important to accomplish an estimation. If the total data is scarce or the correlation between any input and the output is low, the estimation will not be satisfactory.
The correlation analysis is one of the most used and reported statistical methods on scientist and medical researches; its visual representation is known as a dispersion graphic. It is used to prove or reject the existence of a relation between two different variables based on the Pearson correlation coefficient described by Equation (1).
where E (ab) is the crossed correlation between a and b and where σ 2 a = E a 2 and σ 2 b = E b 2 are the variations of the variables a and b, respectively [38].
If "a" and "b" are the two considerate variables, then a dispersion graphic shows the location of each ordered pair in a coordinated system. If most points appear to be close to a straight line, this correlation is known as linear. If most points appear to be close to a curve, the correlation is known as nonlinear. On the other hand, if a clear pattern among the ordered pairs is inexistent, then there is no relation between both variables [39]. In order to represent the curve (curve or line regression) that best fits the behavior of ordered pairs, Equation (2) is used.
The correlation coefficient or R requires that both a magnitude and a direction be either positive (from 0 to 1) or negative (from −1 to 0). If R gets close to ±1, the correlation will be stronger. The correlation does not depend on the direction or the sign: A correlation of 0.57 is equal to a −0.57 one.
It can be also stated that the greater the absolute value of R, the greater the correlation will be. The determination coefficient R 2 is defined as the percentage of the variation of the dependent variable values that can be explained as variations of the independent variable. In other words, a determination coefficient R 2 = 0.23 symbolizes that 23% of the dependent variables is attached to the changes of the independent variable. Therefore, if a correlation factor of R = 0.20 was found between two variables, the determination coefficient would be R 2 = 0.04 so that only 4% of the dependent variable is affected by the variations of the independent variable [40].
In order to create a statistical representative model of the generated power by solar energy, the concept of gradient descent optimization (GDO) was considered due its ability to minimize the model error by the LSR of a linear regression model, which is often used on estimation studies, but it is not as complex to implement as an intelligent technique [29,41,42].
This optimization method has multiple applications. In Reference [43], this method was used to optimize the geometry of the strut-and-tie truss to minimize the difference in the share of resisting actions with respect to the prediction of the multi-action shear model. In Reference [44], a model based on gradient descent is proposed that integrates several important parameters for ranking channels in order to select the best communication channel from a radio spectrum for transmission, enhanced by cognitive radio as an intelligent wireless solution.
The GDO is based on the linear regression method known as the relation among all the input variables and the output one. Depending on the number of input variables, the regression can be simple or multiple [45][46][47]. To organize all the gathered data, every input variable is considered as a column in a matrix called "X" and the output parameter is considered as a vector "y", as seen in Equation (3).
The purpose of GDO is to find an estimation of the real output through an equation involving all the collected data as shown in Equation (4).
where h θ (x) is the estimated output, x k is the kth input variable, θ k is the characteristic coefficient of every variable, and is the error between the model and the real data [48]. As can be seen from Equation (4), θ 0 is not a value of influence for any input; however, it does locate the resulting estimation graphic on the Y-axis. If θ 0 is greater than expected, the plot will have a wide gap against the real data in the lower values; nonetheless, if θ 0 is smaller than necessary, the gap between the real and estimated model will be large on the upper zone. Therefore, both cases have a greater error. In order to achieve the minimum error value and instead of a linear regression which uses the least squares method, the GDO finds the right θ 0 by recursive partial derivatives of the cost function described by Equation (5) where m is the total amount of rows in the matrix, x (i) represents the ith row of "X", and y (i) is the value of the ith row of "y". The gradient descent, denoted by Equation (6), aims to converge to the cost function minimum through its partial derivative. The quickness of the convergence is given by α.
Equation (7) shows the substitution of Equation (5) into (6) which has to be repeated itself n-times until the convergence is done.

Solar Data
The data gathered from the State of Sonora, specifically the city of Hermosillo, were obtained with the support of the University of Sonora (UNISON), while the ones from Mexico City were issued by the Centro de Investigación y de Estudios Avanzados (CINVESTAV) campus Zacatenco, with the solar radiation (global-horizontal), outside temperature, wind speed, and daylight hour (time) as the input meteorological variables and the electric power as the output for both PV systems. Considering that both the solar plants are from different locations and do not have the same arrangement, their respective output power magnitudes vary among them; Hermosillo site has an electric power maximum around 2500 W, while Mexico City site reaches values around 45 kW.
The matrix "X" and the vector "y" from Equation (3) By using the same variables from both locations, the aim is to achieve an acceptable statistical estimation of the electric power even if those variables are gathered from distant geographic areas.  Tables 2 and 3, the monthly and total results of the characteristic equations for each fitted curve and their determination coefficients are described, respectively. Equation (7) is applied in order to calculate the values of θ k for Hermosillo by using the Matlab R software. The optimal values found for θ are mentioned in Table 4 and substituted into Equation (4), thereby obtaining its mathematical representation given by Equation (9). Figure 8 is obtained by implementing six months of gathered data into Equation (9). It describes the behavior of the statistical estimation (red) against the real data (green) obtained by the PV system in Hermosillo. For a better appreciation of these results, Figure 9 shows a close-up from 24 November to 24 December 2018. A tracing practically coincident with the rise and fall times for each day as a satisfactory reach of the maximum and minimum values is observed.

Month Input Variables
Solar radiation Temperature Aug

Data from Mexico City
The geographical zone of the Mexico City Site (MCS) is presented in Figure 10. The data covers a period of 6 months. From Figures 11-14 the dispersion graphic and fitted curve for each input variable against electric power are shown. In Tables 5 and 6, the monthly and total results of the characteristic equations for each fitted curve and their determination coefficients are described, respectively.

Month Input Variables
Solar radiation Temperature Jul Wind  The determination coefficient values for Mexico City are found applying the same procedure used in the case of Hermosillo, as shown in Table 6. Analogous to the case from Hermosillo, the coefficients θ for Mexico City data and its mathematical representation are displayed in Table 7 and Equation (10), respectively. Figure 15 shows a six months estimation result (red) against the real data behavior (green) from Mexico City. In order to a better appreciation, Figure 16 shows a close-up of Figure 15 covering the same period as that in the case of Hermosillo. The estimation result matches with the rise and fall times of the real power.

Error Analysis On-Site
As seen in Section 1 and according to Reference [51], the output power of a PV array greatly depends, among other parameters, on solar radiation; however, this variable has an intermittent nature and suffers from rapid fluctuations. In Reference [41,51], the above is considered and some parameters besides the solar radiation such as clear sky or weather data are added; nonetheless, for these cases, the solar radiation is either the estimated output or an estimated input, contrary to this paper where this variable is measured on-site.
Each coefficient θ k from Equations (9) and (10) represents the characteristic magnitude of its respective input variable x k . Its quantity values prove the influence in the mathematical model approximation, establishing an existing relationship between the meteorological variables involved and (responsible for solar fluctuations) the power output.
Comparing Figures 9 and 16, an adequate behavior is observed between both results, achieving a valid statistical estimation. Regardless of the amount of stored data, similar outcomes were obtained from both locations.
This statement is mathematically proved through the correlation and error analysis.

Correlation Analysis
According to Section 2.1, the smaller an error between the estimation and real data, the greater the determination coefficient will be. Figures 17 and 18 show the dispersion graphic of the estimation against the real electric power for Hermosillo and Mexico City respectively. Moreover, in Table 8, the monthly and total fitted curve characteristic equation and determination coefficients for both cases are displayed.   Both statistical models estimate favorably the real electric power for each tested day as we can observe through the values in Table 8. A great closeness can be confirmed between the estimation and the real data for each location, which shows that the implementation of a gradient descent optimization achieves a satisfactory result.

Error Analysis
The less the difference between the estimated and the real value, the better the estimation. Two kinds of errors were applied, where "P m " was the measured power, "P e " was the estimated power, "s" was the sample in consideration, and "N" was the total amount of samples. The first one was called Mean Absolute Error (MAE) defined by Equation (11) and was used as a standard statistical metric to measure the model performance in meteorology, air quality, and climate research studies [42,51,52]. The second one was the percentage of the MAE known as Mean Absolute Percentage Error (MAPE) defined by Equation (12).
Given that the results shown in Figures 9 and 16 resemble a sinusoidal behavior, (P m − P e ) /P m does not present the real error between both signals. Considering Figure 19a, if the measured value is on a high level, then the rate between the P m − P e and P m values will be low; however, if the measured value is on a low level, then the same rate will be high or close to 1, as seen in Figure 19b. According to this, the rate is changed regarding the full range of the measured signal (max-min) instead in order to obtain a reliable value. Equation (13) represents the modified MAPE.
With respect to the data from HS and MCS, Table 9 shows the values of the errors mentioned in Equations (11) and (13).    Table 9, achieving a MAPE of 6.9036% in general for HS; meanwhile, for MCS, its respective results is a MAPE of 1.9291%. The above demonstrates the estimation robustness of the statistical model.

Conclusions
A gradient descent method was used to calculate, through a characteristic equation, the relationship between several variables which can be easily computed using quantitative data.
The results showed that solar radiation and daylight time are relevant in estimating the electric power, with solar radiation being the one with the most influence over the photovoltaic power generation; nonetheless, temperature, wind speed, and daylight hour affect this process, with their inclusion being fundamental in the analysis to achieve a proper electric power estimation.
According to Figures 9 and 16, an acceptable approximation between the statistical simulation results and real-time power generation has achieved. This is proven both by Table 8, where 97.63% of the estimation results matches with the real data for HS and 99.66% match for MCS, and by Table 9, achieving overall error values no greater than 7% and 2% for HS and MCS, respectively. An observable relationship between the determination coefficient values and error results is clear, concluding that the percentage error will be lower while R 2 is higher.
Several causes could perturb the correlation, resulting in a wider dispersion, with the weather conditions as the most important for this study. Contrary to Mexico City, Hermosillo has abrupt changes in climate throughout the year, reaching temperatures around 50 • C and 0 • C in summer and winter, respectively, as well as sudden weather changes from a sunny day to a cloudy or even a rainy day within hours.
However, according to the above and even if the results have a satisfactory behavior, by definition, a statistical estimation will always have an error value. Nevertheless, these error values can be reduced, increasing the gathered input data or by other alternative methods such as intelligent systems which have proven to be an efficient methodology [53][54][55][56]. The above will be discussed elsewhere.