Estimation of Electricity Generation by an Electro-Technical Complex with Photoelectric Panels Using Statistical Methods

This paper presents a computational tool for estimating energy generated by low-power photovoltaic systems based on the specific conditions of the study region since the characteristic energy equation can be obtained considering the main climatological factors affecting these systems in terms of the symmetry or skewness of the random distribution of the generated energy. Furthermore, this paper is aimed at determining any correlation that exists between meteorological variables with respect to the energy generated by 5-kW solar systems in the specific climatic conditions of the Republic of Cuba. The paper also presents the results of the influence of each climate factor on the distribution symmetry of the generated energy of the solar system. Studying symmetry in statistical models is important because they allow us to establish the degree of symmetry (or skewness), which is the probability distribution of a random variable, without having to make a graphical representation of it. Statistical skewness reports the degree to which observations are distributed evenly and proportionally above and below the center (highest) point of the distribution. In the case when the mentioned distribution is balanced, it is called symmetric.


Disadvantages and Advantages of Renewable Energy Sources
The world's population growth, as well as the development of industry and production technologies, is accompanied by a significant increase in power consumption. To meet the needs of the population, power-generating enterprises are forced to consume an increasing number of fossil organic resources [1,2] since energy generation is usually provided through the combustion of hydrocarbons (oil, gas, coal). However, over time, reserves of this type of raw material are depleted, green fields are in increasingly complex mining, geological, and climatic conditions [3,4], and projects for the implementation of hydrocarbon production require the construction of several infrastructures: industrial facilities for preparation, drilling, production, and transportation of oil, gas, and coal [5]. This results not only in significant investments but also in a negative impact on the environment, due to construction and installation works, road embankments, trenching for pipelines, and emissions arising from machine operation, resulting in soil disturbance, pollution, littering, destruction of the soil cover, changes and destruction of animal habitats, and occurrence of the greenhouse effect [6].
Therefore, the issue of transitioning from traditional energy production to alternative methods of generating electricity becomes more and more relevant [7]. Methods of generating electric energy based on the use of renewable resources have the following advantages: non-depletion, availability, no need for complex related infrastructure, as well as reduction or complete elimination of carbon dioxide emissions [8]. However, despite all 1.
Increased funding from the state budgets of the United States, Japan, Germany, Italy, and India [14]; 2.
Introduction of a system of "Green Certificates", which operates in the European Union and the United States, ensuring the implementation of the mechanism for granting quotas for the generation/acquisition of energy from renewable sources; 3.
Tax credits and benefits for renewable energy producers; grants and tenders for the development of new projects and expansion of existing production facilities.

The Feasibility of Using Solar Energy
This paper discusses aspects of the use of solar power in electro-technical complexes. The choice of solar power as a source is supported by the fact that the sun emits about 1 kW/m 2 on the Earth's surface per day, and within seven days, the energy entering the planet exceeds the energy of all global reserves of fossil organic resources. According to some estimates, the economic potential of solar power is 20 billion tons of standard fuel, and this figure is two times greater than the production of all hydrocarbons per year [15]. In addition, the raw-material base for producing photovoltaic panels has significant resources: the amount of silicon, from which most solar cells are currently made, is 100,000 times greater than the reserves of uranium used when generating electricity in nuclear power plants.
Based on the above, it should be concluded that it is advisable to use solar energy [16] and convert it into electrical energy [17] using solar panels (photovoltaic panels).

Factors Affecting the Efficiency of a Photovoltaic Panel
A photovoltaic panel is a direct-current generator, which principle of operation is based on the physical property of semiconductors: photons of light knock electrons out from the outer orbit of the semiconductor atoms, creating enough free electrons to generate an electric current. When the circuit is closed, an electric current occurs [18]. To obtain the required power, individual solar cells are combined in panels, where they are connected in parallel or in series to obtain the required current and voltage parameters. Since the electricity produced is directly proportional to the area of the panels, photovoltaic panels occupy a large amount of space.
The efficiency of converting solar energy into electrical energy depends primarily on the intensity of sunlight and the angle of incidence of the rays. The efficiency of the panel depends on its location (latitude), climatic characteristics, time of year, and time of day. Since the surface of the panel has reflective properties, not all of the sun's rays are captured by the module. However, it should be noted that since the panel has the ability to convert not only direct solar radiation, but also scattered, into electrical energy, the photoelectric module can also capture the radiation reflected from neighboring surfaces. According to the current-voltage characteristic of solar modules, the no-load voltage OSV depends inversely on the operating temperature of the module, so the output power decreases when the module is heated.
In addition, power is also lost when a current passes through the volumetric resistance of the semiconductor [19], thereby heating the module, which leads to a decrease in its energy efficiency [20][21][22]. The number of failures of PV power plants during operation also affects the amount of electricity produced. The number of faults is small for about ten years of operation but then rises rapidly [23]. The factors described above are the main reasons that reduce the efficiency of photovoltaic panels. Therefore, theoretically, a silicon solar cell has an efficiency of about 20%, but in practice-less [24].
Currently, increasing the efficiency of sunlight-to-electricity conversion is a highly relevant task [25]. Widely known are two methods for increasing the generation of electricity [26] obtained from a photovoltaic installation: improving the structure of a photovoltaic panel to increase its performance [27] and increasing the amount of solar radiation captured by the panel [28]. The first method is directly related to the development of new technological solutions for creating materials and combining various semiconductor materials that can capture a different spectrum. For example, in [29], an increase in the efficiency of a photovoltaic module is achieved by creating multilayer panels, the so-called heterostructures. The paper [30] describes the use of thin films for two-sided silicon solar cells. The second method includes technical solutions for the use of solar tracking systems-solar radiation concentrators, or, in a word, the component composition of the equipment included in a solar power plant. This paper is aimed at determining the correlation that exists between climatological factors in systems that use solar power with respect to the energy generated by these systems. This is due to the fact that the efficiency of generation is affected by climatological factors, for both increasing and decreasing efficiency. This research work is focused on determining the impact of various climatological factors on electricity production, taking into account the subject matter's geographic location. Studying statistical models is important because it allows us to establish the degree of symmetry (or skewness), which is the probability distribution of a random variable, without having to make a graphical representation of it.
Based on the analysis of statistical data, accounting for the greatest impact on the production of energy by a solar power plant at the design stage, it will be possible to determine the most efficient geographical location of the power plant [31] or its component composition. The obtained dependencies will allow one to increase the productivity of direct conversion solar power plants.
This paper presents an analysis of an electro-technical complex with a low-power solar power plant (5 kW) connected to the electrical network of the Santiago de Cuba Province, the Republic of Cuba. To estimate the energy generated by a five-kilowatt solar system, studies with different climatic conditions should be conducted. This approach will make it possible to determine the dependence of climatic factors that affect electricity generation in photovoltaic systems.

Modeling an Electro-Technical Complex with a Photovoltaic System
During previous studies, a model implemented via Matlab software for simulation of a five-kilowatt photovoltaic system, as shown in Figure 1, was developed [32]. This model allows us to study the main electrical variables, such as the energy generated by the system in certain climatic conditions. The model includes the response surface equations to estimate the energy generated. Thus, it is possible to compare the generated energy calculated based on the mathematical model of the complex with the response surface equations obtained using statistical models, as well as to check their efficiency when estimating the energy generated. The system under study is located in the territory of Santiago de Cuba (latitude 20.0208 • N and longitude 75.8267 • W). when estimating the energy generated. The system under study is located in the territory of Santiago de Cuba (latitude 20.0208° N and longitude 75.8267° W). The panels used in this photovoltaic energy system are from the NUMEN SOLAR brand, model DSM-240-C, which are interconnected, allowing the generation of electricity. Table 1 shows the technical parameters of the panels with emissions of 1000 W/m² and an ambient temperature of 25 °C. Using a weather station installed within the study region, the following data were obtained (2020):

•
Horizontal global radiation; The measurement results were entered into the developed computer model. The math needed for the photovoltaic generator simulation, and introduced into the solar generator unit, included as follows: the response surface equations, found via the Minitab Statistical Software; data, measured by the weather station. Next, the energy values obtained by both models were compared. This approach made it possible to check the statistical model efficiency.
In the automatic calculations of the developed program, the solar radiation is calculated in the inclined plane in correspondence with the inclination of the solar generator. The panels used in this photovoltaic energy system are from the NUMEN SOLAR brand, model DSM-240-C, which are interconnected, allowing the generation of electricity. Table 1 shows the technical parameters of the panels with emissions of 1000 W/m 2 and an ambient temperature of 25 • C. Using a weather station installed within the study region, the following data were obtained (2020): The measurement results were entered into the developed computer model. The math needed for the photovoltaic generator simulation, and introduced into the solar generator unit, included as follows: the response surface equations, found via the Minitab Statistical Software; data, measured by the weather station. Next, the energy values obtained by both models were compared. This approach made it possible to check the statistical model efficiency.
In the automatic calculations of the developed program, the solar radiation is calculated in the inclined plane in correspondence with the inclination of the solar generator.

Correlation of Meteorological Variables
To determine the relationship between different variables, a correlation study was carried out. For this purpose, the Pearson's correlation coefficient (P) was calculated for each of the selected variables (horizontal global radiation, wind speed, ambient temperature, relative humidity, and atmospheric pressure) via Matlab (version R2018a) and Minitab Statistical Software (version 18.0) packages. Based on the calculations, it was determined whether the correlation between the studied variables is significant so that the p-value is less than 0.05.
The results obtained via both software packages are presented in Tables 2 and 3. The first table presents the values for the correlation between the meteorological variables and electrical energy, coming from a low-power solar system, calculated via the Matlab and Minitab statistical software packages, and the second table presents the values for the correlation between the meteorological variables of both software packages.
Based on the results presented in Table 2, it is possible to conclude as follows: 1.
The results of the correlation between the variables, found via both software packages, are the same.

2.
There are four meteorological variables having a greater correlation with electrical energy generated by a solar power plant, solar radiation, ambient temperature, and relative humidity, and to a lesser extent with wind speed.

3.
There is a direct relationship between solar radiation and ambient temperature with energy, which means they are directly proportional.

4.
There is an inverse correlation between relative humidity and energy, which means they are inversely proportional.

5.
Atmospheric pressure has a very low correlation with the electrical energy coming from the solar system produced by the solar power plant. Table 3 presents the differences in the calculations of the correlation coefficients obtained via the Matlab and Minitab software packages. As seen from the table, the results for both software packages are the same.
Based on the results presented in Table 2, it is possible to conclude as follows: 1.
There is a high and inverse correlation (K = −0.91) between relative humidity and ambient temperature.

2.
There is a high and direct relationship (K = 0.74) between solar radiation and ambient temperature.

3.
There is a high and inverse correlation (K = −0.73) between solar radiation and relative humidity.

4.
There is an average and direct correlation (K = 0.407) between ambient temperature and wind speed.

5.
Other correlations, marked in red, are low or zero. 6.
Atmospheric pressure has a low or zero correlation with other meteorological variables.  Figure 2 shows a graphical representation of the relationship between the meteorological variables obtained from a meteorological station located within the study region in the Santiago de Cuba Province, the Republic of Cuba.
4. There is an average and direct correlation (K = 0.407) between ambient temperature and wind speed. 5. Other correlations, marked in red, are low or zero. 6. Atmospheric pressure has a low or zero correlation with other meteorological variables. Figure 2 shows a graphical representation of the relationship between the meteorological variables obtained from a meteorological station located within the study region in the Santiago de Cuba Province, the Republic of Cuba. The behavior of the measured variables confirms the results obtained according to Table 3 since the following phenomenon is observed-with an increase in solar radiation in the time interval from 9:00 to 13:00, the ambient temperature also increases proportionally and to a lesser extent, so does the wind speed. Relative humidity shows the opposite behavior, decreasing, as previously obtained ( Table 3).
The results of the correlation calculations confirm the efficiency of the mathematical model since they coincide with the measurements by the weather station.

Calculation of the Main Partials
In statistics, principal component analysis (PCA) is a method used for a data set definition in terms of new uncorrelated variables ("components") [33]. The components are ordered by the amount of initial variance they define; therefore, this method is effective for reducing the data set dimensionality [34].
Technically, PCA is searching for a prediction, according to which data is best represented in terms of least squares [35]. It converts a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables called principal components. The behavior of the measured variables confirms the results obtained according to Table 3 since the following phenomenon is observed-with an increase in solar radiation in the time interval from 9:00 to 13:00, the ambient temperature also increases proportionally and to a lesser extent, so does the wind speed. Relative humidity shows the opposite behavior, decreasing, as previously obtained ( Table 3).
The results of the correlation calculations confirm the efficiency of the mathematical model since they coincide with the measurements by the weather station.

Calculation of the Main Partials
In statistics, principal component analysis (PCA) is a method used for a data set definition in terms of new uncorrelated variables ("components") [33]. The components are ordered by the amount of initial variance they define; therefore, this method is effective for reducing the data set dimensionality [34].
Technically, PCA is searching for a prediction, according to which data is best represented in terms of least squares [35]. It converts a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables called principal components.
As for its application, the principal component method is considered a method for reducing the number of initial variables that were taken into account while analyzing [36].
This method is needed to determine five meteorological variables, the relationship between which must be studied (solar radiation, ambient temperature, relative humidity, wind speed, and atmospheric pressure). It is necessary to determine which variables have the greatest impact on the electricity generation by the complex. This method allows one to determine the most affecting variables; therefore, it is these variables that must be taken into account when calculating. Ultimately, it will be possible to reduce the number of variables in the equation, thereby reducing estimation errors.
There are two main modes of PCA use: A method based on a correlation matrix is used when the data is not uniform dimensionally, or the order of the measured random variables is not the same [37].
A method based on covariance, used when the data is uniform dimensionally and has similar mean values [38].
For the purposes of this study, the first PCA method was used since the variable data are heterogeneous.
The method starts with a correlation matrix. Next, the values of each of m random variables Fj β are considered. For each of n individuals, the values of these variables were taken, and the data set was written in matrix form [39]: where each set: can be considered a random sample for the variable F j . From the m × n data, corresponding to m random variables, one can construct a sample correlation matrix, which is defined as follows: Since the correlation matrix is symmetric, it is diagonalizable, and its eigenvalues λ i , are checked: Due to the previous property, these eigenvalues are called the weights of each of m principal components. The main mathematically identified factors are represented by the base of the eigenvectors of the matrix R. Each of the variables can be expressed as a linear combination of eigenvectors or principal components [40].
Using the PCA method, the AP coefficient (%) that contains the percentage of the total variance, which in turn explains each principal component of the dependent variable being studied, was calculated via Matlab software.
According to the results obtained in Table 4, from the calculation of the main components, it can be concluded that the meteorological variables with the highest correlation with the energy delivered by photovoltaic systems are solar radiation, ambient temperature, and wind speed. Being solar radiation and ambient temperature, the climatic variables with the highest correlation with the energy delivered by photovoltaic systems.
On the other hand, relative humidity greatly affected the behavior of ambient temperature (see Table 2), so both ambient temperature and solar radiation are the main meteorological variables that are taken into account when estimating the energy delivered by photovoltaic systems.

Response Surface Method
The concept of a response surface includes a dependent variable Y, called a response variable, and several independent or controlled variables. If a provision is made that all these variables are measurable, the response surface can be expressed as: To obtain the response surface equation, several special experimental plans aimed at an approximation of this equation using the smallest possible number of experiments were developed.
In a two-dimensional problem, the simplest surface is the plane defined by the equation [41]: where X is the values of the independent variable, and B is the coefficients calculated for each of the independent variables. The observed response is taken to be equal to one, and the estimates for B should be determined by the least-squares method, which minimizes the sum of squared errors. This equation is called a first-degree equation since the exponent of each independent variable is equal to one. If there is any reason to believe that the surface is not flat, then the most suitable model may be a second-degree equation with two unknowns [42]: In order to effectively assess the model parameters, it is necessary to apply an appropriate experimental plan to collect the required data [43]. Some of the key features are as follows: 1.
Provides a reasonable distribution of data points and, therefore, information.

2.
Does not require a large number of experiments.

3.
Allows one to study the model adequacy.

4.
Provides accurate estimates of the model coefficients.
Allows one to conduct experiments in blocks. 7.
Does not require too many levels of independent variables.
As the surface becomes more complex, a larger number of coefficients must be estimated, and the number of experimental points will inevitably increase [44].
Based on the results obtained, the response surface equation was found for two factors, namely, for two independent variables and one dependent variable, using the Minitab 18 software. The results of the simulation, on the basis of which it can be concluded that in order to obtain the response equation, it is necessary to use only two independent variables, which will be the ambient temperature and solar radiation, since more than 85% of the response variable, in this case, energy, can be explained with two independent variables only.
Based on the calculations made, it is possible to conclude as follows: 1.
To determine the change in the energy generated by a solar power plant, taking into account the five meteorological variables studied (wind speed, relative humidity, ambient temperature, solar radiation, atmospheric pressure), the model used will explain only 58% of the response.

2.
If four meteorological variables (wind speed, relative humidity, ambient temperature, solar radiation) are taken into account, the model used will explain only 67% of the response. 3.
If three meteorological variables (wind speed, ambient temperature, solar radiation) are taken into account, the model used will explain only 85% of the response.

4.
If only two meteorological variables (ambient temperature, solar radiation) are taken into account, the model used will explain only 87% of the response.
Therefore, based on the results obtained, it can be concluded that in order to estimate the energy generated by the complex in the specific climatic and geographical conditions of the region, the three most affecting variables include: ambient temperature and solar radiation (with an explanation of the dependent variable at 87%).
Where EC is the average energy per month, calculated using Matlab software, EE is the calculated average energy per month according to the response surface equations, D is the difference between the average energy obtained by both methods in October.

Results and Discussion
The results obtained give rise to the following conclusion: for two response surface equations, the equation that relates the independent variables, ambient temperature, and solar radiation, gives the best answer with respect to the equation that relates the independent variables, ambient temperature, and relative humidity.
The results from Table 4 show that equation two (solar radiation and ambient temperature) gives the best estimate of the energy produced by a five-kilowatt system.
In Figure 3, the average energy generated by an electro-technical complex with a fivekilowatt photovoltaic plant per day, and calculated according to equation (Table 1), is represented via red columns. The average energy, produced by an electro-technical complex with a five-kilowatt photovoltaic plant per day, and calculated using a mathematical model (Figure 1), is represented via blue columns.
The results obtained give rise to the following conclusions: 1. The obtained energy values are estimated by the response surface equation in Table 5 (red columns); it approximates with high accuracy to the energy obtained using the mathematical model of a five-kilowatt photovoltaic system, simulated via the Matlab/Simulink software (blue columns).

2.
Low average values of energy, generated per day, as shown in Figure 3 (area 1), are the result of low average values of solar radiation and low average values of ambient temperatures, associated with high average values of relative humidity and a decrease in average wind speed during the day.

3.
According to Figure 3 (area 2), the average energy produced on days 9, 10, 11, and 14, despite the fact that the values of solar radiation and wind speed are lower than on days 12, 13, and 15, by about 100 W/ m 2 , represents slightly higher values for the energy produced on days 12, 13, and 15. On days 9, 10, 11, and 14, higher values of relative humidity, which affect the decrease in ambient temperature, are observed. The latter is directly related to the operating temperature of the solar panel.

4.
The highest average values of energy, generated per day, as shown in Figure 3 (area 3), are the result of high average values of solar radiation and high average values of ambient temperature, associated with low average values of relative humidity and an increase in average wind speed during the day.

5.
When estimating the energy of a system, it is sufficient when only two meteorological variables (ambient temperature and solar radiation) are taken into account since both explain more than 85% of the system's response and introduce a significant skewness in the random variable distribution of the generated energy ( Figure 3). 6. Figure 3 shows that at high wind speeds, photovoltaic systems receive additional cooling since it is known that with an increase in the operating temperature, the efficiency of the photovoltaic module decreases. This circumstance also introduces a certain skewness of energy. 7.
There is a direct correlation between ambient temperature and solar radiation and an inverse relationship between relative humidity with solar radiation and ambient temperature. 8.
It can be proved that the atmospheric pressure has little or almost no impact on the energy performance of photovoltaic systems in the conditions under consideration, which indicates that there is no random change in energy, but there is a theoretical possibility that the atmospheric pressure affects the energy performance when the geographical location of the subject matter changes.
Symmetry 2021, 13, x FOR PEER REVIEW 10 of 14 Figure 3. Energy produced per day by a 5-kW photovoltaic system (October 2020) with respect to meteorological variables (ambient temperature, solar radiation, relative humidity, wind speed).
The results obtained give rise to the following conclusions: 1. The obtained energy values are estimated by the response surface equation in Table  5 (red columns); it approximates with high accuracy to the energy obtained using the mathematical model of a five-kilowatt photovoltaic system, simulated via the Matlab/Simulink software (blue columns). 2. Low average values of energy, generated per day, as shown in Figure 3   . Energy produced per day by a 5-kW photovoltaic system (October 2020) with respect to meteorological variables (ambient temperature, solar radiation, relative humidity, wind speed). There are a number of articles devoted to the estimation of electricity generation by solar panels depending on various factors, including climatic conditions, for example, work [45][46][47][48]. In these studies, only solar radiation is taken into account; other climatic factors are not represented. Also, these works do not provide a detailed description of the calculation models, a statistical analysis of the influence of climatic factors on the energy produced. Article [49] describes a model for determining the tilt angle with the horizon (with respect to the ground) of the solar energy system by estimating the monthly mean daily global solar radiation on tilted surfaces facing directly towards the equator, which is based on monthly average daily global solar radiation data produced from typical meteorological year (TMY) data. The disadvantage of the proposed model is the lack of correlation with other climatic factors, which can also randomly change and affect solar radiation. The same assessment can be given to the model of the photovoltaic system presented in article [50].
Furthermore, the article [51] presents a model and a self-learning system for dust estimation of photovoltaic panels based on data on solar radiation, ambient temperature, and output power generated by solar panels, as well as the amount of dust in these conditions. This approach has many advantages, but data on wind speed and humidity, which have a great influence on dust formation, are not used when constructing the model.
There are also currently clear sky models that take into account the influence of clouds on solar radiation and do not take into account other climatic factors.
In solar applications, the most common CSI models provide broadband irradiance predictions based on a number of simplifications and/or empirical components compared to the rigorous radiative transfer models used in atmospheric sciences. Thus, these common CSI models have to undergo continuous quality assurance evaluations to delineate the range of validity of such simplifications. Traditionally, these evaluations have consisted of direct comparisons against high-quality ground observations [52].
Taking into account the results of these studies and the limited experience of existing systems and models for estimation of electricity generation by solar panels depending on climatic conditions, it should be concluded that the proposed methods are appropriate for use in specific geographical conditions.

Conclusions
The computational tool proposed in this paper is designed to estimate the energy produced by low-power photovoltaic systems based on the specific conditions of the study region. This approach will make it possible to determine the relationship between climatic factors that affect energy production in photovoltaic systems operating in any region. This approach allows us to evaluate the most favorable geographical location of photovoltaic panels, which contributes to increasing the efficiency of converting solar energy into electricity. The approach to assessing the significance of climate parameters described in this paper will also allow us to determine the component composition of a solar power plant from the point of view of automation and its algorithm of operation. The energy estimation results, derived using the response surface equations, and obtained during this study for the specific climatic conditions of the Republic of Cuba, correspond to one month of the study (October 2020), which is a small sample. For an adequate estimation of energy, annual meteorological data and the energy generated are needed.
According to the results obtained during this study, when estimating the energy generated by an electro-technical complex with a five-kilowatt photovoltaic plant, by statistical methods, it can be stated that statistical skewness reports the degree to which observations are distributed evenly and proportionally above and below the center (highest) point of the distribution. In the case when the mentioned distribution is balanced, it is called symmetric. Thus, based on the presented studies, three climatic factors contribute to the skewness and the greatest influence in the random variable distribution of the generated energy: ambient temperature, solar radiation and wind speed. Acknowledgments: We acknowledge support by the Solar energy research center in Santiago de Cuba and the department of General Electrical Engineering of the Saint-Petersburg Mining University.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this article: