Prediction of the Electricity Generation of a 60-kW Photovoltaic System with Intelligent Models ANFIS and Optimized ANFIS-PSO

: The development and constant improvement of accurate predictive models of electricity generation from photovoltaic systems provide valuable planning tools for designers, producers, and self-consumers. In this research, an adaptive neuro-fuzzy inference model (ANFIS) was developed, which is an intelligent hybrid model that integrates the ability to learn by itself provided by neural networks and the function of language expression, how fuzzy logic infers, and an ANFIS model optimized by the particle swarm algorithm, both with a predictive capacity of about eight months. The models were developed using the Matlab ® software and trained with four input variables (solar radiation, module temperature, ambient temperature, and wind speed) and the electrical power generated from a photovoltaic (PV) system as the output variable. The models’ predictions were compared with the experimental data of the system and evaluated with rigorous statistical metrics, obtaining results of RMSE = 1.79 kW, RMSPE = 3.075, MAE = 0.864 kW, and MAPE = 1.47% for ANFIS, and RMSE = 0.754 kW, RMSPE = 1.29, MAE = 0.325 kW, and MAPE = 0.556% for ANFIS-PSO, respectively. The evaluations indicate that both models have good predictive capacity. However, the PSO integration into the hybrid model allows for improving the predictive capability of the behavior of the photovoltaic system, which provides a better planning tool.


Introduction
Electrical energy generation with hydrocarbons accounts for about 38% of global CO 2 emissions [1]. A Life cycle analysis of photovoltaic plants shows that CO 2 emission rates are significantly lower than those of systems using fossil fuels. Therefore, photovoltaic systems have a significant potential to mitigate global warming [2,3]. The virtuous circle of technological development, innovations in design, and the joint action of nations have been decisive in reducing the costs of generating electricity through photovoltaic technology [4]. The progressive decrease in large-scale solar PV generation prices fell by 85% between 2010 and 2020, from USD 0.381 per kWh to USD 0.057 per kWh [5]. It has driven an increase in the installed capacity of photovoltaic solar energy (PV) generation, with an annual growth rate of 38.3% since 2009. Solar photovoltaic energy achieved the most significant capacity increase ever seen in a single year, in which the annual rate increase was 21.5% in 2020 compared to 2019 [6]. By 2020, producing electrical energy through PV systems energy models based on standard recurrent neural network models and long-term memory terms. They closed recurring units and developed their fundamentals and comparisons in predictive periods within one hour.
Patel et al. [29] carried out a systematic study of the prediction of solar irradiation and solar PV generation and carried out a detailed analysis of ANN models, such as those based on the backpropagation algorithm, multilayer forward algorithm, and linear regression, in addition to hybrid models based on fuzzy logic, analyzing the behavior with various input parameters and different layer structuring, concluding that hybrid models based on ANN and fuzzy logic have a better predictive capacity. Dokmen et al. [30] developed three intelligent models based on the wavelet artificial neural network (WANN), wavelet support vector machine (WSVM), and ANFIS to estimate solar energy at two stations in Iraq. The results were compared to experimental data and evaluated with statistical criteria, resulting in satisfactory predictive results for all three models, with a slightly better predictive ability for WANN for the two sites.
Perveen et al. [31] developed an ANFIS model to predict short-term power generation with applications in a smart grid using variables from a composite climate zone. Finally, the results were compared with other predictive models such as ANN, vector machine, and fuzzy logic, with the ANFIS model obtaining the best results. Sujil et al. [32] proposed an ANFIS predictive model for forecasting the output power of a photovoltaic system and a wind system for a power management system in South China. The ANFIS model was developed with three different partitioning techniques, compared with the backpropagation algorithm, and later evaluated with statistical methods, resulting in ANFIS fuzzy c mean clustering with better results for both algorithms. Viswavandya et al. [33] developed two predictive models based on fuzzy logic and ANFIS, respectively, where they used historical meteorological data to predict short-term solar irradiation and then compared them with on-site historical radiation data, reporting satisfactory results. Haji and Genc [34] proposed a maximum power point tracking (MPPT) controller for an off-grid photovoltaic system based on the ANFIS, Perturbation and observation (P&O), and fuzzy logic controller (FLC) to track the maximum power point, concluding that the ANFIS shows better tracking efficiency, lower ripple power, and resistance to system voltage and current changes.
In recent years, a large amount of research has been conducted on computational methods of combinatorial optimization [18,35], where optimization algorithms are used in the search to improve the capacity of predictive models. Combinatorial optimization [36] improves their ability to address complex and non-linear problems [37], and coupling has been successfully applied in various fields of science [38][39][40][41]. Slowik and Kwasnicka [42] extensively studied evolutionary algorithms such as genetic algorithms, genetic programming, differential evolution, evolution strategies, and evolutionary programming, their nature, properties, and selective application in different areas of engineering. Khosravi et al. [43] developed an optimized ANFIS model with a hybrid of genetic algorithm and teaching-learning optimization algorithm (ANFIS-GATLBO) to evaluate central tower solar systems with thermal storage, using the general parameters of their design depending on each geographical region and comparing it with the artificial neural network and FIS models, resulting in ANFIS-GATLBO being more accurate in the results.
Ndiaye [44] implemented an intelligent predictive model, ANFIS, and an optimized model, ANFIS-GA, to predict the power generated by a photovoltaic system and distribute it to the national electricity grid in Senegal. The results determined that ANFIS-GA was more efficient, obtaining a mean square error of 2.027 W compared to ANFIS, which obtained 4.142 W. Lara-Cerecedo et al. [45] developed the intelligent models ANFIS and ANFIS optimized with a genetic algorithm and carried out a comparative study of their efficiency in the prediction of electrical energy generated from a photovoltaic system in northwestern Mexico. The variables of ambient temperature and solar radiation were evaluated with statistical methods, concluding that the optimized ANFIS-GA model obtained better results with a MAPE of 4.56% compared to the ANFIS model, which obtained a MAPE of 6.98%. Slowik and Kwasnicka [46] presented an investigation regarding swarm in- telligence algorithms based on natural collective intelligence. The objective was to describe the mathematics of its operation and its application in various industries and economic sectors. Khosravi et al. [47] developed two predictive models, ANFIS and ANN, which were optimized with GA, and independently, the same two models were optimized with PSO to simulate the energy and thermal behavior of a Stirling solar collector. They used different meteorological and geometric variables and collector designs for all models. ANFIS optimized with PSO reported the best results.
Wu et al. [48] presented a hybrid model based on the Elman neural network (ELM) and the adaptive network-based fuzzy inference system (ANFIS), which in turn was optimized with the parasitism-predation algorithm to improve the short-term prediction of energy demand obtained from the consumption statistics of November 2020. Its results were evaluated with statistical methods. They determined a better substance in the daily predictive capacity of the optimized hybrid model compared to the non-optimized hybrid, decreasing the RMSE by 48.4%, the MAE by 46.0%, and the MAPE by 47.4%. Ghenai et al. [49] developed an ANFIS model to predict energy consumption in the very short term (0.5, 1, and 4 h) at the University of Sharjah campus, Sharjah, United Arab Emirates, where their results were very accurate at the 30-min horizon, reducing its estimation approach as the estimation took a longer amount of time to predict, concluding that the model requires a large amount of data collected to be able to carry out training that allows the predictive horizon to be extended without a greater margin of inaccuracies. Eya et al. [50] presented an improved ANFIS predictive model for the estimation of the weekly and monthly load consumption for six months of the electrical system of the University of Nigeria, Nsukka. For its design and training, they carried out a collection of historical environmental and consumption data during the period 2014 to 2019. The predictive data were statistically evaluated, obtaining MAPE values below 5%. Yang et al. [51] presented a short-term novel predictive model for the electricity demand of New South Wales in Australia. This model combines three models based on the back propagation (BP) neural network and the adaptive network-based fuzzy inference system (ANFIS) to adopt the advantages of handling data regarding linearity, non-linearity, and seasonality of the individual models. A Differential Evolution (DE) optimizing algorithm was added to improve its precision. They compared the results with the experimental data on electrical demand for half an hour, giving the combined model a better approach than the three individual models.
Ashari [52] developed an ANFIS-PSO model that allows for the best configuration of an MPPT controller. The performance was compared with the perturb and observe algorithms (P&O) and incremental conductance (Inc). The efficiency of an ANFIS-PSO showed that it worked better than the other algorithms and reached 98.36% under standard test conditions. Adedeji et al. [53] developed a PSO-optimized ANFIS model for shortterm wind turbine power production forecasting in the Cape, South Africa. The study addressed the importance of data binning in predictive models using grid partitioning (GP) techniques, subtractive clustering (SC), and fuzzy-C-means (FCM), evaluating the SC with better statistical measurements.
The literature survey showed various studies on predictive intelligent models and their usefulness in broad engineering and science areas. Also showed interest in implementing new optimization techniques that help improve the robustness and accuracy of the model's predictions. We observed that few studies have explored how these models should be complemented, and even fewer have rigorously compared the results of different models. The literature analysis indicated the absence of investigations related to predicting the energy production of photovoltaic systems using neuro-fuzzy adaptive systems coupled to a particle swarm optimizer algorithm that performs not only the predictions of electrical power but also the predictions of electric energy generation, which is essential for the economic analysis of photovoltaic plants. Therefore, the main objective of this research was Energies 2023, 16, 6050 5 of 26 the development of two models, ANFIS and the optimized ANFIS-PSO model, that used experimentally captured data for their training and allowed us to make predictions over a window of almost eight months for both models. Each model has processed and learned from around 225,400 pieces of data per variable and was able to present very accurate predictive results. In addition, the algorithms showed great flexibility to deliver predictions for arbitrary periods and showed predictive robustness under volatile weather situations. A comparison was made between both models using statistical indicators widely used in the literature. The ANFIS system obtained RMSE values in the test segment: 1.79 kW, RMSPE: 3.075, MAE: 0.864 kW, and MAPE: 1.47% compared to the ANFIS-PSO, which obtained RMSE: 0.754 kW, RMSPE = 1.29, RMSPE: 1.29, MAE: 0.325 kW, and MAPE: 0.556 %, respectively. The above showed the predictive superiority of the optimized model.

Experimental Setup and Data Processing
The present research is based on the study of a photovoltaic system located at the Cen- predictions of electrical power but also the predictions of electric energy generation, which is essential for the economic analysis of photovoltaic plants. Therefore, the main objective of this research was the development of two models, ANFIS and the optimized ANFIS-PSO model, that used experimentally captured data for their training and allowed us to make predictions over a window of almost eight months for both models. Each model has processed and learned from around 225,400 pieces of data per variable and was able to present very accurate predictive results. In addition, the algorithms showed great flexibility to deliver predictions for arbitrary periods and showed predictive robustness under volatile weather situations. A comparison was made between both models using statistical indicators widely used in the literature. The ANFIS system obtained RMSE values in the test segment: 1.79 kW, RMSPE: 3.075, MAE: 0.864 kW, and MAPE: 1.47% compared to the ANFIS-PSO, which obtained RMSE: 0.754 kW, RMSPE = 1.29, RMSPE: 1.29, MAE: 0.325 kW, and MAPE: 0.556 %, respectively. The above showed the predictive superiority of the optimized model.

Experimental Setup and Data Processing
The present research is based on the study of a photovoltaic system located at the Center for Research and Advanced Studies of the National Polytechnic Institute (CIN-VESTAV-IPN) in Mexico City, Mexico (latitude 19°30′48′′ N and longitude 99°07′57′′ W). The photovoltaic array consists of 240 monocrystalline silicon photovoltaic modules of the brand Solartec (model S60MC, Guanajuato, Mexico), mounted on an aluminum structure with an orientation of 30° in azimuth towards the East of the geographic South and at an angle of inclination of 20°. Each module has a nominal power of 250 W and a high cell efficiency of up to 15%, for a total cumulative capacity of 60 kW. Figure 1 shows the photovoltaic system. The photovoltaic module array is subdivided into five sections. Each section comprises a chain of 48 modules, with 12 connections in series and 4 in parallel. The nominal power of each section is about 11.9 kW. The system has one inverter of the Fronius brand. IG Plus V11.4.2 DELTA (Pettenbach, Austria): It has a capacity of 11.4 kW per section and a maximum efficiency of 96.2%. The system has a thermopile-pyranometer EKO MS-602 (Tokyo, Japan) installed at an angle of 20° from the horizontal, whose technical characteristics are: spectral Error: ±0.2%, temperature response (−20 °C to 50 °C): ±2%, wavelength range (nm): 285 to 3000, irradiance range (W/m 2 ): 0 to 2000, operating temperature: −40 to 80 °C, calibration traceability/uncertainty: ISO 17025/WRR/<0.7% (k = 1.96), and an anemometer installed on-site to measure wind speed. Figure 2 shows the support PV system instruments. The photovoltaic module array is subdivided into five sections. Each section comprises a chain of 48 modules, with 12 connections in series and 4 in parallel. The nominal power of each section is about 11.9 kW. The system has one inverter of the Fronius brand. IG Plus V11.4.2 DELTA (Pettenbach, Austria): It has a capacity of 11.4 kW per section and a maximum efficiency of 96.2%. The system has a thermopile-pyranometer EKO MS-602 (Tokyo, Japan) installed at an angle of 20 • from the horizontal, whose technical characteristics are: spectral Error: ±0.2%, temperature response (−20 • C to 50 • C): ±2%, wavelength range (nm): 285 to 3000, irradiance range (W/m 2 ): 0 to 2000, operating temperature: −40 to 80 • C, calibration traceability/uncertainty: ISO 17025/WRR/<0.7% (k = 1.96), and an anemometer installed on-site to measure wind speed. Figure 2 shows the support PV system instruments. Lastly, each data logger records and integrates the system's weather and power variables every 5 min, 24 h a day. The registered variables are wind speed (m/s), module temperature (°C), ambient temperature (°C), solar radiation (kWh/m 2 ), and system output power (kWp), which were used as variables of the intelligent model. The design of the smart algorithm, record analysis, training, and predictive testing were done on a laptop with a CPU i3-1005G1 and 8 GB of RAM.
As a first step, information processing (collection, extraction, storage, structure, and analysis) was carried out due to the large number of records collected. The data was extracted from historically collected data files and distributed according to the time sequences of each of the input and output variables stored. Secondly, due to the need to determine the level of influence of each input variable concerning the electrical generation of the system, a correlation analysis of each input variable concerning the output variable was made. The research helps to describe the association between quantitative and categorical variables, where the associated variables change in tandem [54]. About 225,441 records per variable for approximately 26 months were used. The Pearson correlation coefficient (Equation (1)) was used, as well as Spearman's correlation coefficient (Equation (2)), which are explained in [55]. The results obtained from the correlations are shown in Table 1.
where, is the Pearson's correlation coefficient, ∑ is the product sum of squares, ∑ ∑ are the sum of squares of and respectively, and is the Spearman's correlation coefficient, = − is the variation in the rankings of the corresponding variables, and is the number of observations.  Lastly, each data logger records and integrates the system's weather and power variables every 5 min, 24 h a day. The registered variables are wind speed (m/s), module temperature ( • C), ambient temperature ( • C), solar radiation (kWh/m 2 ), and system output power (kWp), which were used as variables of the intelligent model. The design of the smart algorithm, record analysis, training, and predictive testing were done on a laptop with a CPU i3-1005G1 and 8 GB of RAM.
As a first step, information processing (collection, extraction, storage, structure, and analysis) was carried out due to the large number of records collected. The data was extracted from historically collected data files and distributed according to the time sequences of each of the input and output variables stored. Secondly, due to the need to determine the level of influence of each input variable concerning the electrical generation of the system, a correlation analysis of each input variable concerning the output variable was made. The research helps to describe the association between quantitative and categorical variables, where the associated variables change in tandem [54]. About 225,441 records per variable for approximately 26 months were used. The Pearson correlation coefficient (Equation (1)) was used, as well as Spearman's correlation coefficient (Equation (2)), which are explained in [55]. The results obtained from the correlations are shown in Table 1. (1) where, r p is the Pearson's correlation coefficient, ∑ d x d y is the product sum of squares, ∑ d 2 x ∑ d 2 y are the sum of squares of X and Y respectively, and r s is the Spearman's correlation coefficient, d i = X i − Y i is the variation in the rankings of the corresponding variables, and N is the number of observations. The results of the models were evaluated under strict statistical metrics whose mathematical formulations were found and adapted from the literature [56,57]. The statistical methods used to assess the efficiency of the models were the root mean square error (RMSE), the root mean square percentage error (RMSPE), the mean absolute error (MAE), and the mean absolute error in percent (MAPE), where Equations (3)-(6) were used, respectively. RMSE (Equation (3)) is a quadratic evaluation mechanism measuring errors' mean magnitude. Where, d i is the predicted values and y i are the observed values, and n is the number of observations.
However, RMSPE (Equation (4)) calculates the square root of the average of the squared percentage errors. It has the same properties as RMSE, with the difference that the results are expressed in percentages. Where, n is the number of samples, y t is the current value, andŷ t is the estimate. The loss function of the measure is the error squared.
MAE (Equation (5)) provides a generic and bounded performance measure for the model. Corresponds to an estimate of the absolute error. This level indicates the average magnitude of the true and predicted values. WhereŶ i is the estimated value, Y i is the present value, and N is the number of simples. As MAE approaches zero, the model is considered more accurate.
MAPE (Equation (6)) determines the percentage accuracy compared to the true value. Where n is the number of fitted points, A t is the actual value, F t is the forecast value, and Σ is the summation notation (the absolute value is summed for each point in the forecast time).

ANFIS Model
An ANFIS system is a multilayer direct-feed network formed by neurons associated with connections. The construction process of the hybrid model was based on five layers. Which have been widely studied and are adapted from [32,[58][59][60]. Using the least squares method, ANFIS performs a forward propagation to obtain the consequent parameters p i , q i , r i indicated in Equation (10), in the defuzzification layer and uses the backpropagation algorithm to minimize errors through gradient descent and thus modify the premise parameters a i , b i , c i indicated in Equation (7), in the fuzzification layer.
Layer 1. In the first layer, the fuzzification is carried out. In this process, the neurons transfer the previously received signals according to their encoding, which can be a square node with a square function. Each neuron can generate a value belonging to a linguistic level.
where, O 1 i denotes the output function and µA i denotes the membership function, and x is the node input, wherein a, b, and c are used to represent the membership function's shape-altering premise parameters. Layer 2. This layer is called the rules layer because the weights of the membership functions are computed. The rules layer represents the activation force of each rule generated in the fuzzification layer.
Layer 3. The fuzzy rules are determined, and the beginning of each of them is calculated.
where, O 3 i represents the output of the normalization layer and w i the normalized firing force.
Layer 4, called a defuzzification layer. In this layer, the output values are calculated considering the trigger strength values from earlier layers.
where, w i represents the output of layer 3, and the set of parameters called consequents are p i , q i y r i . Layer 5. In this layer, the total output is calculated as the sum of all the previous signals. The output value has a value of a continuous type instead of a fuzzy set type. Figure 3 shows an ANFIS structure with four inputs and one output. The model is composed of five layers, which are subdivided into the premise part and the consequent part.
where, denotes the output function and denotes the membership function, and is the node input, wherein a, b, and c are used to represent the membership function's shape-altering premise parameters.
Layer 2. This layer is called the rules layer because the weights of the membership functions are computed. The rules layer represents the activation force of each rule generated in the fuzzification layer.
Layer 3. The fuzzy rules are determined, and the beginning of each of them is calculated.
where, represents the output of the normalization layer and the normalized firing force.
Layer 4, called a defuzzification layer. In this layer, the output values are calculated considering the trigger strength values from earlier layers.
where, represents the output of layer 3, and the set of parameters called consequents are , y . Layer 5. In this layer, the total output is calculated as the sum of all the previous signals. The output value has a value of a continuous type instead of a fuzzy set type. Figure 3 shows an ANFIS structure with four inputs and one output. The model is composed of five layers, which are subdivided into the premise part and the consequent part.

ANFIS Optimized with Swarm Intelligence Algorithms
Evolutionary computing is an area of artificial intelligence that investigates optimization algorithms based on nature [61]. They look for solutions based on trial and error, using stochastic optimization, and provide tools to solve complex engineering problems involving randomness and non-linear dynamics [62].
Particle swarm optimization (PSO) is a branch of computational intelligence [63] that can be used as a global optimization technique where, through heuristic optimization, it seeks to find global minimums or maximums [64]. The operation is based on the swarm intelligence observed in nature as the behavior of groups of fish or birds, where the movement of each member is the result of combining individual decisions with the behavior of the rest, perceived as collective intelligence [46]. In the PSO, each moves in the search space according to a mathematical formulation, with a dynamically adjusted speed based on his own experience and his partner's experience. Each one is a point in the search space that uses their expertise to find their best position, and the social knowledge of the group helps to determine the best place of the entire swarm, the global minimum or maximum. The PSO process is adapted and widely explained in [46,[65][66][67].
The algorithm starts the swarm P(t), where the position → x i (t) of each particle is random in all space, and then the performance F of each particle is evaluated using the current position → x i (t), where the speed and position of the particle are given by Equations (12) and (13).
where, V(t) and X(t), are the velocity and position of the particle, P best is the induvial best value and G best is the global best value up until the most recent iteration, r 1 and r 2 are random in the 0-1 range, and t finally c 1 and c 2 are coefficients of particle acceleration. Subsequently, the performance of each individual is compared with their best performance so far, such that if , consequently, the performance of each individual is compared with the best global particle, such that if ). In addition, the velocity vector change is performed for each one using Equation (14).
where, v i (t + 1) is the speed of the particle i at the moment (t + 1), v i (t) is the speed of the particle i at the moment t, w is the coefficient of inertia, it reduces or increases to the velocity of the particle, c 1 is the cognitive coefficient, r 1 is the vector of random values between 0 and 1 of length equal to that of the velocity vector,x i (t) is the best position in which particle i has been so far, x i (t) is the position of particle i at time t. c 2 is the social coefficient, r 2 is the vector of random values between 0 and 1 of length equal to that of the velocity vector, and g(t) is the position of the entire swarm at time t, therefore the best global value. Finally, each particle is relocated to its new position according to Equation (15), and the iteration is repeated until convergence. Subsequently, the particle with the worst local value is selected and replaced with a new one with a better value. In such a way, PSO helps optimize the parameters of the antecedent part and the consequent parameters of ANFIS, improving the model's predictive capacity.
The ANFIS-PSO model finds the relationship between the input and output data to obtain the optimal configuration of the membership functions. The algorithm uses least squares methods and the back-trapping algorithm that uses the descending gradient method to calculate partial derivatives and thus minimizes errors by adjusting the weights to fit the training data [68,69]. The integration of optimization through PSO aims to improve the classification capacity of ANFIS so that more accurate predictions can be achieved.
In the ANFIS model, the PSO algorithm is mainly used to optimize the parameters of the antecedent layer (layer 1), where the parameters of the fuzzy membership functions are adjusted, and the consequent layer (layer 3), where the optimal values of the weights of the consequent functions are adjusted to obtain a more accurate output. In the PSO algorithm, the particles representing different combinations of parameters move in the search space, updating their velocity and position according to the best individual and global location, looking for the optimal combination of parameters. The dynamics of the PSO process begin by randomly generating a set of particles, where each represents a set of parameters of the ANFIS model (in this case, the weights, and centers of the membership functions). Then begins an iterative process where the model evaluates the fitness of each particle using a performance metric, in this case, the RMSE. Then, based on the PSO equations, the particles are updated to the best individual position, which is those in which the particles have obtained the best fitness until that moment, and the best global position is the position in which some particles have obtained the best fitness. The process continues by updating the velocity and position of each particle. The position is updated based on the current speed. The process is repeated until the criterion of the maximum number of iterations is established, obtaining the best global position, and providing the final parameters of ANFIS. It should be noted that optimizing the ANFIS layer parameters can improve the accuracy and overall performance of ANFIS, which can indirectly impact rule layer two and overall model performance.
However, during the optimization process, the parameters of the membership functions and weights are modified, PSO adjusts the initialized parameters randomly from the first step, and iterative adjustments are made to each parameter until the objective function is reached. Table 2 shows the PSO parameters of the optimization process. These values can be modified using the trial-and-error method, in which 1000 iterations were used to obtain the best cost, which is measured with RMSE. The objective of the PSO is to help ANFIS optimize the antecedent and consequent parameters. It is essential to point out that there is no consensus on a single and unified metric for the design of the parameters or an ideal initial configuration of the PSO algorithm that works optimally for all problems. The configuration decision is based on the mathematics of the algorithm and a trial-and-error adjustment in which several optimization tests are performed to select the one that provides the most accurate results possible, seeking to maintain an adequate balance between the results and the computational effort. For example, the swarm size or number of particles, where a larger swarm size may allow a better exploration of the search space but may also increase the computational cost. The inertia coefficient controls the influence of the previous velocity of a particle on its current motion. When it is high, it may favor the initial exploration of the search space, while if it is low, it may favor the exploitation of the best solutions. The personal acceleration coefficient controls the tendency of a particle to move towards its personal best position; the global acceleration coefficient determines the tendency of a particle to move towards the global best position found by the swarm; and the damping ratio of the inertial coefficient is used to gradually reduce the inertial coefficient as the optimization progresses in each iteration. In any case, it is essential to point out that the optimal parameter values may vary depending on the problem and the data type. The selection and configuration of the PSO parameters could even use some other algorithm that seeks, through an iterative process, to find the optimal configuration for a specific case. Figure 4 shows the flowchart of the PSO integration in ANFIS. global acceleration coefficient determines the tendency of a particle to move towards the global best position found by the swarm; and the damping ratio of the inertial coefficient is used to gradually reduce the inertial coefficient as the optimization progresses in each iteration. In any case, it is essential to point out that the optimal parameter values may vary depending on the problem and the data type. The selection and configuration of the PSO parameters could even use some other algorithm that seeks, through an iterative process, to find the optimal configuration for a specific case. Figure 4 shows the flowchart of the PSO integration in ANFIS. The coding and design of the ANFIS hybrid model and the evolved ANFIS-PSO model were carried out using Matlab ® software v2017b. Each model integrated four input variables and a single output variable. A total of 225,442 records per variable were imported into the algorithm. For the model training and testing process, it is essential to segment the total universe for each of the actions. Experimental studies have shown that machine learning models that use 70-80% of the data for training and 20-30% of the data for tests produce the best results [70][71][72][73]. The data from the test set is used to determine the predictive accuracy of the models. The preceding is intended for the model to learn from the data provided with low bias and training error but also to have the ability to generalize new data with low variance and low error in the test segment, that is, to avoid overfitting and underfitting. It is known as the bias and variance dichotomy [74,75].
Poor generalization from data can be characterized by overtraining. If the model overtrains, it only memorizes the behavior of the training data and cannot give correct outputs. The difference between the training and testing errors is too high. On the other hand, it may also present an undertrained. In this case, the model cannot obtain an admissibly low error value in the training segment. Therefore, the model cannot learn well enough from the training mapping, which will result in poor levels of accuracy in the prediction [76].
Empirical training tests were carried out in the evolved ANFIS-PSO model with three data partitions. They were evaluated with the MAPE statistical method, selecting the data segmentation that lowered the percentage error presented (Table 3). The coding and design of the ANFIS hybrid model and the evolved ANFIS-PSO model were carried out using Matlab ® software v2017b. Each model integrated four input variables and a single output variable. A total of 225,442 records per variable were imported into the algorithm. For the model training and testing process, it is essential to segment the total universe for each of the actions. Experimental studies have shown that machine learning models that use 70-80% of the data for training and 20-30% of the data for tests produce the best results [70][71][72][73]. The data from the test set is used to determine the predictive accuracy of the models. The preceding is intended for the model to learn from the data provided with low bias and training error but also to have the ability to generalize new data with low variance and low error in the test segment, that is, to avoid overfitting and underfitting. It is known as the bias and variance dichotomy [74,75].
Poor generalization from data can be characterized by overtraining. If the model overtrains, it only memorizes the behavior of the training data and cannot give correct outputs. The difference between the training and testing errors is too high. On the other hand, it may also present an undertrained. In this case, the model cannot obtain an admissibly low error value in the training segment. Therefore, the model cannot learn well enough from the training mapping, which will result in poor levels of accuracy in the prediction [76].
Empirical training tests were carried out in the evolved ANFIS-PSO model with three data partitions. They were evaluated with the MAPE statistical method, selecting the data segmentation that lowered the percentage error presented (Table 3). According to the results, 70% of the total data universe was selected for the training stage and the remaining 30% for the predictive phase.
It is essential to highlight that the computational effort registered in each training stage, in the process of 1000 iterations, for the ANFIS model was approximately 625 min, and in the ANFIS-PSO model was about 460 min, reaching the best cost (that in this case, the model measures with the RMSE) for ANFIS of 1.79 and for ANFIS-PSO of 0.747 ( Table 4). The above represented a better convergence of the model optimized with the PSO algorithm over the non-optimized ANFIS model, reaching a percentage decrease of 35.9% in training time and a decrease in RMSE error of 140%.

Results
As previously indicated, the training of the models (training stage) was carried out with 70% of all the recorded data (15 October 2020, to 18 April 2022), representing an accumulated total of 157,809 records per variable collected by the system instrumentation as indicated in the materials and methods section. At the same time, the remaining 30% (predictive stage) was used to carry out the prediction tests (19 April 2022, to 12 December 2022), representing a cumulative total of 67,632 records for predictive tests. The statistical errors between the experimentally recorded PV system power and the estimated results of both models in the predictive stage are presented in Table 4. The optimized ANFIS-PSO model shows a considerable reduction in error evaluation, where we observe a lower average error.
However, the results are presented by comparing the models with the experimental records individually, and then the two models are compared against each other about the experimental data. In Figure 5, the predictive results of the ANFIS hybrid model are plotted against the experimental power records collected from the system from 19 April to 12 December 2022. In Figure 5a, the entire universe of test data was plotted. It was considered necessary since, although visually difficult to analyze in detail, the wide range of predictive capacity that the models presented required. Subsequently, October was selected, where the MAPE result was most satisfactory (Figure 5b), to show the predictive behavior of the model in more detail. Figure 5c shows the month with the lowest MAPE error from 21 to 28 October. In the latter, it is possible to observe the predictive approach of ANFIS in greater detail visually. Under these same parameters, Figure 6 shows a sequence of graphs of the results of the ANFIS-PSO model against the actual power records of the photovoltaic system during the same test period from 19 April to 12 December 2022. In the same way, it starts from the global to the specific, subdividing into sections, selecting, and graphing according to Under these same parameters, Figure 6 shows a sequence of graphs of the results of the ANFIS-PSO model against the actual power records of the photovoltaic system during the same test period from 19 April to 12 December 2022. In the same way, it starts from the global to the specific, subdividing into sections, selecting, and graphing according to the slightest MAPE error registered by the model. In Figure 6a, the entire predictive period of the model is observed. In Figure 6b, June was selected as the month with the lowest MAPE recorded for ANFIS-PSO. Finally, in Figure 6c, for the week from 8 June to 14 June, the monitoring of the Gaussian bells was observed in greater detail. the slightest MAPE error registered by the model. In Figure 6a, the entire predictive period of the model is observed. In Figure 6b, June was selected as the month with the lowest MAPE recorded for ANFIS-PSO. Finally, in Figure 6c, for the week from 8 June to 14 June, the monitoring of the Gaussian bells was observed in greater detail.  Subsequently, both models were compared with experimental results. In this case, the graph of the week of the month with the lowest MAPE error for each model was made. Figure 7a shows the period 15-21 October of the ANFIS model. Later, a non-arbitrary day was selected and graphed within the same weekly range. Figure 7b shows the predictions of both models for 16 October.
Energies 2023, 16, x FOR PEER REVIEW 15 of Subsequently, both models were compared with experimental results. In this cas the graph of the week of the month with the lowest MAPE error for each model was mad Figure 7a shows the period 15-21 October of the ANFIS model. Later, a non-arbitrary da was selected and graphed within the same weekly range. Figure 7b shows the prediction of both models for 16 October. Finally, Figure 8a compares both models from 8 to 14 October, corresponding to th week when ANFIS-PSO presented its lowest measured MAPE error. Finally, Figure 8 shows greater visual detail of the predictive monitoring on 10 June.
To show in detail the behavior of the MAPE errors of both models, Table 5 numer cally describes the MAPE evaluations of the predictions of both models every month (fro 19 April to 12 December, for a total of 237 days). It is displayed visually in Figure 9, whic shows a mapping of the daily dynamics during the periods: (a) 19 April-12 December, ( 1 to 30 June (which showed the slightest MAPE error for the ANFIS-PSO model), and ( 1 to 30 October (where the slightest MAPE error was shown for the ANFIS model). Finally, Figure 8a compares both models from 8 to 14 October, corresponding to the week when ANFIS-PSO presented its lowest measured MAPE error. Finally, Figure 8b shows greater visual detail of the predictive monitoring on 10 June.
To show in detail the behavior of the MAPE errors of both models, Table 5 numerically describes the MAPE evaluations of the predictions of both models every month (from 19 April to 12 December, for a total of 237 days). It is displayed visually in Figure 9, which shows a mapping of the daily dynamics during the periods: (a) 19 April-12 December, (b) 1 to 30 June (which showed the slightest MAPE error for the ANFIS-PSO model), and (c) 1 to 30 October (where the slightest MAPE error was shown for the ANFIS model).      Figure 10 presents the behavior in even greater detail, where a mapping of the MAPE errors was performed in Figure 6a, which indicates the week with the lowest error in the ANFIS model, and shown in Figures 7a and 6b, which indicate the week with the lowest   Figure 6a, which indicates the week with the lowest error in the ANFIS model, and shown in Figures 7a and 6b, which indicate the week with the lowest MAPE error of the ANFIS-PSO model, as shown in Figure 7b. Figures 11a and 12a show a general analysis of the errors for ANFIS and ANFIS-PSO, respectively. It concerns the experimental results during the entire universe of tests, while Figures 11b and 12b show the corresponding histograms with a distribution fit for the ANFIS and ANFIS-PSO models, respectively. Figure 13 shows the correlation plots of each model concerning the experimentally measured output power in the system for the entire universe of test data (30%). A clear linear relationship between the system power records and the values estimated by both models can be observed. Even so, it is possible to appreciate that the optimized model better fits the trend line, giving us a visual idea of the superiority of ANFIS-PSO.
With the data obtained from the models, a numerical integration was carried out using the trapezoidal rule to calculate the electrical energy predicted by the models, which was compared with the real generation data of the system. It is during the four weeks of June and the four weeks of October, which are shown in Table 6.  Figure 7b. Figures 11a and 12a show a general analysis of the errors for ANFIS and ANFIS-PSO, respectively. It concerns the experimental results during the entire universe of tests, while Figures 11b and 12b show the corresponding histograms with a distribution fit for the ANFIS and ANFIS-PSO models, respectively. Figure 13 shows the correlation plots of each model concerning the experimentally measured output power in the system for the entire universe of test data (30%). A clear linear relationship between the system power records and the values estimated by both models can be observed. Even so, it is possible to appreciate that the optimized model better fits the trend line, giving us a visual idea of the superiority of ANFIS-PSO.
With the data obtained from the models, a numerical integration was carried out using the trapezoidal rule to calculate the electrical energy predicted by the models, which was compared with the real generation data of the system. It is during the four weeks of June and the four weeks of October, which are shown in Table 6.     Subsequently, for a more detailed follow-up, the comparison of the electrical energy generated by the PV system and that estimated by both models was made for seven days of the periods of the second week of June and the third week of October, corresponding to the best MAPES values obtained by ANFIS-PSO and ANFIS, respectively. The results are shown in Table 7. Subsequently, for a more detailed follow-up, the comparison of the electrical energy generated by the PV system and that estimated by both models was made for seven days of the periods of the second week of June and the third week of October, corresponding to the best MAPES values obtained by ANFIS-PSO and ANFIS, respectively. The results are shown in Table 7. The results obtained by the models were compared with two other different predictive models of the electrical generation of PV systems reported in the literature. Table 8 shows the general characteristics and results based on the RMSE, MAE, and MAPE statistical metrics. The models from other investigations were designed and trained for a different photovoltaic system with different generation capacities and different geographical conditions, so it is possible that only the percentage metric could be compared quantitatively. It is observed that ANFIS-PSO has the best agreement. The results obtained by the models were compared with two other different predictive models of the electrical generation of PV systems reported in the literature. Table 8 shows the general characteristics and results based on the RMSE, MAE, and MAPE statistical metrics. The models from other investigations were designed and trained for a different photovoltaic system with different generation capacities and different geographical conditions, so it is possible that only the percentage metric could be compared quantitatively. It is observed that ANFIS-PSO has the best agreement.    Finally, an analysis was made of the behavior of the ANFIS-PSO model when trained with 2, 3, or 4 distinct input variables. In this process, the model was trained and tested in four different cases, and the results were evaluated with the statistical metrics and presented in Table 9. As can be observed, the present case with four variables (Case 5) is the one with the lesser MAPE, followed very closely by the cases with two variables (Case 2) and three variables (Case 3), respectively. It is relevant that the cases without solar radiation present higher deviations (Cases 1 and 4). Therefore, it can be concluded that solar radiation is the primary variable to be considered.

Conclusions
In this research, two intelligent models that predict the generation of photovoltaic energy based on the training and learning of historical experimental data of the system were designed: the hybrid model ANFIS and the novel model ANFIS-PSO, which were adjusted through an optimization based on swarm intelligence. The training and testing of the models were carried out with 225,441 records per variable collected from experimental data from the study photovoltaic system for almost 26 months. The long-term predictive capacity of both models was evaluated. In addition, monthly and weekly estimation comparisons were made in cases where the models showed great precision.
The evolved ANFIS-PSO model showed better performance in predicting power generation, both in operating speed and in the minimum error value, where the results of the evaluation of statistics errors were RMSE = 1.79 kW, RMSPE = 3.075, MAE = 0.864 kW, and MAPE = 1.47%, compared to ANFIS-PSO, which obtained RMSE = 0.754 kW, RMSPE = 1.29, MAE = 0.325 kW, and MAPE = 0.556%, respectively. In the analysis of the electricity generation predicted by the models, it was observed that the estimates were considerably close, with both models underestimating or overestimating the actual records with minimal differences in the results over four weeks. The comparison of two different months indicated that both models could estimate the actual electricity generation with high precision, finding minimum percentage differences in each model. Later, once the more specific comparison was made, on the daily analysis of the period of 7 days for both models, it was found that although both models predict very well, the ANFIS-PSO model has a better predictive capacity for daily electricity generation in almost all cases. Therefore, they may be fed with a typical meteorological year (TMY) to predict the long-term production of electrical energy for new photovoltaic plants to facilitate their economic viability.
In analyzing the factors affecting the electric power prediction of a PV system, the solar radiation variable is shown as the predominant variable due to its very high correlation with the output variable. In addition, it is observed that when the solar radiation variable is excluded from the training, the error results are significantly higher than in cases where it is included. In addition, the case with the four available variables in the study was more accurate. It leads to the conclusion that to refine the model's predictive capacity as much as possible, it is essential to have the variables with the best correlation to the output variable for training and an adequate time for historical data collection.
Finally, we suggest future research topics. It would be essential to evaluate the models in other photovoltaic systems under different environmental conditions, even to include variables of physical conditions that affect the performance, such as the accumulated dust on the panels. For this, it is necessary to have a sufficient data record that can be included as an additional input variable. Additionally, it is interesting to study optimization with more optimizing algorithms, such as the ant colony algorithm and the artificial bee colony algorithm, to make several comparisons and record the conditions in which one may be more suitable than the other. All these variants could be beneficial in improving the predictive model.  Data Availability Statement: All data, models, and/or code that support the findings of this study are available from the corresponding author upon reasonable request.

Acknowledgments:
The authors would like to thank the University of Sonora and CINVESTAV for the facilities for studying the photovoltaic system and the support from CONAHCYT, Mexico, through a graduate scholarship with number 480505.

Conflicts of Interest:
The authors declare no conflict of interest.