Machine Learning Based Approaches for Modeling the Output Power of Photovoltaic Array in Real Outdoor Conditions

: It is important to investigate the long-term performances of an accurate modeling of photovoltaic (PV) systems, especially in the prediction of output power, with single and double diode models as the conﬁgurations mainly applied for this purpose. However, the use of one conﬁguration to model PV panel limits the accuracy of its predicted performances. This paper proposes a new hybrid approach based on classiﬁcation algorithms in the machine learning framework that combines both single and double models in accordance with the climatic condition in order to predict the output PV power with higher accuracy. Classiﬁcation trees, k-nearest neighbor, discriminant analysis, Naïve Bayes, support vector machines (SVMs), and classiﬁcation ensembles algorithms are investigated to estimate the PV power under di ﬀ erent conditions of the Mediterranean climate. The examined classiﬁcation algorithms demonstrate that the double diode model seems more relevant for low and medium levels of solar irradiance and temperature. Accuracy between 86% and 87.5% demonstrates the high potential of the classiﬁcation techniques in the PV power predicting. The normalized mean absolute error up to 1.5% ensures errors less than those obtained from both single-diode and double-diode equivalent-circuit models with a reduction up to 0.15%. The proposed hybrid approach using machine learning (ML) algorithms could be a key solution for photovoltaic and industrial software to predict more accurate performances.


Introduction
Due to the high increase of petroleum prices and imposed politics on industrial countries to reduce CO 2 levels, the use of renewable sources to produce energy has become an obligation. Accordingly, different solutions are used to cover energy needs while respecting clean and eco-friendly requirements [1]. Generally, wind, hydro, and solar sources of energy show an appropriate solution ensuring green electricity for diverse industrial and domestic applications. In particular, photovoltaic systems display a good balance between investment cost and performance [2].
The modeling task is a substantial procedure to analyze the electrical performances of the photovoltaic (PV) cell/module/array. Equivalent-circuit models are mainly implemented to predict the long-term potential of the photovoltaic device. Single-diode model (SDMs) and double-diode models (DDMs) represent the most used configurations [3]. Furthermore, the single-diode model is less complicated compared to the double-diode configuration; this is because of the limited number of parameters needed. The SDM requires five while DDM requires seven [4]. To achieve a high

PV Cell Modeling
To evaluate the electrical behavior of the PV device (cell), plenty of equivalent-circuit models are proposed. In the literature, there are two configurations widely used. Namely, the single-and the double-diode model.

Single-diode Model
This model is developed by adding a series and a shunt resistance to the ideal model to represent the losses of the module [16]. By applying the Kirchhoff laws on the scheme in Figure 1, the output current of the PV panel is given by the following Equation [16]: where, I ph is the light-generated current, I os is the diode saturation current, and R s and R sh are respectively the series and the shunt resistance. The value of A depends on the thermal voltage V t = kT q and is expressed as follows: γ is the ideality factor of the diode, and N cell represents the number of cells that compose the PV module.

PV Cell Modeling
To evaluate the electrical behavior of the PV device (cell), plenty of equivalent-circuit models are proposed. In the literature, there are two configurations widely used. Namely, the single-and the double-diode model.

Single-diode Model
This model is developed by adding a series and a shunt resistance to the ideal model to represent the losses of the module [16]. By applying the Kirchhoff laws on the scheme in Figure 1, the output current of the PV panel is given by the following Equation [16]: where, I ph is the light-generated current, I os is the diode saturation current, and R s and R sh are respectively the series and the shunt resistance. The value of A depends on the thermal voltage V t = kT q ⁄ and is expressed as follows:  is the ideality factor of the diode, and N cell represents the number of cells that compose the PV module.

Double-diode Model
As another configuration to model the PV cell, the double-diode is an improvement of what was claimed about the single-diode accuracy [17]. Recently, the DDM has become a primary solution for several authors due to its improvements especially at low-irradiances [4,18]. This model considers two diodes in parallel with the current source as shown in Figure 2, and the output current is given by Equation (3): where, A 1 and A 2 depend respectively on the values of the ideality factor of each diode (γ 1 and γ 2 ) and cell temperature. In addition, I os1 and I os2 are the saturation current of each diode separately.

Double-Diode Model
As another configuration to model the PV cell, the double-diode is an improvement of what was claimed about the single-diode accuracy [17]. Recently, the DDM has become a primary solution for several authors due to its improvements especially at low-irradiances [4,18]. This model considers two diodes in parallel with the current source as shown in Figure 2, and the output current is given by Equation (3): where, A 1 and A 2 depend respectively on the values of the ideality factor of each diode (γ 1 and γ 2 ) and cell temperature. In addition, I os1 and I os2 are the saturation current of each diode separately. The second step of PV cell modeling is to evaluate the unknown parameters of Equations (1) and (3). The number of parameters to determine depends on the used equivalent-circuit model (e.g., five parameters for the single-diode model (I ph , I os , R s , R sh ,), and seven parameters for the double-diode model (I ph , I os1 , I os2 , R s , R sh , 1 , 2 , γ 1 , γ 2 )) [5].  (1) and (3). The number of parameters to determine depends on the used equivalent-circuit model (e.g., five parameters for the single-diode model ( ℎ , , , ℎ , ), and seven parameters for the double-diode model ( ℎ , 1 , 2 , , ℎ ,  1 ,  2 )) [5].
To estimate these unknown parameters, many techniques are proposed using different approaches. Whether it is a numerical, analytical, or metaheuristic based method [19][20][21], most of these approaches use I-V experimental data or datasheet information to simplify the estimation of the parameters . For this reason, the photo-generated current is generally calculated using Equation (4) [22]: where, I sc is the short-circuit current, K i is the temperature coefficient related to I sc , and  and T are respectively the solar irradiance level and cell temperature.

Maximum Power Point
The maximum power point (MPP) represents the optimal operating of the PV module regardless of climate variations [23]. Mathematically at this point, the derivative of PV power with respect to PV voltage is expressed as follows: In order to compute the maximum power point for both adopted equivalent-circuit models, the current expressions in Equations (1) and (2) are calculated using Equation (5). Then, the optimal currents of the single-diode and double-diode models are respectively expressed by the following equations: where, V MPP and I MPP are respectively the PV output voltage and current at the maximum power point. The V MPP and I MPP coordinates are then used to estimate the output power P MPP = * for both adopted equivalent-circuit models.
In order to obtain the maximum power from a PV plant, a maximum power point tracking (MPPT) algorithm controls and adjusts the operating voltage to reach the maximum output power. MPP losses occur when the MPPT is not able to find the MPP rapidly. Typical MPP loss values are To estimate these unknown parameters, many techniques are proposed using different approaches. Whether it is a numerical, analytical, or metaheuristic based method [19][20][21], most of these approaches use I-V experimental data or datasheet information to simplify the estimation of the parameters. For this reason, the photo-generated current is generally calculated using Equation (4) [22]: where, I sc is the short-circuit current, K i is the temperature coefficient related to I sc , and λ and T are respectively the solar irradiance level and cell temperature.

Maximum Power Point
The maximum power point (MPP) represents the optimal operating of the PV module regardless of climate variations [23]. Mathematically at this point, the derivative of PV power with respect to PV voltage is expressed as follows: In order to compute the maximum power point for both adopted equivalent-circuit models, the current expressions in Equations (1) and (2) are calculated using Equation (5). Then, the optimal currents of the single-diode and double-diode models are respectively expressed by the following equations: where, V MPP and I MPP are respectively the PV output voltage and current at the maximum power point. The V MPP and I MPP coordinates are then used to estimate the output power P MPP = V MPP * I MPP for both adopted equivalent-circuit models. In order to obtain the maximum power from a PV plant, a maximum power point tracking (MPPT) algorithm controls and adjusts the operating voltage to reach the maximum output power. MPP losses occur when the MPPT is not able to find the MPP rapidly. Typical MPP loss values are lower than 0.5%. Furthermore, the operating voltage of the PV array depends on the DC cable length, cross-section, and temperature that can lead to current and power losses, and the connection of modules in series can cause the mismatching between the I-V characteristics of the module (mismatch losses). In the present work, when modeling PV, estimation of the MPPT and DC losses are not considered.

Adopted Machine Learning Algorithm
Supervised machine learning (ML) techniques are able to predict the response for a given measurement set of the predictor variables on the base of a model built on a known observation set noted as the training dataset.
Classification algorithms are a type of supervised ML technique in which an algorithm learns to separate the data into specific "classes" in order to predict categorical responses [24]. The most popular classification algorithms include [25]: classification ensembles (boosted trees-AdaBoost)

Classification Trees
The classification tree technique also noted as a decision tree is one of the common approaches applied in data mining to predict the class response for a given observation by using specific predictor variables [26]. The classification tree learning maps the observations as a tree structure to model its target value [27]. In the tree, each node represents a feature and each path corresponds to what is associated. The decision tree learner can identify each node that classifies the best value of the feature within the dataset according to some criterion. The terminal nodes are marked according to the classes into which the instances are to be classified. During the testing, the value of the instance is compared with the value labeled at each path (branch). If the value at the node matches the value of the node then the classification will continue through the path until it meets the terminal node as shown in Figure 3 [28]. The terminal leaf nodes are shown as orange and yellow squares according to the classes. In each tree, the instance is shown in a blue path. In Figure 3a-c the tree predicts the yellow class, unlike in Figure 3d the instance is in the orange class, so the classifier will assign it to the yellow class by a 3 to 1 majority voting.

k-Nearest Neighbors
The k-nearest neighbors algorithm (kNN) is a ML method applied for classification where an instance represents a point in a d-dimensional space and each dimension corresponds to one of the d

k-Nearest Neighbors
The k-nearest neighbors algorithm (kNN) is a ML method applied for classification where an instance represents a point in a d-dimensional space and each dimension corresponds to one of the d features. So, the instances which present the same properties would be close to each other in the d-dimensional space [29]. In order to predict the class, the kNN algorithm finds k nearest instances by computing the distance between them. The predicted class is represented by the minimum distance among instances. The Euclidean distance is usually applied as the distance metric [30,31]. The k-nearest neighbor classification algorithm is listed in Appendix A. Figure 4 shows the kNN classification concept. The green instance will be assigned to the blue class for k = 1. For k = 3 it will be classified as the blue class by a 2 to 1 majority and finally, it will be assigned to the orange class by 3 to 2 cases for k = 5. Therefore, the k-nearest-neighbors classifier assigns to a test sample the majority class of its k-nearest training samples.

k-Nearest Neighbors
The k-nearest neighbors algorithm (kNN) is a ML method applied for classification where an instance represents a point in a d-dimensional space and each dimension corresponds to one of the d features. So, the instances which present the same properties would be close to each other in the ddimensional space [29]. In order to predict the class, the kNN algorithm finds k nearest instances by computing the distance between them. The predicted class is represented by the minimum distance among instances. The Euclidean distance is usually applied as the distance metric [30,31]. The knearest neighbor classification algorithm is listed in Appendix A. Figure 4 shows the kNN classification concept. The green instance will be assigned to the blue class for k = 1. For k = 3 it will be classified as the blue class by a 2 to 1 majority and finally, it will be assigned to the orange class by 3 to 2 cases for k = 5. Therefore, the k-nearest-neighbors classifier assigns to a test sample the majority class of its k-nearest training samples.

Discriminant Analysis
Discriminant analysis (DA) finds a predictive equation based on independent variables to classify the instances into classes [32]. Discriminant analysis is very similar to regression analysis, where the dependent variables become the independent variables in the discriminant analysis. The mathematical formulation is presented in Appendix A. It can be considered as dimensionality reduction technique, reducing the sample space into a smaller dimension while retaining as much information as possible. Discriminant analysis can be distinguished into two categories in according to the boundary between the classes: linear discriminant analysis (LDA) or quadratic discriminant analysis (QDA) [33]. LDA adopts the coordinate axes to transform data by reducing the two-dimensional space into a one-dimensional space using a linear boundary. The QDA can be considered as an extension to the LDA. It classifies two or more classes by a quadratic model as a surface.

Naïve Bayes
Naïve Bayes classification algorithm is one of the most popular statistical learning methods based on the Bayes theorem related to the conditional probability, predicting the most probable class.
Given an instance and its occurring probability P(d), the Bayes theorem says: where: is the probability of the instance d being in the class c i ; P(d|c i ) is the probability of observing d in a domain where c holds; P(c i ) is the prior probability of c i ; The Naïve Bayes classifier computes the probability of each instance for all classes in c and selects the class c i with the highest probability ( Figure A1). Generally, the features are assumed to have a Gaussian probability distribution. When the features do not follow a Gaussian distribution, the kernel density method [34] is applied to estimate the probability distribution. More details are provided in Appendix A.

Support Vector Machines (SVM)
Support vector machines are supervised learning models able to analyze data and learn a classifier [35]. An SVM finds the optimal separating hyperplane as a decision surface to separate the data in different classes. First, the SVM method transforms predictors to high-dimensional feature space and successively solves a quadratic optimization problem to find an optimal hyperplane in order to classify the transformed features into classes [36].
Figure 5 [37] shows the optimal separating hyperplane in two dimensions. The yellow plane divides the support vectors into two classes (red squares and blue dots).

Classification Ensembles
A classification ensemble combines different machine learning techniques models to improve the model performance by decreasing variance (bagging), bias (boosting), or improving predictions (stacking) [38]. One of the most popular ensembles learning algorithms is adaptive boosting (AdaBoost) that uses the boosting method to convert weak learners to strong learners [39]. Given a dataset of N data points, the AdaBoost algorithm firstly initializes the weights for each data point. Then it fits weak classifiers to the data set and selects the one with the lowest weighted classification error. For each iteration, it computes the weight for each weak classifier related to each data point. The final classifier can be expressed as: where, fm represents the m-th weak classifier and is the corresponding weight. Therefore, the final classifier (strong classifier) ( ) is given by a weighted summing of M weak classifiers as Figure 6 shows.

Classification Ensembles
A classification ensemble combines different machine learning techniques models to improve the model performance by decreasing variance (bagging), bias (boosting), or improving predictions (stacking) [38]. One of the most popular ensembles learning algorithms is adaptive boosting (AdaBoost) that uses the boosting method to convert weak learners to strong learners [39]. Given a dataset of N data points, the AdaBoost algorithm firstly initializes the weights for each data point. Then it fits weak classifiers to the data set and selects the one with the lowest weighted classification error. For each iteration, it computes the weight for each weak classifier related to each data point. The final classifier can be expressed as: Electronics 2020, 9, 315 8 of 21 where, f m represents the m-th weak classifier and θ m is the corresponding weight. Therefore, the final classifier (strong classifier) F(x) is given by a weighted summing of M weak classifiers as Figure 6 shows.
Then it fits weak classifiers to the data set and selects the one with the lowest weighted classification error. For each iteration, it computes the weight for each weak classifier related to each data point. The final classifier can be expressed as: where, fm represents the m-th weak classifier and is the corresponding weight. Therefore, the final classifier (strong classifier) ( ) is given by a weighted summing of M weak classifiers as Figure 6 shows.

Methodology
This section presents the adopted methodology to identify the most suitable model between the SDM and the DDM under different levels of solar irradiance and temperature by using the ML classification algorithms.
The data collected from supervisory control and data acquisition (SCADA) of a 113.85 kW P grid-connected PV plant located in southeast Italy (latitude 40 • 37 55 N, longitude 17 • 56 9 E) is adopted to carry out the investigation. The PV system includes 414 polycrystalline silicon PV modules with a nominal power of 275 W. The modules are connected in 23 strings of 18 modules, oriented south, and inclined at a tilt angle of 30 • . Data of the solar irradiation, ambient temperature, and DC power are collected according to the International Standard IEC 61724. A mean value of one hour of measurements relative to solar irradiance of the array, ambient temperature, and DC output power from 1 October 2017 to 20 September 2018 (8760 sample) is considered in the present study. Figure 7 shows the hourly solar irradiance incident on the plane of the array and the output power over one year. The hourly output power increases linearly with the increase of solar irradiance on the tilted plane with a strong correlation (R 2 = 0.9897).

Methodology
This section presents the adopted methodology to identify the most suitable model between the SDM and the DDM under different levels of solar irradiance and temperature by using the ML classification algorithms.
The data collected from supervisory control and data acquisition (SCADA) of a 113.85 kWP gridconnected PV plant located in southeast Italy (latitude 40° 37' 55 N, longitude 17° 56' 9 E) is adopted to carry out the investigation. The PV system includes 414 polycrystalline silicon PV modules with a nominal power of 275 W. The modules are connected in 23 strings of 18 modules, oriented south, and inclined at a tilt angle of 30°. Data of the solar irradiation, ambient temperature, and DC power are collected according to the International Standard IEC 61724. A mean value of one hour of measurements relative to solar irradiance of the array, ambient temperature, and DC output power from 1 October 2017 to 20 September 2018 (8760 sample) is considered in the present study. Figure 7 shows the hourly solar irradiance incident on the plane of the array and the output power over one year. The hourly output power increases linearly with the increase of solar irradiance on the tilted plane with a strong correlation (R 2 = 0.9897). The methods of Ishaque et al. [4] and Chaibi et al. [16] are implemented respectively to extract the parameters for SDM and DDM by MATLAB code. For each measurement of irradiance and ambient temperature, the PV output voltage and current at the maximum power point are computed using Equations (6) and (7) for SDM and DDM, respectively. The methods of Ishaque et al. [4] and Chaibi et al. [16] are implemented respectively to extract the parameters for SDM and DDM by MATLAB code. For each measurement of irradiance and ambient temperature, the PV output voltage and current at the maximum power point are computed using Equations (6) and (7) for SDM and DDM, respectively.
An estimation of global MPP coordinates is found considering that the PV plant consists of 23 strings of 18 modules 275W (N s = 23, N m = 18). Thus, the whole output PV power is computed as N s * N m * P MPP , where the P MPP is computed for each couple of the hourly monitored data of irradiance and ambient temperature by using the PV output voltage and current at the maximum power point in accordance to Equations (6) and (7) for the SDM and DDM, respectively.
The obtained power output is compared to the actual data by the normalized mean bias error (NMBE) as: where, the P model can be P SDM or P DDM and represents the power calculated from SDM and DDM. Six classification algorithms, as shown in Table 1, are chosen to identify which model between the SDM and DDM provide the best performance for a given solar irradiance and temperature. Therefore, each classification algorithm is based on two predictors: irradiance and temperature.  Figure 8 depicts the adopted approach to classify the equivalent-circuit models for a given solar irradiance and temperature and to provide the PV output power with the highest accuracy.
In order to assess the performance of the classification algorithms based on ML and the proposed approach, we introduce the accuracy index, the confusion matrix, the receiver operating characteristic (ROC) curve, and the normalized mean absolute error (NMAE).
The accuracy index of the classification algorithms can be evaluated as: where K(x i ) is the predicted class by the classifier and c i is the i-th class. In other terms, it represents the number of the case in which the predicted class matches the expected class. In the dataset, if the classes are not equally distributed, the classifier cannot be accurate. In order to overcome this limitation, the "cross-validation (CV)" method is applied. It divides the dataset into k equal partitions (k folders), by generating k testing sets and using the remain data as the training set. Then, a classifier is evaluated for k iterations. In the present study, k is set to 5.
A further tool to present the classification algorithm performance is the confusion matrix, noted also as an error matrix. The matrix includes the predicted true/false values and actual true/false values as shown in Figure 9. The prediction is correct for true positive (TP) and true negative (TN) and prediction fails for false negative (FN) and false positive (FP).

Naïve Bayes
Kernel Support Vector Machine (SVM) Gaussian Classification Ensembles Boosted Trees Figure 8 depicts the adopted approach to classify the equivalent-circuit models for a given solar irradiance and temperature and to provide the PV output power with the highest accuracy. In order to assess the performance of the classification algorithms based on ML and the proposed approach, we introduce the accuracy index, the confusion matrix, the receiver operating characteristic (ROC) curve, and the normalized mean absolute error (NMAE).
The accuracy index of the classification algorithms can be evaluated as: where ( ) is the predicted class by the classifier and is the i-th class. In other terms, it represents the number of the case in which the predicted class matches the expected class. In the dataset, if the classes are not equally distributed, the classifier cannot be accurate. In order to overcome this limitation, the "cross-validation (CV)" method is applied. It divides the dataset into k equal partitions (k folders), by generating k testing sets and using the remain data as the training set. Then, a classifier is evaluated for k iterations. In the present study, k is set to 5.
A further tool to present the classification algorithm performance is the confusion matrix, noted also as an error matrix. The matrix includes the predicted true/false values and actual true/false values as shown in Figure 9. The prediction is correct for true positive (TP) and true negative (TN) and prediction fails for false negative (FN) and false positive (FP). Furthermore, it is possible to define some indexes such as the true positive rate (TPR) and true negative rate (TNR), given by: where, TPR represents the proportion of TRUE values that are correctly predicted as TRUE and the TNR is defined as the proportion of FALSE observations that are correctly predicted as FALSE.
Therefore, the overall accuracy is given by: The ROC curve shows a true positive rate versus false positive for different thresholds of the classifier output. It can be used to find the threshold that maximizes the classification accuracy.
In order to evaluate the performance of the implemented classification algorithms in term of PV output power, the predicted and experimental data are used to compute the percentage value of the normalized mean absolute error (NMAE) for each case, as follows: Furthermore, it is possible to define some indexes such as the true positive rate (TPR) and true negative rate (TNR), given by: where, TPR represents the proportion of TRUE values that are correctly predicted as TRUE and the TNR is defined as the proportion of FALSE observations that are correctly predicted as FALSE. Therefore, the overall accuracy is given by: The ROC curve shows a true positive rate versus false positive for different thresholds of the classifier output. It can be used to find the threshold that maximizes the classification accuracy.
In order to evaluate the performance of the implemented classification algorithms in term of PV output power, the predicted and experimental data are used to compute the percentage value of the normalized mean absolute error (NMAE) for each case, as follows: where, N is the number of samples used for the testing step.

Results and Discussion
In order to address the performance of both SDMs and DDMs under different levels of irradiance and temperature, we identify the low values class when the irradiance is below 400 W/m 2 , the medium is between 400 W/m 2 and 800 W/m 2 and the high values class is above 800 W/m 2 . For the temperature changes, the low variations are below 20 • C, the medium are between 20 • C and 40 • C, and the last class is for temperatures above 40 • C. Table 2 includes the normalized mean bias error of PV output power, as defined by Equation (10), using the SDM and the DDM for low, medium, and high classes of solar irradiance and temperature. At low and medium changes of solar irradiance, the DDM exhibits more accuracy with a low value of bias error which explains an overestimation of output power that does not exceed 1.92% compared to SDM (overestimation up to 3.04%). For high irradiance levels, the SDM shows a positive error that means an overestimation of the PV power output, unlike the DDM that shows a negative error (underestimation). In terms of temperature, both models present close behaviors for low and medium temperatures, but SDM shows higher error than DDM for medium temperature only. For high-temperature level, SDM shows positive error (overestimation), unlike DDM, which shows a negative error (underestimation). Therefore, the equivalent-circuit models perform in a different manner under various levels of irradiance and temperature, demonstrating a high influence of climatic conditions on the accuracy of the SDMs and DDMs.
In the next step, six classification algorithms are implemented using 5880 samples, about 70% of the data related to the whole year, for the training and the remaining (about 30%) (2880 samples) for the validation. In particular, the months of February, May, August, and November were chosen to test the models.
In order to validate what was claimed previously about equivalent-circuit models classifications according to climatic variations, the predicted power of the SDM and DDM are plotted and ranged in Figure 10. The PV power values are classified using different algorithms and this is for a large variation of solar irradiance and temperature. As seen in this figure, most predictions are correct, and it is clear the power increases linearly with irradiance and temperature.

Performance of the Classification Algorithms during the Training
The performance of the classification algorithms was investigated by using the classification confusion matrix and receiver operating characteristic (ROC) curve.
test the models.
In order to validate what was claimed previously about equivalent-circuit models classifications according to climatic variations, the predicted power of the SDM and DDM are plotted and ranged in Figure 10. The PV power values are classified using different algorithms and this is for a large variation of solar irradiance and temperature. As seen in this figure, most predictions are correct, and it is clear the power increases linearly with irradiance and temperature.

Performance of the Classification Algorithms During the Training
The performance of the classification algorithms was investigated by using the classification confusion matrix and receiver operating characteristic (ROC) curve.
In the confusion matrix, the rows correspond to the predicted class (output class) and the columns correspond to the true class (target class). The cm (i,j) is the number of samples (or percentage of samples) whose target is the i-th class that is classified as j. It represents the percentages of all the examples predicted to belong to each class that is correctly and incorrectly classified. These metrics are often called the precision (or positive predictive value) and false discovery rate, respectively.
In Figure 11, the cells show the percentage of correct classifications by the trained network. In the case of course tree, 91% of samples are correctly classified as the DDM and similarly, 75% of In the confusion matrix, the rows correspond to the predicted class (output class) and the columns correspond to the true class (target class). The cm (i,j) is the number of samples (or percentage of samples) whose target is the i-th class that is classified as j. It represents the percentages of all the examples predicted to belong to each class that is correctly and incorrectly classified. These metrics are often called the precision (or positive predictive value) and false discovery rate, respectively.
In Figure 11, the cells show the percentage of correct classifications by the trained network. In the case of course tree, 91% of samples are correctly classified as the DDM and similarly, 75% of samples are correctly classified as the SDM. Cubic KNN, Gaussian SVM, and boosted trees show the same trend as the course tree algorithm for TPR. Quadratic discriminant and kernel Naïve Bayes present lower TPR (87%) for thr DDM which means that 13% of samples are incorrectly predicted as a SDM. In the case of SDM class the TPR is higher than the corresponding one of course tree, cubic KNN, Gaussian SVM, and boosted trees (84% for quadratic discriminant and 82% for kernel Naïve Bayes). Therefore, the last two models fail for 16% and 18% of samples, respectively. The optimal operating points on the ROC curve for each classification model are plotted in Figure 12. The ROC curve for Naïve Bayes is generally lower than the other two ROC curves, which indicates worse in-sample performance than the other two classifier methods. By comparison of the area under the curve (AUC) for all classifiers, classification tree and SVM have the lowest AUC measure, meanwhile Naïve Bayes and quadratic discriminant have the highest AUC value. Therefore, the classification tree and SVM present high performance for the considered sample data.   The optimal operating points on the ROC curve for each classification model are plotted in Figure 12. The ROC curve for Naïve Bayes is generally lower than the other two ROC curves, which indicates worse in-sample performance than the other two classifier methods. By comparison of the area under the curve (AUC) for all classifiers, classification tree and SVM have the lowest AUC measure, meanwhile Naïve Bayes and quadratic discriminant have the highest AUC value. Therefore, the classification tree and SVM present high performance for the considered sample data.  Table 3 summarizes the performance of the classification algorithms during the training in terms of TPR, AUC, and accuracy as defined by Equation (11). In Table 3, it is clear that the SVM classifier presents the highest accuracy with a value of 87.5%. However, the Naïve Bayes provides the lowest accuracy with a mean value of 86%.
In order to assess the performance of the classification algorithms in terms of PV power output predicted, a test dataset related to the months of February, May, August, and November was chosen for a total of 2880 samples. Figure 13 shows the linear regression of actual power (targets) relative to predicted power (outputs). High R values demonstrate that the ML classification algorithms are very suitable to predict the output power based on the hybrid modeling between SDM and DDM.
The NMAE for the SDM and DDM related to the testing dataset of 2880 samples was computed of 1.634% and 1.523% respectively, as Table 4 shows. In the same table can also be observed that NMAEs average value for the classification algorithms is 1.48%. Therefore, the ML classification algorithms can improve the accuracy of the PV modeling based on the traditional SDM and DDM models to the different solar irradiance and temperature. The potential of error reduction is estimated between 0.04% and 0.15%  Table 3 summarizes the performance of the classification algorithms during the training in terms of TPR, AUC, and accuracy as defined by Equation (11). In Table 3, it is clear that the SVM classifier presents the highest accuracy with a value of 87.5%. However, the Naïve Bayes provides the lowest accuracy with a mean value of 86%.
In order to assess the performance of the classification algorithms in terms of PV power output predicted, a test dataset related to the months of February, May, August, and November was chosen for a total of 2880 samples. Figure 13 shows the linear regression of actual power (targets) relative to predicted power (outputs). High R values demonstrate that the ML classification algorithms are very suitable to predict the output power based on the hybrid modeling between SDM and DDM. The NMAE for the SDM and DDM related to the testing dataset of 2880 samples was computed of 1.634% and 1.523% respectively, as Table 4 shows. In the same table can also be observed that NMAEs average value for the classification algorithms is 1.48%. Therefore, the ML classification algorithms can improve the accuracy of the PV modeling based on the traditional SDM and DDM models to the different solar irradiance and temperature. The potential of error reduction is estimated between 0.04% and 0.15%

Conclusions
Prediction of PV module performances becomes an important task in order to anticipate the long-term functioning of PV systems. In literature, the PV modeling techniques adopt the equivalent-circuit models whose performances are influenced by climatic conditions. This paper presents a classification method of the single-diode and double-diode equivalent-circuit models under real operating conditions of irradiance and temperature.
A hybrid approach based on the ML classification algorithms is proposed to combine SDM and DDM according with the corresponding accuracy. Six classification algorithms, such as classification trees, k-nearest neighbors, discriminant analysis, Naïve Bayes, support vector machines, and classification ensembles were implemented in order to identify which model between the SDM and DDM provides an estimation of the output power of a PV array with higher accuracy for a given solar irradiance and temperature. The algorithms were fitted using the hourly measurements of solar irradiance on the plane of the array and ambient temperature over one year and related to a Poly-Si 113.85 kWp grid-connected PV plant located in southeast Italy, characterized by the Mediterranean climate. High accuracy demonstrates the high potential of six classification algorithms in the PV power predicting. During the training process, the support vector machines classifier presents the highest TPR of 91% for DDM and an accuracy with a value of 87.5%. However, the Naïve Bayes provides the lowest values of TPR (87%) and accuracy (86%). In the validation phase, the performance assessment in terms of NMAE demonstrates that the hybrid approach using ML classifiers presents lower errors compared to the use of only SDMs or DDMs with an error reduction up to 0.15%. This error achieved the lowest value for the k-nearest neighbors algorithm with a value of 1.469%.

Conflicts of Interest:
The authors declare no conflicts of interest.

Appendix A
The appendix contains the analytical expressions of the classification methods.

Appendix A.1 k-Nearest Neighbors
Let x and x be a training and test sample respectively, and c and c be the true class of a training sample and the predicted class for a test sample, the Euclidean distance between a test sample and the training samples.
In the kNN classification for k = 1 the predicted class of test sample x is set equal to the true class c of its nearest neighbor, where m i is the nearest neighbor to x if the distance is: For k-nearest neighbors, the predicted class of test sample x is set equal to the most frequent true class among k nearest training samples [31].

Appendix A.2 Discriminant Analysis
Given p variables, K classes, and N k , the total number of observations for each class S t , S w , and Sa, is defined as follows [32]: where, X ki represents the i-th observation in the k-th class, M is a vector of the mean value for each class, and M k is the vector of means of observations in the k-th class. The discriminant function is defined as the weighted average of the independent variables. The weights can be found by solving the eigenvectors V as where, the elements of eigenvectors are the canonical coefficients and the correlations between the independent variables and the canonical variates are given by: where, Vj are the elements of V and Wj are the elements of W. The within-group covariance matrix, W, is given by:

Naïve Bayes
Given an instance and its occurring probability p(d), the Bayes theorem says: where: P(c i |d) is the probability of the instance d being in the class c i ; P(d|c i ) is the probability of observing d in a domain where c holds; P(c i ) is the prior probability of c i .
It is assumed that all instances show an independent distribution and all classes occur with the same probability P(c i ) = P(c j ), P(d|c i ) can be simplified as follows: where each P(d n |c i ) and P(c i ) can be estimated by statistical analysis of features and classes of the training dataset. The Naïve Bayes classifier computes the probability of each instance for all classes in C and select the class cj with the highest probability P(c i |d), denoted as c MAP and noted as "maximum a posteriori (MAP)" class: Generally, the features are assumed to have a Gaussian probability distribution as follows: The mean µ c is the average of all values of the feature found in the dataset D. When the features do not follow a Gaussian distribution, the kernel density method [34] is applied to estimate the probability distribution, as follows: where, j is the j-th element of the dataset D m ⊂ D given by m samples. The kernel method performs m estimation of the Gaussian probability, unlike the probability which is evaluated only once using Equation (A12). In the present study, the kernel-based Naïve Bayes method is used to estimate the probability distribution.
It is assumed that all instances show an independent distribution and all classes occur with the same probability P(ci) = P(cj), ( | ) can be simplified as follows: Generally, the features are assumed to have a Gaussian probability distribution as follows: The mean µc is the average of all values of the feature found in the dataset D. When the features do not follow a Gaussian distribution, the kernel density method [34] is applied to estimate the probability distribution, as follows: where, j is the j-th element of the dataset ⊂ given by m samples. The kernel method performs m estimation of the Gaussian probability, unlike the probability which is evaluated only once using Equation (A12). In the present study, the kernel-based Naïve Bayes method is used to estimate the probability distribution.

Support Vector Machines
Given a training set of N data points, = { , } =1 where, ℝ is the k-th input data and ℝ is the k-the output data, the support vector method constructs a classifier as follows [35]:

Appendix A.4 Support Vector Machines
Given a training set of N data points, D N = x k , y k N k=1 where, x k R d is the k-th input data and y k R is the k-the output data, the support vector method constructs a classifier as follows [35]: where, α k R are positive constant and b R are a constant. The term Ψ(x, x k ) can be a linear, polynomial, exponential function. The classifier is constructed as: where, α k R are the Lagrange multipliers. The optimal conditions are: So, the solution in matrix notation is: Applying the Mercer's theorem Ω k j = y k y j ϕ T (x k )ϕ x j = K x k , x j k, j = 1 . . . N (A20) where, K x k , x j is the kernel matrix. Hence, the classifier in Equation (A14) is found by solving the linear set of Equations (A19) and (A20) instead of quadratic programming.
In the present study the radial basis function (RBF) kernel is used and defined as: where, σ is a tuning parameter.