Determination of Transformers’ Insulating Paper State Based on Classiﬁcation Techniques

: The continuity of transformer operation is very necessary for utilities to maintain a continuity of power ﬂow in networks and achieve a desired revenue. Most failures in a transformer are due to the degradation of the insulating system, which consists of insulating oil and paper. The degree of polymerization (DP) is a key detector of insulating paper state. Most research in the literature has computed the DP as a function of furan compounds, especially 2-furfuraldehyde (2-FAL). In this research, a prediction model was constructed based on some of most periodical tests that were conducted on transformer insulating oil, which were used as predictors of the insulating paper state. The tests evaluated carbon monoxide (CO), carbon dioxide (CO 2 ), breakdown voltage (VBD), interfacial tension (IF), acidity (ACY), moisture (M), oil color (OC), and 2-furfuraldehyde (2-FAL). The DP, which was used as the key indicator for the paper state, was categorized into ﬁve classes labeled 1, 2, 3, 4, and 5 to express the insulating paper normal aging rate, accelerating aging rate, excessive aging danger zone, high risk of failure, and the end of expected life, respectively. The classiﬁcation techniques were applied to the collected data samples to construct a prediction model for the insulating paper state, and the results revealed that the ﬁne tree was the best classiﬁer of the data samples, with a 96.2% prediction accuracy.


Introduction
The power transformer is one of the most important assets in a power system. Malfunctions often occur in a transformer insulating system due to electrical and thermal stresses, and then electrical utilities incur tremendous losses due to the switching off operation of the transformer to repair its faults [1][2][3][4]. The health index (HI) is an indicator to the transformer condition. It is computed based on the electrical, physical, and chemical parameters of the oil and insulating paper, and it is related to the ability of the transformer to withstand failure [3]. To detect and analyze transformer faults, some necessary tests must be performed on insulating oil samples to interpret the source of their faults. The test categories are electrical, physical, and chemical. These tests include those for dissolved gases, especially carbon monoxide (CO) and carbon dioxide (CO 2 ) in part per million (ppm), the breakdown voltage (VBD) that measures the dielectric strength of the oil in kV/cm, the interfacial tension (IF) in millinewton per m, the oil acidity (ACY) in potassium hydroxide/gm (KOH/g), the moisture content (M) in ppm (which dramatically reduce the strength of the insulating system), the oil color (OC) that refers to the decomposing of the oil, and the dissolved furan compounds (especially 2-furfuraldehyde (2-FAL)) in ppm. The influence of these test parameters on the degree of polymerization (DP), which is the key indicator for insulating paper condition, has been investigated in several publications [5][6][7][8].
The degradation of insulating paper can be evaluated by a DP measurement, where an actual insulating paper sample is claimed from a transformer to measure the DP while considering international electrotechnical commission IEC 60450-2007 [9], but this is impractical due to the difficulty of switching off a transformer from its service. Several DP

Experimental Work
In this section, the important tests conducted on insulating oil to investigate the state of insulating paper of the power transformer are explained. All oil tests have been explained in detail in the literature. Most important tests and their procedures, such as the dissolved gas analysis (DGA) test that is used to determine the concentration of the dissolved gas (especially CO and CO 2 , which generate with the degradation of the insulating paper), have been illustrated. The determination of the furan compound dissolved in the oil, especially 2-FAL, has also been explained.
The chromatographic analysis of dissolved gases in transformer oils is considered to the most common method for detecting a very small amount of dissolved gases in insulating oils. This analysis indicates the condition of the transformer and the beginning of any faults that have occurred so that the transformers can be preserved, avoid an increase of the faults, and be repaired before the complete breakdown of the transformer. The accuracy of the results of a chromatographic analysis depends on the method of drawing the oil Processes 2021, 9,427 3 of 12 sample from the transformer, extracting the dissolved gases from it, and adjusting the analyzer device. The recommendation is to do this analysis at the beginning of transformer operation, thus making the test results a reference. A chromatographic analysis is defined as a physical separation method where the components to be separated are distributed between two layers; one is the stationary phase, and the other is the mobile phase, which passes through the stationary layer [22]. The procedures of DGA in insulating oil using gas chromatography (GC) is explained in American Society for Testing and Materials ASTM D3612-2 [23]. GC consists of a carrier gas source, a regulator for pressure, flow meters, a single injection port, and chromatography columns, detectors, recording integrators, or recorders. GC has the means to control and measure temperatures of an adsorption column, an inlet port, and a detector up to 65.5 • C.
Here, oil samples were prepared and placed in special glass containers (vials) by a sampling device and then placed into an autosampler unit, which started analyzing the samples one by one after inserting them into the oven at a temperature of 80 • C. The dissolved gases were extracted from the oil sample inside the vials by increasing the temperature with continuous movement of the sample, which was left in the oven for 30 min. Then, the extracted gases were injected into the chromatography device (GC), where they were analyzed before the curves for each sample were drawn separately [23]. Figure 1 depicts the results analysis of the GC, including the chart that illustrates the time required to extract each gas and its concentration in ppm. Each of the dissolved gases appears at the peaks of the curves within a minute. The table in the screenshot also illustrates the percent of each gas as a percentage of the concentration of the total gases. Processes 2021, 9,427 3 of 12 the analyzer device. The recommendation is to do this analysis at the beginning of transformer operation, thus making the test results a reference. A chromatographic analysis is defined as a physical separation method where the components to be separated are distributed between two layers; one is the stationary phase, and the other is the mobile phase, which passes through the stationary layer [22]. The procedures of DGA in insulating oil using GC is explained in American Society for Testing and Materials ASTM D3612-2 [23]. GC consists of a carrier gas source, a regulator for pressure, flow meters, a single injection port, and chromatography columns, detectors, recording integrators, or recorders. GC has the means to control and measure temperatures of an adsorption column, an inlet port, and a detector up to 65.5 °C.
Here, oil samples were prepared and placed in special glass containers (vials) by a sampling device and then placed into an autosampler unit, which started analyzing the samples one by one after inserting them into the oven at a temperature of 80 °C. The dissolved gases were extracted from the oil sample inside the vials by increasing the temperature with continuous movement of the sample, which was left in the oven for 30 min. Then, the extracted gases were injected into the chromatography device (GC), where they were analyzed before the curves for each sample were drawn separately [23]. Figure 1 depicts the results analysis of the GC, including the chart that illustrates the time required to extract each gas and its concentration in ppm. Each of the dissolved gases appears at the peaks of the curves within a minute. The table in the screenshot also illustrates the percent of each gas as a percentage of the concentration of the total gases. To measure the furanic compounds, high performance liquid chromatography (HPLC), was used. The procedures to measure furanic compounds complied with ASTM D5837 [24]. The preparation of the samples was as follows. i.
We weighed out of 0.1 g of the five furan compounds. To measure the furanic compounds, high performance liquid chromatography (HPLC), was used. The procedures to measure furanic compounds complied with ASTM D5837 [24]. The preparation of the samples was as follows. i.
We weighed out of 0.1 g of the five furan compounds. ii.
These were dissolved in in toluene and volumetrically diluted to 100 mL. iii.
These were thoroughly mixed so that all five furan compounds were completely dissolved. iv.
Then, 1 mL of the stock solution was volumetrically diluted to 1 L using new uninhibited electrical insulating oil of the original mineral. v.
This solution of the furan compounds in oil became a concentration of about 1 mg/L (ppm) or 1000 mg/L (ppb). vi.
We volumetrically prepared 1, 0.5, and 0.25 ppm from the 1 mg/L standard solution, as per requirements. vii.
The standard solution was stored in a clean and dark plastic bottle, not in a glass bottle. viii.
Each of the five furan compounds-toluene, new inhibited transformer oil, acetonitrile, hexane, and water-were prepared according to HPLC grade with vacuum manifold, Si cartridge liners, and 0.5 mm filters.
The procedures to extract the solid phase extraction (SPE) were as follows.
i. We inserted the SPE column into vacuum manifold and passed 5 mL of hexane through each SPE column under vacuum. We did not dry the column. ii.
We mixed 10 mL of the oil sample (specimen) with 10 mL of hexane and passed it through the SPE column at a rate of no faster than 3 mL/min. iii.
We passed 20 mL of hexane through the SPE column to raise out residual oil and dried the column under vacuum for 5 minutes before discarding all elutes. iv.
The elute retained compounds from the SPE column using 10 mL of an acetonitrile/water (20:80) mixture composed of the same properties as in the HPLC system mobile phase. v.
We collected the first 10 mL of eluting from the SPE column. vi.
If the elute was cloudy, we filtered it with 0.5 mm filters. vii.
We placed the elute in a 2 mL vial, put it in the autosampler, and then ran the HLPC to analyze the sample.
The results of the test samples were collected from the Saudi Electricity Company. All tests were conducted in a chemical laboratory with experts and with modern test devices. A total of 131 test data samples were collected, including the test parameters of CO, CO 2 , VBD, IF, ACY, M, OC, 2-FAL, and the DP. The classification of the insulating paper state was categorized based on the value of the DP, as seen in Table 1. The distribution of the 131 dataset samples is as illustrated in Table 2. In this table, a total of 86 data samples were categorized as category 1 that referred to the normal aging rate, 15 for accelerating aging rate, 7 for excessive aging danger zone, 3 for high risk of failure, and 20 for the end of expected life. Table 3 depicts the test results of some samples in this study. Table 2. Distribution and number of the data samples that were used in this study.

Classification Techniques
The used classification was supervised machine learning, where a constructed classification model learns to classify new observations based on the training process of trained samples. Several classification algorithms can be used in classification learner applications in MATLAB software. The classification process depends on the feature data and their corresponding responses. The different classifiers in classification learner application in MATLAB are decision trees (DTs), discriminant analysis (DA), support vector machine (SVM), logistic regression (LR), nearest neighbors (KNNs), naïve Bayes (NB), and ensemble classification (ENC). A constructed model can be exported to a workspace to test new observations and compute model accuracy based on testing samples.

Decision Trees (DTs)
The decision trees' classifiers predict the response to the new observations based on the training process of the classifier on training data samples containing the feature variables and the corresponding responses. The detection process in the decision tree follows the decisions from the beginning nodes (roots) down to a leaf node, which includes a response of true or false. Figure 2 depicts the construction of the decision tree, which consists of the root node (R), the internal nodes (A), and the leaf nodes (B), and all nodes are connected via branches. The root node has logic statements that help to determine the flow of the decisions, and each internal node identifies a new decision path based on new logic statements before reaching to the leaf node that expresses the final decision or the predicted response by considering a numeric class [25,26].

Classification Techniques
The

Decision Trees (DTs)
The decision trees' classifiers predict the response to the new observa the training process of the classifier on training data samples containing th ables and the corresponding responses. The detection process in the decisi the decisions from the beginning nodes (roots) down to a leaf node, which sponse of true or false. Figure 2 depicts the construction of the decision tr sists of the root node (R), the internal nodes (A), and the leaf nodes (B), an connected via branches. The root node has logic statements that help to flow of the decisions, and each internal node identifies a new decision path logic statements before reaching to the leaf node that expresses the final d predicted response by considering a numeric class [25,26].

Discriminant Analysis (DA) or Fisher Discriminant
DA is a method that classes generate data based on different Gaussian In this method, the fitting functions evaluate the Gaussian distribution p each class. It is considered to be a description and predictive classifier tha tors separating two groups based on a set of features in which the group known [27,28]. In regression techniques, a real value expresses the output minant analysis, the output refers to the class label. Moreover, the discrim

Discriminant Analysis (DA) or Fisher Discriminant
DA is a method that classes generate data based on different Gaussian distributions. In this method, the fitting functions evaluate the Gaussian distribution parameters for each class. It is considered to be a description and predictive classifier that seeks the factors separating two groups based on a set of features in which the group membership is known [27,28]. In regression techniques, a real value expresses the output, but in Processes 2021, 9, 427 6 of 12 discriminant analysis, the output refers to the class label. Moreover, the discriminant analysis can be linear and tries to find the line to separate between different categories, or it can use different curve configurations, as shown in Figure 3. ocesses 2021, 9,427 can be linear and tries to find the line to separate between different catego use different curve configurations, as shown in Figure 3.

Support Vector Machine (SVM)
An SVM is a type of machine learning that is used to separate two cl points by a hyperplane, which satisfies the maximum distance between the p class. Therefore, new observations can be accurately classified. The points ou perplane margin belong to different classes. The number of input features dimensions of the hyperplane. A hyperplane is both a line that can separa features and a plane that can be used to separate among three features; wh features increase to more than three features, the separation process beco Support vectors refer to the points close to the hyperplane and have a great position and orientation of the hyperplane; they can be used to maximize th classifier. Support vectors are the main points that are needed to set up an Figure 4 depicts the margin condition that can be built by support vectors, w margin indicates the ability of the SVM model to provide good classificat with new observations [29,30].

Support Vector Machine (SVM)
An SVM is a type of machine learning that is used to separate two classes of data points by a hyperplane, which satisfies the maximum distance between the points of each class. Therefore, new observations can be accurately classified. The points outside the hyperplane margin belong to different classes. The number of input features identifies the dimensions of the hyperplane. A hyperplane is both a line that can separate two input features and a plane that can be used to separate among three features; when the input features increase to more than three features, the separation process becomes difficult. Support vectors refer to the points close to the hyperplane and have a great effect on the position and orientation of the hyperplane; they can be used to maximize the margin of a classifier. Support vectors are the main points that are needed to set up an SVM model. can be linear and tries to find the line to separate between different categories, or it can use different curve configurations, as shown in Figure 3.

Support Vector Machine (SVM)
An SVM is a type of machine learning that is used to separate two classes of data points by a hyperplane, which satisfies the maximum distance between the points of each class. Therefore, new observations can be accurately classified. The points outside the hyperplane margin belong to different classes. The number of input features identifies the dimensions of the hyperplane. A hyperplane is both a line that can separate two input features and a plane that can be used to separate among three features; when the input features increase to more than three features, the separation process becomes difficult Support vectors refer to the points close to the hyperplane and have a great effect on the position and orientation of the hyperplane; they can be used to maximize the margin of a classifier. Support vectors are the main points that are needed to set up an SVM model

Ensemble Trees (ENT)
Ensemble methods are used to take advantage of decision trees and to reduce overfit, although they may sometimes become very complex. Decision trees are used in classification and regression operations, but their most common use is for classification operations.

Ensemble Trees (ENT)
Ensemble methods are used to take advantage of decision trees and to reduce overfit, although they may sometimes become very complex. Decision trees are used in classification and regression operations, but their most common use is for classification operations. They are non-parameterized, do not include any assumptions about how the input data are distributed, and depend only on the input data for the input and response. They are suitable when there are a lot of data but there is not a lot of information surrounding that data. The results of a decision tree can be interpreted and thus used in constructing models to make inferences about data-not just to make predictions. Ensemble methods are quick due to their simplicity and are useful for initial understanding data. With decision trees, data processing is easy because there is no need to scale the data since the splits that occur at each node are only associated with one feature at a time and there is no need to perform coding operations for the splits, features, and categorizations. The two most common ensemble techniques are bagging and boosting ensemble trees. Boosting models are better than bagging models if the hyperparameters are precisely adjusted, but they consume more time in the classification process. Bagging models prevent overfitting even though they may not give better bias. Additionally, the boosting models generate models with few errors but are prone to overfitting because a single model that modifies itself is created, and the bagging models create multiple models that are parallel to each other [31][32][33]. Figure 5 illustrates the difference between bagging and boosting, where the bagging generates a parallel model for the classification process to prevent overfitting and the boosting generates only one model that is improved every time to reduce classification errors. distributed, and depend only on the input data for the input and response. They are suitable when there are a lot of data but there is not a lot of information surrounding that data. The results of a decision tree can be interpreted and thus used in constructing models to make inferences about data-not just to make predictions. Ensemble methods are quick due to their simplicity and are useful for initial understanding data. With decision trees, data processing is easy because there is no need to scale the data since the splits that occur at each node are only associated with one feature at a time and there is no need to perform coding operations for the splits, features, and categorizations. The two most common ensemble techniques are bagging and boosting ensemble trees. Boosting models are better than bagging models if the hyperparameters are precisely adjusted, but they consume more time in the classification process. Bagging models prevent overfitting even though they may not give better bias. Additionally, the boosting models generate models with few errors but are prone to overfitting because a single model that modifies itself is created, and the bagging models create multiple models that are parallel to each other [31][32][33]. Figure 5 illustrates the difference between bagging and boosting, where the bagging generates a parallel model for the classification process to prevent overfitting and the boosting generates only one model that is improved every time to reduce classification errors.

Classification Results and Discussions
For the first table, the data were prepared in columns to include the test results for eight test measures of CO, CO2, VBD, IF, ACY, M, OC, and 2-FAL, and the response (DP) was scaled as classes that refer to the state of insulating paper. According to the difference of the eight test inputs in units, the standardization of all data input was very crucial.
Data standardization is used to convert data to a common format to be easy to process and analyze. Data can be collected from different data sources, which may lead to problems if they are not standardized, thus resulting in difficulties in the future, especially with data regarding dashboards, visualization, decision-making… etc. Therefore, for a good training process, the standardization process had to be performed for the data in this study. The standard score, as shown in Equation (1), was used to standardize the data: where Xst is the standardized value, X is the actual value, m is the mean of each input column, and s is the standard deviation in each input column.

Classification Results and Discussion
For the first table, the data were prepared in columns to include the test results for eight test measures of CO, CO 2 , VBD, IF, ACY, M, OC, and 2-FAL, and the response (DP) was scaled as classes that refer to the state of insulating paper. According to the difference of the eight test inputs in units, the standardization of all data input was very crucial.
Data standardization is used to convert data to a common format to be easy to process and analyze. Data can be collected from different data sources, which may lead to problems if they are not standardized, thus resulting in difficulties in the future, especially with data regarding dashboards, visualization, decision-making . . . etc. Therefore, for a good training process, the standardization process had to be performed for the data in this study. The standard score, as shown in Equation (1), was used to standardize the data: where X st is the standardized value, X is the actual value, m is the mean of each input column, and s is the standard deviation in each input column. The classification techniques were applied to the prepared standardized data, and the results of each technique, based on 10-fold cross-validation, are illustrated in Table 4. The cross-validation statistical method was used as an optional tool in the classifier learners to estimate the ability and robustness of the machine learning models. It divided the data samples into a partition to use for the training and testing processes of the constructed model. The partition process was randomly performed to turn the data samples into k equal-size subsamples in which a single subsample was retained as validation data for testing the model and the remaining k-1 subsamples were used only for the training process. This process was repeated k-times, and each subsample was used as a validation sample and the others were used for training. The results of testing each subsample were averaged to provide the accuracy of the model. Then, each model's merit was decided based on utilizing all data samples for both training and validation processes [34]. For the classification learner to solve the classification problem, stratified k-fold cross-validation was used so that the folds randomly included the same features of each categorized class. Figure 6 depicts the classification accuracy of the decision tree classifiers, i.e., 96.2%. The scatter plot indicates that the distribution of all observations (131 samples) explained the correct and incorrect detection class. The accuracy results were based on the eight predictors of the data samples (CO, CO2, VBD, IF, ACY, M, OC, and 2-FAL) and the corresponding paper state class (which was based on the DP magnitude, as a response). The classification accuracy was based on the averaging of the 10-fold cross-validation method results. The classification techniques were applied to the prepared standardized data, and the results of each technique, based on 10-fold cross-validation, are illustrated in Table 4. The cross-validation statistical method was used as an optional tool in the classifier learners to estimate the ability and robustness of the machine learning models. It divided the data samples into a partition to use for the training and testing processes of the constructed model. The partition process was randomly performed to turn the data samples into k equal-size subsamples in which a single subsample was retained as validation data for testing the model and the remaining k-1 subsamples were used only for the training process. This process was repeated k-times, and each subsample was used as a validation sample and the others were used for training. The results of testing each subsample were averaged to provide the accuracy of the model. Then, each model's merit was decided based on utilizing all data samples for both training and validation processes [34]. For the classification learner to solve the classification problem, stratified k-fold cross-validation was used so that the folds randomly included the same features of each categorized class. Figure 6 depicts the classification accuracy of the decision tree classifiers, i.e., 96.2%. The scatter plot indicates that the distribution of all observations (131 samples) explained the correct and incorrect detection class. The accuracy results were based on the eight predictors of the data samples (CO, CO2, VBD, IF, ACY, M, OC, and 2-FAL) and the corresponding paper state class (which was based on the DP magnitude, as a response). The classification accuracy was based on the averaging of the 10-fold cross-validation method results.    Figure 7 illustrates the confusion matrix of the classification accuracy of the decision tree classifier, which provided the best classification accuracy among all trained classifiers in the classification learner tool. Figure 7a shows the correct number of classifications in each classified class; for class 1, a total number of 86/86 samples were correctly classified, and for class 3, a total number of 6/7 samples were correctly classified. Figure 7b illustrates the classification accuracy of each class, where it can be seen that the predicted classification accuracy of class 1 was 100% (86/86 corrected samples), the predicted correct number of classified samples was 14/15 with a 93% classification accuracy for class 2; the false predicted classification accuracy of this class was 1/15 with a 7% classification accuracy. For class 3, the positive predictive accuracy was 60% (6 correct from a total number of 10 samples that were predicted for class 3) and the false predictive accuracy was 40% (4 wrong samples of the total number of 10 samples that were predicted for class 3). Figure 8 shows the receiver operating characteristic (ROC) curve of the positive class 1 that shows the performance of the classifier at the threshold of classification. The area under the curve (AUC) refers to the performance across all possible classification thresholds. The AUC indicates the 100% correct predictive classification of class 1, so the AUC was 1.
Processes 2021, 9,427 Figure 7 illustrates the confusion matrix of the classification accuracy of the d tree classifier, which provided the best classification accuracy among all trained cla in the classification learner tool. Figure 7a shows the correct number of classifica each classified class; for class 1, a total number of 86/86 samples were correctly cla and for class 3, a total number of 6/7 samples were correctly classified. Figure 7b ill the classification accuracy of each class, where it can be seen that the predicted cla tion accuracy of class 1 was 100% (86/86 corrected samples), the predicted correct of classified samples was 14/15 with a 93% classification accuracy for class 2; t predicted classification accuracy of this class was 1/15 with a 7% classification ac For class 3, the positive predictive accuracy was 60% (6 correct from a total numb samples that were predicted for class 3) and the false predictive accuracy was wrong samples of the total number of 10 samples that were predicted for class 3) 8 shows the receiver operating characteristic (ROC) curve of the positive class 1 tha the performance of the classifier at the threshold of classification. The area under th (AUC) refers to the performance across all possible classification thresholds. Th indicates the 100% correct predictive classification of class 1, so the AUC was 1.   Figure 9 illustrates the confusion matrix of the ensemble bagged tree classifier the predictive accuracy of the classes was 93.1%, thus making this the best second fier in the study. Figure 9a depicts the number of correct classification samples f class, e.g., the number of correctly predicted samples for class 3 was 3/7 samples  Figure 7 illustrates the confusion matrix of the classification accuracy of the decision tree classifier, which provided the best classification accuracy among all trained classifiers in the classification learner tool. Figure 7a shows the correct number of classifications in each classified class; for class 1, a total number of 86/86 samples were correctly classified, and for class 3, a total number of 6/7 samples were correctly classified. Figure 7b illustrates the classification accuracy of each class, where it can be seen that the predicted classification accuracy of class 1 was 100% (86/86 corrected samples), the predicted correct number of classified samples was 14/15 with a 93% classification accuracy for class 2; the false predicted classification accuracy of this class was 1/15 with a 7% classification accuracy. For class 3, the positive predictive accuracy was 60% (6 correct from a total number of 10 samples that were predicted for class 3) and the false predictive accuracy was 40% (4 wrong samples of the total number of 10 samples that were predicted for class 3). Figure  8 shows the receiver operating characteristic (ROC) curve of the positive class 1 that shows the performance of the classifier at the threshold of classification. The area under the curve (AUC) refers to the performance across all possible classification thresholds. The AUC indicates the 100% correct predictive classification of class 1, so the AUC was 1.   Figure 9 illustrates the confusion matrix of the ensemble bagged tree classifier, where the predictive accuracy of the classes was 93.1%, thus making this the best second classifier in the study. Figure 9a depicts the number of correct classification samples for each class, e.g., the number of correctly predicted samples for class 3 was 3/7 samples with a  Figure 9 illustrates the confusion matrix of the ensemble bagged tree classifier, where the predictive accuracy of the classes was 93.1%, thus making this the best second classifier in the study. Figure 9a depicts the number of correct classification samples for each class, e.g., the number of correctly predicted samples for class 3 was 3/7 samples with a 43% classification accuracy (Figure 9b) and the number of wrongly predicted samples for class 3 was 4/7 with a 57% diagnostic accuracy (Figure 9b).

Conclusions
The results of this study are very valuable for determining the state of the insu paper in a power transformer. The state of such paper is based on the magnitude DP, which correlates with several important tests. Most research in the literature lated the DP and the insulating paper state to 2-FAL concentration, but this corr can provide a wrong indication about the insulating paper state due to the adsorp the paper to the 2-FAL that leads to a reduced 2-FAL concentration that may no rately detect the insulating oil of a transformer. Therefore, using several periodic indicate the state of insulating oil is better than using 2-FAL as a single indicator. W plied several classification techniques to collected data samples while using 10-fold validation to build a classification model to identify insulating paper class, as categ in Table 1. The results revealed that the decision tree was the best classifier for t lected data samples, with a 96.2% classification accuracy. Some other classificatio niques, such as ensemble bagged tree (93.1%), also provided a reasonable accurac utilization of the classification techniques to derive the DP and insulating pape based on several electrical, physical, and chemical tests provided more reliable than a technique that only utilizes one parameter such as 2-FAL.
Author Contributions: Conceptualization, methodology, software, validation, formal anal vestigation, resources, data curation, writing-original draft preparation, writing-review iting, visualization, supervision, project administration, funding acquisition: S.S.M.G. The has read and agreed to the published version of the manuscript. Acknowledgments: The author would like to acknowledge the financial support received fr University Researchers Supporting Project Number (TURSP-2020/34), Taif University, Tai Arabia. The author also appreciates the Saudi Electricity company for supplying the data research.

Conflicts of Interest:
The author declares no conflict of interest. The funders had no rol

Conclusions
The results of this study are very valuable for determining the state of the insulating paper in a power transformer. The state of such paper is based on the magnitude of the DP, which correlates with several important tests. Most research in the literature has related the DP and the insulating paper state to 2-FAL concentration, but this correlation can provide a wrong indication about the insulating paper state due to the adsorption of the paper to the 2-FAL that leads to a reduced 2-FAL concentration that may not accurately detect the insulating oil of a transformer. Therefore, using several periodic tests to indicate the state of insulating oil is better than using 2-FAL as a single indicator. We applied several classification techniques to collected data samples while using 10-fold cross-validation to build a classification model to identify insulating paper class, as categorized in Table 1. The results revealed that the decision tree was the best classifier for the collected data samples, with a 96.2% classification accuracy. Some other classification techniques, such as ensemble bagged tree (93.1%), also provided a reasonable accuracy. The utilization of the classification techniques to derive the DP and insulating paper state based on several electrical, physical, and chemical tests provided more reliable results than a technique that only utilizes one parameter such as 2-FAL.