Application of Statistical Distribution Models to Predict Health Index for Condition-Based Management of Transformers

In this study, statistical distribution model (SDM) is used to predict the health index (HI) of transformers by utilizing the condition parameters data from dissolved gas analysis (DGA), oil quality analysis (OQA), and furanic compound analysis (FCA), respectively. First, the individual condition parameters data were categorized based on transformer age from year 1 to 15. Next, the individual condition parameters data for every age were fitted while using a probability plot to find the representative distribution models. The distribution parameters were calculated based on 95% confidence level and extrapolated from year 16 to 25 through representative fitting models. The individual condition parameters data within the period were later calculated based on the estimated distribution parameters through the inverse cumulative distribution function (ICDF) of the selected distribution models. The predicted HI was then determined based on the conventional scoring method. The Chi-square test for statistical hypothesis reveals that the predicted HI for the transformer data is quite close to the calculated HI. The average percentage of absolute error is 2.7%. The HI that is predicted based on SDM yields 97.83% accuracy for the transformer data.


Introduction
Power transformers are among the most expensive and critical units in electrical distribution systems. Improper operational intervention could affect the power delivery, which results in substantial repair or replacement costs to the utilities. In recent years, utilities have shifted into predictive maintenance due to an advancement of data driven in maintenance program. Transformer's condition-based management (CBM) database has been given significant attention and utilized for the development of the health index (HI) model in order to optimize the investment and ensure reliable operation.
HI is a known method that utilizes the in-service multiple condition information into an objective and single computable index to provide the inclusive health of the transformer asset. This technique helps to assess the long-term degradation level of transformer population through the condition parameters data, which may not be accessible to be diagnosed by scheduled maintenance and individual diagnostic techniques [1]. HI can be used for the management of the assets and prioritization of the investment in either capital or maintenance schemes [1]. HI concept is based on scoring, rating, and ranking methods, which considers fundamental theory, technical guidelines, and expert decisions. Most utilities employ HI for the management of transformer assets, such as Kinectrics Inc., DNV GL, Hydro-Québec, Terna, Electricity Generating Authority of Thailand, and Tenaga Nasional Berhad [2][3][4][5][6].
Common data driven approaches, such as statistical and artificial intelligence (AI) application based models, have been widely used as prediction models for condition deterioration of high voltage assets [7][8][9][10]. Predicting the HI of the transformers based on these approaches is one of many other applications. This will substantially help in the financial strategy of the utilities in asset maintenance plans. Currently, there are limited studies on the modelling of future transformers' health degradation that is based on HI.
Most of the studies focus on the prediction based on the individual condition parameters data [11][12][13][14].
Statistical approaches that are based on Markov Model (MM) [15,16] and Hidden Markov Model (HMM) [17] have been used to predict the condition states of transformer population. Previous work in [15] utilizes the transition probabilities of the transformer's condition states that are derived from HI for a specific year interval. The other study in [16] implements a similar approach, except that the transformer's condition states are derived from condition parameters data. A previous study in [17] utilizes HMM to predict the transformer's condition states in a different approach as compared to MM [15,16]. Hidden state transition and emission probabilities derived from condition parameters data have been computed to predict the HI of transformers [17]. MM and HMM both do not rely heavily on historical condition parameters data. Hence, the uncertainty effect due to constraints of a long-term data record could be minimized. The methods predict either the final HI or individual condition parameters in terms of probabilities, which were later converted into HI and condition parameters data values. It is found that the accuracy levels of HI that were obtained based on these models have been satisfactory.
On the other hand, fuzzy logic [18], general regression neural network (GRNN) [19], neural-fuzzy (NF) [20], random forest [21], support vector machine (SVM) [22], principle component analysis (PCA), and analytical hierarchy process (AHP) [23], are among the available AI models that have been studied in previous works of HI. These models require extensive data to ensure a promising result in terms of prediction accuracy of the condition of the transformers.
The main motivation of this study is to introduce a simplified method in order to predict the HI of transformer population that is based on Statistical Distribution Model (SDM) utilizing the individual condition parameter data as a key approach to determine the HI. SDM is chosen, due to its simplicity and adaptability to analyze any sample size data [24]. In addition, it can also identify the independent variables (13 condition parameters) that can affect the predicted HI. Hence, further investigation can be performed on the abnormal trend of the individual condition parameters data to improve the interpretation of overall transformer's HI. First, the representative distribution model is identified for the individual condition parameters data of transformers. Next, the SDM is implemented to the condition parameters data to determine the predicted HI of transformer population. The final part is on hypothesis testing through the Chi-square statistic to determine the best-of-fit and absolute error percentage between the predicted and computed HI.

Transformer Health Index Estimation Model
SDM was employed in order to predict the HI of the transformer population, given the limited historical condition parameters data. Figure 1 shows the overall framework for estimating the impending transformer HI using individual condition parameters data. First, the condition parameters data from the transformer population were grouped according to age band from year 1 to 15. Next, the individual condition parameters data of transformers data for every age were fitted into the probability plot to determine the representative distribution. The distribution parameters from year 1 to 15 were then cal-culated. The distribution parameters were then plotted and extrapolated to determine the distribution parameters for the next 10 years. Next, the individual condition parameters data were calculated from the extrapolated distribution parameters that were based on the inverse cumulative distribution function (ICDF) of the identified distribution model. The predicted HI was then determined based on the conventional scoring approach. Finally, the predicted HI was compared with computed HI using the Chi-square test and absolute error percentage.
Appl. Sci. 2021, 11, x FOR PEER REVIEW 3 of 21 of transformers data for every age were fitted into the probability plot to determine the representative distribution. The distribution parameters from year 1 to 15 were then calculated. The distribution parameters were then plotted and extrapolated to determine the distribution parameters for the next 10 years. Next, the individual condition parameters data were calculated from the extrapolated distribution parameters that were based on the inverse cumulative distribution function (ICDF) of the identified distribution model. The predicted HI was then determined based on the conventional scoring approach. Finally, the predicted HI was compared with computed HI using the Chisquare test and absolute error percentage. The HI of transformer population was modelled based on the statistical approach. The method was used to estimate the predicted HI by calibrating the prediction model based on the condition parameters data. The first condition parameters data that were considered for the input parameters were from dissolved gas analysis (DGA), which included hydrogen (H2), methane (CH4), acetylene (C2H2), ethylene (C2H4), ethane (C2H6), The HI of transformer population was modelled based on the statistical approach. The method was used to estimate the predicted HI by calibrating the prediction model based on the condition parameters data. The first condition parameters data that were considered for the input parameters were from dissolved gas analysis (DGA), which included hydrogen (H 2 ), methane (CH 4 ), acetylene (C 2 H 2 ), ethylene (C 2 H 4 ), ethane (C 2 H 6 ), carbon monoxide (CO), and carbon dioxide (CO 2 ). The oil quality analysis (OQA) included dielectric breakdown voltage, interfacial tension, color, acidity, and water content. The furanic compound analysis (FCA) that consisted of 2-furfuraldehyde (2-FAL) was also used in the study to consider the in-service ageing of the solid insulation. It is important to note that this study does not consider any abnormal data due to the unusual operating environment, internal electrical faults, and electromagnetic interference during in-service.
In this study, two-parametric Weibull and normal distribution models were considered. Weibull distribution has been widely used for industrial applications, especially in the field of reliability engineering. It is normally used in the reliability assessment, such as the derivation of reliability indices mean time to failure (MTTF) [25], equipment's failure rate [26][27][28], remaining useful life prediction [29,30], spare parts replacement, and maintenance/replacement strategies [31,32]. It is a versatile distribution with two critical parameters, which can also describe the characteristics of other types of distributions. The probability density function (PDF) of the Weibull distribution model was computed based on the condition data parameters and it is expressed in Equation (1) [33].
where t is the lifetime of the transformer, β is the shape parameter, and α is the scale parameter. α determines the spread of the data, while β implies the shape of the distribution, whereby it is dimensionless. Different values of the β can affect the behavior of the distribution. For 3 ≤ β ≤ 4, the shape of the Weibull distribution is similar to that of a normal model. In order tso plot the probability of Weibull distribution, the condition parameters data were converted into probabilities using CDF, as shown in Equation (2) [34].
Normal or Gaussian distribution has wide application in describing product lifetime data [35], and it can be expressed by Equation (3) [36].
in which µ is the mean of the distribution and σ is the standard deviation. Both of the parameters are in the same unit as the variable t. The Normal CDF can be expressed by Equation (4) [37].
where F(t) is the CDF for the standard normal distribution (with µ = 0 and σ = 1).

Estimations Distribution Parameters Estimation
There are several methods that can be used in order to estimate the Weibull and normal distributions parameters, such as the ordinary least square (OLS), weighted least square (WLS), maximum likelihood estimate (MLE), and method of moments (MOM) [33]. The OLS and the WLS methods are commonly used due to its simplicity [37]. The parameters estimation can be calculated by solving the simultaneous equations. The MLE and the MOM are the common methods that are used for engineering analyses, but both are computationally demanding [37]. Moreover, MLE and likelihood functions normally desire significant numbers of sample size, which develop unbiased minimum variance estimators as the sample size increases [38,39]. In this paper, the MLE method was used for estimating the population parameters of a distribution. It is because MLE is an analytic maximization procedure, which is applicable to all form of data [34]. These methods also have approximate normal distributions and sample variances that can be utilized to produce confidence bounds likelihood functions to test the models and parameters' hypotheses.
Suppose that x 1 , x 2 , . . . , x n are independent and identically distributed Weibull parameters, the random variables have probability density function f (x) expressed in Equation (1), where the parameters are assumed to be unknown. The MLE method was employed to estimate the parameters α and β. The likelihood function of x 1 , x 2 , . . . , x n can be constructed from Equation (2) and expressed in Equation (5) [40], Equation (6) can be obtained by taking the natural logarithmic transformation of Equation (5).
Hence, the MLE estimates (α ,β) of (α, β) can be estimated from the following Equations (9) and (10). n Equations (11) and (12) can be numerically solved forβ andα. Similarly, for normal distribution, the parameters µ and σ were estimated using the MLE. The likelihood function of x 1 , x 2 , . . . , x n can be constructed from Equation (4) and it is expressed in Equation (13).
From Equation (13), the log-likelihood can be expressed based on (14).

Condition Data Estimation
The condition parameters data represented by the Weibull and normal distributions were computed at the 50th percentile of sample data. Using the ICDF with the respective α and β, the data were evaluated at probability values, p. p, α, and β can be vectors, matrices, or multidimensional arrays with the same size [41]. A scalar input was expanded to a constant array of the same size as the other inputs. The ICDF of the Weibull distribution model at p = 0.5 can be expressed in Equation (19) [42].
Similarly, the condition parameters data that were represented by the normal distribution at p = 0.5 from the data sample were computed using the ICDF, as shown in Equations (20) and (21). where

Health Index Model Based on Scoring Algorithm
HI was computed based on a scoring algorithm. It is a conventional method that utilizes the weighting and ranking techniques to a list of condition parameters data, followed by conversion to scores from a predefined grade range. The scores were later aggregated into a single quantitative value. This method was employed due to its flexibility with the available data and it was the most commonly used by the utilities [1][2][3][4][5][6]43].
The condition parameters data were retrieved from the CBM database and the on-site physical conditions. The scoring and weighting algorithm is defined based on the technical guidelines, historical database, and fundamental theory. Expert decision and failure rate record are typically used to describe the appropriate weightages [1]. The procedures for determining the scores and weightage based on the different input parameters can be found in [1]. The final HI that was used in this study was computed according to Equation (23), which was adopted and modified from Equation (22) [1,43].
The updated HI formula omitted the contribution factors from transformers (60%) and tap changer (40%) derived from CIGRÉ WG 12-05 [44]. Because this study only considered three parameters, namely dissolved gases, oil quality, and furfural condition parameters of transformers data.
where K is the coefficient that is assigned to the respective factor and HIF is the score of each factor. In this case, K DGA = 10, K OQA = 8, and K FCA = 6, respectively, and HIF is the rating (A, B, C, D, E) that is converted to a factor between 4 and 0 [1]. Finally, the HIs were categorized, as per Table 11 in [1], and grouped into discrete categories, from "very good" to "very poor", which correspond to transformer condition and interpretations.

Implementation of Statistical Distribution Models to Transformer CBM Data
The condition parameters data that were used in the study were from 1322 oil samples (dataset) that contain 17,186 measurements data from 13 condition parameters data that were extracted from 373 distribution transformers. The data were divided into training (9425) and validation (7761) purposes. These transformers have voltage and power ratings of 33/11 kV (step-down) and 30 MVA, respectively. The age band of the data is from 1 to 25 years. Table 1 tabulates the dataset distribution for each year. Next, the condition parameters data were sorted, scaled logarithmically, and then plotted on the x-axis. The y-axis represents either Weibull or normal distribution quantiles, converted into probability values while using CDF Equations (2) and (4), respectively. Figures 2 and 3 show the probability plots that were obtained for dielectric breakdown voltage and acidity from year 1 to 15, fitted by normal and Weibull distributions, respectively. Based on Figure 2a,b, the dielectric breakdown voltage data for year 1 to 2 and year 4 to 6 are close to the normal distribution fitting line. For year 3, the apparent deviation of the dielectric breakdown voltage data from the fitting line occurs at probabilities higher than 95% and between 10% and 40%. Significant deviations are also observed, particularly at probabilities less than 10% and above that 90%, as shown in Figure 2c-e for year 7 to 15. Figure 3a-e plot the Weibull distribution fittings for acidity data. Apparent deviations can be observed at lower and upper tails of the Weibull distribution. The Weibull distribution could not represent the acidity data for year 1 quite well due to the large data variation within this period, as shown in Figure 3a. There are slight deviations of the acidity data from the Weibull distribution fitting at a probability less than 10% for year 2 to 3. The acidity data for year 4 to 6 could be represented quite well by the Weibull distribution, as shown in Figure 3b. The patterns of Weibull distribution fittings for year 7 to 9 and year 10 to 12 are quite similar, whereby apparent deviations of the acidity data occur at probabilities between 10% and 40%, as shown in Figure 3c

Distribution Parameters and Condition Data Estimations
Next, the distribution parameters for two-parametric Weibull and normal were computed based on (5)- (18). Table 2 tabulates an example of Weibull and normal distribution parameters fittings for dielectric breakdown voltage and acidity. For dielectric breakdown voltage data, the mean µ shows an apparent linear decrement trend as the transformer age increases. The standard deviation σ only shows a slight decrement trend with the increment of transformer age. For the acidity data, the α initially fluctuates between 0.022 and 0.0323 for year 1-4. It starts to increase significantly as the transformer age increases from year 4 to year 10. After year 10, it stabilizes between 0.0712 and 0.085 after year 10. The β for year 1 is relatively low as compared to other data and it is due to the poor fittings of Weibull distribution as shown in Figure 2a. Nonetheless, a decrement pattern is observed for the β as the transformer age decreases. Next, the distribution parameters from year 16 to 25 were fitted and extrapolated while using the curve fitting process based on the WLS method, as shown in Figures 4 and 5. It is quite difficult to obtain high R 2 for all of the fittings due to the large variation of the distribution parameters. However, this limitation needs to be considered in this study in order to obtain the representative model for the transformer population. It is important to be noted, due to the nature of scoring and weighting HI technique used in this study, the variations of the individual condition parameters data will be less sensitive, since the calculation itself is based on aggregation method, whereby some of the values have a small effect on the overall model itself.     The exponential-based model was chosen for the curve fitting process, since it can provide the highest when compared to other models. For dielectric breakdown voltage data, the fittings of and exponentially decrease as the transformer age is increased. Based on the extrapolation, the and at year 25 are 35.005 and 18.434, respectively, as shown in Table 3. For the acidity data, the fitting of increases exponentially as the transformer age increases. On the other hand, the fitting of the shows a slight decrement trend. Table 3 presents the estimated distribution parameters for dielectric breakdown voltage and acidity from year 16 to 25. Next, the individual condition parameters data for the next 10 years were computed while using the estimated distribution parameter through ICDF, as in Equations (19) and (20) for validation purpose.  The exponential-based model was chosen for the curve fitting process, since it can provide the highest R 2 when compared to other models. For dielectric breakdown voltage data, the fittings of µ and σ exponentially decrease as the transformer age is increased. Based on the extrapolation, the µ and σ at year 25 are 35.005 and 18.434, respectively, as shown in Table 3. For the acidity data, the fitting of α increases exponentially as the transformer age increases. On the other hand, the fitting of the β shows a slight decrement trend. Table 3 presents the estimated distribution parameters for dielectric breakdown voltage and acidity from year 16 to 25. Next, the individual condition parameters data for the next 10 years were computed while using the estimated distribution parameter through ICDF, as in Equations (19) and (20) for validation purpose.  Figure 6 presents the predicted and computed individual condition parameters data over the transformer age band. Based on Table 2 in [16], the predicted dielectric breakdown voltage is quite close to the computed dielectric breakdown voltage, whereby it stays in "very good" condition for 25 years, as shown in Figure 6a. Most of the predicted water content shows reasonable agreement with the computed water content throughout the transformer age period, as in Figure 6b. An apparent deviation is found between predicted and computed water content for year 8-10 and year 25. The predicted and computed water content remain in "very good" condition for 25 years. The predicted interfacial tension shows a clear deviation from the computed interfacial tension, as seen in Figure 6c. The predicted interfacial tension is in "very good" condition throughout the first seven years. From year 8 to 15, it is in "good" condition and ends up in "fair" condition after year 15. Meanwhile, the computed interfacial tension is in "very good" condition during the first four years. It fluctuates among "very good", "good", and "fair" conditions between year 5 and 9. It enters a "good" condition after year 9 and then transits to "fair" condition between year 17 and 21. After year 21, it fluctuates between the "very good" and "good" conditions. The predicted color is close to the computed color throughout the first 23 years, as shown in Figure 6d. It deviates from the computed color after 23 years. The predicted color is in "very good" condition throughout the first eight years and then transits to "good" condition from year 9 to 11. The predicted color enters a "fair" condition between year 12 and 15. After 15 years, it ends up in a "poor" condition. Meanwhile, the computed color is in "very good" condition during the seven years and it transits to "good" condition from year 8 to 10 and then enters "fair" conditions in year 11. Between year 12 and 13, the computed color reinstates to a "good" condition. It enters the "fair" condition between year 14 and 16, and later ends up in "poor" condition. There are deviations between predicted and computed acidities between year 7-12, 16-18, and 22-24, as shown in Figure 6e. The predicted acidity is in a "very good" condition during the first 15 years. It ends up in "good" condition after year 15. The computed acidity is in "very good" during the first six years. Between year 7 and 9, it fluctuates between "very good" and "good" conditions. After year 9, the computed acidity remains in "good" condition. The predicted 2-furfuraldehyde remains close to the computed 2-furfuraldehyde during the first 15 years, as shown in Figure 6f. Most of the predicted 2-FAL is lower than the computed 2-furfuraldehyde after year 10. The predicted and computed 2-FAL are in THE "very good" condition during the first five years. Between year 6 and 15, the predicted 2-FAL is in "good" condition. It ends up in "fair" condition after year 15. The computed 2-FAL is in "good" between year 8 and 13. After year 13, it enters "fair" conditions. It is in "poor" condition between year 18 and 19, and it reinstates to "good" condition between year 20 and 22. After 22 years, it remains in a "fair" condition. Table 4 summarizes the representative distribution models for each condition parameters data in oil quality and furanic compound analyses. The dielectric breakdown voltage, color, and 2-FAL can be represented by the normal distribution, whereas interfacial tension, acidity, and water content are suitable to be represented by Weibull distribution. Color has the highest R 2 with 0.9044, and interfacial tension has the lowest R 2 with 0.3602. The exponential-based model was chosen for the curve fitting process for dielectric breakdown, water content, and interfacial. Whereas, color, acidity, and 2-FAL could be curve fitted by the power-based model. These models are chosen, since the highest R 2 is obtained when compared to other models besides these curves depict the closest generic trends of oil quality and furanic compound analyses parameters data.   Most of the predicted dissolved gases show deviation with the computed dissolved gases, as shown in Figure 7. Based on Table 1 in [16], the predicted H 2 deviates from the computed H 2 during the first two years, between year 4-7 and 17-21, as shown in Figure 7a. Both of the predicted and computed H 2 maintain in "very good" condition for 25 years. The predicted CH 4 still follows the decrement trend of the computed CH 4 , regardless of the deviation, as seen in Figure 7b. The predicted and computed CH 4 remains in "very good" condition for 25 years. A few of the predicted CO show reasonable agreement with the computed CO between year 4 and 23, as shown in Figure 7c. The deviation between the predicted and computed CO occurs between year 1-3 and year 24-25. The predicted CO maintains in "very good" condition during the first seven years and later transits to the "good" condition. The computed CO maintains in "very good" during the first six years. Between year 7 and 23, it is in "good" condition. The computed CO reinstates to the "very good" condition after 23 years. The majority of the predicted CO 2 deviates from the computed CO 2 , as shown in Figure 7d. The predicted CO 2 is in "very good" condition during the first two years. It is in a "good" condition between year 3 and 7. After seven years, the predicted CO 2 remains in a "fair" condition. The computed CO 2 is in "very good" condition during the first three years. From year 4 to 6, the computed CO 2 is in "good" condition. It enters a "fair" condition after year 6. It reinstates to "good" condition between year 21 and 23, and later transits to "very good" condition. The predicted C 2 H 4 is close to computed C 2 H 4 during the first 24 years, as shown in Figure 7e. It deviates from computed C 2 H 4 at year 25. Predicted and computed C 2 H 4 both maintain in "very good" condition for 25 years. Apparent deviation between predicted and computed C 2 H 6 , as shown in Figure 7f. The predicted and computed C 2 H 6 are in "very good" condition for 25 years. Similarly, the predicted C 2 H 2 shows a clear deviation from the computed C 2 H 2 , as shown in Figure 7g. The predicted C 2 H 2 is in "good" condition during the first 10 years. After year 8, the predicted C 2 H 2 remains in "fair" conditions until 25 years. On the other hand, the computed C 2 H 2 is in "good" condition during the first eight years. From year 10 to 19, the computed C 2 H 2 is a "fair" condition. After year 19, it remains in "good" condition and later transits to "fair" condition after year 23. Table 5 summarizes the representative distributions for each of the dissolved gas parameters data. Based on the results, the majority of the dissolved gas parameters data fit Weibull distribution, except for C 2 H 2 , CO, and CO 2 fitting normal distribution. C 2 H 6 has the highest R 2 with 0.7155 and CO 2 has the lowest R 2 with 0.2375. The exponential-based model was chosen for the curve fitting process for all dissolved gas parameters data, except for C 2 H 4 , which was curve fitted by the power-based model. The justification of the chosen distributions for dissolved gas parameters data is the same as the oil quality and furanic compound parameters data.  Figure 8 shows the predicted HI obtained by statistical model in Figures 6 and 7 for a period of 25 years. It is observed that most of the predicted HI values are close to the computed HI. Based on Figure 9, there are considerably small deviations for the predicted HI at year 10, 19, and 24. The HI at year 17 recorded the highest deviation. Further hypothesis testing to measure the best-of-fit between the computed and predicted HI was performed while using the Chi-square statistic, as seen in Equation (24), where n is the total year of the transformer in term of age, C n is the computed HI at n year, P n is the predicted HI at n year, and X 2 is a Chi-square statistic coefficient with degree of freedom, n − 1. The significance level α was set to 0.05, thus the rejection area fell after the critical value, which is 13.85. The X 2 of HI is 12.94, where, at α = 0.05, it falls outside the area of rejection.  Table 5 summarizes the representative distributions for each of the dissolved gas parameters data. Based on the results, the majority of the dissolved gas parameters data fit Weibull distribution, except for C2H2, CO, and CO2 fitting normal distribution. C2H6 has the highest with 0.7155 and CO2 has the lowest with 0.2375. The exponentialbased model was chosen for the curve fitting process for all dissolved gas parameters data, except for C2H4, which was curve fitted by the power-based model. The justification of the chosen distributions for dissolved gas parameters data is the same as the oil quality and furanic compound parameters data.  Appl. Sci. 2021, 11, x FOR PEER REVIEW 17 of 21 Figure 8 shows the predicted HI obtained by statistical model in Figures 6 and 7 for a period of 25 years. It is observed that most of the predicted HI values are close to the computed HI. Based on Figure 9, there are considerably small deviations for the predicted HI at year 10, 19, and 24. The HI at year 17 recorded the highest deviation. Further hypothesis testing to measure the best-of-fit between the computed and predicted HI was performed while using the Chi-square statistic, as seen in Equation (24), where is the total year of the transformer in term of age, is the computed HI at year, is the predicted HI at year, and is a Chi-square statistic coefficient with degree of freedom, − 1. The significance level was set to 0.05, thus the rejection area fell after the critical value, which is 13.85. The of HI is 12.94, where, at = 0.05, it falls outside the area of rejection.     The average percentage error between the predicted and computed HI was performed based on Equation (25).
where C n is the computed HI, P n is the predicted HI, and n is the age of the transformer. Figure 9 presents the absolute error percentages between the computed and predicted HIs that have been obtained based on SDM for 25 years. The overall average absolute error percentage in the training region is 0.65%, while, for the validation region, is 2.17%. The HI predicted using SDM for the transformers in validation region yields 97.83% accuracy. The application of SDM to predict HI of transformer population is a propitious approach for asset management in utilities. It is shown that, with limited historical condition parameters data, SDM is able to predict the transformers' HI. These findings can be further validated if direct HI data from utilities can be acquired in the future. The application can be extended to another fleet or unit, regardless of ratings/sizes, because it is a data driven model. In addition, it is interesting to examine the HI model based on SDM to represent the condition of transformer due to oil change or regeneration that can be carried out as part of the future study.

Conclusions
In summary, it is found that the dielectric breakdown voltage, color, 2-FAL, CO, CO 2 , and C 2 H 2 under study could be represented by the normal distribution. The Weibull distribution is suitable for representing the IFT, acidity, water content, H 2 , CH 4 , C 2 H 6 , and C 2 H 4 . It is found that SDM can be utilized to estimate the HI of transformers while using individual condition parameters data. The predicted accuracy is subject to the obtainability of the data at various ages. Predominantly, the trends of the predicted HI are close to the computed HI. The hypothesis testing from the results using Chi-square shows that the X 2 value of HI data is 12.94, where it falls outside the rejection area at 0.05 significance level. The overall average percentages of absolute errors in training and validation regions are 0.65% and 2.17%, respectively. The predicted HI of transformers based on SDM yields accuracy of about 97.83%.