Improved Fuzzy C-Means Clustering for Transformer Fault Diagnosis Using Dissolved Gas Analysis Data

: Dissolved gas analysis (DGA) of the oil allows transformer fault diagnosis and status monitoring. Fuzzy c-means (FCM) clustering is an effective pattern recognition method, but exhibits poor clustering accuracy for dissolved gas data and usually fails to subsequently correctly classify transformer faults. The existing feasible approach involves combination of the FCM clustering algorithm with other intelligent algorithms, such as neural networks and support vector machines. This method enables good classiﬁcation; however, the algorithm complexity is greatly increased. In this paper, the FCM clustering algorithm itself is improved and clustering analysis of DGA data is realized. First, the non-monotonicity of the traditional clustering membership function with respect to the sample distance and its several local extrema are discussed, which mainly explain the poor classiﬁcation accuracy of DGA data clustering. Then, an exponential form of the membership function is proposed to obtain monotony with respect to distance, thereby improving the dissolved gas data clustering. Likewise, a similarity function to determine the degree of membership is derived. Test results for large datasets show that the improved clustering algorithm can be successfully applied for DGA-data-based transformer fault


Introduction
In power systems, gas chromatography is widely used to detect mineral-oil-immersed transformer insulation defects, allowing timely discovery of different latent faults in preventive tests of electric equipment [1][2][3][4][5][6].Specifically, dissolved gas analysis (DGA) is the most common method of transformer fault diagnosis.Since the role of dissolved gases in determining the behavior of mineral insulating oils was first discussed in 1933 [1], various methods of dissolved gas interpretation have been proposed, including the Doernenburg ratio method, the International Electrotechnical Commission (IEC) ratio method, the key gas method, Duval triangles, Duval pentagons, and the Mansour pentagon [2,3].These conventional methods have gained worldwide acceptance among electrical utility companies as the main fault diagnosis methods for transformers.However, because of the objective uncertainty regarding the cause-and-effect relationship of transformer faults themselves, as well as the uncertainty of the subjective judgment boundary of the test data, it is difficult for the above conventional methods to meet the requirements for engineering applications [4,5].
With technological development, considerable progress has been made with regard to intelligent approaches towards dissolved gas data interpretation, with application of theories such as expert systems [6][7][8], artificial neural networks [9][10][11], fuzzy theory [12][13][14][15], rough set theory [16], grey system theory [17,18], and support vector machines [19,20].Other intelligent diagnosis tools [21] such as swarm intelligence algorithms, wavelet analysis, Bayesian networks, and the evidential reasoning approach have also been introduced to DGA interpretation research.These intelligent methods partially compensate for the deficiencies of the conventional methods and provide new techniques for transformer fault diagnosis.In particular, fuzzy clustering algorithms are integral components of artificial intelligence, being unsupervised classification algorithms with a wide range of applications in areas such as data mining and pattern recognition [22].The primitive principle behind these algorithms is that "birds of a feather flock together."That is, similarities exist between the same fault datasets inside the transformer, and DGA data processing with unsupervised fuzzy clustering algorithms is promising.However, these algorithms have relatively low fault diagnosis accuracy when processing DGA data, and the outcomes are sensitive to the initial values.Hence, different initial values often yield different results and unsatisfactory evaluations.Since the development of the fuzzy c-means (FCM) algorithm in 1973 [23,24], numerous FCM variants have been proposed, including the possibilistic c-means algorithm [25], the possibilistic FCM clustering algorithm [26,27], the kernel clustering algorithm [28], and the generalized entropy-based clustering algorithm [29].All these FCM-based algorithms use the norm d to describe the distance between the sample and cluster center, and the reciprocal distance to characterize sample membership of a class.However, different clustering results are obtained for different iteration starting points on the same dataset.Further, test sample datasets for different faults are always divided into the same subclass.These defects limit the application of fuzzy clustering algorithms.
Based on the transformer fault characteristics of dissolved gases, this article proposes an improved FCM clustering algorithm to analyze and investigate transformer fault diagnosis using dissolved gas data.The improved algorithm has reduced sensitivity to initial values, allowing application of this algorithm for successful DGA data processing.Further, the proposed formulation can be straightforwardly applied in the context of the other FCM-based clustering techniques mentioned above.The remainder of this paper is organized as follows: the DGA data clustering process is discussed in Section 2. Section 3 describes the non-monotonicity of the traditional clustering membership function.In addition, an exponential form of the membership function is proposed.Then, Section 4 reports a detailed verification of the effectiveness of the proposed algorithm.

FCM Clustering of DGA Data
The internal structure of an oil-filled power transformer is highly complex, and has temporally and spatially varying electrical and thermal field distributions.Under normal operation, the insulating oil and solid insulating materials deteriorate at different rates with increased operation time under catalysis of copper, iron, and other materials, as well as the actions of the electric field, thermal field, moisture, and oxygen.Besides some non-gaseous deteriorating products, small amounts of hydrogen (H 2 ), low-molecular-weight hydrocarbons, and carbon oxides are produced over time.Then, inceptive faults in oil-filled equipment produce increasing amounts of gases, the accumulation of which gradually becomes apparent.In fact, the gases accumulate in the oil and are continuously dissolved by convection and diffusion until they saturate and form bubbles.The processes of gas production and dissolution in insulating oil are also complex.Nevertheless, some gases allow determination of internal faults in oil-filled equipment, including H 2 , methane (CH 4 ), ethane (C 2 H 6 ), ethylene (C 2 H 4 ), acetylene (C 2 H 2 ), carbon monoxide (CO), and carbon dioxide (CO 2 ).The concentrations of these gases are closely related to the type and severity of the fault.Hence, DGA data can be processed using the unsupervised FCM clustering algorithm, which classifies DGA data samples by distance and aims to discriminate different types of faults by increasing their separability.
Fuzzy clustering algorithms treat clustering as a constrained optimization problem and determine the fuzzy partition of the dataset as the solution.Fuzzy clustering optimally classifies DGA data samples through an iterative objective function that can be the square sum of the weighted error within the class, i.e.,: is the Euclidean distance between DGA data sample x i and cluster center v j ; and m is a weight index.The clustering criterion is the minimum value of J(U,V), and the constraint of this extreme value is The abovementioned optimization problem can be solved using the method of Lagrange multipliers, and the iterative formula of the FCM clustering algorithm is obtained as The FCM clustering algorithm can process DGA data, where the components of sample x k are the concentrations of various dissolved gases (such as those mentioned above) in the oil.The basic algorithm consists of the following steps: (1) Normalize and preprocess the DGA data.

Local Extremum Problem of (Conventional) Membership Function
The following equations hold: where p ij is the similarity between sample x i and cluster center v j , and u ij represents the degree of membership of the sample to the cluster.Use of Equation ( 3) to describe the similarity between the sample and cluster center yields extremely high values as the sample approaches the cluster center, i.e., as d ij approaches zero.Likewise, p ij decreases rapidly as the samples separate from the cluster center, as shown in Figure 1a.Moreover, the membership function u ij in Equation ( 4) has many local extrema, as shown in Figure 1b.In Figure 1, where pij is the similarity between sample xi and cluster center vj, and uij represents the degree of membership of the sample to the cluster.Use of Equation (3) to describe the similarity between the sample and cluster center yields extremely high values as the sample approaches the cluster center, i.e., as dij approaches zero.Likewise, pij decreases rapidly as the samples separate from the cluster center, as shown in Figure 1a.Moreover, the membership function uij in Equation ( 4) has many local extrema, as shown in Figure 1b.In Figure 1,  Consider the red line in Figure 1b, which represents the membership function for the cluster center at 0.6.The function does not monotonically decrease as the data samples become more distant from the center.In fact, two minima appear at 0.3 and 0.9, affecting the function monotonicity.The same behavior can be seen for the blue and green lines in Figure 1b (for cluster centers at 0.3 and 0.9, respectively).Hence, for some data samples xm and xn, as well as cluster center vp, the situation that dmp > dnp with ump > unp can arise.Furthermore, as the FCM clustering optimization essentially corresponds to the hill-climbing technique for local searching, the existence of local extrema can yield different clusters depending on the initial values of the search process.
In the abovementioned problem, the one-dimensional case of FCM clustering is analyzed.Therefore, it is reasonable to believe that the non-monotonicity of the membership function is more severe for high-dimensional data.Likewise, as transformer oil chromatography usually considers five-dimensional datasets containing H2, CH4, C2H6, C2H4, and C2H2 content measurements, the local extrema and non-monotonicity of the membership function can cause low accuracy for transformer fault diagnosis when FCM clustering is used for DGA data processing.

Exponential Membership Function
Based on the local extremum problem, novel similarity and membership functions are proposed in this paper, with the former being expressed as: and the latter as: Consider the red line in Figure 1b, which represents the membership function for the cluster center at 0.6.The function does not monotonically decrease as the data samples become more distant from the center.In fact, two minima appear at 0.3 and 0.9, affecting the function monotonicity.The same behavior can be seen for the blue and green lines in Figure 1b (for cluster centers at 0.3 and 0.9, respectively).Hence, for some data samples x m and x n , as well as cluster center v p , the situation that d mp > d np with u mp > u np can arise.Furthermore, as the FCM clustering optimization essentially corresponds to the hill-climbing technique for local searching, the existence of local extrema can yield different clusters depending on the initial values of the search process.
In the abovementioned problem, the one-dimensional case of FCM clustering is analyzed.Therefore, it is reasonable to believe that the non-monotonicity of the membership function is more severe for high-dimensional data.Likewise, as transformer oil chromatography usually considers five-dimensional datasets containing H 2 , CH 4 , C 2 H 6 , C 2 H 4 , and C 2 H 2 content measurements, the local extrema and non-monotonicity of the membership function can cause low accuracy for transformer fault diagnosis when FCM clustering is used for DGA data processing.

Exponential Membership Function
Based on the local extremum problem, novel similarity and membership functions are proposed in this paper, with the former being expressed as: and the latter as: Here, p ij represents the similarity between sample x i and cluster center v j , u ij represents the degree of membership of the sample to the cluster, and γ is a factor that adjusts the sensitivity of the similarity function.The proposed similarity and membership functions are illustrated in Figure 2a,b, respectively.
Here, pij represents the similarity between sample xi and cluster center vj, uij represents the degree of membership of the sample to the cluster, and γ is a factor that adjusts the sensitivity of the similarity function.The proposed similarity and membership functions are illustrated in Figure 2a,b, respectively.Comparing Figure 2 to Figure 1, it is apparent that the improved similarity function (Figure 2a) is smooth, unlike the traditional function (Figure 1a) which exhibits asymptotic points.Likewise, the improved membership function (Figure 2b) does not exhibit local minima, and the degree of membership monotonously decreases with distance between the samples and any cluster center.In fact, the improved membership function smoothly divides the cluster space into three classes, as shown in Figure 2b.
Figures 3 and 4 show the two-dimensional data membership obtained with the transitional and improved FCM, respectively.Similar to Figures 1 and 2  Comparing Figure 2 to Figure 1, it is apparent that the improved similarity function (Figure 2a) is smooth, unlike the traditional function (Figure 1a) which exhibits asymptotic points.Likewise, the improved membership function (Figure 2b) does not exhibit local minima, and the degree of membership monotonously decreases with distance between the samples and any cluster center.In fact, the improved membership function smoothly divides the cluster space into three classes, as shown in Figure 2b.
Figures 3 and 4 show the two-dimensional data membership obtained with the transitional and improved FCM, respectively.Similar to Figures 1 and 2 Here, pij represents the similarity between sample xi and cluster center vj, uij represents the degree of membership of the sample to the cluster, and γ is a factor that adjusts the sensitivity of the similarity function.The proposed similarity and membership functions are illustrated in Figure 2a,b, respectively.Comparing Figure 2 to Figure 1, it is apparent that the improved similarity function (Figure 2a) is smooth, unlike the traditional function (Figure 1a) which exhibits asymptotic points.Likewise, the improved membership function (Figure 2b) does not exhibit local minima, and the degree of membership monotonously decreases with distance between the samples and any cluster center.In fact, the improved membership function smoothly divides the cluster space into three classes, as shown in Figure 2b.
Figures 3 and 4 show the two-dimensional data membership obtained with the transitional and improved FCM, respectively.Similar to Figures 1 and 2, the membership function shown in Figure 3b has many local minima, and the spatial structure of the division is complex.The newly proposed membership function in Figure 4b

Sub-Similarity Function
In FCM clustering, the distance between data samples and cluster centers is measured by the vector norm . Hence, information on dimensions with small values can be uncovered using that for dimensions showing relatively high values.In addition, DGA usually considers the Euclidean distance: Given the large differences in the H2, CH4, C2H6, C2H4, and C2H2 contents, the contribution of the gases with small concentrations to the distance calculation is negligible, resulting in loss of DGA information.For instance, the H2 content is considerably larger than that of the other gases; thus, the sample membership of each class is mainly determined by the H2 content, with the concentrations of gases such as C2H6 and C2H4 being disregarded.Furthermore, different gases generally exhibit different features in DGA.For instance, simulations and several field tests [4] have shown that C2H2, H2, and C2H4 allow characterizing discharge, partial discharge, and overheating failures, respectively.However, the contribution of the low-concentration gases is disregarded in Equation ( 7); thus, proper fault characterization is hindered.Therefore, the following method to calculate the degree of membership is proposed in this paper.First, for DGA data sample xi and cluster center vj, the sub-similarity pijk is calculated for each kth gas, where: Then, the similarity: is calculated.Finally, the degree of membership of sample xi to the cluster with center vj is calculated from: By introducing the sub-similarity function, pijk, the contribution of each gas can be suitably represented for different faults, and addition and subtraction of different gas concentrations can be avoided.Therefore, the membership function reflects the physical characteristics of the different gases more clearly.

Sub-Similarity Function
In FCM clustering, the distance between data samples and cluster centers is measured by the vector norm x i − v j 2 .Hence, information on dimensions with small values can be uncovered using that for dimensions showing relatively high values.In addition, DGA usually considers the Euclidean distance: where ∆(ψ) represents the gas numerical difference x i (ψ) − x j (ψ) between the DGA data samples x i and x j .Given the large differences in the H  [4] have shown that C 2 H 2 , H 2 , and C 2 H 4 allow characterizing discharge, partial discharge, and overheating failures, respectively.However, the contribution of the low-concentration gases is disregarded in Equation (7); thus, proper fault characterization is hindered.Therefore, the following method to calculate the degree of membership is proposed in this paper.First, for DGA data sample x i and cluster center v j , the sub-similarity p ijk is calculated for each kth gas, where: Then, the similarity: is calculated.Finally, the degree of membership of sample x i to the cluster with center v j is calculated from: By introducing the sub-similarity function, p ijk , the contribution of each gas can be suitably represented for different faults, and addition and subtraction of different gas concentrations can be avoided.Therefore, the membership function reflects the physical characteristics of the different gases more clearly.

Flow of Improved FCM Clustering Algorithm
The improved FCM clustering algorithm is expressed as follows: (1) Initialize dataset X.
(3) Calculate cluster center V(L + 1) and membership matrix U(L + 1) iteratively using Equations (2) and (10), respectively.(4) When the iteration accuracy is within the predefined threshold, stop the iterative process and retrieve the clustering results.

Clustering of University of California, Irvine (UCI) Iris Dataset
The UCI Iris Dataset is a widely used database in the pattern recognition literature [30][31][32][33][34], and is used to validate the improved clustering algorithm in this paper.The dataset contains three classes of 50 instances each, where each class refers to a type of iris plant; namely, setosa, versicolor, and virginica.Each iris plant has four attributes: its sepal length, sepal width, petal length, and petal width in centimeters.The dataset is collected in the UCI Machine Learning Repository (http://archive.ics.uci.edu/ml)[35].The conventional and improved FCM algorithms were tested 100 times with clustering of 100 sets of initial point pairs randomly selected from the 150 iris data elements.The convergence criterion was max(∆U, ∆V) < 10 −3 and the fuzzy exponent m = 2.The clustering results are listed in Table 1.  1.This application shows that the improved clustering algorithm successfully achieves classification and exhibits superior performance to the traditional technique.

Verification Process for DGA Data
Based on data available from different sources [2][3][4], a large amount of transformer DGA data was compiled to validate the method proposed in this article.Note that the faults in large-scale oil-filled power transformers are diverse and involve different levels of complexity.Therefore, the IEC has divided these faults into six categories: low-, middle-, and high-temperature failures, and partial, spark, and arc discharges.These six faults were contained in the considered DGA data, and each data element x i had five attributes; namely, the H 2 , CH 4 , C 2 H 6 , C 2 H 4 , and C 2 H 2 content (unit: µL/L).
Ten data for each of the six faults were selected from the collected DGA data and the dataset X = {x 1 , • • • , x 60 } was obtained.Further, six samples were randomly selected from the dataset as initial clustering values.A total of 10,000 sets of initial data points for clustering were selected.In addition, the clustering convergence was set to 10 −3 , where the convergence of each cluster was defined by the difference in the center of each cluster for two consecutive iterations.
In this study, the performance of the FCM clustering algorithm was verified before and after the proposed improvement for DGA data processing.The verification process for both algorithms is summarized as follows: (1) Initialize DGA dataset X by calculating the proportion of gases related to the various faults.The H 2 content in the DGA is usually large.During the initialization process, the percentage of H 2 to H 2 and hydrocarbons, and the percentage of each hydrocarbon to the total hydrocarbons, are calculated.(2) Select six DGA data samples to be classified as the starting points for cluster analysis.
(3) Use the (original or improved) FCM clustering algorithm to classify the DGA data, and divide the dataset into six classes.(4) Determine the fault type that each class represents using statistics, as follows.Determine the number of samples belonging to each fault type in classified subset X.The fault type associated with the largest number of samples defines the subset class.For instance, subset X represents arc discharge if the subset contains more samples classified as this fault type.( 5) Validate the classification with the condition that no subset should be assigned to two fault types, and no fault type should be represented by two different subsets; i.e., the subsets and fault types must form a bijective mapping for the classification process to be considered valid.( 6) Statistically analyze the algorithm accuracy.

Verification Results
Overall, the verification test showed that the traditional FCM clustering algorithm did not achieve suitable clustering.Specifically, the clustering process either yielded overlapping clusters or assigned two fault types to the same subset.Despite invalid clustering of some initial values, however, the proposed FCM clustering algorithm could effectively classify the DGA data for most initial values and discriminate between the six types of transformer fault.In the following, the detailed quantitative results for each algorithm are presented.

Improved FCM Clustering Algorithm
Six patterns (labeled A-F) were obtained from the 10,000 sets of initial data points for clustering using the improved FCM clustering algorithm.One of the six patterns (pattern A) returned a valid result according to the verification process described in Section 4.2.The cluster centers of the gases according to the six fault types for the six patterns are listed in Table 2.Note that the H 2 value is expressed as the percentage of H 2 relative to the total gas concentration, whereas the other gas values are expressed as the percentage of their concentrations relative to the total hydrocarbon concentration.Of the six obtained patterns, only pattern A provided suitable DGA data classification into six classes corresponding to the fault types.In contrast, patterns B to F yielded classification of different subsets to repeated classes, or allocation of different classes to the same subset (denoted "invalid classification" according to the rule in Section 4.2).
The abovementioned classification patterns were obtained from the randomly selected 10,000 sets of initial data points for clustering.From Table 3, 8359 of the sets returned the valid classification pattern A, whereas the remaining 1641 sets returned the other patterns (B to F).The classification accuracy of pattern A reached 93.3%.
Fuzzy clustering algorithms treat clustering as a constrained optimization problem and determine the fuzzy partition of the dataset as the solution.Although the improved algorithm proposed in this paper eliminates the local extrema of the membership function, local extrema may still exist in the objective function, J(U,V).The iterative solution method in the fuzzy clustering algorithm is essentially a hill-climbing method, and the iterative process may terminate at local extrema.Therefore, from the mathematical perspective, the invalid patterns are the results of local extrema for J(U,V).In addition, Table 3 indicates that the patterns obtained from the proposed FCM clustering algorithm have fast convergence (the average number of iterations is 9.33) and suitable clustering performance.In fact, the separation coefficients are all above 0.8, and the fuzzy entropies remain below 0.3.The FCM clustering algorithm aims to retrieve the minimum value of the objective function through the iterative process.However, Table 3 reveals that the objective function value of the valid pattern A does not correspond to the minimum value among the patterns.In fact, the parameters of patterns B and E are superior to those of pattern A; i.e., they have higher separation coefficients and smaller fuzzy entropy values, and pattern E has a smaller objective function value.Hence, pattern A reflects the local extrema of the objective function rather than the global optimal solution.

Traditional FCM Clustering Algorithm
From the 60 DGA datasets, 10,000 sets of initial data points were also randomly selected for clustering using the traditional algorithm.Six clustering patterns were also obtained with this algorithm, all of which presented overlapping, thus failing to identify the transformer fault types.In fact, the clustering repeatedly returned overlapping centers or mapped two fault types to the same dataset.The patterns with their classes and cluster centers are listed in Table 4.For brevity, only the two patterns with the highest occurrence frequency are presented (see Table 5).From Table 4, the traditional FCM clustering algorithm always returned overlapping classes; hence, there was no valid pattern.Specifically, the cluster centers of the spark discharge and arc discharge faults were almost identical for pattern 1, as well as those for low-temperature overheating and "invalid classification."Likewise, the cluster centers for partial discharge, arc discharge, and "invalid classification" were almost identical for pattern 2. The patterns not listed in Table 4 also yielded overlapping; thus, the traditional FCM clustering algorithm cannot correctly identify transformer fault types from DGA data.
Table 5 lists the clustering performance results for the traditional FCM clustering algorithm.The classification rate is 0%.Further, compared with the proposed algorithm, the separation coefficient is small for all patterns, whereas the average fuzzy entropy is high.Moreover, the convergence rate is slow, taking an average of 53.8 iterations.

Fault Diagnosis
The fault diagnosis steps for DGA dataset x with FCM are as follows: (1) Normalize x by calculating the gas proportion.Calculate the percentage of H 2 to H 2 and hydrocarbons, and the percentage of each hydrocarbon to the total hydrocarbons.(2) Using Equations ( 8)- (10), calculate the degree of membership between x and cluster centers v j , presented as Pattern A in Table 2. (3) The degree of membership is the extent to which the dataset belongs to each fault, and the fault type with the highest membership is the diagnosed fault.
For the 60 DGA datasets used in this paper, the improved FCM correctly identified 55 faults, with an accuracy rate of 91.7%.When those data were analyzed with the widely used IEC ratio method, the accuracy rate was 78.3%.As the traditional FCM failed to obtain valid cluster centers, the fault diagnosis accuracy rate was as low as 45%.In general, the overall accuracy of the improved FCM method is as effective as that of IEC ratio method.Moreover, the improved method retains all fault information and is free of the shortcomings of the IEC ratio method with respect to the coding boundary error and code absence.
To further verify the applicability of the improved FCM, a total of 28 field DGA datasets with actual fault details were adopted, as detailed in Table 6.These data were from the book-Typical Cases: Application of Grid Equipment Status Detection Technology (2011-2013) [36], published by the Operation and Maintenance Department of the Chinese State Grid Corporation, which details actual faults found after transformer disassembly.The degrees of membership of the 28 DGA datasets to each cluster center v j of Pattern A in Table 2 are listed in Table 7. Twenty-one of the twenty-eight DGA datasets had diagnostic results that matched the actual faults in Table 6.Among the other seven datasets, the DGA datasets numbered 6 and 9 were falsely diagnosed as arc discharge, when their actual faults were high-temperature overheating.However, the diagnosis results for the remaining five DGA datasets were close to the true states of the transformers.Although the thermal faults were divided into low-, mid-, and high-temperature overheating with divisions of 300 • C and 700 • C, the differences between them are not physically clear, and neither is the difference between spark and arc discharge (also known as low-and high-energy discharge, respectively).Therefore, fuzzy clustering can be applied to interpret DGA data, rather than simply being employed for a single fault.The fault of the DGA dataset numbered 28 was 44.95% spark discharge and 45.79% arc discharge.In view of the relevance of the spark and arc discharge, the diagnosis result for dataset 28 is valid.This conclusion also holds for datasets 26, 25, 15, and 3 (highlighted in Table 7), with their diagnosis results also being valid.Therefore, the diagnosis results are highly consistent with the actual faults.
The IEC ratio method codes for these DGA datasets are also listed in Table 7.According to the diagnostic rules, 10 datasets (numbered 6, 8, 9, 13, 15-18, 26, and 28) were misdiagnosed.The accuracy rate was relatively low and the fault severity could not be obtained.

Dissolved Gas Analysis
Detection of certain gases generated in an oil-filled transformer in service is frequently the first available indication of a malfunction that may eventually lead to failure if not corrected.Arcing, partial discharge, low-energy sparking, severe overloading, pump motor failure, and overheating of the insulation are some possible mechanisms.Occurrence of these conditions singly, or as several simultaneous events, can result in decomposition of the insulating materials and formation of some gases.Many techniques for detection and measurement of these gases have been established.However, analysis of these gases and interpretation of their significance is a complex subject involving many factors, such as the transformer type, fault location, oil circulation type and rate, and variables associated with the sampling and measuring procedures.Therefore, the following principal points require special attention: (1) Gas sources The gases dissolved in the oil can originate from different sources, and in some cases the gases are not generated by transformer faults.When a transformer is overhauled, the oil exposed to the air can absorb CO 2 with content as high as 300 µL/L [2].If the insulating oil has not been completely degassed after the fault is repaired, the residual gases may remain in the oil.In fact, these gases do not generally affect normal transformer operation.Therefore, these cases should be considered when DGA is used to determine transformer faults.
(2) Gas measurement Accurate measurement of the dissolved gas content in the oil is a very important DGA analysis step, and itself involves many steps, such as oil sampling from the transformer, degassing from the oil samples, and gas content measurement.The errors of each step must be reduced.Oil sampling from a transformer should be performed in a fully sealed state and the sample should be analyzed as soon as possible.To prevent gas escape, the sample must be sealed and shielded from light.Severe vibration should also be avoided during transportation.Degassing is the main contributor to gas content error; therefore, the degassing result repeatability must be ensured.According to the provisions of IEC 60599-1999 [2], the gas content results for all measurements of the same gas sample should be within ±1.5% of the average value.When the composition and content are measured with a gas chromatograph, the column resolution must meet the quantitative analysis requirements; i.e., the instrument must have stability at baseline and sufficient sensitivity.According to IEC 60599-1999, the minimum detectable gas component concentrations are as follows: C 2 H 2 ≤ 0.1 µL/L, H 2 ≤ 5 µL/L, CO ≤ 25 µL/L, and CO 2 ≤ 25 µL/L.

(3) Warning values
In normal operation, the insulating oil and organic insulating materials inside the oil-filled transformers generate gases under the actions of heat and electricity.Therefore, warning values are set in practice [3], including gas content and gas growth-rate warning values, which are 150 µL/L and 6 mL/d, respectively, for the total hydrocarbon; 1 µL/L and 0.1 mL/d, respectively, for C 2 H 2 ; and 150 µL/L and 5 mL/d, respectively, for H 2 .Fault diagnostics with improved FCM clustering are enabled only when the gas properties exceed the warning values and the gas content continues to increase, indicating that the transformer has a fault.

(4) Comprehensive diagnosis
As the transformer structure is complex, DGA should be combined with other tests to obtain a comprehensive diagnosis of whether the transformer is faulty according to the DGA result.These tests usually include measurement of the winding DC resistance, no-load testing, insulation testing, partial discharge testing, and trace moisture measurement [4].The structure, operation, maintenance, etc., of the equipment are also factors to be considered.Different measures are implemented depending on the comprehensive fault diagnosis, such as test period shortening, monitoring strengthening, load limiting, scheduling of an internal inspection in the near future, or immediate operation termination.

Improved FCM Performance
In this study, an exponential form of the membership function was proposed that exhibits monotony with respect to distance and, thus, improves the FCM clustering of DGA data.The improved FCM exhibits highly superior performance compared to the traditional technique: (1) The improved FCM reduces the algorithm sensitivity.In the experiment performed in this study, the traditional FCM algorithm was not suitable for valid transformer fault classification using DGA data, as it could not return a valid pattern containing the six fault types.Different clustering results were obtained for different iteration starting points of the same dataset.In contrast, the improved algorithm could effectively identify the fault types, and a valid pattern for classification appeared for 83.59% of the sets of initial data points for clustering.Thus, the improved algorithm dramatically outperforms the traditional algorithm.(2) The improved FCM yielded six types of DGA fault.As expected, data samples corresponding to the same fault data were categorized into the same subclass, and different fault data samples were classified into different subclasses.(3) A total of 28 field DGA datasets with actual fault details were considered to verify the applicability of the improved FCM.The diagnosis results of the improved FCM were highly consistent with the actual faults.The fault diagnosis accuracy rate was considerably higher than that of the IEC ratio method.(4) The proposed algorithm corresponds to hill-climbing iterative optimization, and convergence to local minima was found; this is an inherent defect of the iterative optimization algorithm.(5) Disregarding the pattern validity, when converging with the same accuracy (i.e., 10 −3 ), the proposed algorithm required only 9.33 iterations on average compared to the 53.8 required by the traditional algorithm; the separation coefficients of the proposed and traditional algorithms were 0.87 and 0.38, respectively; and the average fuzzy entropies of the proposed and traditional algorithms were 0.23 and 1.26, respectively.Consequently, the proposed algorithm considerably improves the convergence speed of FCM clustering and outperforms the traditional algorithm.

Conclusions
FCM clustering is an effective pattern recognition method, but the traditional algorithm is not suitable for transformer fault detection using DGA data.The traditional FCM is prone to misclassification, and the clustering analysis effect may not meet the desired requirements.The iterative solution of the optimization problem essentially corresponds to the "mountain climbing" method of local searching.In that approach, however, the solution can easily fall into the local extrema owing to the initial value sensitivity.In addition, different clustering results are obtained for different iteration starting points of the same dataset, which seriously affects the clustering effect.This article explored the characterization of similarity between samples performed by traditional FCM clustering algorithm using the inverse Euclidean distance, the membership function of which has several extrema.In addition, the degree of membership is not monotonic with respect to distance; therefore, the traditional algorithm does not provide suitable classification of transformer fault types.To overcome this problem, a novel exponential similarity function and membership function were proposed for FCM clustering.Tests on a large DGA dataset showed that the proposed algorithm can effectively identify transformer faults and classify DGA data.Moreover, the proposed algorithm has excellent convergence and clustering performance, which are essential factors for its practical application.In addition, it is worth noting that the objective function of the fuzzy clustering function is nonlinear, and although the proposed technique eliminates the local extremum of the membership function, local extrema may still exist in the objective function.Therefore, the iterative process may terminate at the local extrema, as observed for the invalid patterns discussed in this article.If severely isolated classes exist in the dataset, this defect can be very serious.Therefore, good knowledge of the dataset is necessary before clustering.
1.5], and m = 2. Overall, the membership function of the original FCM clustering algorithm has several local extrema and the cluster space is irregular.
m = 2. Overall, the membership function of the original FCM clustering algorithm has several local extrema and the cluster space is irregular.

Figure 2 .
Figure 2. Proposed FCM clustering algorithm for one-dimensional data: (a) similarity and (b) membership functions.

Figure 2 .
Figure 2. Proposed FCM clustering algorithm for one-dimensional data: (a) similarity and (b) membership functions.
, the membership function shown in Figure 3b has many local minima, and the spatial structure of the division is complex.The newly proposed membership function in Figure 4b has no local extremum, which clearly reflects the membership of the three classes on the two-dimensional plane.The improved membership function has superior mathematical properties, which is beneficial to the clustering algorithm classification.Energies 2018, 11, x FOR PEER REVIEW 5 of 17

Figure 2 .
Figure 2. Proposed FCM clustering algorithm for one-dimensional data: (a) similarity and (b) membership functions.
data samples xi and xj.

Table 1 .
Iris cluster results obtained with conventional and improved FCM. improved FCM algorithm successfully divided the iris plant data into three classes with a high accuracy rate.The conventional FCM misclassified 16 data elements, numbered 51, 53, 78, 102, 107, 114, 120, 122, 124, 127, 128, 134, 139, 143, 147, and 150.After improvement, eight of these data elements (51, 53, 102, 114, 134, 143, 147, and 150) were correctly clustered, and the accuracy rate increased to 94.67%.The improved clustering algorithm achieved superior performance, a larger partition coefficient, and smaller separation entropy, as apparent from Table * The values indicate the minimum/average/maximum number of iterations required to determine the clusters.The

Table 2 .
Cluster centers obtained from dissolved gas analysis (DGA) data using proposed FCM clustering algorithm.

Table 3 .
Performance of proposed FCM clustering algorithm.The values indicate the minimum/average/maximum number of iterations required to determine the clusters. *

Table 4 .
Cluster centers obtained from DGA data using traditional FCM clustering algorithm *.
* The highlighted classes sharing the same gray tone exhibit overlapping.

Table 5 .
Performance of traditional FCM clustering algorithm.The values indicate the minimum/average/maximum number of iterations required to determine the clusters. *

Table 6 .
Field DGA data with actual faults.IEC ratio method code; # The DGA data were misdiagnosed by the IEC ratio method.** Low-, mid-, high-temperature overheating are abbreviated to low-, mid-, and high heat, respectively. ++

Table 7 .
Degree to which 28 DGA datasets are associated with each fault.