Improved Consistent Interpretation Approach of Fault Type within Power Transformers Using Dissolved Gas Analysis and Gene Expression Programming

: Dissolved gas analysis (DGA) of transformer oil is considered to be the utmost reliable condition monitoring technique currently used to detect incipient faults within power transformers. While the measurement accuracy has become relatively high since the development of various off-line and on-line measuring sensors, interpretation techniques of DGA results still depend on the level of personnel expertise more than analytical formulation. Therefore, various interpretation techniques may lead to different conclusions for the same oil sample. Moreover, ratio-based interpretation techniques may fail in interpreting DGA data in case of multiple fault conditions and when the oil sample comprises insigniﬁcant amount of the gases used in the speciﬁed ratios. This paper introduces an improved approach to overcome the limitations of conventional DGA interpretation techniques, automate and standardize the DGA interpretation process. The approach is built based on incorporating all conventional DGA interpretation techniques in one expert system to identify the fault type in a more consistent and reliable way. Gene Expression Programming is employed to establish this expert system. Results show that the proposed approach provides more reliable results than using individual conventional methods that are currently adopted by industry practice worldwide.


Introduction
Because power transformers are considered as the crux of transmission and distribution electricity networks, utilities and other power network stakeholders around the globe are aiming to adopt reliable, automated and non-invasive techniques for power transformer condition monitoring and diagnosis. To avoid transformer catastrophic failures that may result in a significant loss of revenue and business interruption due to electricity outages, incipient faults should be identified as soon as they emerge and a proper remedial action should be taken. Dielectric oil within power transformer is regularly tested using various chemical and electrical diagnostic techniques to detect developed transformer internal faults at early stages [1,2]. Among current transformer diagnostic techniques, dissolved gas in oil analysis (DGA) is widely accepted as a reliable tool to detect power transformer incipient faults [3]. DGA technique was developed based on the fact that different gases are evolved in transformer oil as a result of the decomposition of the oil and paper insulation due to the high thermal and electric stresses they are exposed to during transformer operational life [4]. These gases are formed at specific temperature ranges as defined in the combustible gas generation temperature [5] and Hastead's thermal charts [6]. Gases produced due to oil decomposition are hydrogen (H 2 ), methane (CH 4 ), acetylene (C 2 H 2 ), ethylene (C 2 H 4 ) and ethane (C 2 H 6 ). Carbon monoxide (CO) and carbon dioxide (CO 2 ) can be produced due to cellulose degradation, atmospheric leak or as a result of long term oxidation of transformer oil [7]. The amount of produced gases and the rate of generation reflect the general health condition of the transformer and can be used to identify various transformer internal faults and can be employed along with other transformer condition monitoring parameters to estimate the remnant operational life of the transformer [8][9][10][11]. Partial discharge activity produces H 2 and CH 4 while arcing generates all gases including traceable amount of C 2 H 2 [12]. While DGA measurement techniques are well developed in both real time and laboratory-based environments, the interpretation of results remains a challenge research area that calls for further improvement to enhance the accuracy of the current conventional interpretation methods as so far, there is no globally accepted technique for DGA interpretation [1]. DGA interpretation techniques such as key gas method [3], Doernenburg, Roger and IEC ratio methods [13,14] and Duval triangle method [15] are widely used by various utilities for DGA results identification and quantification. These techniques depend on personnel experience more than mathematical formulation. Hence, they may result in different conclusions for the same oil sample [16].
Ratio-based methods such as Roger, Doerenburg and IEC which employ either four or three gas ratios for DGA data interpretation are only usable if a substantial amount of the gas employed in the ratio is present else the method leads to out-of-code ratio and will not be able to identify the type of fault [6]. Therefore these methods can be used to identify faults rather than detecting it.
The key gas method employs combination of individual gases and total combustible gas concentration for fault identification within the transformer [6,17]. While the application of this method is a straightforward exercise, it is considered very conservative as a transformer may operate safely even though this interpretation technique indicates imminent risk providing gas evolution rate is not constantly increasing. For this reason, key gas it is not widely accepted as an effective tool for evaluating power transformers DGA results [3].
Duval triangle is a graphical technique that employs the concentration ratio of three gases in a triangle comprising different zones for thermal, partial discharge and arcing faults [18]. The main drawback of this technique is that the proposed triangle does not encompass an area for normal DGA results, hence similar to the ratio-based techniques, this method can only be used to identify the fault type of faulty transformers. Table 1 summarizes the methodology, pros and cons of the five DGA interpretation techniques discussed above.
Due to the limitations of conventional DGA interpretation techniques and because of the availability of DGA data, researchers were motivated to develop computer-based approaches for DGA interpretation using artificial intelligent (AI) techniques with a main goal of overcoming the drawbacks of the ratio-based methods in particular when multiple faults exist within the transformer [18][19][20][21][22][23][24][25]. As per [19] conventional DGA interpretation techniques are not consistent and they may result in different interpretations for the same oil sample. To automate, standardize and enhance the accuracy and consistency of the current DGA conventional interpretation techniques, a fuzzy logic model that incorporates the key features of Doerenburg, Roger, IEC ratio methods along with key gas and Duval triangle methods was developed and presented in [19]. However, the proposed model did not take into account the rate of gas evolution and the automatic adaption of model's rules based on data history to enhance its accuracy.
Another attempt to enhance the DGA interpretation accuracy is presented in [26], in which 386 oil samples were used to validate the proposed interpretation approach. However, the proposed technique relies on the ratio of 5 gases with respect to total combustible gases without taking into account the history of previous DGA data. Also, the accuracy of the proposed technique has not been widely confirmed due to the lack of data it was verified against. A particle swarm optimization technique is presented in [27] to classify various faults within power transformers based on DGA results. While the technique revealed good accuracy, its practical application may not be an easy task, in particular for online DGA sensors. In [28], IEC ratio method is used to train a deep belief network in order to enhance the accuracy of the DGA interpretation. However, as a ratio-based method, the accuracy of the proposed model to detect incipient faults cannot be guaranteed. Transformer insulation health index is quantified in [29] based on DGA data and other oil quality factors without investigating various DGA interpretation techniques. While an adaptive neuro fuzzy inference system for DGA interpretation is presented in [30] that facilitate adaptive learning, the model is developed based on ratio methods only which makes the model suitable to identify faults with significant amount of gases used in such ratios with limited accuracy in identifying incipient faults with minimum fault gas concentration. In this paper, a new expert system that takes the gas evolution rate into account and has the ability of learning and tuning is introduced. This expert system is based on Gene Expression Programming as briefly explained below.

Gene Expression Programming
Gene Expression Programming (GEP) is a learning technique that can find relationships between variables and builds mathematical models revealing these relationships [31][32][33][34][35]. Like genetic algorithm (GA) and genetic programming (GP), GEP selects populations of individuals based on their fitness and presents genetic variation using one or more operator. However, the nature of individuals is different in the three mentioned algorithms. While the individuals in GA are of linear strings of fixed length (chromosomes), they are of nonlinear entities of different sizes and shapes (expression tree) in GP. On the other hand, the individuals in GEP are encoded as linear strings of fixed length expressed as nonlinear entities of different sizes and shapes. Hence, GEP comprises the key features of the GA and GP. The main feature of GEP is the ability of formulating a mathematical expression between the dependent and independent variables that performs well for all fitness cases. The GEP process as shown in the flowchart of Figure 1 starts by creating a random generation of the chromosomes after which the fitness of each individual is evaluated. The chromosomes are modified through genetic operations (mutation, gene recombination, and transposition) to create a new generation. The process is continued until a termination criteria; number of generations or least error is obtained. The gene in GEP comprises head and tail. While the length of the head is selected based on the investigated problem, the length of the tail depends on the length of the head and the number of independent variables in the investigated problem.

Proposed GEP Model
To establish a general DGA interpretation model, all possible transformer faults along with the current conventional interpretation techniques are integrated as shown in Table 2. In this Table, F2 represents thermal fault of various temperature ranges as per the codes of the five conventional interpretation techniques shown in the Table. On the other hand F3 represents partial discharge and F4 is for arcing fault. The model is designed to provide a health index (h) between 2 and 8 that represents the three faults listed in Table 2. A value between 0 and 2 is reserved for normal transformer operation and is designated as F1 while F5 is assigned for invalid code in case of ratio methods (IEC, Roger and, Doerenburg). As stated above, there is no 100% consistency among the existing DGA interpretation techniques. The proposed approach in [19], did not take into account the rate of gas evolution, self-tuning of the developed rules based on the model's output and for some DGA samples, the degree of criticality is not correctly reflected. To overcome these limitations, this approach is amended to the one shown in Figure 2. In the revised approach, the process starts by applying the key gas method to the measured DGA data. As key gas is the most conservative method among all existing interpretation techniques, the model reports normal analysis without any further investigations for oil samples comprising gases concentration less than condition 1 specified by the standards of this method [3]. On the other hand, if the key gas method detects abnormal analysis, the model performs further investigations on the same DGA oil sample using Duval triangle and the three ratio-based methods. Where previous DGA data for the same transformer are available, the rate of gas evolution is calculated by the model and is used in Duval triangle analysis instead of the absolute DGA measurements. Availability of the rate of gas evolution facilitates a proper asset management decision by the model. If any ratio-based method results in out-of-code value, the model eliminates its contribution to the final decision. The overall model output is identified based on the same weights provided in [19] however, these weights can be continuously adapted based on the results of the model that are to be stored and assessed against further oil analysis along with engineering judgment. The model also provides a more reliable health index of which fault type and a proper asset management decision can be identified as proposed in Table 3. It is to be noted that while [36] presented an application for GEP to ease the DGA interpretation process, the proposed model did not combine all conventional techniques in one equation based on their accuracy levels. Also, the transformer health index based on the proposed model is not provided. Furthermore, in the aforementioned paper normal state is identified based on the IEC ratio method. It is well known that all ratio methods (including IEC) are not suitable to identify normal condition as these methods are only valid when a significant amount of the gases used in the proposed ratios exist in the oil sample which makes them suitable for quantifying the fault type rather than identifying it. Excessive oil decomposition.
Exercise extreme caution. Check gas generation rate weekly or daily.
Reduce loading (below 50%). Further oil analysis must be conducted.
Oil must be degassed/filtered. Plan for outage.
Input variables to the proposed model are the concentration of the seven key gases in parts per million (ppm) and the output is divided into 4 sets including all health conditions that operating transformers may exhibit as summarized in Tables 2 and 3. To investigate whether thermal fault involving cellulose or not, the ratio CO 2 /CO is used. Although CO and CO 2 can be found at normal condition due to atmospheric leak or as a result of oil long term oxidation which makes this ratio not a reliable indicator for paper degradation, it can be used as a flagship for further paper investigations and testing such as furan analysis and degree of polymerization. GEP is used to build models for the most popular DGA interpretation techniques that include key gas method, Roger, Doerenburg and IEC ratio methods and Duval triangle method. The individual outputs of these methods are weighted in accordance to [19] to calculate one health index between 0 and 8 as stated in Table 2.
All models are developed based on the method's guidelines and standards. 660 DGA sample data were collected from various operating transformers under different operating, age and health conditions. The health condition of each transformer was confirmed through engineering judgement along with other complimentary testing such as partial discharge, Furan contents, dielectric dissipation factor, moisture contents and frequency response analysis [37]. 70% of the collected data were used to train the proposed GEP model that was developed as per the flow chart in Figure 2. The remaining 30% of the collected data were used to validate the model. Figure 3 shows the health index returned by the model along with the actual health index calculated based on practical confirmed faults for 198 DGA samples used during the validation stage. The root square error between transformer health index calculated by the model and the actual health index is 0.38 with a correlation coefficient of 0.8 which reveals a satisfactory performance for the developed model. The main feature of the GEP model is its ability to retune in order to increase its accuracy and its capability to estimate a mathematical correlation between the dependent and independent variables which is an advantage for practical applications. Figure 4 shows the genes along with the expression tree of the proposed GEP model. In the generated genes and expression tree, parameters from d0 to d6 represent the input 7 key gases to the model (CH 4 , C 2 H 4 , C 2 H 2 , C 2 H 6 , H 2 , CO 2 , CO) while c0 and c1 are constants generated by the model. It is to be noted that the functions used in the expression tree are user-defined functions and can be changed based on the nature of the investigated problem. In the model shown in Figure 4 functions such as arctangent (Atan), cube root (3RT), square root (sqrt) along with mathematical operations; addition (+), subtraction (−), division (/) and multiplication (*) are selected to define the mathematical correlation between dependent and independent variables. The expression tree shown in Figure 4 is read similar to a text from top to bottom left to right to generate the corresponding mathematical relationship.

Results and Discussions
The model is tested with DGA results of H 2 (75 ppm), CH 4 (87 ppm), C 2 H 6 (58 ppm), C 2 H 4 (40 ppm), C 2 H 2 (10 ppm), CO (260 ppm), CO 2 (950 ppm) as detected in a transformer oil sample using DGA measurements. As the amount of all key gases and the total dissolved combustible gas are within condition-1 of the IEEE standard [13], the model overall output is only based on the key gas method as per the flow chart in Figure 2. Thus, the model reflects a health index h=1 i.e., normal condition.

Comparison with Other AI-based Models
As mentioned in the introduction section, a fuzzy logic-based DGA interpretation model is presented in [19]. While this model integrates all conventional DGA interpretation techniques based on pre-specified weighting factors, the model does not take into account the evolution rate of the individual gases nor the dynamic change in weighting factors and adaptive learning of the model. To assess the robustness of the new GEP-based model proposed in this paper, DGA data in [19] are re-assessed using the developed GEP model and compared with the fuzzy logic results of [19] as listed in Table 4. In Table 4, the first 3 samples show normal condition because the concentration of all key gases in these samples is below the fault limit of key gas method. Hence both models result in "no fault" condition and the GEP provides a health index of 1. Samples 4, 5 and 6 comprise high concentration of C 2 H 4 and CH 4 which reflects a thermal fault. Both models result in "thermal fault" and the GEP model indicates a health index between 2 and 4 for the three samples. The high concentration of C 2 H 2 in samples 7, 8 and 9 indicates arcing fault. The two models outputs coincide with this analysis. Depends on the severity of fault, the GEP model provides a health index ranges between 6 and 8 for these three samples. According to Table 3, these units should be taken out of service for further investigation for the source of arcing. The high concentration of H 2 in DGA samples 10 and 12 in Table 4 indicates low energy discharge (corona) that both models result in. While the output of the fuzzy-logic-based model in [19] reflects corona fault for DGA sample 11, the GEP model results in arcing fault. This is attributed to the considerable amount of C 2 H 2 in this oil sample.
To compare the proposed model with other artificial intelligent (AI)-based models published in the literature, DGA results in [38] have been re-assessed using the proposed GEP model. In [38], 10 DGA samples are assessed using four AI-based models namely; artificial neural network (ANN), support vector machine (SVM), extreme learning machine (ELM) and self-adaptive evolutionary extreme learning machine (SaE-ELM). The authors in [38] divided possible faults into partial discharge (PD), discharges of low energy (D1), discharges of high energy (D2), thermal fault less than 300 • C (T1), thermal fault between 300 • C and 700 • C (T2) and thermal fault higher than 700 • C (T3). Table 5 shows the results of the 10 investigated DGA samples as published in [38] along with the result of the GEP model proposed in this paper. While other models show faulty cases for DGA samples 1, 3, 6 and 8, the proposed GEP model results in "no fault". This is attributed to the fact that the concentration of the key gases of these samples are below the maximum normal limits provided in the key gas analysis standard. While other models reflected thermal fault for samples 2, 4, 7 and 10, the GEP model suggests a corona fault due to the high concentration of H 2 in such samples. Due to the relatively high concentration of C 2 H 4 in sample 5, all models including GEP suggest thermal fault. Sample 9 comprises high concentration of all gases except C 2 H 2 which reveal thermal fault along with high energy discharge. All models in [38] result in thermal fault and the proposed GEP model indicates high energy discharge.

Evaluation of the Model on Units with DGA History
The trend of gases evolution is investigated for some transformers with DGA history. Some of these results are presented below.
DGA samples were collected yearly for a 28 MVA, 11/34.7 kV power transformer. Results during the period 2013 to 2016 are shown in Table 6 (all gases concentrations are in ppm). The increment increase in H 2 and CO 2 indicate thermal fault involving cellulose. The GEP model results in 4.5 health index for such transformer. More frequent DGA samples are recommended and further oil assessment should be conducted. Due to the high concentration of H 2 in the first DGA sample of another power transformer shown in Table 7, it was recommended to observe the gas evolution based on weekly samples and degas the oil. After 4 weeks all key gas concentrations indicated normal operation as can be seen from sample 5 in the table. GEP model reflected a health index of 0.5 for such transformer. While the CO 2 /CO in the DGA samples collected for a third power transformer (Table 8) indicates normal cellulose aging, the trend increase in H 2 indicates low energy electrical discharge and the GEP model provided a health index of 4 for such transformer.

Advantage of the Proposed Model Over Conventional Interpretation Techniques
To confirm that DGA is not an exact science and conventional interpretation techniques may lead to different outcomes for the same oil sample, some DGA data are listed in Table 9 with the type of fault revealed by the five conventional interpretation methods along with the fault type suggested by the GEP model. Table 9 indicate normal operation by the key gas method. Other ratio-based methods indicate either out-of-code or false interpretation. According to the flow chart in Figure 2, the output of the GEP model will be only based on the key gas method and hence provides normal operation for these three samples. It is to be noted that the CO 2 /CO ratio depends on the individual concentration of the two gases and is significant when the concentration of CO 2 and CO becomes considerable. For example sample 3 in Table 9 indicates a ratio of 0.2 however, individual concentration of both gases is not significant and hence normal operation can be confirmed. On the other hand while the CO 2 /CO ratio in samples 1 and 2 is above 10, the high concentration of both gases should be investigated and furan analysis is recommended.

The first 3 samples in
The high concentration of the H 2 in samples 4 indicates partial discharge activity. The high concentration of CO and CO 2 in this sample may be an indication for paper degradation even though their ratio is far above 10. The developed GEP model integrates the results of the conventional techniques based on their accuracy levels to provide an overall result (F3). As indicated in the flow chart of Figure 2, in case of out-of-code result for a ratio-based method (e.g., IEC in this case), its contribution to the overall result is eliminated.
The high concentration of C 2 H 2 in sample 5 reflects an arcing fault which was resulted by 3 conventional techniques along with the developed GEP model. While the above results show a good performance for the developed GEP model, there is uncertainty of the health index reflected by the model. Also, accuracy of the model in detecting multiple faults is limited. These limitations can be improved through training the model using wide range of DGA data covering all types of faults. Also, as the model was built based on the weights published in [19], the accuracy may be improved by adapting these weights using more DGA results.

Conclusions
This paper presents a gene expression programming-based model to analyze the dissolved gas analysis data. The developed model takes into account the history of the transformer DGA and is easy to adapt based on the obtained results. While the model shows satisfactory performance, its accuracy can be improved further by considering more DGA samples. The implemented model reflects a general health condition of the investigated DGA sample using five interpretation techniques that are combined using specific weighting factors calculated based on a consistency analysis performed on collect DGA samples from operating power transformers. These weighting factors are not stationary and can be adapted through more DGA results. The developed model is easy to implement within existing online DGA sensors to automate and standardize the interpretation process of the measured characteristic gases.
Funding: This research received no external funding.

Conflicts of Interest:
The author declares no conflict of interest.