An Unsupervised Mutual Information Feature Selection Method Based on SVM for Main Transformer Condition Diagnosis in Nuclear Power Plants

: Dissolved gas in oil (DGA) is a common means of monitoring the condition of an oil-immersed transformer. The concentration of dissolved gas and the ratio of different gases are important indexes to judge the condition of power transformers. Monitoring devices for dissolved gas in oil are widely installed in main transformers, but there are few recorded fault data of main transformers. The special operation and maintenance modes of main transformers leads to the fault modes particularity of main transformers. In order to solve the problem of insufﬁcient samples and the feature uncertainty, this paper puts forward an unsupervised mutual information method to select the feature veriﬁed by the optimized support vector machine (SVM) model of particle swarm optimization (PSO) method and tries to ﬁnd the feature sequence with better performance. The methos is validated by data from nuclear power transformers.


Introduction
Power transformers that work under harsh environments would experience thermal decomposition of oil and cellulose insulation materials, such as arcing, corona discharge, low energy sparks, severe overloading, overheating of insulation systems and pump motor failures. These conditions alone or in combination can produce combustible and noncombustible gases [1] Detection of anomalies requires an assessment of the amount of gas produced. Gas in oil-immersed transformers can be used to identify fault types, including thermal and electrical interference. Gases obtained from chromatographic analysis of insulating oils may contain dissolved carbon monoxide (CO), carbon dioxide (CO 2 ), nitrogen (N 2 ), hydrogen (H 2 ), methane (CH 4 ), acetylene (C 2 H 2 ), ethylene (C 2 H 4 ), and ethane (C 2 H 6 ). The composition, formation rate and specific content ratio of dissolved gas can be used to indicate transformer condition.
The composition and content of dissolved gases in oil of transformer insulation can reflect the operation condition of transformer to a great extent thus dissolved gas analysis (DGA) has become an effective method for fault diagnosis of oil-immersed transformers [2].
Organizations such as the Institute of Electrical and Electronics Engineers (IEEE) and the International Electrotechnical Commission (IEC) recommend a variety of diagnostic techniques [3], depending on the type of transformer and operating conditions. Some of the most commonly used techniques include Doernenburg ratio, Rogers ratio, Duval triangle model, etc. These classical diagnostic methods mostly take the ratio of different gases as the characteristic input and then judge the actual operating condition of the transformer by the threshold value formed by experience or statistical methods. Fuzzy network, support vector machine, artificial neural network, and other commonly used artificial intelligence methods are also generally introduced into the field of power transformer fault diagnosis [4][5][6][7]. However, in the studies of different scholars, the features used as the basis of intelligent diagnosis are often different.
In previous studies, in addition to problems in diagnostic methods, there are the following phenomena in monitoring data: less test data, less available data sets and unbalanced data type distribution, which bring great problems to algorithm verification [8].
Main transformers are important equipment for power generation of nuclear facilities. They are in a high-load long-term condition and are more prone to failure caused by aging [9]. Meanwhile, due to the particularity of nuclear power refueling overhaul and the conservative culture of nuclear power [10], the maintenance strategy of nuclear equipment is more rigorous and conservative, and the failure modes of main transformers may be slightly different. The data of the main transformers are classified separately in the IEC database [2], which shows transformer performance difference in nuclear industry.
Due to the particularity of nuclear power transformers, there are less marked data and more constraints on the monitoring data that can be used for research. The features are important inputs of the diagnostic algorithm. High-dimensional features bring high computational cost and the risk of "over-fitting". Dimensionality reduction or selection is an important research direction.
In this paper, an unsupervised mutual information feature algorithm is proposed for feature selection of different features proposed in the current classical algorithm and intelligent algorithm, as a pattern recognition method, SVM can construct the optimal classification hyperplane under the condition of small sample learning and distinguish transformer conditions according to the input features. The main transformer condition diagnosis model based on support vector machine is adopted for diagnosis in this paper, and the case data of main transformer is verified.

Framework of the Feature Selection Method Based on the SVM Model for Main Transformers
The research framework for the feature selection method based on the architecture of the SVM model for the main transformers is shown in Figure 1. The gas concentration values measured in the continuous operation process of nuclear power transformer are obtained, and there is almost no-fault data.
The features used in various power transformer condition diagnosis methods based on dissolved gas in oil are extensively studied. On this basis, the initial feature set is formed.
The unsupervised mutual information feature extraction algorithm is adopted to extract features, and the set of sequence features is obtained according to the weight coefficient from high to low.
In the feature set, different number of feature sets are selected sequentially and verified by optimized SVM model for transformer fault diagnosis.
In order to reduce the contingency of the experiment, the 5-fold verification method is used to process the training samples and test samples to verify the validity of the selection feature in the diagnosis of the nuclear power transformer condition diagnosis.
Based on the accuracy of diagnosis, the feasibility of different feature extraction algorithms in the condition diagnosis of main transformers is analyzed.

Condition Diagnosis Model for Main Transformer
Condition diagnosis model is important to verify the feature selection algorithm and to determine the accuracy and rapidity of transformer condition diagnosis. SVM is a machine learning method based on statistical learning theory, compared with other algorithms, it can well solve practical problems such as small sample, nonlinear [11], the PSO algorithm can converge fast in the parameter optimization. The SVM optimized with PSO in the field of power transformer fault diagnosis has further application [12].

Support Vector Machine
State diagnosis of main transformers, As the case of a typical nonlinear classification problem, the overall plan of SVM is the first use of a nonlinear transform the input space data is mapped to a high-dimensional feature vector space, and then in the feature space of the optimal separating hyperplane is constructed, linear classification, after the last map back to the original space Became a nonlinear classification of input space [13].
SVM settings At present, the commonly used kernel functions are mainly polynomial kernel function, radial basis (RBF) kernel function, hyperbolic tangent (sigmoid) kernel function, and so on. This paper mainly uses the RBF kernel function to the apply to SVM model.

PSO for Optimal Parameters
PSO is a kind of evolutionary computation, the basic idea of which is to find the optimal solution through the cooperation and information sharing between individuals in the group. It mimics a bird in a flock by designing a massless particle with just two properties: speed, which represents how fast it is moving, and position, which represents the direction it is moving. Each particle separately searches for the optimal solution in the search space, and records it as the current individual extreme value, and shares the individual extreme value with other particles in the whole particle swarm and finds the optimal individual extreme value as the current global optimal solution of the whole particle swarm. All particles in a swarm adjust their speed and position based on the current individual extremum they find, and the current global optimal solution shared by the whole swarm [14].
PSO-SVM Parameter Settings C 1 : the initial value is 1.5, local search capability of PSO parameters Popgmin: the initial value is 0.01, the minimum change value of the SVM parameter C.

Feature Selection Algorithms for Main Transformer Condition
This chapter introduces the unsupervised mutual information filtering feature sorting method used in feature selection. In feature selection, the relevance of each feature is calculated first, the importance of the feature is evaluated by the forward sequential search, and finally an ordered feature sequence is output.

Stepwise Feature Selection Process
The process of stepwise feature selection is to select a feature from the unselected feature set each time and add the feature set S. In accordance with the selection order, the feature set outputs an ordered feature sequence.
When initializing, the feature set is empty. The unselected feature set is the complete set of all known features.
After each step selection, the feature set increases the feature set selected in this step, while the feature set not selected reduces the feature set selected in this step. Until the unselected feature set is empty.

Selection Principle
The principle of ''minimum redundancy-maximum correlation" which is similar to the famous supervised feature selection method is adopted [15], and the selection of the mth feature is based on: where U m represents the set of unselected features in the current step f i represents a feature in the unselected feature set in the current step; Rel( f i ) represents Relevance of feature f i , which is the average mutual information between feature f i and any other one in the whole feature set is defined as Rel( f i ). Rel( f i ) can be calculated with Formula (2).
S m−1 is the selected feature set in the current step; Red( f i , f t )is the redundancy of feature f i relative to selected feature f t . Red( f i , f ) t can be calculated with Formula (3). (4).

Relationship with Supervised Algorithms
When the data type is supervised, the labels of the class can represent the information of the whole feature set.
Then relevance of feature f i can be defined as where c in Formula (5) is the class label [16].
Redundancy between feature f i and the selected feature f t is defined as According to the principle of mathematics [17], relevance in an unsupervised algorithm is the lower bound of relevance in a supervised algorithm, and redundancy in an unsupervised algorithm is proportional to the redundancy in a supervised algorithm. When the initial feature set is approximately equal to the labels of the class, the sequence features obtained by the unsupervised algorithm are highly correlated with the sequence features obtained by the supervised algorithm.

Experiment Description
The internal fault mode of power transformer is mainly mechanical fault, thermal fault and electrical fault, the latter two types of faults is the major issues, and mechanical fault is often shown in the form of thermal fault or electrical fault [18]. General power transformers are often subdivided into fault modes according to the degree of heating or arcing.
Due to safety culture of nuclear power plants, maintenance strategy for the main transformers tends to be conservative and strict; The failure of nuclear power transformers is rare to happen, and the failure data that can be accessed to publicly is very few. Available fault data cannot cover all the modes. Therefore, the condition of the main transformers are divided into the following three types in this paper, and only two summative failure modes are reserved and the corresponding as illustrated in Table 1. Low-energy discharge 3 High-energy discharge 4 Thermal fault < 300 • C Thermal fault 2 5 Thermal fault 300 to 700 • C 6 Thermal fault > 700 • C 7 Normal 3 The fault data used in the experiment in this paper are reactor-related transformer data obtained from IEC TC database, and the feature selection data and normal data are the monitoring values of a nuclear power main transformer under normal operation. The data can be obtained in the Supplementary Material. A nuclear power plant generator, 24 kV voltage, is stepped up to 500 kV and connected to the 500 KV power grid through the main transformer. The main transformer is a three single-phase transformer, each phase capacity 410 MVA. The neutral points on the high voltage side are connected and directly grounded. Oil is regularly sampled and analyzed once every 3 months manually. Sampling intervals can sometimes be uneven, depending on special focus judged by sampling staff or adjustment by work schedules.

Sample Data
Gas concentration of DGA method is analyzed in one part per million (PPM). To facilitate the presentation and analysis of the characteristics in the figure, logarithmic processing is performed for each monitoring value in Figures 2 and 3. The box diagram of the monitored data value in the sample data is shown in Figure 2. Sample type distribution and monitoring value distribution of each basic feature are presented in Figure 3.

Basic Features by DGA Condition Monitoring
Typical gases measured in the DGA method of the main transformers include H 2 , CH 4 , C 2 H 2 , C 2 H 4 , C 2 H 6 , CO, CO 2 as illustrated in Figures 2 and 3.
Other features commonly used in transformer condition diagnosis methods are shown in Table 2. Other approach [20] 12 %H 2 fourth % ratio [21] The symbols in the Duval triangle method shown in Table 1 are denoted as Table 1 represent the contents of five characteristic gases, respectively, and Total Combustion Gases (TCG) as in: And %H 2 = 100 * H 2 /(H 2 + C 2 H 6 + CO + CO 2 ). As shown in Figures 2 and 3, the class of the sample is not balanced and several basic features are similar in distribution. Unsupervised feature extraction is adopted to obtain the feature set with maximum correlation and minimum redundancy. Table 3 shows the comparison of feature sequences extracted by the unsupervised mutual information method and supervised mutual information method and their corresponding weights. As obtained from Table 3. Table 3 Extracted feature sequence by unsupervised MI V.S. supervised MI the mutual information of unsupervised feature extraction method selects the same first feature from cases of nuclear power transformer DGA with that of supervised method, while the weight value, generated by supervised feature extraction algorithm is greater than the corresponding values in the unsupervised algorithm, which means supervised method has more dynamic to choose the first feature; In other steps, selected feature is not the same, but the unsupervised algorithm has a strong dynamic at each step.

Diagnostic Precision with Supervised Mutual Information Feature Selection Method versus Unsupervised Approach
According to the framework described in Section 2, a PSO-optimized SVM model is applied to classify cases to reflect the fitness of the selected features.
The feature sequences selected by the supervised mutual information feature selection algorithm and the unsupervised mutual information feature selection algorithm are applied, respectively. Different numbers of features are selected from feature sequences obtained by both methods, and the diagnostic accuracy of the both method is shown in Figure 4, in which the red point is where the best fitness is obtained.  As can be seen from Figure 4, features selected from both unsupervised and supervised feature selection methods have good performance as input to diagnostic accuracy in cases, and both of the models achieve greatly increased diagnostic accuracy in the second feature.

Diagnostic Precision by Features of the Unsupervised Approach with Best Fitness vs. Other Classical Feature Set
Feature set with the highest diagnostic accuracy obtained by unsupervised mutual information feature selection method is used as input to optimized SVM diagnosis model for main transformer cases, other typical feature sets are used in contrast as shown in Table 4. Note: 1 refers to the features of the unsupervised approach with the best fitness obtained as shown in Figure 4 as input to the optimized svm model; 2 refers to the features used in the three ratios method [1]; 3 refers to the features used in some intelligent methods [16]; 4 refers to the features used in the Duval Triangle method [2].
As shown in Table 4, the feature set obtained by the unsupervised mutual information feature selection algorithm, is used as the input of the optimized SVM diagnosis model and performs better than other inputs of the feature set in the case of diagnosis of the main transformer condition diagnosis. Therefore, the algorithm has high applicability.

Conclusions and Analysis
Fault data of main transformer lacks. In addition, the fault mode of the main transformer is different from that of other power transformers. This paper proposes an unsupervised mutual information feature selection method to calculate DGA monitoring data of main transformer and output feature selection sequence. Compared with the supervised mutual information feature selection algorithm, the unsupervised mutual information feature selection algorithm is highly correlated with the sequence features output by the supervised feature selection algorithm in feature selection. In the samples, the training samples and test samples were designed by five-fold method based on the appropriate feature set obtained by the unsupervised mutual information feature selection algorithm. The PSO optimized support vector machine model was used to verify the main transformer fault diagnostic, and the diagnosis accuracy was high. This method is suitable for feature extraction in main transformer fault diagnosis. However, the feature extraction method based on unsupervised mutual information is essentially an embedded feature extraction method with some significant advantages and disadvantages at the same time. The redundancy between features in the selected feature set is minimized, and its limitations depend on the evaluation of candidate solutions by the classification algorithm, which is computationally more expensive. Therefore, the offline data set can be used for training and verification in practical application, and the obtained feature set can be used to judge the condition of nuclear power transformers online.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/ 10.3390/su14052700/s1, the experimental data of DGA samples used in Section 5.1 of this paper.