Frequency Feature Learning from Vibration Information of GIS for Mechanical Fault Detection

The reliability of gas insulated switchgear (GIS) is very important for the safe operation of power systems. However, the research on potential faults of GIS is mainly focused on partial discharge, and the research on the intelligent detection technology of the mechanical state of GIS is very scarce. Based on the abnormal vibration signals generated by a GIS fault, a fault diagnosis method consisting of a frequency feature extraction method based on coherent function (CF) and a multi-layer classifier was developed in this paper. First, the Fourier transform was used to analyze the differences and consistency in the frequency spectrum of signals. Secondly, the frequency domain commonalities of the vibration signals were extracted by using CF, and the vibration characteristics were screened twice by using the correlation threshold and frequency threshold to further select the vibration features for diagnosis. Then, a multi-layer classifier composed of two one-class support vector machines (OCSVMs) and one support vector machine (SVM) was designed to classify the faults of GIS. Finally, the feasibility of the feature extraction method was verified by experiments, and compared with other classification methods, the stability and reliability of the proposed classifier were verified, which indicates that the fault diagnosis method promotes the development of an intelligent detection technology of the mechanical state in GIS.


Introduction
As a piece of control and protection equipment in power system [1,2], gas insulated switchgear (GIS) plays a significant role in high-voltage power grids. Discovering potential defects and hidden danger in the process of operation of GIS equipment in time can ensure the reliability and security of power grid operations.
The existing research on the reliability of GIS is mainly focused on the insulation fault diagnosis through signal analysis, and many experts have conducted extensive research on this topic. The main detection methods of partial discharge defects include the electrical method [3][4][5], acoustic method [6,7] and chemical method [8,9]. Aiming at the condition monitoring and diagnosis of gas insulated structures, a real-time measurement system combining signal acquisition, mode generation, feature extraction and defect recognition was proposed [10]. The ultra high frequency (UHF) method was used to analyze the characteristics of partial discharge, and short-time Fourier transform (STFT) [11] was used to describe the time-frequency characteristics [12,13]. Combined weight function classification tools and K-means clustering, and pulse parameters in both time and frequency domains were used to effectively identify noise signals and discharge pulses [14].
Compared with insulation faults, the development of intelligent diagnosis technology for mechanical faults in GIS is very slow. Under the action of electromotive force generated by AC current in conductors, the vibration signal in the fault changes correspondingly compared with the normal situation. In order to realize the intelligent diagnosis of mechanical faults in GIS, it is necessary to study in depth the characteristics of vibration signals of the GIS shell. The empirical mode decomposition (EMD) [15] method was used to analyze the vibration signal, and the characteristic matrix was defined to form the criterion of mechanical fault in GIS [16]. The full-acoustic acquisition method was used to collect different mechanical fault data, and the acoustic characteristics of signal was summarized to conduct fault diagnosis [17]. The transient vibration characteristics of GIS were analyzed by using finite element simulation software ANSYS, and the theoretical basis of mechanical defect detection technology in GIS based on vibration information was provided [18,19]. The vibration mechanism of GIS was studied in depth, and by extracting features of vibration signals using spectrum analysis, a method for detecting the mechanical state of GIS based on vibration information was proposed [20,21]. A new algorithm, which is composed of the k-nearest neighbor algorithm and the fuzzy c-means clustering algorithm, for the mechanical fault diagnosis of ultra-high voltage GIS was proposed to realize the detection of the mechanical state of GIS [22].
Generally speaking, the aforementioned documents have made great contributions to the development of mechanical fault diagnosis technology in GIS. However, due to the non-linearity, signal dispersion and noise interference of the GIS system, it is difficult to extract features and screen feature space. The features extracted from the aforementioned documents are insufficient, and the problem is more prominent when the number of samples is large. In addition, the training process of a single classifier is affected by the overall error rate, so the model may favor the majority class and ignore the minority class. The feature extraction method based on coherent function (CF) [23] can summarize the similarities of a spectrum and get the feature sets, and the union of all typical fault feature sets is selected as the feature atlas of GIS fault description. Holistic learning [24,25] is very common in machine learning [26,27]. A series of single weak classifiers are constructed and combined to classify or predict new data by a weighted or unweighted voting method. One-class support vector machines (OCSVM) [28][29][30] can solve the problem of unbalance between normal data and fault data; beyond this, it can judge the unknown faults, and the feature has been applied in the field of fault diagnosis. For example, system combining model-based diagnosis and data-driven anomaly classifiers for fault isolation used OCSVM to identify unknown faults, and the validity of the method was verified in the internal combustion engines [31,32]. SVM [33,34] can effectively divide the feature space and better classify the fault conditions.
In this paper, a new feature extraction method based on CF was proposed, and a multi-layer classifier composed of OCSVM and support vector machine (SVM) was constructed. The feasibility of the feature extraction method was verified by experiments, and the advantages of the proposed classifier were verified by comparing with the general classification methods such as Softmax [35], SVM, back propagation neural networks (BPNN) [36] and naive Bayes (NB) [37]. The main contributions of this paper can be summarized as follows: (1) GIS mechanical fault is diagnosed by a holistic approach which integrates the vibration signal acquisition system, feature extraction based on CF and a multi-level classifier composed of OCSVMs and SVM; (2) The CF is introduced into the feature screening process, and a method of feature extraction based on CF with double thresholds is proposed, which provides a new idea for feature screening and can fully describe characteristics of the vibration signal; (3) A multi-layer classifier composed of OCSVM and SVM is designed to diagnose GIS faults.
The remainder of this paper is organized as follows: Section 2 introduces the vibration information acquisition system of GIS, the experiment platform and the vibration signal analysis; Section 3 presents the method of extracting vibration features; Section 4 discusses the establishment of GIS fault classifier; Section 5 discusses the parameters of the diagnosis model and the optimization of feature space, and compares the method with other traditional diagnosis methods to prove the improvement in the diagnosis accuracy. Finally, Section 6 summarizes the contributions of the paper.

Experiments
In order to collect the data needed for the study of mechanical fault detection technology in GIS, a YD-81D acceleration sensor, DHF-7-3 charge amplifier and NI PCI-4472B acquisition card were used to build the vibration signal acquisition system, and the parameters of the system are shown in the Table 1. The experimental object is a 110 kV three-phase common-box GIS experimental platform, as shown in Figure 1a. The types of faults simulated in this paper are shown in Figure 1b: (1) Isolation switch fault-the fault can be simulated by adjusting the position of the isolation switch; (2) Looseness of flange screw-the fault can be simulated by loosening three bolts used to fix flange; (3) Looseness of stone bolt-the fault can be simulated by loosening two bolts supporting GIS in section A. The experimental data recorded are shown in Table 2. The vibration data of each of the working conditions were collected 200 times with the alternating current (AC) of 50 Hz and 1000 A, forming the data set of the research in this paper. of feature space, and compares the method with other traditional diagnosis methods to prove the improvement in the diagnosis accuracy. Finally, Section 6 summarizes the contributions of the paper.

Experiments
In order to collect the data needed for the study of mechanical fault detection technology in GIS, a YD-81D acceleration sensor, DHF-7-3 charge amplifier and NI PCI-4472B acquisition card were used to build the vibration signal acquisition system, and the parameters of the system are shown in the Table 1.

Parameters
Value measuring range (g) ±0.5 sensitivity (V/g) 10 maximum output voltage (V) ±5 weight of a sensor (g) 10 sampling rate (kHz) 10 sampling time length (ms) 100 The experimental object is a 110 kV three-phase common-box GIS experimental platform, as shown in Figure 1a. The types of faults simulated in this paper are shown in Figure 1b: (1) Isolation switch fault-the fault can be simulated by adjusting the position of the isolation switch; (2) Looseness of flange screw-the fault can be simulated by loosening three bolts used to fix flange; (3) Looseness of stone bolt-the fault can be simulated by loosening two bolts supporting GIS in section A. The experimental data recorded are shown in Table 2. The vibration data of each of the working conditions were collected 200 times with the alternating current (AC) of 50 Hz and 1000 A, forming the data set of the research in this paper.

Vibration Data Analysis
When AC flows into the conductor, the vibration information shows strong periodicity. The reason is that the excitation source of the GIS vibration is electromagnetic force, of which the frequency is twice that of the AC frequency [16]. Then, the traditional Fourier transform can be used to analyze the amplitude-frequency distribution characteristics of the vibration signal, as showed in Figure 2.

Vibration Data Analysis
When AC flows into the conductor, the vibration information shows strong periodicity. The reason is that the excitation source of the GIS vibration is electromagnetic force, of which the frequency is twice that of the AC frequency [16]. Then, the traditional Fourier transform can be used to analyze the amplitude-frequency distribution characteristics of the vibration signal, as showed in Figure 2. The main energy of the vibration signal is concentrated at the frequency around 1 kHz, and there is less energy at the frequency of 0.1 kHz. The reason is that the GIS system exhibits strong nonlinearity, and its natural frequency is about 1 kHz excited by the electromagnetic force of 100 Hz. When the mechanical fault occurs to GIS, the change of natural frequency leads to differences in spectrum under the same excitation, which are illustrated by the following remarkable differences: (1) the normal case, the isolation switch fault, the looseness of flange screw and the looseness of stone bolt have the highest frequency of vibration signal energy at 0.9 kHz, 1 kHz, 1.1 kHz and 1 kHz, respectively; (2) the energy ratio of 0.7 kHz and 1.2 kHz in fault working conditions is less than that of the normal case; (3) the energy at 2 kHz of vibration signal in flange fault is higher than that of other working conditions. In order to further analyze the frequency domain characteristics of vibration signals in GIS, three samples were taken from the vibration signals collected in each working conditions to perform a Fourier transform to obtain the spectrum, as shown in Figure 3. The main energy of the vibration signal is concentrated at the frequency around 1 kHz, and there is less energy at the frequency of 0.1 kHz. The reason is that the GIS system exhibits strong nonlinearity, and its natural frequency is about 1 kHz excited by the electromagnetic force of 100 Hz. When the mechanical fault occurs to GIS, the change of natural frequency leads to differences in spectrum under the same excitation, which are illustrated by the following remarkable differences: (1) the normal case, the isolation switch fault, the looseness of flange screw and the looseness of stone bolt have the highest frequency of vibration signal energy at 0.9 kHz, 1 kHz, 1.1 kHz and 1 kHz, respectively; (2) the energy ratio of 0.7 kHz and 1.2 kHz in fault working conditions is less than that of the normal case; (3) the energy at 2 kHz of vibration signal in flange fault is higher than that of other working conditions. In order to further analyze the frequency domain characteristics of vibration signals in GIS, three samples were taken from the vibration signals collected in each working conditions to perform a Fourier transform to obtain the spectrum, as shown in Figure 3.  It is observed that there is almost no energy distribution at the frequency point above 2500 Hz, so the spectrum range is set below 2500 Hz, as shown in Figure 3a. Due to the serious overlap of frequency points, the 700-1200 Hz frequency spectrum with more concentrated energy is further selected for observation, as shown in Figure 3b. Figures 2 and 3 indicate the following conclusions: (1) The signal spectrum in the same working condition has a very high similarity, and the energy distribution at most frequency points is basically the same; (2) there is a great difference in the energy distribution at frequency points between vibration signals in different working conditions; (3) the vibration signals have energy distribution at most frequency points, and numerous frequency points with low energy cannot be ignored. The above characteristics can be used to identify the fault types of signals in different working conditions.

Principle of CF
In the field of signal processing, CF is commonly used to measure the degree of linear correlation between two signals in each frequency component. In this paper, the CF is used for feature extraction.
Suppose there are two time-domain signals Sx(t) and Sy(t), the calculation methods of CF are as follows [38]: (1) calculate Fourier spectrum Ax(f) and Ay(f) of Sx(t) and Sy(t), respectively; (2) calculate self-power spectral density functions Sx(f) and Sy(f),  It is observed that there is almost no energy distribution at the frequency point above 2500 Hz, so the spectrum range is set below 2500 Hz, as shown in Figure 3a. Due to the serious overlap of frequency points, the 700-1200 Hz frequency spectrum with more concentrated energy is further selected for observation, as shown in Figure 3b. Figures 2 and 3 indicate the following conclusions: (1) The signal spectrum in the same working condition has a very high similarity, and the energy distribution at most frequency points is basically the same; (2) there is a great difference in the energy distribution at frequency points between vibration signals in different working conditions; (3) the vibration signals have energy distribution at most frequency points, and numerous frequency points with low energy cannot be ignored. The above characteristics can be used to identify the fault types of signals in different working conditions.

Principle of CF
In the field of signal processing, CF is commonly used to measure the degree of linear correlation between two signals in each frequency component. In this paper, the CF is used for feature extraction.
Suppose there are two time-domain signals S x (t) and S y (t), the calculation methods of CF are as follows [38]: (1) calculate Fourier spectrum A x (f ) and A y (f ) of S x (t) and S y (t), respectively; (2) calculate self-power spectral density functions S x (f ) and S y (f ), where A x * (f ) and A y * (f ) are the complex conjugation of A x (f ) and A y (f ), respectively; (3) calculate the cross power spectral density function, (4) calculate the CF of S x (t) and S y (t), The range of C xy (f ) is [0,1], and the larger the value of C xy (f 0 ) at a certain frequency f 0 , the greater the coherence of signal S x (t) and S y (t) at the frequency of f 0 . S x (t) and S y (t) are irrelevant when CF is 0 and completely coherent when CF is 1. There are two advantages of the CF: (1) CF can describe the frequency commonality of two signals; (2) CF is not affected by absolute amplitude of the signals and describes the amplitude similarity of two signals at the same frequency point.
Firstly, two groups of samples are taken out from the normal signals for CF calculation, and the results are shown in Figure 4a. Then, a group of samples are extracted from the vibration signals of the normal case and the isolation switch fault respectively for coherence analysis, and the results are shown in Figure 4b. where Ax * (f) and Ay * (f) are the complex conjugation of Ax(f) and Ay(f), respectively; (3) calculate the cross power spectral density function, The range of Cxy(f) is [0,1], and the larger the value of Cxy(f0) at a certain frequency f0, the greater the coherence of signal Sx(t) and Sy(t) at the frequency of f0. Sx(t) and Sy(t) are irrelevant when CF is 0 and completely coherent when CF is 1. There are two advantages of the CF: (1) CF can describe the frequency commonality of two signals; (2) CF is not affected by absolute amplitude of the signals and describes the amplitude similarity of two signals at the same frequency point.
Firstly, two groups of samples are taken out from the normal signals for CF calculation, and the results are shown in Figure 4a. Then, a group of samples are extracted from the vibration signals of the normal case and the isolation switch fault respectively for coherence analysis, and the results are shown in Figure 4b.   As illustrated in Figure 4a, the waveforms of the vibration signals of the two groups of normal samples are basically the same, while the waveforms of the two groups of signals in Figure 4b are quite different (red is the normal case sample, blue is the isolation switch fault sample). The energy distribution of the two samples is consistent at most frequency points in the same working condition and has a large difference in different working conditions. The CF of two samples is calculated at the frequency points which are multiples of 10 Hz. The results show that the coherence coefficients of many frequency points are close to 1 in the same working condition, while the frequency points with coherence coefficients close to 1 in different working conditions being few in number.

Design Ideas of Feature Construction
Based on the coherence analysis above, the relationship between two signals at a specific frequency point can be described by the coherent coefficient between signals, thereby obtaining the relevant frequency points describing the commonality of the two signals. However, considering the dispersion of vibration signals and noise interference, it is necessary to calculate the coherence of a large number of signals of the same working condition. This paper designs a feature extraction method based on vibration information in different mechanical conditions, and the specific process is shown in Figure 5.  As illustrated in Figure 4a, the waveforms of the vibration signals of the two groups of normal samples are basically the same, while the waveforms of the two groups of signals in Figure 4b are quite different (red is the normal case sample, blue is the isolation switch fault sample). The energy distribution of the two samples is consistent at most frequency points in the same working condition and has a large difference in different working conditions. The CF of two samples is calculated at the frequency points which are multiples of 10 Hz. The results show that the coherence coefficients of many frequency points are close to 1 in the same working condition, while the frequency points with coherence coefficients close to 1 in different working conditions being few in number.

Design Ideas of Feature Construction
Based on the coherence analysis above, the relationship between two signals at a specific frequency point can be described by the coherent coefficient between signals, thereby obtaining the relevant frequency points describing the commonality of the two signals. However, considering the dispersion of vibration signals and noise interference, it is necessary to calculate the coherence of a large number of signals of the same working condition. This paper designs a feature extraction method based on vibration information in different mechanical conditions, and the specific process is shown in Figure 5.  Figure 5. Feature extraction process.
Step 1: Collect m groups of vibration signal samples of a typical type of mechanical defect, and perform a Fourier transform on each sample; Step 2: Calculate the CF of each of two samples, set the strong correlation threshold Rth, judge the coherence coefficient of the selected frequency points (multiples of 10 Hz) and the threshold Rth, and define the frequency point whose coherence coefficient is larger than Rth as the potential common characteristic frequency feature of this kind of working condition; Step 3: Count the number of occurrences of each potential common frequency points, set the frequency threshold N = Nth × , where Nth is the frequency threshold coefficient ( ∈ [0,1]), and define the potential common frequency points whose occurrence times are greater than the threshold N as the clear common frequency points of the certain working condition to build feature space. In this paper, the sample number m is 200, the strong correlation threshold Rth is 0.9, and the frequency threshold coefficient Nth is 0.65. The coherent results of each working condition are shown in Figure 6a  Step 1: Collect m groups of vibration signal samples of a typical type of mechanical defect, and perform a Fourier transform on each sample; Step 2: Calculate the CF of each of two samples, set the strong correlation threshold R th , judge the coherence coefficient of the selected frequency points (multiples of 10 Hz) and the threshold R th , and define the frequency point whose coherence coefficient is larger than R th as the potential common characteristic frequency feature of this kind of working condition; Step 3: Count the number of occurrences of each potential common frequency points, set the frequency threshold N = N th × C 2 m , where N th is the frequency threshold coefficient (N th ∈ [0, 1]), and define the potential common frequency points whose occurrence times are greater than the threshold N as the clear common frequency points of the certain working condition to build feature space.
In this paper, the sample number m is 200, the strong correlation threshold R th is 0.9, and the frequency threshold coefficient N th is 0.65. The coherent results of each working condition are shown in Figure 6a: (1) the characteristic frequency points above 1. 5 Figure 6b shows the characteristic frequency curves in different working conditions. The feature vector and the number of characteristic frequency points were quite different, and the union of feature points was used as the final feature to diagnose fault types.

SVM
SVM is a method to realize the idea of structural risk minimization. The sample space is linearly partitioned by the optimal classification hyperplane. However, the problem of linearly indivisibility is often encountered in practical problems. Therefore, the data samples need to be mapped into highdimensional feature space through non-linear transformation, so that it can be transformed into a linear separable problem. Its classification principle is shown in Figure 7.  Figure 6b shows the characteristic frequency curves in different working conditions. The feature vector and the number of characteristic frequency points were quite different, and the union of feature points was used as the final feature to diagnose fault types.

SVM
SVM is a method to realize the idea of structural risk minimization. The sample space is linearly partitioned by the optimal classification hyperplane. However, the problem of linearly indivisibility is often encountered in practical problems. Therefore, the data samples need to be mapped into high-dimensional feature space through non-linear transformation, so that it can be transformed into a linear separable problem. Its classification principle is shown in Figure 7. For classification of two classes, suppose that the training set of n samples is D = {(xi,yi)|i = 1,2, ,n}, xi∈R n , yi∈{−1,+1}, then the optimal classification hyperplane H [39] can be expressed as: where W is the normal vector of the optimal classification hyperplane, and b is the constant term. Two standard hyperplanes H1 ( Using Lagrange function to solve the above formula, the dual problem of the original problem is described as follows: where αi(i=1,2, ,n) are Lagrange multipliers, and the maximization of Q(α) depends on the training set {xi T xj} and {yiyj}. If αi * is the optimal Lagrange multiplier, the optimal hyperplane function is: where x is test data.

OCSVM
OCSVM is an anomaly detection algorithm based on machine learning. Unlike traditional SVM, OCSVM only needs one class of samples to train an anomaly detection model, which maps training data to high-dimensional feature space through kernel function and solves an optimal hyperplane in feature space to achieve maximum separation between target data and coordinate origin, as shown in Figure 8. For classification of two classes, suppose that the training set of n samples is D = {(x i ,y i )|i = 1,2, · · · ,n}, x i ∈R n , y i ∈{−1,+1}, then the optimal classification hyperplane H [39] can be expressed as: where W is the normal vector of the optimal classification hyperplane, and b is the constant term. Two standard hyperplanes H 1 (W · x + b = +1) and H 2 (W · x + b = −1) are defined, which are planes through the samples closest to the hyperplane and parallel to the classification hyperplane. In order to maximize the classification interval of hyperplanes, the classification hyperplane is constructed by the following formula to correctly classify all samples [40]: Using Lagrange function to solve the above formula, the dual problem of the original problem is described as follows: where α i (i=1,2, · · · ,n) are Lagrange multipliers, and the maximization of Q(α) depends on the training set {x i T x j } and {y i y j }. If α i * is the optimal Lagrange multiplier, the optimal hyperplane function is: where x is test data.

OCSVM
OCSVM is an anomaly detection algorithm based on machine learning. Unlike traditional SVM, OCSVM only needs one class of samples to train an anomaly detection model, which maps training data to high-dimensional feature space through kernel function and solves an optimal hyperplane in feature space to achieve maximum separation between target data and coordinate origin, as shown in Figure 8. The minimum objective function sought by OCSVM [41] can be described as: where i is the number of training samples, xi is the sample data, l is the number of training samples, ϕ (xi) is the map of the original space to the feature space, and ρ are the normal vectors of the hyperplane required in the feature space and compensation, respectively, the adjustable parameter v ( 0 ,1) ∈ is the upper limit of the proportion of error samples in the total sample and the relaxation variable ξi is the degree to which some error samples are misclassified.
The Lagrange function is introduced to get the following formula: where αi and βi are Lagrange factors, and the dual problem is obtained by mapping the sample space to the feature space through the Gauss kernel function [41].
The analytical formula of ρ is obtained by solving Equation (10), The way to find the optimal hyperplane and get the OCSVM based anomaly detection model is represented as Equation (12): For training data x, f(x) indicates that it is positively or negatively located in the hyperplane in high-dimensional space. f(x) is a positive number and x belongs to a normal class; f(x) is a negative number and x belongs to an abnormal class. Therefore, OCSVM can identify non-target samples more accurately. The minimum objective function sought by OCSVM [41] can be described as: where i is the number of training samples, x i is the sample data, l is the number of training samples, φ (x i ) is the map of the original space to the feature space, ω and ρ are the normal vectors of the hyperplane required in the feature space and compensation, respectively, the adjustable parameter v∈ (0, 1) is the upper limit of the proportion of error samples in the total sample and the relaxation variable ξ i is the degree to which some error samples are misclassified. The Lagrange function is introduced to get the following formula: where α i and β i are Lagrange factors, and the dual problem is obtained by mapping the sample space to the feature space through the Gauss kernel function [41].
The analytical formula of ρ is obtained by solving Equation (10), The way to find the optimal hyperplane and get the OCSVM based anomaly detection model is represented as Equation (12): For training data x, f (x) indicates that it is positively or negatively located in the hyperplane in high-dimensional space. f (x) is a positive number and x belongs to a normal class; f (x) is a negative number and x belongs to an abnormal class. Therefore, OCSVM can identify non-target samples more accurately.

Fault Diagnosis Process
The mechanical fault diagnosis technology of GIS proposed in this paper includes two parts: feature extraction and fault diagnosis, and fault diagnosis is divided into state detection and fault recognition. In state detection, the first OCSVM classifier is used to distinguish normal from abnormal cases and solve the problem of unbalance between normal and fault data samples in actual detection. The second OCSVM classifier is used to distinguish known faults from unknown faults and solve the problem of misdiagnosis of unknown faults. In fault recognition, the fault types are judged by an SVM classifier. The specific diagnosis process is shown in Figure 9.

Fault Diagnosis Process
The mechanical fault diagnosis technology of GIS proposed in this paper includes two parts: feature extraction and fault diagnosis, and fault diagnosis is divided into state detection and fault recognition. In state detection, the first OCSVM classifier is used to distinguish normal from abnormal cases and solve the problem of unbalance between normal and fault data samples in actual detection. The second OCSVM classifier is used to distinguish known faults from unknown faults and solve the problem of misdiagnosis of unknown faults. In fault recognition, the fault types are judged by an SVM classifier. The specific diagnosis process is shown in Figure 9. Firstly, 70% of the samples in a normal working condition are randomly selected as training set A for OCSVM1 training; 70% of the samples in each known type of fault are randomly selected as training set B for OCSVM2 training and SVM training; all remaining samples are used as test set C for evaluating the performance of the diagnosis model.
Secondly, the data of test set C is diagnosed by the OCSVM1 model. The signal with the normal test result is regarded as set C1, and the signal with the abnormal result is taken as the set C2.
Then, the C2 set is diagnosed by the OCSVM2 model. The signals which are detected as faults are regarded as set C3 and other data is taken as unknown fault set C4.
Lastly, input C3 set into the SVM diagnosis model for fault type identification.

Discussion of Parameters
Rth and Nth are two very important parameters in the mechanical fault diagnosis technology proposed in this paper, which jointly determine the feature space of different working conditions, and then affect the final diagnosis results.
If Rth is too small, the difference in frequency energy distribution between signals cannot be effectively distinguished, and it will cause excessive characteristic frequency points, which make it difficult to select characteristic frequency points. If Rth is too large, the effect of dispersion between signals is ignored. Firstly, 70% of the samples in a normal working condition are randomly selected as training set A for OCSVM1 training; 70% of the samples in each known type of fault are randomly selected as training set B for OCSVM2 training and SVM training; all remaining samples are used as test set C for evaluating the performance of the diagnosis model.
Secondly, the data of test set C is diagnosed by the OCSVM1 model. The signal with the normal test result is regarded as set C 1 , and the signal with the abnormal result is taken as the set C 2 .
Then, the C 2 set is diagnosed by the OCSVM2 model. The signals which are detected as faults are regarded as set C 3 and other data is taken as unknown fault set C 4 .
Lastly, input C 3 set into the SVM diagnosis model for fault type identification.

Discussion of Parameters
R th and N th are two very important parameters in the mechanical fault diagnosis technology proposed in this paper, which jointly determine the feature space of different working conditions, and then affect the final diagnosis results.
If R th is too small, the difference in frequency energy distribution between signals cannot be effectively distinguished, and it will cause excessive characteristic frequency points, which make it difficult to select characteristic frequency points. If R th is too large, the effect of dispersion between signals is ignored.
If N th is too small, the characteristic frequency points will spread throughout the frequency domain, and the meaning of feature extraction is lost. If N th is too large, too few characteristic frequency points will reduce the accuracy of fault diagnosis.
In order to discuss the values of parameters R th and N th , this paper selected [0.5, 0.95] as the value range of R th , [0.6, 0.95] as the range of N th and 0.025 as the step size. Figure 10 shows the relationship between diagnosis accuracy and two key parameters. The diagnosis accuracy is low when both R th and N th take large or small values. When values of R th and N th are 0.65 and 0.9, respectively, the diagnosis accuracy is the highest, which reaches 98.75%. If Nth is too small, the characteristic frequency points will spread throughout the frequency domain, and the meaning of feature extraction is lost. If Nth is too large, too few characteristic frequency points will reduce the accuracy of fault diagnosis.
In order to discuss the values of parameters Rth and Nth, this paper selected [0.5, 0.95] as the value range of Rth, [0.6, 0.95] as the range of Nth and 0.025 as the step size. Figure 10 shows the relationship between diagnosis accuracy and two key parameters. The diagnosis accuracy is low when both Rth and Nth take large or small values. When values of Rth and Nth are 0.65 and 0.9, respectively, the diagnosis accuracy is the highest, which reaches 98.75%. In addition to Rth and Nth, the main parameters of the diagnosis model using radial basis function (RBF) [42] kernel function are listed in Table 3.

Diagnosis Results
After the parameters are determined, 10 experiments were conducted to test the stability of the diagnosis model, and the results are compared with the methods of Softmax, SVM, BPNN and NB. Figure 11 shows the confusion matrix obtained from the first experiment by the proposed method. The diagnosis accuracy of the normal case, isolation switch fault, looseness of flange screw and looseness of stone bolt are 100%, 100%, 97.5% and 100%, respectively. The 0 sample is diagnosed as unknown faults. In addition to R th and N th , the main parameters of the diagnosis model using radial basis function (RBF) [42] kernel function are listed in Table 3.

Diagnosis Results
After the parameters are determined, 10 experiments were conducted to test the stability of the diagnosis model, and the results are compared with the methods of Softmax, SVM, BPNN and NB. Figure 11 shows the confusion matrix obtained from the first experiment by the proposed method. The diagnosis accuracy of the normal case, isolation switch fault, looseness of flange screw and looseness of stone bolt are 100%, 100%, 97.5% and 100%, respectively. The 0 sample is diagnosed as unknown faults. 1 Figure 11. First diagnosis results (confusion matrix). Figure 11 shows that the proposed method performs well in the first experiment. In order to compare different methods, Tables 4 and 5 show the diagnosis accuracy and F-measure of different methods in the first experiment. Figure 12 shows the comparison of different methods in terms of accuracy and F-measure.  Tables 4 and 5, and Figure 12 exhibit that the accuracy and F-measure of the proposed method are higher than other classification methods in each working condition, which shows that the method is more suitable than standard deep learning for feature learning in the paper. The results of 10 random experiments are shown in Figure 13.  Tables 4 and 5, and Figure 12 exhibit that the accuracy and F-measure of the proposed method are higher than other classification methods in each working condition, which shows that the method is more suitable than standard deep learning for feature learning in the paper. The results of 10 random experiments are shown in Figure 13. In 10 random experiments, the diagnosis accuracy of the proposed method is above 90%, which is generally higher than Softmax, SVM, BPNN and NB. It is only slightly inferior to SVM in the Test 6 test, which indicates that the method proposed in this paper is more stable and has a better diagnosis effect.
In order to further evaluate the diagnosis model, the average and standard deviation were used as indicators to compare the diagnosis accuracy of each working condition with different methods. The statistical results are shown in Figure 14 and Table 6. In 10 random experiments, the diagnosis accuracy of the proposed method is above 90%, which is generally higher than Softmax, SVM, BPNN and NB. It is only slightly inferior to SVM in the Test 6 test, which indicates that the method proposed in this paper is more stable and has a better diagnosis effect.
In order to further evaluate the diagnosis model, the average and standard deviation were used as indicators to compare the diagnosis accuracy of each working condition with different methods. The statistical results are shown in Figure 14 and Table 6.
is generally higher than Softmax, SVM, BPNN and NB. It is only slightly inferior to SVM in the Test 6 test, which indicates that the method proposed in this paper is more stable and has a better diagnosis effect.
In order to further evaluate the diagnosis model, the average and standard deviation were used as indicators to compare the diagnosis accuracy of each working condition with different methods. The statistical results are shown in Figure 14 and Table 6.     Figure 14 and Table 5 show that: (1) the average diagnosis accuracy of the proposed classifier is higher than that of other methods and the standard deviation is lower than that of other methods, which show that the proposed classifier is more stable and reliable; (2) considering the standard deviation, the diagnosis accuracy of the proposed classifier may be lower than other methods, but it is obviously superior to other methods overall, which shows that the method proposed in this paper is more effective for the classification of mechanical working conditions in GIS, and the diagnosis model is more reliable; (3) the feature extraction method proposed in this paper is effective and feasible.
To verify the ability of the diagnosis model to judge the unknown faults, the 20 groups of vibration samples of two composite faults (a combination of isolation switch fault and looseness of stone bolt, a combination of looseness of flange screw and looseness of stone bolt) were collected, respectively. Diagnosis results are shown in Table 7. The diagnostic accuracy of the two composite faults are 85% and 95%, respectively, indicating that the diagnosis model has a good ability to identify unknown faults.

Conclusions
It is important to improve the reliability of the operation of GIS equipment and find out the potential defects in the operation in time. Thus, a holistic approach composed of a method to extract features and a multi-layer classifier was proposed in this study. First, we developed the characteristic description method based on CF. Then, based on OCSVM and SVM, a multi-layer classifier was constructed to conduct fault diagnosis.
The usefulness of feature learning was verified by a comparison among five machine learning methods in a series of experiments. The experimental results indicated that the technique of using CF for feature screening is feasible, and a new idea is provided for feature extraction. At the same time, it also proves that the classifier proposed in the paper is more stable and reliable than other methods.