Mechanical Fault Diagnosis of High Voltage Circuit Breakers Based on Wavelet Time-Frequency Entropy and One-Class Support Vector Machine

Mechanical faults of high voltage circuit breakers (HVCBs) are one of the most important factors that affect the reliability of power system operation. Because of the limitation of a lack of samples of each fault type; some fault conditions can be recognized as a normal condition. The fault diagnosis results of HVCBs seriously affect the operation reliability of the entire power system. In order to improve the fault diagnosis accuracy of HVCBs; a method for mechanical fault diagnosis of HVCBs based on wavelet time-frequency entropy (WTFE) and one-class support vector machine (OCSVM) is proposed. In this method; the S-transform (ST) is proposed to analyze the energy time-frequency distribution of HVCBs’ vibration signals. Then; WTFE is selected as the feature vector that reflects the information characteristics of vibration signals in the time and frequency domains. OCSVM is used for judging whether a mechanical fault of HVCBs has occurred or not. In order to improve the fault detection accuracy; a particle swarm optimization (PSO) algorithm is employed to optimize the parameters of OCSVM; including the window width of the kernel function and error limit. If the mechanical fault is confirmed; a support vector machine (SVM)-based classifier will be used to recognize the fault type. The experiments carried on a real SF6 HVCB demonstrated the improved effectiveness of the new approach.


Introduction
High voltage circuit breakers (HVCBs) play an important role in the protection and control of power systems.The faults of HVCBs will directly affect the running of the power system.Mechanical faults of HVCBs' mechanical operation mechanism are the major reasons for HVCBs faults.Therefore, research on fault diagnosis methods for HVCBs is very important for the stable operation of electric power systems.The traditional scheduled maintenance scheme will result in frequent operations and excessive overhauls.It may lead to needless intervention, and even cause HVCB faults during the maintenance [1][2][3][4].The International Council on Large Electric Systems (CIGRE) made an investigation on the causes of failure of HVCBs.They found that 44% of main faults and 39% of secondary faults are mechanical faults [5].An extensive diagnostic testing of circuit breakers in [4] shows that vibration analysis is a reliable and appropriate method for non-invasive diagnostic testing.Vibration analysis is an effective signal-based approach of fault diagnosis [6].Over the past decade, the vibration signatures generated during the operation of mechanical structure have been used for condition monitoring and fault diagnosis with good effects [7][8][9][10][11][12].A HVCB's vibration signal is a typical non-stationary signal with strong transients.The existing vibration signal processing methods such as short-time energy [7], dynamic time warping (DTW) [8], wavelet packet transform (WPT) [9] and empirical mode decomposition (EMD) [10,11] have all achieved good results in this area.However, these methods also have some disadvantages.Dynamic time warping and short-time energy are based on the original signal.They are only sensitive to the signal changes over time.When wavelet packet decomposition is used, an appropriate wavelet basis is difficult to select.The EMD method has the indication of end effect and high computational complexity.S-transform (ST) is an effective method for time-frequency analysis.It has a multiple time-frequency resolution with the frequency by using a Gaussian window with variable window width inversely with the frequency [13].Therefore, ST can satisfy the time-frequency analysis resolution requirements of vibration signals in different frequency domains.Besides, ST can be derived by the fast Fourier transform (FFT).Thus it is easy to realize in engineering applications.The output of ST is a two-dimensional time-frequency matrix.The characteristics of vibration signals in both the time domain and frequency domain can be fully extracted from the matrix.
As a description of the randomness status of a chaotic system, Shannon entropy contains the information characteristics of complex signals.It is suitable for feature extraction in non-stationary signals analysis.Since Shannon introduced the concept of entropy in 1948 [14], many types of information entropy have been widely used in many areas.Wavelet entropy (WE) combines the advantages of Shannon entropy and WT.It has been widely used for non-stationary signals analysis in diverse fields such as power quality transient analysis [15][16][17], biomedicine [18,19], fault detection and fault diagnosis [20,21].In this paper, WTFE is used to describe the unique time-frequency characters of different HVCB mechanical statuses by vibration signal analysis.
Neural networks (NNs) [9] and SVM [10,11] have made a significant contribution to fault recognition of HVCBs.Because HVCBs generally operate infrequently, it is quite difficult to get enough vibration samples of different types of HVCB mechanical faults for training multi-class classifiers.Obviously, the fault recognition of HVCBs is a classification problem with small samples.Therefore, multi-class classification methods such as NNs which rely on lots of training samples are not appropriate for analyzing the mechanical status and identifying HVCB faults.SVM on the other hand is suitable for classification problems with small training sample sets.In order to analyze the status of HVCBs, all types of fault samples should be included in the SVM training set.However, not all types of mechanical fault samples of HVCBs are accessible to obtain.Some types of fault samples cannot be acquired in large quantities, so we are unable to cover the complete range of fault characteristics.Classification boundary deviation will be caused by the limited fault types with unbalanced fault samples Thus, some fault samples are easily mistakenly recognized as normal samples.The extreme learning machine (ELM) [22] and pairwise-coupled relevance vector machine (PCRVM) [23] have shown good effects in fault diagnosis in gas turbine generator systems, but applications in mechanical fault diagnosis of HVCBs have not been reported.
One-class classifier is a kind of pattern recognition method which can be trained by normal samples.It is suitable for classifying a small sample set.One-class support vector machine (OCSVM) [24] has great potential in the field of fault detection [25,26].It can effectively determine whether the equipment is working in a fault condition or not.The parameter optimization is an important step that affects the classification performance of OCSVM [27].Particle swarm optimization (PSO) is a widely used optimal method that optimizes a problem by iteratively trying to improve a candidate solution with regard to a given measure of quality [28].It can be used to calculate the factor such as the width factor of a kernel function to improve the classification ability of OCSVM.This paper presents a new ST and OCSVM-based approach for HVCBs mechanical fault diagnosis.Firstly, the ST is used to process vibration signals to analyze the energy distribution in the time-frequency area.Secondly, the WTFE of a ST matrix (STM) is calculated to construct the feature vectors for describing the energy distribution of HVCB vibration signals in the time-frequency area.Then, a PSO-based OCSVM (PSO-OCSVM) which is just trained by the normal training samples is used to separate the normal and fault conditions of HVCB's mechanical operation structure.A PSO algorithm is used to optimize the parameters and improve the classification ability of traditional OCSVM.Finally, if the conditions of HVCBs are judged as a fault condition by PSO-OCSVM, the type of mechanical fault is recognized by a SVM-based classifier.Three different types of faults are simulated in a field experiment on a real HVCB to verify the validity of the new method.

S-Transform
The S-transform was proposed by Stockwell in 1996 [13].The ST result Spτ, f q of an input signal hptq is defined by: where wpt, f q is the Gaussian window function.The parameter τ is a displacement factor and controls the location of the Gaussian window in the time axis.f is a parameter related to the width of Gaussian window.
As the inheritance and development of the continuous wavelet transform (CWT), ST can be derived by CWT.The one-dimensional CWT Wpτ, dq of a signal hptq is defined as: where ψpt ´τ, dq is a mother wavelet; τ is a displacement factor; d is a scale factor.The scale factor d determines the width of the mother wavelet, while the scale factor τ determines the time location where the signal hptq is analyzed.
Let the dilation factor d as the inverse of the frequency f , i.e., d " 1{ f .Along with Equations ( 1) and (3), ST can be considered as a CWT with a special mother wavelet multiplied by the phase factor: where the special mother wavelet is defined as the product of the Gaussian window and a complex vector: The result of ST can be calculated based on Equations ( 3)-( 5): It is obvious that Equation ( 6) is equal to Equations ( 1) and (2).Note that Equation ( 6) is not a strict CWT because the wavelet in Equation ( 5) does not satisfy the condition of zero mean for an admissible wavelet.
The frequency spectrum of ST is as follows: Entropy 2016, 18, 7 4 of 17 Likewise, ST result of a signal h ptq can be derived by the Fourier transform, that is: where Hp f q is the spectrum of the signal hptq, β is the frequency which controls the movement of Gaussian window on the frequency axis.Let f Ñ n{NT and τ Ñ jT , the discrete ST can be denoted as: ST has variable time-frequency resolution.The result of ST is a two-dimensional complex matrix, called the S-matrix.With a modulus operation, we can get the module matrix of ST (STMM).The column vectors of STMM reflect the amplitude-frequency characteristics and the row vectors describe the time domain distribution of signals at a certain frequency.ST can describe the characteristics of the signal in both the time and frequency domains.Compared with WT, the decomposition of ST in the high frequency part is more detailed.The frequency resolution and anti-noise performance of ST are better than those of WT [29], therefore, ST is suitable for vibration signal processing.

Wavelet Time-Frequency Entropy
Shannon entropy is an important part of information theory, which describes the degree of confusion of a system.The more orderly the system is, the smaller the entropy is.Shannon entropy H is defined as: where p i is the probability of random event Y " y i and N ř i"1 p i " 1.When p i " 0, there is a convention that p i logp i " 0. As a powerful tool for analyzing the transient features of non-stationary signals, wavelet entropy is the combination of Shannon entropy and wavelet transform.This combination not only retains the localized features in time-frequency domains of wavelet analysis, but also embodies the representational capacity of Shannon entropy.The distributions of different kinds of fault signals in wavelet phase space are different.Several types of wavelet entropy are defined based on different principles or processing methods, such as wavelet energy entropy (WEE), wavelet time entropy (WTE), wavelet singular entropy (WSE) and wavelet time-frequency entropy (WTFE) [15].WEE and WTE indicate the information characteristics of a signal in the time domain and fail to indicate the characteristics in the frequency domain.Regarding the fault types related to the frequency such as lack of mechanical lubrication, the two methods will appear powerless.WSE can map the correlative wavelet space into the independent linearity space and indicate the uncertainty of the energy distribution of a signal in the time-frequency domain.It is highly sensitive to the transients.Since at any moment the vibration signals of HVCBs have strong transients, the WSE results of the same type of vibration signals may present a distinct difference, thus WSE will not be suitable for extracting features of vibration signals in this study.
WTFE is composed of two vectors.The first vector stretches over the whole time space and describes the characteristics of the signal in the time domain.The second vector stretches over the whole frequency space and describes the characteristics of the signal in the frequency domain.In other Entropy 2016, 18, 7 5 of 17 words, WTFE can measure the information features of the signal at any given instant and frequency.Therefore, the WTFE is often used in the field of fault diagnosis and detection.The definition of WTFE is as follows: let D j pkq pj " 1, 2, ¨¨¨, m; k " 1, 2, ¨¨¨, nq be the discrete wavelet presentation, and E j pkq " ˇˇD j pkq ˇˇ2 denotes the wavelet energy at scale j and instant k.WTFE is denoted as: where: where the probability P t and P f are defined as follows: Similarly, the definition of WSE is as follows: let D be a m ˆn matrix constituted by D j pkq.According to singular value decomposition theory, for any m ˆn matrix, there exist a m ˆr matrix U, a r ˆn matrix V and a r ˆr diagonal matrix Λ, which make: where the diagonal elements λ l (l " 1, 2, ¨¨¨, r) of Λ are called singular values of matrix A. The singular values are all non-negative and arranged in a non-increasing order (i.e., λ 1 ě λ 2 ě ¨¨¨ě λ r ě 0).Then the WSE is defined as: where the probability p l associated with λ l is defined as:

Feature Vector Extraction
ST can be considered as a special WT.Thus wavelet entropy theory is also applicable to the feature extraction of signals based on ST.A partition method for the time-frequency plane of the S-matrix is proposed to extract vibration signal features in the time-frequency area.After statistical analysis, we found that the amplitude of vibration signals of HVCBs in the frequency area higher than 10 kHz is very small.Therefore, this paper mainly analyzes the frequency area from 0 Hz to 10 kHz.Firstly, a time-frequency plane which frequency area ranges from 0 Hz to 10 kHz and time area from 0 to 150 ms ("0" is the moment the system receives the operating signal) is constructed by ST.Then, the time-frequency plane is divided into 300 congruent time-frequency blocks and the band-width and time-width of the time-frequency blocks are 1 kHz and 5 ms.The partition method is shown in Figure 1.
Let E ij be the energy of time-frequency block S ij pi " 1, 2, . . ., 10; j " 1, 2, . . ., 30q.Then E ij is the sum of all elements in the block.Let E be the total energy of the whole time-frequency plane.A normalization processing for E ij is given as: The time component of WTFE of vibration signals is calculated by: Similarly, the frequency component of WTFE of vibration signals is calculated by: The feature vector of vibration signals is denoted as [ ]   Z T F , where . Then Z is used as the input vector of OCSVM and SVM classifier.

Condition and Fault Classifier Based on OCSVM and SVM
SVM has good classification ability for classification problems involving small samples and high dimension data.However, because some fault training samples are difficult to obtain, there is a risk that the SVM will easily recognize fault samples as normal samples.In order to avoid this defect, the new method firstly utilizes OCSVM to accurately determine whether a HVCB mechanical failure has happened or not.When the fault is confirmed, the fault type is then identified by the SVM.

One-Class Support Vector Machine
One-class classification is an important pattern recognition methodology.It can be applied to the fields where negative samples are hard to obtain, such as fault detection, fault diagnosis, intrusion detection and disease analysis, etc.Compared to traditional classifiers which aim to obtain the highest recognition accuracy, the target of a one-class classifier is to identify the abnormal samples as far as possible.The latter is able to reduce the possibility that fault states will be mistaken for normal states.Therefore, a one-class classifier is appropriate for the mechanical fault diagnosis of HVCBs with a high reliability.
OCSVM is a mature and effective one-class classifier which was presented by Schölkopf et al. [24].It has good fault analysis performance features, including faster training and decision speeds, lower dependence on the number of training samples and better anti-noise performance.The basic idea of OCSVM is to look for a decision hyperplane denoted by the support vector and maximize the distance from the hyperplane to the origin.Most of object samples locate on one side of the hyperplane and most of the no-object samples locate on the other side.The principle of OCSVM is shown in Figure 2. Let   1 2 ; ; ; be the training data set, then X contains n m-dimensional feature vectors extracted from normal vibration signal samples.The decision hyperplane of OCSVM is given by: The time component of WTFE of vibration signals is calculated by: Similarly, the frequency component of WTFE of vibration signals is calculated by: The feature vector of vibration signals is denoted as Z " rTFs, where T " rT 1 , . . ., T 30 s and F " rF 1 , . . ., F 12 s.Then Z is used as the input vector of OCSVM and SVM classifier.

Condition and Fault Classifier Based on OCSVM and SVM
SVM has good classification ability for classification problems involving small samples and high dimension data.However, because some fault training samples are difficult to obtain, there is a risk that the SVM will easily recognize fault samples as normal samples.In order to avoid this defect, the new method firstly utilizes OCSVM to accurately determine whether a HVCB mechanical failure has happened or not.When the fault is confirmed, the fault type is then identified by the SVM.

One-Class Support Vector Machine
One-class classification is an important pattern recognition methodology.It can be applied to the fields where negative samples are hard to obtain, such as fault detection, fault diagnosis, intrusion detection and disease analysis, etc.Compared to traditional classifiers which aim to obtain the highest recognition accuracy, the target of a one-class classifier is to identify the abnormal samples as far as possible.The latter is able to reduce the possibility that fault states will be mistaken for normal states.Therefore, a one-class classifier is appropriate for the mechanical fault diagnosis of HVCBs with a high reliability.
OCSVM is a mature and effective one-class classifier which was presented by Schölkopf et al. [24].It has good fault analysis performance features, including faster training and decision speeds, lower dependence on the number of training samples and better anti-noise performance.The basic idea of OCSVM is to look for a decision hyperplane denoted by the support vector and maximize the distance from the hyperplane to the origin.Most of object samples locate on one side of the hyperplane and most of the no-object samples locate on the other side.The principle of OCSVM is shown in Figure 2.
Let X " rx 1 ; x 2 ; ¨¨¨; x n s P R nˆm be the training data set, then X contains n m-dimensional feature vectors extracted from normal vibration signal samples.The decision hyperplane of OCSVM is given by: F pxq " xw, xy ´ρ " 0 (20) The classification method of OCSVM can be described by the following quadratic programming problem: In order to improve the performance of OCSVM, the kernel theory is used for solving the linear inseparable problem.It supposes that the nonlinear mapping   :   x x maps data from the original input space to the linear feature space.There are a slack variable i  and a margin of error v in this linear feature space.i  is introduced to penalize the points deviated from the hyperplane.
The classifier realizes the soft interval between normal samples and fault samples with i  .v is used to control the upper limit of the outliers number in the training set.Its value range is (0, 1).The expression of improved OCSVM is as follows: , To solve the above optimization problem, a Lagrangian function is constructed as follows: We can obtain the following relations by taking the partial derivatives of each variable in the Equation (23) and making them equal to zero: According to the kernel function theory, the inner product of two vectors in the feature space can be represented by the kernel function in the original input space by using the nonlinear mapping, that is: Combined with Equation ( 27), we can get the dual form of this optimization problem: The classification method of OCSVM can be described by the following quadratic programming problem: In order to improve the performance of OCSVM, the kernel theory is used for solving the linear inseparable problem.It supposes that the nonlinear mapping ϕ : x Ñ ϕ pxq maps data from the original input space to the linear feature space.There are a slack variable ξ i and a margin of error v in this linear feature space.ξ i is introduced to penalize the points deviated from the hyperplane.The classifier realizes the soft interval between normal samples and fault samples with ξ i .v is used to control the upper limit of the outliers number in the training set.Its value range is (0, 1).The expression of improved OCSVM is as follows: To solve the above optimization problem, a Lagrangian function is constructed as follows: pw, ξ, ρ, α, βq " α i pxw, ϕ px i qy ´ρ `ξi q ´n ÿ i"1 We can obtain the following relations by taking the partial derivatives of each variable in the Equation ( 23) and making them equal to zero: According to the kernel function theory, the inner product of two vectors in the feature space can be represented by the kernel function in the original input space by using the nonlinear mapping, that is: Combined with Equation ( 27), we can get the dual form of this optimization problem: Entropy 2016, 18, 7 8 of 17 A RBF Gaussian kernel function is adopted, and its form is given as: where σ is the width parameter of RBF Gaussian kernel function.Equation ( 28) describes a standard quadratic programming problem.By solving the parameters α i and ρ, we can get the decision hyperplane in the feature space.The decision equation is written as: where the calculation formula of ρ is as follows:

Advantages of OCSVM for Condition Diagnosis
OCSVM is able to overcome the problem that fault samples are difficult to get in HVCB condition monitoring.Compared with the traditional SVM, its decision mode is more inclined to reduce the error-accept-rate, thus OCSVM is more favorable for improving equipment reliability and more suitable for mechanical condition diagnosis of HVCBs. Figure 3 compares the rationale of OCSVM and SVM, and illustrates the advantages of OCSVM in condition monitoring.
A RBF Gaussian kernel function is adopted, and its form is given as: (29) where  is the width parameter of RBF Gaussian kernel function.
Equation ( 28) describes a standard quadratic programming problem.By solving the parameters i  and  , we can get the decision hyperplane in the feature space.The decision equation is written as: where the calculation formula of  is as follows:

Advantages of OCSVM for Condition Diagnosis
OCSVM is able to overcome the problem that fault samples are difficult to get in HVCB condition monitoring.Compared with the traditional SVM, its decision mode is more inclined to reduce the error-accept-rate, thus OCSVM is more favorable for improving equipment reliability and more suitable for mechanical condition diagnosis of HVCBs. Figure 3 compares the rationale of OCSVM and SVM, and illustrates the advantages of OCSVM in condition monitoring.Figure 3 describes the results of two types of linearly separable two-class classification approaches on a 2-dimensional plane.The samples on the left side are normal samples, and the samples on the right side are fault samples.The aim of SVM is finding the lines that support the gap between the two kinds of samples, which are the two green dotted lines in Figure 3.The decision line of SVM is located in the middle of two green dotted lines.OCSVM maps the 2-dimensional input space to the high-dimensional feature space, and then seeks the decision hyperplane which supports all the target samples in this feature space.After the high-dimensional feature space is remapped to the 2-dimensional space, the hyperplane becomes a closed curve that contains all the target samples, such as the red curve in the figure .If there is a minor fault represented by the blue dot in Figure 3, although the fault is slight, it should be identified as a fault state.According to the situation depicted in the figure, SVM identifies it as a normal condition.Conversely, because OCSVM has a more compact support region, it can correctly recognize the minor fault.The classification result of OCSVM is much more reliable than that of SVM. Figure 3 describes the results of two types of linearly separable two-class classification approaches on a 2-dimensional plane.The samples on the left side are normal samples, and the samples on the right side are fault samples.The aim of SVM is finding the lines that support the gap between the two kinds of samples, which are the two green dotted lines in Figure 3.The decision line of SVM is located in the middle of two green dotted lines.OCSVM maps the 2-dimensional input space to the high-dimensional feature space, and then seeks the decision hyperplane which supports all the target samples in this feature space.After the high-dimensional feature space is remapped to the 2-dimensional space, the hyperplane becomes a closed curve that contains all the target samples, such as the red curve in the figure .If there is a minor fault represented by the blue dot in Figure 3, although the fault is slight, it should be identified as a fault state.According to the situation depicted in the figure, SVM identifies it as a normal condition.Conversely, because OCSVM has a more compact support region, it can correctly recognize the minor fault.The classification result of OCSVM is much more reliable than that of SVM.

An Improved PSO-Based OCSVM
The main constant parameters affecting the classification performance of OCSVM are the margin of error v and the width parameter of RBF kernel function σ.By adjusting the parameters v and σ, the distance between the hyperplane and the origin will be maximized and the classification performance of OCSVM will be improved.In fact, these two parameters influence OCSVM's classification performance together.As the relationship between these two parameters and fitness value cannot be decided directly through the function, an intelligence optimization algorithm is used to optimize the values of v and σ.This paper adopts PSO to realize the related optimization calculation.
PSO is an intelligence algorithm for global optimization which imitates birds' flying foraging behavior [28,30].It has a few parameters and a simple concept.The basic principle of PSO is as follows: a swarm consisting m particles is flying at a certain speed in a D-dimensional search space (in this paper D = 2).Each particle is considered as individually without volume.The flight speed of the particle is adjusted dynamically according to its own and its companions' flight experiences.The positions of particles are changed constantly in flight.The position and speed of the ith particle are expressed as x i " px i1 , x i2 , ¨¨¨, x iD q and v i " pv i1 , v i2 , ¨¨¨, v iD q respectively, where 1 ď i ď m.Besides, the best position of the ith particle in history is denoted as p i " pp i1 , p i2 , ¨¨¨, p iD q and the best location that all past particles is denoted as p g " pp g1 , p g2 , ¨¨¨, p gD q.For each generation, the position and speed of the d-dimension (1 ď d ď D) are changed according to the following equation: where w is inertia weight, c 1 and c 2 are acceleration coefficients, rand 1 and rand 2 are two uniformly distributed pseudo-random numbers in interval [0,1].
For the basic PSO algorithm, we generally define w " 1 and c 1 " c 2 " 2. The speed of a particle is limited to a maximum of v max .Along with Equations ( 32) and (34), the swarm constantly moves toward the better fitness according to the information of the particles' own experience and the shared historical information of the swarm in every iteration step.
Like other intelligent optimization algorithms, PSO also has the limitation of premature convergence that makes the algorithm get into local optima and degrades the classification performance of OCSVM.To overcome this defect and improve the convergence speed of the algorithm, this paper proposes an improved PSO algorithm with a linearly varying inertia weight and acceleration coefficients. (1) Adjustment of the inertia weight ω By decreasing the inertia weight in a linear way, the algorithm can search for better solutions from the global scope.It will have a better local search capability with the increase of the number of iterations.The algorithm not only maintains a good search ability but also avoids the premature convergence phenomenon.Let [ω min , ω max ] be the value range of inertia weights, generally ω min " 0.4 and ω max " 0.9.Let Iter_ max be the maximum number of iterations.Then the inertia weight of the ith iteration is given as: (2) Adjustment of the acceleration coefficients c 1 and c 2 Acceleration coefficients reflect the degree of information exchange between particle swarms.On the other hand, the in-flight behavior of a particle depends on its own experience with a larger c 1 .This results in that the particles easily wander in their own local scope.On the other hand, particles will have a higher speed moving toward the optimal individual with a larger c 2 , but this may cause a premature convergence to a local optimum.In order to solve this contradiction, researchers usually assign the same constant value to c 1 and c 2 , but sometimes this method can't meet the needs of the actual situation.We use an appropriate method which chooses a larger c 1 and a smaller c 2 at the beginning of the algorithm, and then gradually decreases c 1 and increases c 2 .By this adjustment, particles tend to fly in the entire search space in the early stages, so the region that contains the optimal solution does not get lost.Particles finally tend to fly to the globally optimal solution.Compared to the traditional method, particles learn more from the particles which have reached the historical optimal solution.In this paper, the adjustment measure of acceleration coefficients is defined as follows: where c 1i and c 1 f are the initial value and final value of c 1 , c 2i and c 2 f are the initial value and final value of c 2 .There is a symmetry variation in this paper, namely c 1 linearly decreases from 2.5 to 0.5 and c 2 linearly decreases from 0.5 to 2.5.

Fault Diagnostic Process
The flow chart of the diagnosis method is shown in Figure 4.In practical engineering applications, the vibration signal that has been diagnosed as a normal signal can be added to the normal sample set, so the fault diagnosis system can constantly adapt to the change of running conditions of a circuit breaker, and its learning ability can be improved.c .This results in that the particles easily wander in their own local scope.On the other hand, particles will have a higher speed moving toward the optimal individual with a larger 2 c , but this may cause a premature convergence to a local optimum.In order to solve this contradiction, researchers usually assign the same constant value to 1 c and 2 c , but sometimes this method can't meet the needs of the actual situation.We use an appropriate method which chooses a larger 1 c and a smaller 2 c at the beginning of the algorithm, and then gradually decreases 1 c and increases 2 c .By this adjustment, particles tend to fly in the entire search space in the early stages, so the region that contains the optimal solution does not get lost.Particles finally tend to fly to the globally optimal solution.Compared to the traditional method, particles learn more from the particles which have reached the historical optimal solution.In this paper, the adjustment measure of acceleration coefficients is defined as follows: c linearly decreases from 0.5 to 2.5.

Fault Diagnostic Process
The flow chart of the diagnosis method is shown in Figure 4.In practical engineering applications, the vibration signal that has been diagnosed as a normal signal can be added to the normal sample set, so the fault diagnosis system can constantly adapt to the change of running conditions of a circuit breaker, and its learning ability can be improved.

Data Collection and Processing
The experiment adopts LW9-72.5 series outdoor high voltage SF6 circuit breakers as the analysis object.The vibration signal acquisition system is built with a CA-YD-182A piezoelectric acceleration sensor made in Jiangsu United Electronic Technology Co., Ltd.(Yangzhou, China) and NI-9234 DAQ

Data Collection and Processing
The experiment adopts LW9-72.5 series outdoor high voltage SF 6 circuit breakers as the analysis object.The vibration signal acquisition system is built with a CA-YD-182A piezoelectric acceleration sensor made in Jiangsu United Electronic Technology Co., Ltd.(Yangzhou, China) and NI-9234 DAQ devices made by National Instruments (NI, Austin, TX, USA).The acceleration sensor is used for vibration signal acquisition.The DAQ device is used to record the data with 25.6 kS/s sampling rate for a time period of 150 ms during opening operation.The vibration signal acquisition system for a circuit breaker is shown in Figure 5.
devices made by National Instruments (NI, Austin, TX, USA).The acceleration sensor is used for vibration signal acquisition.The DAQ device is used to record the data with 25.6 kS/s sampling rate for a time period of 150 ms during opening operation.The vibration signal acquisition system for a circuit breaker is shown in Figure 5.  11 devices made by National Instruments (NI, Austin, TX, USA).The acceleration sensor is used for vibration signal acquisition.The DAQ device is used to record the data with 25.6 kS/s sampling rate for a time period of 150 ms during opening operation.The vibration signal acquisition system for a circuit breaker is shown in Figure 5.According to the above method, vibration signals are processed by ST. Figure 6 shows the vibration signals and their contour plot after ST analysis under different conditions, including the healthy ones and three types of faults.From Figure 6, we can find that the time-frequency energy distributions of the different types of vibration signals have obvious differences.Compared with the normal signal, the energy distribution of the iron core jam fault signal has an apparent time delay; the base screw looseness fault signal has a strong energy distribution in a lower frequency area; and the energy center of the third fault signal has slightly changed in both the time and frequency domains.The time and frequency characteristics can thus be extracted to analyze the operating condition of a HVCB's mechanical operation system.

Feature Extraction and Analysis
We can get the WTFE feature vectors according to the feature vector extraction method mentioned before.
Entropy 2016, 18, 0007 12 According to the above method, vibration signals are processed by ST. Figure 6 shows the vibration signals and their contour plot after ST analysis under different conditions, including the healthy ones and three types of faults.From Figure 6, we can find that the time-frequency energy distributions of the different types of vibration signals have obvious differences.Compared with the normal signal, the energy distribution of the iron core jam fault signal has an apparent time delay; the base screw looseness fault signal has a strong energy distribution in a lower frequency area; and the energy center of the third fault signal has slightly changed in both the time and frequency domains.The time and frequency characteristics can thus be extracted to analyze the operating condition of a HVCB's mechanical operation system.

Feature Extraction and Analysis
We can get the WTFE feature vectors according to the feature vector extraction method mentioned before.The WTFE feature distributions of four kinds of vibration signals are shown in Figure 7, where the first 30 features reflect the energy distribution of the signal in the time domain (WTFEt) and the other 10 features reflect the energy distribution in the frequency domain (WTFEf).For clarity, each type only shows three data points.Figure 7 shows that different types of vibration signals have significantly different feature distributions.According to these differences, the classifier can achieves a good classification effect.In order to prove the diagonosis ability of different feature presentation methods, we present a comparison between the WTFE and WSE.Firstly, the STMM is divided into 50 submatrixes along the time axis.Then the WSE of each of submatrix is calculated based on Equations ( 14)-( 16) to form 50 dimensional input feature vectors.The WSE feature distributions of four kinds of vibration signals are shown in Figure 8.
Entropy 2016, 18, 0007 13 other 10 features reflect the energy distribution in the frequency domain (WTFEf).For clarity, each type only shows three data points.Figure 7 shows that different types of vibration signals have significantly different feature distributions.According to these differences, the classifier can achieves a good classification effect.In order to prove the diagonosis ability of different feature presentation methods, we present a comparison between the WTFE and WSE.Firstly, the STMM is divided into 50 submatrixes along the time axis.Then the WSE of each of submatrix is calculated based on Equations ( 14)-( 16 From Figure 8, the WSE feature distributions of different kinds of vibration signals show different characteristics.However, we can reveal some disadvantages of the WSE method by comparing Figures 7 and 8. First, the WSE method can't clearly and visually display the real change From Figure 8, the WSE feature distributions of different kinds of vibration signals show different characteristics.However, we can reveal some disadvantages of the WSE method by comparing Figures 7 and 8. First, the WSE method can't clearly and visually display the real change rules of vibration signals.Second, the WSE feature vectors of the same types of signals are more dispersed than the WTFE ones.Third, the difference between the high value and the low value in the WTFE feature vector is too small.These latter two characteristics of WSE method will degrade the performance of a classifier.

Classification Using OCSVM-SVM
We select the improved PSO to optimize the parameters of OCSVM in two-dimensional space.According to the abovementioned adjustment method of the inertia weight and accelerated coefficient, a program is written to realize the parameter optimization.The number of swarms is 30 and the number of iterations is 50.After running PSO, we obtain the optimal solution v " 0.82, σ " 17.68.Figure 9 shows the relationship between fitness and iterations.From Figure 9, we can find that the globally optimal solution has appeared in the ninth iteration, and in later iterations swarms just to get close to the particle which has the optimal fitness.Therefore, the average fitness increases gradually.
Entropy 2016, 18, 0007 14 rules of vibration signals.Second, the WSE feature vectors of the same types of signals are more dispersed than the WTFE ones.Third, the difference between the high value and the low value in the WTFE feature vector is too small.These latter two characteristics of WSE method will degrade the performance of a classifier.

Classification Using OCSVM-SVM
We select the improved PSO to optimize the parameters of OCSVM in two-dimensional space.According to the abovementioned adjustment method of the inertia weight and accelerated coefficient, a program is written to realize the parameter optimization.The number of swarms is 30 and the number of iterations is 50.After running PSO, we obtain the optimal solution 0.82 v  , 17.68   . Figure 9 shows the relationship between fitness and iterations.From Figure 9, we can find that the globally optimal solution has appeared in the ninth iteration, and in later iterations swarms just to get close to the particle which has the optimal fitness.Therefore, the average fitness increases gradually.1, where, the state determination accuracy (STA) reflects the ability of the classifier to determine whether the circuit breaker is healthy or not and the classification accuracy (CA) reflects the ability of the classifier to identify the specific type of a sample.From the diagnosis results, the OCSVM-SVM failed to recognize the normal sample completely, but it can accurately classify all the fault samples into the correct fault type.In fact, for the fault diagnosis of circuit breakers, the risk of recognizing fault samples as normal samples is much higher than that of recognizing normal samples as fault samples, therefore this approach still achieves a  1, where, the state determination accuracy (STA) reflects the ability of the classifier to determine whether the circuit breaker is healthy or not and the classification accuracy (CA) reflects the ability of the classifier to identify the specific type of a sample.From the diagnosis results, the OCSVM-SVM failed to recognize the normal sample completely, but it can accurately classify all the fault samples into the correct fault type.In fact, for the fault diagnosis of circuit breakers, the risk of recognizing fault samples as normal samples is much higher than that of recognizing normal samples as fault samples, therefore this approach still achieves a good diagnosis effect.When the WSE is selected as the input feature vector, we can get a comparative results shown in Table 2. From Tables 1 and 2 we can find that the diagnosis results with the WSE method are greatly inferior to those of the new approach, especially for the classification of normal state conditions.Thus the WSE method is not suitable for the feature extraction of vibration signals of HVCBs.
To explain the merit of OCSVM-SVM against other popular used classifiers, comparison experiments between SVM, ELM based classifier and the new approach are designed.The training method and test samples of SVM and ELM are the same as in the new approach.The experimental results are shown in Table 3. Table 3 shows that the SVM and ELM methods have about the same classification ability.They all correctly identify all samples of type I and II faults.For the type III faults, the two classifier fail to identify all samples.The AC of SVM is 90% and that of ELM is 85%.Since the OCSVM-SVM can identify all samples of the fault type I, II and III, the fault recognition ability of the new approach is better than that of SVM and ELM.
In a real power system environment, there may some types of new faults that we have not recorded before.Once this happens, the multiple fault classifiers cannot identify this fault type because a lack of training samples.Therefore, it is very important that the classifier can determine it as a fault state accurately.Considering this case, we compare the STA of OCSVM-SVM, SVM and ELM.Suppose the fault type III is the unknown fault, then no samples of fault type III are involved in the training of the three types of classifiers.Twenty sets of type III fault vibration data are selected as the new test samples.The diagnosis results are shown in Table 4.
In Table 4, neither of the two methods can correctly identify the specific fault type without training samples, but the STA of OCSVM-SVM is 100% while that of SVM and ELM is 0. That means OCSVM-SVM can correctly determine the state of the fault whose type has not been recorded before.Therefore, we can conclude that the OCSVM-SVM method has better fault detection capability and it is more suitable for circuit breaker fault diagnosis which requires a higher reliability.

Conclusions
This paper presents a new method based on WTFE and improved OCSVM for mechanical fault diagnosis of HVCBs.ST is employed to process and analyze vibration signals.The WTFE is selected as the vibration signal feature.It characterizes the signal in the time domain and frequency domain.A new classifier based on OCSVM-SVM is built to improve the classification performance of the diagnosis system.An optimal PSO algorithm is used to optimize the OCSVM parameters.Experimental results show that the new approach has higher STA and CA than the traditional SVM and ELM methods, and the accuracy for some faults increases by more than 10%.Especially in the mechanical fault condition analysis of fault types without training samples, the new method shows a conspicuous advantage, therefore, the new method can significantly increase power system security and reliability.

Figure 1 .
Figure 1.The partition of time-frequency plane.

Figure 1 .
Figure 1.The partition of time-frequency plane.

Figure 3 .
Figure 3.Comparison between the principles of OCSVM and SVM.

Figure 3 .
Figure 3.Comparison between the principles of OCSVM and SVM.

) where 1i c and 1 f c are the initial value and final value of 1 c , 2i c and 2 f c are the initial value and final value of 2 c
. There is a symmetry variation in this paper, namely 1 c linearly decreases from 2.5 to 0.5 and 2

Figure 4 .
Figure 4.The flow of the fault diagnosis process.

Figure 4 .
Figure 4.The flow of the fault diagnosis process.

Figure 5 .Figure 6 .
Figure 5.The vibration signal acquisition system for a circuit breaker.

Figure 5 .
Figure 5.The vibration signal acquisition system for a circuit breaker.

Figure 5 .Figure 6 .
Figure 5.The vibration signal acquisition system for a circuit breaker.

Figure 6 .
Figure 6.(a) The normal signal and its STMM contour plot; (b) The signal of fault type I and its STMM contour plot; (c) The signal of fault type II and its STMM contour plot; (d) The signal of fault type III and its STMM contour plot.

Figure 7 .
Figure 7. (a) WTFE feature distribution of the normal signals; (b) WTFE feature distribution of the iron core jam fault signals; (c) WTFE feature distribution of the base screw looseness fault signals; (d) WTFE feature distribution of the lack of mechanical lubrication fault signals.

Figure 7 .
Figure 7. (a) WTFE feature distribution of the normal signals; (b) WTFE feature distribution of the iron core jam fault signals; (c) WTFE feature distribution of the base screw looseness fault signals; (d) WTFE feature distribution of the lack of mechanical lubrication fault signals.

Figure 8 .
Figure 8.(a) WSE feature distribution of the normal signals; (b) WSE feature distribution of the iron core jam fault signals; (c) WSE feature distribution of the base screw looseness fault signals; (d) WSE feature distribution of the lack of mechanical lubrication fault signals.

Figure 8 .
Figure 8.(a) WSE feature distribution of the normal signals; (b) WSE feature distribution of the iron core jam fault signals; (c) WSE feature distribution of the base screw looseness fault signals; (d) WSE feature distribution of the lack of mechanical lubrication fault signals.

Table 1 .
Diagnosis results using the new approach.

Table 1 .
Diagnosis results using the new approach.

Table 2 .
Diagnosis results with the feature vector of WSE.

Table 3 .
Diagnosis results by using the SVM and ELM methods.

Table 4 .
Diagnosis results of the case of lack of training samples by using the OCSVM-SVM, SVM and ELM methods.