Mechanical Fault Diagnosis of High Voltage Circuit Breakers Based on Variational Mode Decomposition and Multi-Layer Classifier

Mechanical fault diagnosis of high-voltage circuit breakers (HVCBs) based on vibration signal analysis is one of the most significant issues in improving the reliability and reducing the outage cost for power systems. The limitation of training samples and types of machine faults in HVCBs causes the existing mechanical fault diagnostic methods to recognize new types of machine faults easily without training samples as either a normal condition or a wrong fault type. A new mechanical fault diagnosis method for HVCBs based on variational mode decomposition (VMD) and multi-layer classifier (MLC) is proposed to improve the accuracy of fault diagnosis. First, HVCB vibration signals during operation are measured using an acceleration sensor. Second, a VMD algorithm is used to decompose the vibration signals into several intrinsic mode functions (IMFs). The IMF matrix is divided into submatrices to compute the local singular values (LSV). The maximum singular values of each submatrix are selected as the feature vectors for fault diagnosis. Finally, a MLC composed of two one-class support vector machines (OCSVMs) and a support vector machine (SVM) is constructed to identify the fault type. Two layers of independent OCSVM are adopted to distinguish normal or fault conditions with known or unknown fault types, respectively. On this basis, SVM recognizes the specific fault type. Real diagnostic experiments are conducted with a real SF6 HVCB with normal and fault states. Three different faults (i.e., jam fault of the iron core, looseness of the base screw, and poor lubrication of the connecting lever) are simulated in a field experiment on a real HVCB to test the feasibility of the proposed method. Results show that the classification accuracy of the new method is superior to other traditional methods.


Introduction
As an integral part of the power system, high-voltage circuit breakers (HVCBs) are responsible for the control and protection of the system. HVCB faults will directly harm system reliability, causing significant outage costs. Therefore, the study of fault diagnostic methods for HVCBs is urgent. An inquiry about HVCB faults by the International Council on Large Electric Systems (CIGRE) showed that 39% of minor faults and 44% of major faults are of mechanical origin [1]. Hence, the research on mechanical fault diagnosis of HVCBs has practical significance. Vibration signals generated during the opening/closing operations of HVCBs contain certain important information associated with the mechanical state of breakers. Runde et al. [2] demonstrated through an extensive HVCB diagnostic test that vibration analysis is a suitable and reliable noninvasive diagnostic method for HVCBs. Analysis is constructed by two OCSVMs and an SVM. The first OCSVM (OCSVM1) trained by normal samples determines whether a test sample is in the fault state. The second OCSVM (OCSVM2) trained by all available fault samples identifies whether the type of the fault samples is new. SVM is adopted to identify the known fault type. Comparative experiments are designed with the measured fault data of real HVCBs to validate the new method.

Acceleration Sensor
Acceleration is a physical quantity that characterizes an object's movement. Vibration essence is the reciprocating movement of an object. Thus, vibration data can be obtained by measuring the acceleration with an acceleration sensor. Integrated electronics piezo electric (IEPE) acceleration sensor is widely used and can obtain HVCB vibration signals well. It also has the following advantages: small size, light weight, low noise, and anti-interference capability. This paper adopts a CA-YD-182A piezoelectric acceleration sensor to measure HVCB vibration data. The main technical indicators of the CA-YD-182A include ±250 g (g = 9.8 m/s 2 ) measuring range, 20 mV/g sensitivity, 40 kHz natural frequency, 10 kHz frequency response, a maximum output voltage of 6 V, and a weight of 9 g.

Data Acquisition System
In this paper, the CA-YD-182A acceleration sensor and an NI 9234 data acquisition card are applied to build the vibration signal acquisition system for HVCBs. The measuring object is the LW9-72.5 series, which is an outdoor high-voltage SF 6 circuit breaker. The acquisition system of HVCB vibration signals and its block diagram are shown in Figure 1. The acceleration sensor is used to measure the vibrational state of HVCBs and produce the corresponding voltage signals. The voltage signals are digitized by using the NI 9234. When the circuit breaker receives an opening command, the system starts sampling. The sampling rate is 25.6 kS/s, and the sampling period is 150 ms.

Acceleration Sensor
Acceleration is a physical quantity that characterizes an object's movement. Vibration essence is the reciprocating movement of an object. Thus, vibration data can be obtained by measuring the acceleration with an acceleration sensor. Integrated electronics piezo electric (IEPE) acceleration sensor is widely used and can obtain HVCB vibration signals well. It also has the following advantages: small size, light weight, low noise, and anti-interference capability. This paper adopts a CA-YD-182A piezoelectric acceleration sensor to measure HVCB vibration data. The main technical indicators of the CA-YD-182A include ±250 g (g = 9.8 m/s 2 ) measuring range, 20 mV/g sensitivity, 40 kHz natural frequency, 10 kHz frequency response, a maximum output voltage of 6 V, and a weight of 9 g.

Data Acquisition System
In this paper, the CA-YD-182A acceleration sensor and an NI 9234 data acquisition card are applied to build the vibration signal acquisition system for HVCBs. The measuring object is the LW9-72.5 series, which is an outdoor high-voltage SF6 circuit breaker. The acquisition system of HVCB vibration signals and its block diagram are shown in Figure 1. The acceleration sensor is used to measure the vibrational state of HVCBs and produce the corresponding voltage signals. The voltage signals are digitized by using the NI 9234. When the circuit breaker receives an opening command, the system starts sampling. The sampling rate is 25.6 kS/s, and the sampling period is 150 ms. In an actual measurement, the installation location and the method of the acceleration sensor affect the performance of the acquisition system. The principle for selecting measurement position is that the sensor does not affect the normal operation of the measured object, and the position is close to the object or the most concerned point of the object. In this paper, the sensor is installed on the mechanism box near the operating mechanism. Acceleration sensor installation methods mainly include handheld magnetic adsorption, glue bonding, and screw fixation. An adhesive mounting is selected according to the actual demand of the diagnosis of HVCB mechanical fault.

Fault Diagnosis Process
The new method proposed in this paper consists of three parts: feature extraction, state detection, and fault recognition. In feature extraction, the features of vibration signals are extracted In an actual measurement, the installation location and the method of the acceleration sensor affect the performance of the acquisition system. The principle for selecting measurement position is that the sensor does not affect the normal operation of the measured object, and the position is close to the object or the most concerned point of the object. In this paper, the sensor is installed on the mechanism box near the operating mechanism. Acceleration sensor installation methods mainly include handheld magnetic adsorption, glue bonding, and screw fixation. An adhesive mounting is selected according to the actual demand of the diagnosis of HVCB mechanical fault.

Fault Diagnosis Process
The new method proposed in this paper consists of three parts: feature extraction, state detection, and fault recognition. In feature extraction, the features of vibration signals are extracted by using VMD and LSVD methods. In state detection, the normal or fault state of the HVCB is determined by OCSVM1. In fault recognition, the fault type is recognized using OCSVM2 and SVM. The fault diagnosis process is shown in Figure 2, in which OCSVM1 is trained by the normal samples, and OCSVM2 is trained by all available fault samples (fault samples with known types). For a test sample, the new method recognizes a fault as a normal condition, known with a specific fault type, or unknown without specific fault type. by using VMD and LSVD methods. In state detection, the normal or fault state of the HVCB is determined by OCSVM1. In fault recognition, the fault type is recognized using OCSVM2 and SVM. The fault diagnosis process is shown in Figure 2, in which OCSVM1 is trained by the normal samples, and OCSVM2 is trained by all available fault samples (fault samples with known types). For a test sample, the new method recognizes a fault as a normal condition, known with a specific fault type, or unknown without specific fault type.

VMD Theory
VMD process is the solving of a variational problem. Therefore, this algorithm can be divided into the construction and solution of the variational problem. VMD involves three key concepts: classic Wiener filtering, Hilbert transform, and frequency mixing.


Construction of the variational problem The VMD turns an input signal h into K modes. Each mode mk is mostly compact around a center frequency k. The variational problem can be described as seeking the K modes to make the sum of all bandwidths of the modes minimum. The constraint condition is that the sum of each mode is equals to the input signal h. The detailed construction scheme is as follows: (1) The associated analytic signal of each mode mk is computed by the Hilbert transform to obtain the unilateral frequency spectrum; (2) The frequency spectrum of each mode is tuned to the respective estimated center frequency by mixing with the exponential  k jω t e ; (3) The bandwidth is estimated through the squared 2 L -norm of the gradient of the demodulated signal. The constrained variational problem is written as:

VMD Theory
VMD process is the solving of a variational problem. Therefore, this algorithm can be divided into the construction and solution of the variational problem. VMD involves three key concepts: classic Wiener filtering, Hilbert transform, and frequency mixing.

•
Construction of the variational problem The VMD turns an input signal h into K modes. Each mode m k is mostly compact around a center frequency ω k . The variational problem can be described as seeking the K modes to make the sum of all bandwidths of the modes minimum. The constraint condition is that the sum of each mode is equals to the input signal h. The detailed construction scheme is as follows: (1) The associated analytic signal of each mode m k is computed by the Hilbert transform to obtain the unilateral frequency spectrum; (2) The frequency spectrum of each mode is tuned to the respective estimated center frequency by mixing with the exponential e −jω k t ; (3) The bandwidth is estimated through the squared L 2 -norm of the gradient of the demodulated signal. The constrained variational problem is written as: where {m k } = {m 1 , m 2 , · · · , m K } is the set of all modes, {ω k } = {ω 1 , ω 2 , · · · , ω K } are the corresponding center frequencies, δ (t) is the Dirac function, and * denotes the convolution.
• Solution of the variational problem A constrained variational problem can become unconstrained by introducing a Lagrange multiplier α and a quadratic penalty factor η. The Lagrange multiplier enforces constraints strictly; and the quadratic penalty factor guarantees the reconstruction fidelity of the signal with Gaussian noise. The augmented Lagrange expression is as follows [29]: The alternating direction method of multipliers (ADMM) solves the saddle point of the augmented Lagrange. m n+1 k , ω n+1 k , and α n+1 are alternately updated using the ADMM approach. The updates of m n+1 k , ω n+1 k , and α n+1 are as follows (see Appendix A for the detailed solution process): where· denotes the FT of ·, and τ is the update parameter of the Lagrange multiplier. The mode m n+1 k can be obtained as the real part of the inverse FT ofm n+1 k . VMD estimates the mode m k and center frequency ω k constantly through an iteration. For a given convergence tolerance e > 0, the termination condition of this iteration is:

Simulated Vibration Signal Analysis Based on VMD
The vibration signal of HVCBs consists of a series of vibration events. It can be described by a set of exponentially decaying sinusoidal signals, which is as follows [5]: where n is the number of vibration events, ε(t) is the unit step function, A i is the amplitude of the ith vibration event, µ i is attenuation coefficient, f i is oscillation frequency, and t i is the starting time of vibration. The vibration events V 1 to V 5 generated by MATLAB compose the simulated vibration signal for HVCBs. The parameter of each vibration event is shown in Table 1. The waveforms of the simulated vibration signal and each vibration event with a signal-to-noise ratio (SNR) of 20 dB are shown in Figure 3, in which the sampling rate is 25.6 kS/s.  EMD has been proven to be a suitable method for the vibration signal processing of HVCBs. We mainly compare the performances of VMD and EMD to decompose this simulated vibration signal (with an SNR of 20 dB). In addition, VMD performance is also compared with a few new and improved EMD-related methods, i.e., LMD [9], ensemble EMD (EEMD) [30], and complete EEMD (CEEMD) [31]. The original vibration events and IMFs decomposed by these five methods are shown in Figure 4.     When the number of vibration events and corresponding parameters of the simulated signal are known, the performance of each signal-processing method can be determined by comparing the EMD has been proven to be a suitable method for the vibration signal processing of HVCBs. We mainly compare the performances of VMD and EMD to decompose this simulated vibration signal (with an SNR of 20 dB). In addition, VMD performance is also compared with a few new and improved EMD-related methods, i.e., LMD [9], ensemble EMD (EEMD) [30], and complete EEMD (CEEMD) [31]. The original vibration events and IMFs decomposed by these five methods are shown in Figure 4.  EMD has been proven to be a suitable method for the vibration signal processing of HVCBs. We mainly compare the performances of VMD and EMD to decompose this simulated vibration signal (with an SNR of 20 dB). In addition, VMD performance is also compared with a few new and improved EMD-related methods, i.e., LMD [9], ensemble EMD (EEMD) [30], and complete EEMD (CEEMD) [31]. The original vibration events and IMFs decomposed by these five methods are shown in Figure 4.  When the number of vibration events and corresponding parameters of the simulated signal are known, the performance of each signal-processing method can be determined by comparing the When the number of vibration events and corresponding parameters of the simulated signal are known, the performance of each signal-processing method can be determined by comparing the correlation degrees of their modes and the original vibration events. Figure 4b shows that the signal is decomposed into five IMFs by VMD, and each IMF is mostly the same as the corresponding vibration event in Figure 4a. That is, the VMD approach can decompose vibration signals thoroughly. Conversely, we obtain approximately 10 IMFs through EMD, LMD, EEMD, and CEEMD approaches. In Figure 4c, the modes decomposed by EMD have a serious mode aliasing problem, especially for the second mode. Although LMD is better than EMD in some aspects, such as the endpoint effect suppression and algorithm speed, it shows almost the same performance as EMD with modal aliasing in this study. Both EEMD and CEEMD can eliminate modal aliasing to a certain extent, but the latter has a better effect. EMD and its derivation algorithms cannot effectively separate the vibration events from the composite vibration signal because of the limitation of its algorithmic nature. Consequently, the characteristics (such as starting time and spectrum) of each mode obtained by EMD and other similar methods are almost irrelevant with the original signal characteristics; thus, these modes fail to reflect the physical significance of each vibration event, i.e., existence of false modes. Therefore, the VMD method is more suitable for the feature extraction of HVCB vibration signals.

Determining the Number of K Modes of VMD
The number of K modes should be predefined in VMD method. Each mode component of VMD contains local features of the original signal at a center frequency and different time scales. A great number of K modes suggests that VMD has abundant frequency components. The reconstructed signals by K modes will be highly similar to the original signal. The measured vibration signals of HVCBs contain a large number of vibration components; thus, the analysis should focus on the main vibration event rather than all vibration components. Therefore, we determine the number of K modes by comparing the similarity measure between the reconstructed and original signals.
Distance measure is a common measure of pattern similarity. Normalized distance (ND) is selected to evaluate the similarity between the original and reconstructed signals using different mode numbers. The ND of two discrete signals p = (p 1 ,p 2 , . . . p n ) and q = (q 1 ,q 2 , . . . q n ) is defined as: VMD is used to decompose the simulated vibration signal with different K and compute the corresponding reconstructed signals. The NDs between the reconstructed and original signals with different K are shown in Figure 5. correlation degrees of their modes and the original vibration events. Figure 4b shows that the signal is decomposed into five IMFs by VMD, and each IMF is mostly the same as the corresponding vibration event in Figure 4a. That is, the VMD approach can decompose vibration signals thoroughly. Conversely, we obtain approximately 10 IMFs through EMD, LMD, EEMD, and CEEMD approaches. In Figure 4c, the modes decomposed by EMD have a serious mode aliasing problem, especially for the second mode. Although LMD is better than EMD in some aspects, such as the endpoint effect suppression and algorithm speed, it shows almost the same performance as EMD with modal aliasing in this study. Both EEMD and CEEMD can eliminate modal aliasing to a certain extent, but the latter has a better effect. EMD and its derivation algorithms cannot effectively separate the vibration events from the composite vibration signal because of the limitation of its algorithmic nature. Consequently, the characteristics (such as starting time and spectrum) of each mode obtained by EMD and other similar methods are almost irrelevant with the original signal characteristics; thus, these modes fail to reflect the physical significance of each vibration event, i.e., existence of false modes. Therefore, the VMD method is more suitable for the feature extraction of HVCB vibration signals.

Determining the Number of K Modes of VMD
The number of K modes should be predefined in VMD method. Each mode component of VMD contains local features of the original signal at a center frequency and different time scales. A great number of K modes suggests that VMD has abundant frequency components. The reconstructed signals by K modes will be highly similar to the original signal. The measured vibration signals of HVCBs contain a large number of vibration components; thus, the analysis should focus on the main vibration event rather than all vibration components. Therefore, we determine the number of K modes by comparing the similarity measure between the reconstructed and original signals.
Distance measure is a common measure of pattern similarity. Normalized distance (ND) is selected to evaluate the similarity between the original and reconstructed signals using different mode numbers. The ND of two discrete signals p = (p1,p2,…pn) and q = (q1,q2,…qn) is defined as: VMD is used to decompose the simulated vibration signal with different K and compute the corresponding reconstructed signals. The NDs between the reconstructed and original signals with different K are shown in Figure 5.  Figure 5 shows that the ND almost does not change when K becomes greater than 5 and remains at a near-zero value. In this case, the similarity between the original and reconstructed signals is maximized, i.e., the reconstructed signal contains all the main information characteristics of the original signal. Hence, the optimal number of modes of VMD is set at 5, which is consistent  Figure 5 shows that the ND almost does not change when K becomes greater than 5 and remains at a near-zero value. In this case, the similarity between the original and reconstructed signals is maximized, i.e., the reconstructed signal contains all the main information characteristics of the original signal. Hence, the optimal number of modes of VMD is set at 5, which is consistent with the number of vibration events contained in the original vibration signal. Accordingly, ND method is effective for mode number selection.

SVM
SVM, proposed by Vapnik in 1995, has many advantages in solving small-sample, high-dimensional, and nonlinear pattern recognition problems [32]. The basic principles of SVM are mapping the data samples from a low-dimensional space to a high-dimensional one and making the indivisible low-dimensional data become linearly separable. A linear partition is then used to determine the classification boundary. The classification principle of SVM is shown in Figure 6. with the number of vibration events contained in the original vibration signal. Accordingly, ND method is effective for mode number selection.

SVM
SVM, proposed by Vapnik in 1995, has many advantages in solving small-sample, high-dimensional, and nonlinear pattern recognition problems [32]. The basic principles of SVM are mapping the data samples from a low-dimensional space to a high-dimensional one and making the indivisible low-dimensional data become linearly separable. A linear partition is then used to determine the classification boundary. The classification principle of SVM is shown in Figure 6. M a r g i n We suppose that the training sample set ( is composed of two different sample classes. The samples are linearly separable when a hyperplane 0  b = w x can correctly divide them into two classes, i.e., when they satisfy: The samples that satisfy 1 are called support vectors. The distance between two classes of support vectors is 2 w , i.e., the classification margin is 2 w . The goals of SVM are to seek the optimal hyperplane under the constraints in Equation (9), and make 2 w as maximum and 2 2 w as minimum: For most situations, the samples in the training set are linearly inseparable. SVM introduces a slack variable i ξ to reduce the constraint to ( . Meanwhile, penalty factor C is introduced to control the degree of punishment to error-classifying samples. Thus, the objective function becomes: This problem can be solved through saddle point of the Lagrange function, which is constructed as: We suppose that the training sample set (x i , y i )(i = 1, 2, · · · , l; x i ∈ R d , y i ∈ {−1, 1}) is composed of two different sample classes. The samples are linearly separable when a hyperplane w · x + b = 0 can correctly divide them into two classes, i.e., when they satisfy: The samples that satisfy |w · x i + b| = 1 are called support vectors. The distance between two classes of support vectors is 2/ w , i.e., the classification margin is 2/ w . The goals of SVM are to seek the optimal hyperplane under the constraints in Equation (9), and make 2/ w as maximum and w 2 /2 as minimum: For most situations, the samples in the training set are linearly inseparable. SVM introduces a slack variable ξ i to reduce the constraint to y i (w · x i + b) ≥ 1 − ξ i . Meanwhile, penalty factor C is introduced to control the degree of punishment to error-classifying samples. Thus, the objective function becomes: This problem can be solved through saddle point of the Lagrange function, which is constructed as: where α i > 0 is Lagrange coefficient. Equation (12) is converted into the following dual problem according to dual theory: The optimal solution of the quadratic programming problem α = [α 1 , α 2 , · · · , α l ] T can be obtained, followed by optimal w and b. The optimal decision function is: where sgn(z) is sign function, which equals +1 for z ≥ 0 and −1 otherwise. For a nonlinear classification problem, SVM uses kernel function φ(x) to map the sample data from a low-dimensional space to a high-dimensional, making these samples linearly separable. The kernel function is defined as follows: After introducing the kernel function, Equation (13) becomes: The decision function becomes:

OCSVM
OCSVM also maps the training data into a high-dimensional feature space by using the kernel function. OCSVM aims to separate sample data from the origin with a maximum margin, which is different from SVM. The object and no-object samples are located on either side of the hyperplane. The classification principle of OCSVM is shown in Figure 7. For convenience, we still use {x i } (i = 1, 2, · · · , l; x i ∈ R d ) to represent the training sample set. where 0 i α > is Lagrange coefficient. Equation (12) is converted into the following dual problem according to dual theory: The optimal solution of the quadratic programming problem can be obtained, followed by optimal w and b. The optimal decision function is: where sgn(z) is sign function, which equals +1 for 0  z and −1 otherwise. For a nonlinear classification problem, SVM uses kernel function ( ) φ x to map the sample data from a low-dimensional space to a high-dimensional, making these samples linearly separable.
The kernel function is defined as follows: After introducing the kernel function, Equation (13) becomes: The decision function becomes:

OCSVM
OCSVM also maps the training data into a high-dimensional feature space by using the kernel function. OCSVM aims to separate sample data from the origin with a maximum margin, which is different from SVM. The object and no-object samples are located on either side of the hyperplane. The classification principle of OCSVM is shown in Figure 7. For convenience, we still use Figure 7. The classification principle of OCSVM.
Similar to SVM, the classification hyperplane of OCSVM is expressed as OCSVM solves the following quadratic programming problem: Similar to SVM, the classification hyperplane of OCSVM is expressed as w · φ (x) − b = 0. OCSVM solves the following quadratic programming problem: where v ∈ (0, 1] is the margin of error that controls the number of outliers. The decision function is as follows: The value of decision function f (x) is +1 or −1 along with Equation (19). f (x) is considered as the object sample when it takes the value of +1 in a test sample. Therefore, once w and b are solved, we can determine the sample class.
Lagrange multipliers are introduced to solve the above quadratic programming problem. The Lagrange function is as follows: where α i , β i ≥ 0 are Lagrange multipliers. We set the partial derivatives of variables w, ξ, b in Equation (20) equal to zero, yielding: Combined with the kernel function in Equation (15), the dual form of this optimization problem is described as: The support vector is located on the hyperplane; thus b can be found by support vector x i and the corresponding α i : The decision function together with Equation (15) can be transformed into a kernel expansion form: Figures 6 and 7 illustrate that the support vectors of OCSVM are on the classification hyperplane, whereas those of SVM are on both sides of the hyperplane with a certain distance. Accordingly, OCSVM can identify the non-target samples more accurately and has higher capability of fault identification than SVM in the fault diagnosis area of HVCBs.

Singular Value Decomposition (SVD)
SVD [33] is an important matrix decomposition method that is widely used in feature extraction. According to SVD theory, for an m × n matrix A (A ∈ R m×n ), there must exist two orthogonal matrices U m×m and V n×n , and a diagonal matrix Λ, satisfying: where λ i (i = 1, 2, · · · , r) is the singular value of matrix A, and λ 1 ≥ λ 2 ≥ · · · ≥ λ r ≥ 0. The singular value tends to correspond to the important information implied in the matrix, and the importance is in positive correlation with the value. The SVD of a matrix has the following property: We assume matrices A, B ∈ R m×n , and the singular values of A and B are λ 1 ≥ λ 2 ≥ · · · ≥ λ R ≥ 0 and σ 1 ≥ σ 2 ≥ · · · ≥ σ R ≥ 0, respectively, where R = min (m, n). Then: This property indicates that when matrix A has slight disturbance, the changes in singular values are not greater than the spectral radius of the perturbation matrix. Hence, the singular values of a matrix are insensitive to the changes in matrix elements.

Feature Extraction Based on LSVD
In the feature extraction of the vibration signal of circuit breakers, a few energy-based features, such as the time segmentation energy entropy (TSEE), are often used as signal features [9]. However, the energy feature of the signal is sometimes not enough to reflect the fault characteristics of the signal accurately. SVD is an effective method to extract the algebraic feature of a matrix, which can better reflect the changes in the internal characteristics of the signal.
LSVD method is used in this study to extract HVCB vibration features to improve the disturbance detection capability of SVD. A sample sequence of length N can be decomposed into K IMFs by VMD. The data length of each IMF is also N. Hence, the size of the IMF matrix is K × N. The research in [34] showed that the singular values of the entire matrix cannot indicate the local and detailed features of the matrix. For some faults of HVCBs, such as time delay fault, the singular values of the entire matrix tend not to reflect the fault characteristic information. Therefore, more detailed local information in the time domain is required. The local information of HVCB vibration signals at different time periods is obtained using LSVD method, which is as follows: (1) VMD is used for decomposing HVCB vibration signals to obtain the IMF matrix.

Feature Analysis of Measured Vibration Signal
HVCB vibration data are collected using the acquisition system in Figure 1. As mentioned previously, the number of VMD modes should be predefined. According to the abovementioned method for determining the number of modes, we use VMD to decompose the four types of vibration signals with different K. The NDs between their corresponding reconstructed and measured signals are then computed, which are shown in Figure 9.  Figure 9 shows that the NDs of the four signal types decrease with the increase of K. When K is greater than 10, the changes in ND values show signs of leveling off. The number of K modes is set to 10 to guarantee that the four signal types can be effectively decomposed.
The normal and fault vibration signals are decomposed by VMD, and the corresponding IMFs are shown in Figure 10. Ten modes of each signal type are arranged from top to bottom based on the increase in center frequencies, and the red dashed line indicates the starting time ts of a normal vibration signal. Figure 10 indicates some characteristics of fault signals in the time or frequency domain. Compared with the normal state, the vibration of Fault I has a significant time delay. The amplitudes of the last seven modes of Fault II are significantly smaller than the normal state, i.e., the vibration focuses on a lower-frequency area. The vibration time duration in different modes of Fault III is longer than other types of signals because of the poor lubrication of the connecting lever. As mentioned previously, the number of VMD modes should be predefined. According to the abovementioned method for determining the number of modes, we use VMD to decompose the four types of vibration signals with different K. The NDs between their corresponding reconstructed and measured signals are then computed, which are shown in Figure 9. As mentioned previously, the number of VMD modes should be predefined. According to the abovementioned method for determining the number of modes, we use VMD to decompose the four types of vibration signals with different K. The NDs between their corresponding reconstructed and measured signals are then computed, which are shown in Figure 9.  Figure 9 shows that the NDs of the four signal types decrease with the increase of K. When K is greater than 10, the changes in ND values show signs of leveling off. The number of K modes is set to 10 to guarantee that the four signal types can be effectively decomposed.
The normal and fault vibration signals are decomposed by VMD, and the corresponding IMFs are shown in Figure 10. Ten modes of each signal type are arranged from top to bottom based on the increase in center frequencies, and the red dashed line indicates the starting time ts of a normal vibration signal. Figure 10 indicates some characteristics of fault signals in the time or frequency domain. Compared with the normal state, the vibration of Fault I has a significant time delay. The amplitudes of the last seven modes of Fault II are significantly smaller than the normal state, i.e., the vibration focuses on a lower-frequency area. The vibration time duration in different modes of Fault III is longer than other types of signals because of the poor lubrication of the connecting lever.  Figure 9 shows that the NDs of the four signal types decrease with the increase of K. When K is greater than 10, the changes in ND values show signs of leveling off. The number of K modes is set to 10 to guarantee that the four signal types can be effectively decomposed.
The normal and fault vibration signals are decomposed by VMD, and the corresponding IMFs are shown in Figure 10. Ten modes of each signal type are arranged from top to bottom based on the increase in center frequencies, and the red dashed line indicates the starting time t s of a normal vibration signal. Figure 10 indicates some characteristics of fault signals in the time or frequency domain. Compared with the normal state, the vibration of Fault I has a significant time delay. The amplitudes of the last seven modes of Fault II are significantly smaller than the normal state, i.e., the vibration focuses on a lower-frequency area. The vibration time duration in different modes of Fault III is longer than other types of signals because of the poor lubrication of the connecting lever. The LSVD method is adopted to extract the features of vibration signals. The LSV feature vectors of the normal and three types of fault conditions are shown in Figure 11. For clarity, each type only displays three feature vectors. The LSVD method is adopted to extract the features of vibration signals. The LSV feature vectors of the normal and three types of fault conditions are shown in Figure 11. For clarity, each type only displays three feature vectors. The LSVD method is adopted to extract the features of vibration signals. The LSV feature vectors of the normal and three types of fault conditions are shown in Figure 11. For clarity, each type only displays three feature vectors.  Figure 11 presents that the feature vectors of different types of vibration signals have significant differences. The peak of the feature vector of normal condition appears around the fourth feature, whereas that of Fault I appears around the seventh feature, that of Fault II appears around the sixth feature, and that of Fault III appears around the fifth feature. The variations in the 10th to 20th features of the four signals are also different. The classifier can make a good classification according to the differences among these feature vectors. These feature vectors roughly reflect the energy distributions of the corresponding vibration signals in the time domain from Figures 8 and 11.
We use the whole SVD (WSVD) method to extract the features of vibration signals, validating the LSV feature vectors. The entire matrix is directly decomposed into K (K = 10 here) singular values by SVD [35]. The whole singular value (WSV) feature vectors of the four types of vibration signals are shown in Figure 12. The WSVD method may not distinguish normal from Fault I signals, as presented in Figure 12. Fault I is essentially a time delay fault that contains the same vibration rules as normal signals. Thus, almost all the major elements of the IMF matrix of Fault I are the same as those of the normal signal. Consequently, the WSV feature of Fault I tends to be nearly equal to the normal condition. WSVD method cannot directly reflect the vibration laws of the original signal over time, unlike LSVD. Thus, LSVD approach is more suitable for the feature extraction of HVCB vibration signals.

Fault Classification Using MLC
The LSV feature vectors are entered into the MLC to achieve the relevant classification results. The MLC consists of three classifiers: OCSVM1, OCSVM2, and SVM. These classifiers need to be trained first. For each type of vibration signals, 40 vibration data are included. We select 20 data randomly as the training samples and the other 20 data as test samples. OCSVM1 is trained using normal training samples, whereas OCSVM2 and SVM are trained by fault training samples. SVM is the most widely used classifier in HVCB fault diagnosis and has achieved a good classification effect. We compare the classification performances of MLC and SVM. The experiment results are  Figure 11 presents that the feature vectors of different types of vibration signals have significant differences. The peak of the feature vector of normal condition appears around the fourth feature, whereas that of Fault I appears around the seventh feature, that of Fault II appears around the sixth feature, and that of Fault III appears around the fifth feature. The variations in the 10th to 20th features of the four signals are also different. The classifier can make a good classification according to the differences among these feature vectors. These feature vectors roughly reflect the energy distributions of the corresponding vibration signals in the time domain from Figures 8 and 11.
We use the whole SVD (WSVD) method to extract the features of vibration signals, validating the LSV feature vectors. The entire matrix is directly decomposed into K (K = 10 here) singular values by SVD [35]. The whole singular value (WSV) feature vectors of the four types of vibration signals are shown in Figure 12.  Figure 11 presents that the feature vectors of different types of vibration signals have significant differences. The peak of the feature vector of normal condition appears around the fourth feature, whereas that of Fault I appears around the seventh feature, that of Fault II appears around the sixth feature, and that of Fault III appears around the fifth feature. The variations in the 10th to 20th features of the four signals are also different. The classifier can make a good classification according to the differences among these feature vectors. These feature vectors roughly reflect the energy distributions of the corresponding vibration signals in the time domain from Figures 8 and 11.
We use the whole SVD (WSVD) method to extract the features of vibration signals, validating the LSV feature vectors. The entire matrix is directly decomposed into K (K = 10 here) singular values by SVD [35]. The whole singular value (WSV) feature vectors of the four types of vibration signals are shown in Figure 12. The WSVD method may not distinguish normal from Fault I signals, as presented in Figure 12. Fault I is essentially a time delay fault that contains the same vibration rules as normal signals. Thus, almost all the major elements of the IMF matrix of Fault I are the same as those of the normal signal. Consequently, the WSV feature of Fault I tends to be nearly equal to the normal condition. WSVD method cannot directly reflect the vibration laws of the original signal over time, unlike LSVD. Thus, LSVD approach is more suitable for the feature extraction of HVCB vibration signals.

Fault Classification Using MLC
The LSV feature vectors are entered into the MLC to achieve the relevant classification results. The MLC consists of three classifiers: OCSVM1, OCSVM2, and SVM. These classifiers need to be trained first. For each type of vibration signals, 40 vibration data are included. We select 20 data randomly as the training samples and the other 20 data as test samples. OCSVM1 is trained using normal training samples, whereas OCSVM2 and SVM are trained by fault training samples. SVM is the most widely used classifier in HVCB fault diagnosis and has achieved a good classification effect. We compare the classification performances of MLC and SVM. The experiment results are The WSVD method may not distinguish normal from Fault I signals, as presented in Figure 12. Fault I is essentially a time delay fault that contains the same vibration rules as normal signals. Thus, almost all the major elements of the IMF matrix of Fault I are the same as those of the normal signal. Consequently, the WSV feature of Fault I tends to be nearly equal to the normal condition. WSVD method cannot directly reflect the vibration laws of the original signal over time, unlike LSVD. Thus, LSVD approach is more suitable for the feature extraction of HVCB vibration signals.

Fault Classification Using MLC
The LSV feature vectors are entered into the MLC to achieve the relevant classification results. The MLC consists of three classifiers: OCSVM1, OCSVM2, and SVM. These classifiers need to be trained first. For each type of vibration signals, 40 vibration data are included. We select 20 data randomly as the training samples and the other 20 data as test samples. OCSVM1 is trained using normal training samples, whereas OCSVM2 and SVM are trained by fault training samples. SVM is the most widely used classifier in HVCB fault diagnosis and has achieved a good classification effect.
We compare the classification performances of MLC and SVM. The experiment results are shown in Table 2. "New Fault" in the Table refers to the new type of fault that has not been recorded before, i.e., the unknown fault type. According to the results in Table 2, three types of fault states are correctly recognized by the MLC method, and their classification accuracies are 100%. Conversely, three samples of Fault III are recognized as normal samples by SVM, and the corresponding classification accuracy is 85%. This comparison shows that the new approach has a higher capability of fault identification. For normal state, two samples are wrongly classified by MLC and one by SVM. For HVCBs, normal samples that are recognized as fault samples will not cause accidents and outage cost. Moreover, the operational reliability of the device is not reduced by the new method. Therefore, the new method improves the accuracy of fault diagnosis while ensuring the reliability of HVCBs. When the WSV is selected as the input feature vector of the classifier in this paper, the corresponding classification results using MLC and SVM are shown in Table 3. The accuracy of fault diagnosis using WSVD method is lower than that using the LSVD method, as shown in Tables 2 and 3. It illustrates that the WSVD approach is unsuitable for the feature presentation of HVCB vibration signals. Besides, the entire classification accuracy of MLC remains higher than that of SVM in such a situation.
A new fault type without training sample appearing in test samples is also considered. We assume that Fault III is the new fault, and the training samples of Fault III do not participate in the training of OCSVM2 and SVM. The classification results are shown in Table 4. The test samples of Fault III are selected as the test sample set. The classification results of MLC and SVM are compared under this situation and are shown in Table 4. Table 4 shows that when a new fault type occurs, SVM cannot accurately identify the fault samples because of the lack of corresponding training. All fault samples are recognized as the normal state to reduce the fault diagnosis accuracy of SVM significantly. Conversely, MLC can identify a fault state with 100% accuracy. Thus, the new method has higher accuracy for the diagnosis of unknown new fault types. When a new fault is recognized, we can determine its specific fault type according to the overall report made by the maintenance personnel. In this way we can continue to accumulate fault samples and get more fault types.

Conclusions
This paper proposes a diagnosis method for HVCB mechanical faults based on VMD and MLC. The simulation and practical tests demonstrate the following advantages of the new approach: (1) Compared with EMD, the mode decomposed by VMD has a clearer physical meaning. The latter can reduce the influence of false modes for feature extraction and has a better property of feature presentation for vibration signals.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A
As mentioned in the main text, the variational problem is written as: The corresponding augmented Lagrange is constructed as: The variational problem (A1) can be solved through the saddle point of the augmented Lagrange (A2). In this paper, ADMM approach is used to solve the saddle point, in which m n+1 For simplicity, · n+1 and · n are omitted for the fixed directions m i =k and ω k , respectively. They represent the most recent available updates. Problem (A4) can be solved in the frequency domain based on the Parseval-Plancherel theorem, which is as follows: Equation (A6) can be written as the integral over the non-negative frequencies using Hermitian symmetry, which is as follows: The solution of the quadratic optimization problem is: Equation (A8) shows thatm n+1 k is the Wiener filtering of the current residual with the signal prior of 1/ (ω − ω k ) 2 .m k (ω) can be transformed into mode m k (t) using inverse FT.
Similarly, ω n+1 k is updated as follows: The center frequencies ω k appear only in the first term of Equation (A2). Thus, the relevant problem can be written as: This optimization problem is transformed into the Fourier domain and eventually turns into the following form: The solution of this quadratic problem is: It shows that the new ω k is the center of gravity of the power spectrum of the recent mode.
Finally, the update of α n+1 is as follows: where τ is the update parameter of the Lagrange multiplier.