1. Introduction
The universal circuit breaker is a critical control and protection device in power systems [
1,
2]. Its operational state directly influences the security of the power grid and the reliability of the power supply [
3]. However, universal circuit breakers are susceptible to malfunctions that might seriously jeopardize the electrical system’s ability to operate safely [
4]. Therefore, diagnosing the operational state of universal circuit breakers and detecting faults is paramount [
5].
The feature extraction technique of vibration signals and the fault diagnosis models are the primary fields of interest in studying the fault diagnosis of universal circuit breakers. Universal circuit breakers generate vibration signals during operation that contain substantial characteristic information [
6,
7]. However, these vibration signals are complex and high-dimensional and require signal processing with feature extraction. Insufficient extraction may lead to under-representation, while too many feature dimensions can lead to redundant representations, affecting diagnostic results’ accuracy [
8]. Researchers have conducted extensive investigations into universal circuit breaker feature extraction methods. Fourier transform is adequate in identifying frequency information and processing periodic signals. However, it struggles with nonlinear and non-stationary signals and cannot capture instantaneous changes in the time and frequency domains of the signal [
9]. Wavelet transform efficiently highlights localized information in the time-frequency domain and helps analyze nonlinear and unstable vibration signals. [
10]. However, it is susceptible to the influence of adjacent harmonics. Empirical Mode Decomposition (EMD) applies to processing nonlinear and non-stationary signals. It can address the issue of near harmonics by decomposing signals into multiple Intrinsic Mode Functions (IMFs) [
11]. However, EMD suffers from mode aliasing and end effects [
12]. Compared to EMD [
13], Ensemble Empirical Modal Decomposition (EEMD) resolves the modal aliasing and end-point effects problems. However, it requires manual selection of the amplitude parameter, lacks a solid theoretical basis, and can lead to inaccurate signal decomposition. VMD is a method that can effectively handle nonlinear and non-stationary signals [
14]. It exhibits superior fault feature extraction capabilities. VMD can resolve harmonic issues through IMF decomposition [
15] and avoid mode aliasing and end effects. Moreover, VMD is grounded in a solid theoretical foundation [
16], allowing for the scientific selection of hyperparameters, and its results have been extensively verified. Applying the VMD method to the feature extraction process of universal circuit breaker vibration signals [
17] is expected to yield promising results.
For universal circuit breakers, support vector machines (SVM), random forests, and neural networks often utilize fault diagnostic models [
18,
19,
20,
21]. SVM can handle high-dimensional vibration signal data and maintain good diagnostic accuracy in small sample spaces. However, adjusting its kernel function and hyperparameters can be challenging. Additionally, SVM is a binary classifier, which increases model complexity when dealing with multiclassification problems [
22]. Random forests are effective in handling multiclassification problems and have easier parameter tuning. However, they have limitations regarding model expressiveness and high-dimensional feature mapping [
23]. Neural network models are widely employed because they capture highly complex data relationships in large sample spaces and effectively handle multiclassification problems. However, the working characteristics of universal circuit breakers make it difficult to obtain many fault samples, which limits the potential application of neural network models [
24,
25,
26]. The emergence of DBNs [
27] has provided new approaches to address these challenges. DBNs possess excellent nonlinear mapping capabilities, allowing them to learn complex mapping relationships [
28,
29,
30] efficiently. They demonstrate good model representation ability even in small sample spaces, presenting a significant advantage in fault diagnosis. However, the performance of DBNs is greatly influenced by parameters such as the number of neurons in the hidden layers, the network learning rate, and the number of backward fine-tuning. The proper configuration of these parameters is crucial for achieving high recognition accuracy. To address these challenges, we introduce a promising intelligent optimization algorithm, the WOA [
31,
32,
33,
34]. WOA offers advantages such as avoiding local optima, high adaptability, and fast and accurate parameter optimization. It provides an effective means to optimize DBN parameters and improve fault diagnosis accuracy.
Based on the analysis above, this paper proposes a universal circuit breaker fault diagnosis method that combines VMD and an improved DBN. This method initially decomposes the vibration signal into multiple modal components using VMD. These modal components’ time and frequency domain features information are extracted to construct the feature sample space. The parameters of the DBN are optimized using the WOA. Subsequently, the optimized DBN is utilized for fault diagnosis of universal circuit breakers, resulting in a multi-classification fault diagnosis model suitable for small sample spaces.
  2. Experiment Setup and Data Acquisition
This section focuses on the design of the experimental setup for collecting vibration signals and generating corresponding faulty vibration data for fault diagnosis.
  2.1. Experimental Dataset
Figure 1 illustrates the experimental setup for collecting vibration signals under operating conditions. This paper used the universal circuit breaker model HSW1-2000, manufactured by Hangzhou Zhijiang Switchgear Stock Co., Ltd., Hangzhou, China, which is operated in 
Figure 1a. The sensor that acquires the vibration signal was mounted on the universal circuit breaker’s top beam, which satisfied the installation location and detection direction specifications while enabling non-invasive detection. The data acquisition system in 
Figure 1b consisted of a vibration sensor, constant current adapter, and oscilloscope. Vibration sensors converted mechanical vibrations into electrical signals that could be easily transmitted, transformed, processed, and stored. Constant current adapters provided vibration sensors with their appropriate voltage, and the oscilloscope was connected to the vibration sensor to collect the current signal from it.
 Under various operating conditions, the experimental platform gathered vibration signals from the universal circuit breaker. Six distinct operating conditions were simulated during the universal circuit breaker closing process, including normal conditions, mechanical structure stuck, insufficient core travel, core stuck, high operating voltage, and low operating voltage. The vibration acceleration sensor was YK-YD500, with a maximum measurable impact acceleration of 500 g. The vibration signals were analyzed, the sampling frequency was 500 kHz, and the sampling duration was 20 ms.
Table 1 shows the configuration of the data collection system for universal circuit breakers. The signal was sampled at 500 kHz and stored on a USB flash disk using an oscilloscope. Subsequently, the data were transmitted to the upper computer for further analysis.
 In this paper, five different fault conditions were simulated in the laboratory using specific methods, as described below:
- Mechanical structure stuck fault: A 5 mm-thick copper sheet was wrapped twice around the bottom collision mechanism to simulate this fault. This created resistance during switching and closing impacts, as  Figure 2- a depicts. 
- Insufficient core travel fault: A 10 mm × 10 mm × 0.5 mm wooden block was fixed to the core travel terminals to simulate this fault. This prevented them from fully resetting during the switching operation, thus simulating insufficient core travel fault. Refer to  Figure 2- b for the fault simulation method. 
- Core stuck fault: The fault simulation method for this type of fault involved inserting a cylindrical copper sheet with a thickness of 1 mm around the iron core of the opening and closing tripper. This generated resistance during the opening and closing operation, as shown in  Figure 2- c. 
- High operating voltage fault: In this case, the operating voltage was artificially set to 110% of rated voltage to simulate the fault. For the fault simulation method, refer to  Figure 2- d. 
- Low operating voltage fault: In this case, the operating voltage was artificially set to 90% of rated voltage to simulate the fault. For the fault simulation method, refer to  Figure 2- e. 
For universal circuit breakers, the closing spring saves energy for the opening spring to use during the closing process. It was discovered that the vibration signal energy was higher during the closing process than during the opening process. Moreover, additional notable failure signs may be seen in the vibration signal characteristics during the closing operation. Therefore, this study collects and analyzes vibration signals, especially when the universal circuit breaker is closed. A total of 100 sets of vibration signals were measured for each fault type using the data acquisition system. 
Table 2 provides an overview of the data distribution.
Figure 3 displays the time-domain vibration signals of the universal circuit breaker under different fault simulation scenarios. These signals exhibit prominent nonlinear and unsteady characteristics. It is worth noting that the vibration signal generation process in universal circuit breakers is brief, typically lasting around 20 ms, as depicted in 
Figure 3. Despite the short duration, these signals contain significant characteristic information. However, directly analyzing the time-domain signals poses challenges in achieving effective fault diagnosis [
35]. Therefore, performing VMD decomposition and feature extraction operations on the vibration signals becomes crucial [
36].
   2.2. VMD Approach for the Vibration Signal
VMD is used to analyze the signal and extract fault characteristics. Dragomiretskiy et al. proposed the VMD hypothesis in 2014 [
37]. The essence of this model is to decompose the original signal into 
 time series 
 by iteratively searching a variational model and finally obtaining the constrained variational model as
        
        where 
 is the number of decomposed modes; 
 is the 
 modal components after decomposition; 
 is the center frequency of each modal component after decomposition; 
 is the convolution operation; 
 is the original signal.
By introducing the quadratic penalty factor 
 and Lagrange operator 
, the VMD algorithm transforms it into an unconstrained variational problem whose expression is
        
The alternating direction multiplier algorithm is used to iteratively update , ,  to find the “saddle point” of Equation (2) and obtain the optimal solution:
Step 1: Initializes , , ,  = 0.
Step 2: Make , .
Step 3: Update 
 iteratively for 
.
        
Step 4: Update the Lagrange operator.
        
        where 
 is the Lagrangian multiplier, 
 = 0 is usually set.
Step 5: Repeat Step 2 to Step 4 until the conditions for terminating the iteration are met.
        
        where 
 is usually set to 1 × 10
−6.
Based on a prior experience [
14], the VMD parameter applicable to the vibration signals of universal circuit breakers derived from comparative experiments are 
 and 
, and the results of the comparison experiments are shown in the 
Section 3.4. This section focuses on showing the parameters that have been selected. Using the mechanical structure stuck fault as an example, 
Figure 4 displays the decomposition result following the signal’s VMD decomposition. The same treatment is applied to the remaining five fault types. To validate the effectiveness of the VMD decomposition, the modal components obtained from the decomposition are subjected to spectral analysis. The results are presented in 
Figure 5. It is evident that the modal components are essentially separated from one another by virtue of their independent center frequencies.
To analyze the universal circuit breaker’s vibration signals more comprehensively, the extracted time-domain features include standard deviation, mean square root, peak-to-peak value, square root of the mean square value, kurtosis, skewness, waveform factor, crest factor, impulse factor, margin factor, generalized energy, variance, absolute average, maximum value, and shape factor, as shown in 
Table 3.
Performing a discrete Fourier transform on the vibration signal yields the spectrum 
 and the power spectrum 
, 
v = 1, 2, 3, …, 
V, Here, 
 represents the number of frequency points, 
 represents the Kth frequency value, 
 represents the weighted average of all frequency values, 
 represents the amplitude of the 
kth frequency value, and 
 represents the weighted average of all frequency amplitudes. The spectrum and power spectrum are analyzed to obtain various frequency domain features. Among these features are spectrum mean, spectrum standard deviation, spectrum root mean square, power spectrum means, power spectrum variance, power spectrum entropy, power spectral centroid, centroid frequency, center frequency, standard deviation of frequency, spectral skewness, and kurtosis of frequency, as shown in 
Table 4.
In summary, a set of signals is initially subjected to VMD, extracting eight modal components. Each modal component contains 27 time and frequency domain eigenvalues, resulting in 216 eigenvalues. These eigenvalues are subsequently fused and normalized for further analysis. The same data processing approach is applied to the remaining five fault types, extracting characteristic data for each fault type. The generated dataset, as shown in 
Figure 6, exhibits significant differences among the fault types. This observation suggests that both time and frequency domain signal features can serve as effective indicators for distinguishing different fault types. Furthermore, these features can be vectors for subsequent fault diagnosis tasks.
  3. Methodology
  3.1. Deep Belief Network
DBN is a neural network architecture combining multiple Restricted Boltzmann Machines (RBMs) layers to form a multi-layer perceptron. The DBN is designed to extract essential features from data by progressively abstracting information from lower to higher layers. Each RBM in the DBN consists of a visible layer (
v) and a hidden layer (
h), with weights (
w) connecting the layers. 
Figure 7 illustrates a DBN structural model stacked by three RBMs. Each RBM consists of a two-layer network, i.e., a visible layer (
v) and a hidden layer (
h). The layers are connected by weights w. The first visible layer v
1 serves as the initial input data and forms the first RBM (RBM1) with the first hidden layer h
1. The first hidden layer 
h1 then serves as the second visible layer 
v2, forming a second RBM (RBM2) with the second hidden layer 
h2. Similarly, the second hidden layer h
2 serves as the third visible layer 
v3, forming a third RBM (RBM3) with the third hidden layer 
h3. While the layers are internally independent of each other, the data can be transformed to each other between the layers by the activation function according to the RBM learning rule. After the RBM learns the input data of the lower layer, its output results are used as inputs for the higher layer. This process keeps going layer by layer, forming a feature representation at the higher layer that is more abstract and representational than that of the lower layer.
The DBN network accomplishes the integration of data and categories through a probabilistic generative model. It trains and optimizes the network layer by layer, generating a network structure that maximizes the probability of training data. This approach allows for the extraction of correlations between data points and facilitates the tasks of classification and identification. Neurons within the same layer operate independently, resulting in a network with solid independence, enhanced computational power, and improved training speed.
  3.2. Whale Optimization Algorithm
Mir Jalili et al. proposed WOA, a bionic optimization method [
31], based on the excellent hunting methods adopted by the humpback whale population. As a superior global optimization method, the WOA method has significant exploration and exploitation capabilities. The target prey in WOA is the current optimal solution. To try to catch its meal, the whale utilizes a bubble net technique while traveling in a decreasing circle around the target during the feeding phase. In addition to using bubble nets, whales may also locate prey at random. To facilitate global search exploration, during the exploration phase, the whales are pushed to swim towards an individual whale that is selected at random rather than the best candidate solution that is now accessible. With a 50% chance of updating whale’s the location, the spiral model or the shrink surround mechanism is chosen at each stage of optimization. 
Figure 8 displays the flowchart for WOA.
The following are the precise stages of WOA:
Step 1:Set up the WOA parameters, such as the spiral size constant , the maximum number of iterations , the size of the whale population , and the dimension . Additionally, choose the whales’ position at random .
Step 2: The level to which each whale adapted to its surroundings was assessed. The ideal individual was identified, along with its prey location.
        
Step 3: When the number of iterations  reaches , the procedure ends with the output of the optimal solution; alternatively, go on to step 4.
Step 4: Assign a probability 
 and a probability and a coefficient vector 
 at random. The whale’s position is updated according to the adaptive change of the search vector 
 and the probability 
. The shrink-wrap mechanism updates the position. When 
 and 
, the shrink-wrap mechanism updates the position as
        
        where 
 is the number of iterations that are currently in progress, 
 and 
 represent the current and next position vector, respectively. 
 denotes the current optimal position vector, Throughout the computation, 
 is a random vector in the range [0, 1], 
 is a vector that declines linearly from 2 to 0, 
 and 
 are two coefficient vectors, and the whale is located via a random search. When 
 and 
.
        
        where the location vector 
 is random.
Nevertheless, when 
, the whale’s location is updated as it ascends to the goal position as
        
        where 
 symbolizes a constant spiral size, and 
 is a random amount in the range [−1, 1].
And the procedure goes back to step 2 when the whale position has been updated.
  3.3. Deep Belief Networks Optimized by Whale Optimization Algorithm
The hyperparameter settings of DBN have a direct impact on its learning capacity. Compared to single-hidden layer structures, multi-hidden layer structures have superior learning capabilities. Nevertheless, the model’s capacity for generalization is diminished, and the precision of the model’s classification is impacted when the hidden layer has more than four layers [
38]. Consequently, the DBN in this research is composed of three hidden layers. The control variable approach is typically used to determine the DBN’s hyperparameters. Nevertheless, the following shortcomings of the widely used approaches exist [
39]:
- The abundance of search parameters makes it challenging to implement. 
- Narrow search scope. As a result, the found optimal value is not the genuine best value; instead, it is merely the outcome of this narrow range. 
- Lack of theoretical foundation in science. 
A method called WOA-DBN was proposed to optimize the hyperparameters of DBN using WOA, and its flowchart is illustrated in 
Figure 9. Thus, the number of neurons in the three hidden layers, the number of backward fine-tuning, and the network learning rate are the hyperparameters that need to be identified.
The following are WOA-DBN’s significant steps:
Step 1: The DBN’s search range for optimized hyperparameters is determined beforehand empirically: these hyperparameters include the number of neurons in the first hidden layer 
, second hidden layer 
, third hidden layer 
, the network learning rate 
, and the number of backward fine-tuning 
. The number of neurons in the first, second and third hidden layers (
, 
 and 
) ranges from 1 to 1000, the network learning rate 
 ranges from 0.000001 to 0.1, and the number of backward fine-tuning 
 is searched in the range of 1~100. The optimized DBN parameters are used to determine the whale position of the WOA. The whale position 
 is denoted as
        
Step 2: Prior to this, several whale parameters were established by trials and evaluations, including the maximum number of iterations , the size of the whale population , and the spiral size constant , The dimension  was selected based on the number of parameters optimized by DBN.
Step 3: Based on the fitness value, the most excellent adaptable individual is selected as the prey, and its location is chosen as the target place. In other words, the fitness function is determined by obtaining the output value and the actual value based on the WOA-DBN model:
        where 
 is the number of samples, 
 is the actual value of the 
ith sample, and 
 is the output value of the 
ith sample obtained from the WOA-DBN model.
Step 4: The probability  and the coefficient vectors  ought to be assigned at random. Individual whales’ positions are updated. A shrink-wrap method is employed when  and . The spiral model is applied when  and . When , a haphazard hunt for prey is conducted.
Step 5: When the number of iterations approaches the maximum number of iterations , the optimal position of each individual whale is output. Otherwise, step 3 of the process is repeated.
Step 6: Decide which whale has the best position, which is represented by the optimized DBN parameter.
The ideal DBN model’s parameters, following WOA optimization, are 216-471-668-456-6. The network learning rate 
 = 9.06878 × 10
−4, the number of backward fine-tuning 
, and the parameters of the WOA-optimized DBN network are illustrated in 
Table 5—parameters of the WOA-optimized DBN network.
  3.4. Fault Diagnosis Based VMD and WOA-DBN
Figure 10 illustrates the universal circuit breaker fault diagnostic procedure using VMD and WOA-DBN.
 Step 1: Obtain the vibration signal of the universal circuit breaker in various fault situations.
Step 2: Use VMD to analyze the signal and obtain a series of modal components.
Step 3: To acquire the feature components, extract the modal components’ time-domain and frequency-domain features.
Step 4: Use WOA for the fault diagnostic task to improve the DBN parameters. Next, the optimized DBN is used as a fault diagnostic method.
  4. Results and Discussion
The proposed approach is compared with the current mainstream machine learning algorithms such as SVM, Backpropagation Neural Network (BPNN), Long Short-Term Memory Network (LSTM), Stacked Self-Encoder (SDAE), and DBN. The parameters of the comparison models are set as follows:
The kernel function selects the RBF (Radial Basis Function) function based on the time and frequency domain features of the multiple modal components of the VMD decomposition. Based on the grid optimization findings, the penalty coefficient C was adjusted to 9, and the gamma function’s radius was adjusted to g = 0.008. The precision of the stopping training error was set to 1 × 10−3.
- 2.
- BPNN 
Multiple modal components’ time and frequency domain features after VMD decomposition were used as data inputs, the three-layer network structure 216-100-10 with WOA optimization was used, and the network learning rate was set as η = 0.001.
- 3.
- LSTM 
Multiple modal components’ time and frequency domain features after VMD decomposition were used as data inputs, the network structure was 216-100-50-6, and the network learning rate was set as η = 0.001.
- 4.
- SDAE 
Multiple modal components’ time and frequency domain features after VMD decomposition were used as data inputs, the network structure was 216-100-50-6, and the network learning rate was set as η = 0.001.
- 5.
- DBN 
A stepwise decreasing neuron selection approach used time and frequency domain feature data input. The network structure was determined as 216-200-100-50-6, and the network learning rate was set as η = 0.001.
- 6.
- WOA-DBN 
Network layer structure 216-471-668-456-6, η = 9.06878 × 10−4.
Overall, 10 iterations of the experiments were conducted in this paper, with each experimental model using identical training and test sets. The results of the 10 experiments are displayed in 
Figure 11. 
Figure 11 illustrates how the accuracy of SVM, BPNN, and LSTM models was low, and the average accuracy was 0.7253, 0.7938, and 0.7929, respectively. This is because SVM relies heavily on expert experience and requires much experience and multiple attempts to select better features for a particular task, resulting in lower model accuracy. In short sample datasets, the BPNN model was ineffective since it settles into the local optimum throughout the training phase and needs plenty of data. The SDAE model had an accuracy of 0.8403. The total accuracy was poor because of the SDAE model’s restricted feature processing capability. The DBN model had an accuracy rate of 0.8631, which is why DBN relies heavily on selecting hyperparameters and using appropriate hyperparameters will significantly raise the DBN model’s accuracy rate.
Figure 12 shows the box plots and scatter plots for 10 training runs. The box plot of the WOA-DBN approach adopted in this paper has the highest height and the narrowest width, which can help determine the high-precision diagnosis of the universal circuit breaker. In addition, the scatter plot shows that the WOA-DBN model had the most negligible dispersion and the best stability, indicating that it is more robust.
 In this experiment, the recognition rate of each fault was tracked through various models to understand better the model’s recognition rate for each fault category and to rule out extreme cases of recognition rates for specific types of faults. 
Figure 13 illustrates the confusion radar chart of the fault diagnosis results of various diagnosis approaches and the diagnosis results of various faults. Every axis denotes a fault condition of a universal circuit breaker, and the length of the axis indicates how accurately that fault may be identified using different techniques. The analysis findings demonstrate how well the approach described in this work can identify different kinds of faults. The diagnostic accuracy of every state is the highest. In the meanwhile, the WOA-DBN model’s minimum recognition rate for diverse types of defects was 95.24%, demonstrating its stability and dependability.
The average accuracy, standard deviation, precision, recall, F1-score, and testing time of the 10 replicate experiments are shown in 
Table 6. Upon analysis of the result, the WOA-DBN model outperformed the other models in many indices. The WOA-DBN algorithm took the longest time to test, but no more than 1 s, which is basically no different from several other algorithms. The disadvantage of testing time is basically negligible, and the model has good prospects for engineering applications.
To verify the optimality of the VMD parameter settings, comparative experiments were conducted using VMD parameters with 
K = 6, 8, 10 and 
α = 2000, 3000, 4000. The diagnostic models employed for these comparisons were based on the WOA-DBN approach. 
Table 7 presents the comparative indices of diagnostic models under differing VMD parameter configurations. It was concluded that the selected VMD parameters were the most efficient combination to process the vibration signals of the universal circuit breaker.
  5. Conclusions
This paper suggests an approach based on VMD and WOA-DBN to diagnose the different types of faults in universal circuit breakers. The approach extracts features from the modal components in the time and frequency domains using variational modal decomposition to decompose the original vibration signal. The fault diagnostic model uses these feature vectors as its dataset. For the purpose of training DBN around the ideal solution, WOA was used to conduct a global search on the network learning rate of DBN, the number of backward fine-tuning, and the number of neurons in the hidden layers of DBN. The optimal hyperparameters of the DBN fault diagnosis model were obtained. Fault data of universal circuit breaker model HSW1-2000 were selected for validation, and SVM, BPNN, LSTM, SDAE, and DBN were chosen as diagnostic methods for comparison. According to the results, the experimental WOA-DBN had a diagnostic accuracy of 96.63%, with a standard deviation of 0.0103. Compared with the comparison model, the WOA-DBN could select more suitable training parameters, effectively distinguish different features, and had evident benefits in the area of classification recognition accuracy, which effectively solved the inaccuracy of the single vibration signal feature extraction with the problem of low accuracy of the fault diagnosis model, as it could accurately achieve the fault classification and identification of universal circuit breakers. The empirical results indicate that the fault diagnosis approach proposed in this paper provides fresh concepts and approaches for fault diagnosis using vibration signals, and the proposal of this model will not only provide the necessary guidance for the research of universal circuit breaker fault diagnosis but also can be used to fault diagnosis of rolling bearings, rotating machinery and other mechanical equipment that generates vibration signals. But the method presented in this paper still presents issues that deserve further research; for example, no attempt has been made to fuse vibration signals, current signals, and sound signals to perform fault diagnosis together. Therefore, the next step will be to introduce multi-category data for information fusion and try to use more advanced algorithms to optimize the model to further improve the generality of the proposed method.