1. Introduction
Modern mechanical equipment develops towards complexity and automation; hence, minor malfunctions of parts may bring serious chain reactions. Rolling bearings, as a type of important component of rotating machinery, are widely used in wind turbines, compressors, high-speed railways, and other modern mechanical equipment. Bearing failure may cause the failure of the whole equipment, resulting in significant economic loss and even casualties. Therefore, it is necessary to monitor the working condition of rolling bearings and diagnose bearing faults in time to prevent potential accidents, which can be realized by intelligent fault diagnosis methods widely studied in recent years [
1,
2].
The framework for an intelligent fault diagnosis method can be divided into three parts [
3]: signal acquisition, fault feature extraction, and fault classification. The objects of the signal acquisition include many kinds of signals, such as vibration signals [
4], acoustic signals [
5] and electrical signals [
6], among which the vibration signals are most focused as they contain rich essential information of mechanical faults. Common signal processing methods for feature extraction features include short Fourier transform (STFT) [
7], wavelet transform (WT) [
8], empirical mode decomposition (EMD) [
9], principal component analysis (PCA) [
10] and stochastic resonance (SR) [
11,
12], etc. Fault classification is used to determine the health conditions of mechanical equipment. Common fault classification methods include deep learning, support vector machine (SVM) [
13], and the K-nearest neighbor (KNN) algorithm [
14]. The selections of the fault feature extraction method and the fault classification method are key to establishing a vibration-signal-based fault diagnosis method.
In practical applications, the fault features in the vibration signals are always weak when the fault is in its early stage or the faulty part operates in a terrible working environment. The weak features are difficult to extract through a normal feature extraction method, thus limiting the weak fault diagnosis performance of the corresponding intelligent fault diagnosis methods. Therefore, a feature extraction method with the capability of weak-signal enhancement should be selected in these cases. The stochastic resonance (SR) -based methods are a type of weak feature extraction method that can obtain a higher signal-to-noise ratio (SNR) compared to other traditional feature extraction methods [
15]. Stochastic resonance is a type of commonly seen nonlinear phenomenon, which was first proposed by Benzi in the 1980s to describe the periodicity associated with the ice age of the earth in climatology [
16]; it occurs under the cooperation of input weak signal, noise and a nonlinear system, which enables the weak signal to increase its amplitude by absorbing energy from noise. Hence, this interesting phenomenon is widely applied in physics and engineering research [
17], such as energy harvesting [
18] and weak-signal detection in bearing fault diagnosis [
19,
20]. Previous research shows that the weak-signal enhancement performance of the SR system is significantly influenced by the nonlinear system, which involves monostable systems [
21,
22], bistable systems [
23,
24,
25], tri-stable systems [
26] and multi-stable systems [
27]. Among these nonlinear systems, bistable systems such as the Duffing system are widely studied due to their advanced weak-signal detection performance [
28].
However, SR systems need to meet the small-parameter conditions [
29] (the frequency and amplitude of weak-signal should be within small-parameter ranges) due to the adiabatic approximation theory, which cannot be satisfied for most practical vibration signals. To solve this issue, a multiparameter adjusting SR system has been proposed [
30] by introducing an amplitude transformation coefficient and a frequency transformation coefficient, which are used to transform the amplitude and frequency of the signals to appropriate small-parameter ranges. Therefore, SR for larger-parameter signals can be achieved. In order to realize the SR output of the multiparameter adjusting system, the optimal parameters should be obtained, which can be realized by adaptive multiparameter optimization methods [
31], such as the particle swarm optimization (PSO) [
32], genetic algorithm (GA) [
33] and simulated annealing (SA) algorithm [
34], etc.; however, these algorithms all have their limitations in achieving optimal SR outputs. For example, the GA has a powerful global search capability but it is easy to trap in the local optimum [
35], while SA can escape from the local optimum but has a slow optimization speed [
36]. Therefore, the combination of some of these algorithms can improve their optimization performance, which has been studied in some literature [
37].
As for the fault classification method, deep learning technology has achieved great success in many fields in recent years due to its powerful feature extraction and classification capabilities [
38,
39]. The deep learning technologies include deep neural network (DNN) [
40], recurrent neural network (RNN) [
41] and convolutional neural network (CNN) [
42], etc. They can extract features from input automatically by building multiple neural layers and make predictions accordingly, which can also be used in many application fields including the fault diagnosis of mechanical equipment. In the past years, deep learning has been used as an advanced classification tool, which can effectively classify the signals obtained from traditional feature extraction methods [
43]. In this paper, the CNN, which is one of the main types of deep learning technology and has been applied widely in fault classification, was selected as the fault classification method. However, it was noted that its classification accuracy was not high enough when the quality of the raw data was poor [
44], such as the weak fault signals. Therefore, it is necessary to pre-process the raw data using a weak-fault feature extraction method to extract useful features, which can be further classified by the CNN.
In this work, a fault enhancement classification method combining the adaptive SR, which utilizes a hybrid optimization method combining the SA and GA as the weak-fault feature extraction method and a normal CNN as the fault classification method, is proposed for high-performance bearing weak-fault diagnosis. This paper is organized as follows. The multiparameter adjusting bistable Duffing system, the hybrid optimization method and a mapping method are introduced in
Section 2. In
Section 3, the normal CNN is presented by appropriately selecting its parameters. A series of simulations and experiments are conducted in
Section 4 to verify the proposed signal pre-processing and fault diagnosis methods. Conclusions are drawn in
Section 5.
2. Data Preprocessing Based on Adaptive Multiparameter Optimization of SR
In this section, the classical bistable Duffing system we investigated previously is introduced as a data preprocessing model for the further classification algorithm. A hybrid intelligent optimization algorithm combining genetic algorithm (GA) and simulated annealing (SA) was used to achieve stochastic resonance (SR) in this system and obtain the optimal parameters. To make the SR outputs able to be processed by the CNN for classification, a mapping method based on a noise intensity sequence is further proposed to convert the time-domain output of the Duffing system into an image that can be further processed in classification.
2.1. Introduction of the Bistable Duffing System That Can Achieve SR
The bistable Duffing system, which is a typical form of nonlinear system that can achieve SR, can be described as [
20]:
where
denotes the damping ratio;
and
are system parameters deciding the potential function of the system;
indicates a harmonic characteristic signal with amplitude
and frequency
;
indicates a Gaussian white noise with noise intensity
, where
is a zero-mean and unit-variance Gaussian white noise. In this system,
is defined as the input signal, and
is the output signal, which can be obtained by solving Equation (1) numerically.
In Equation (1), the term of
can be understood as the tangential force of a bistable potential field given by
, which has two stable equilibrium points at
and one unstable equilibrium point at
, as shown in
Figure 1. This shows that a potential barrier with a height of
separates two symmetrical potential wells, showing why the Duffing system is bistable. Moreover, the output
of Equation (1) can be understood as the trajectory of a unit-mass Brownian particle moving in the potential field
, which is suffered from the damping force
and the external excitation
as well. Stochastic resonance indicates an optimal matching result between the signal, noise, and a nonlinear system. When SR occurs, the particle can get energy from noise and cross the barrier regularly even though the amplitude of the signal is relatively low, thus enhancing the weak features of the weak signal. Hence, SR provides a feasible way to extract the features of the input signal from the enhanced output signal, especially under weak-signal conditions.
Previous research results show that due to the adiabatic approximation theory, the bistable Duffing system can only achieve SR under small-parameter conditions, i.e., the amplitude, frequency and noise intensity should be small [
30]; however, most of the practical engineering signals do not satisfy this small-parameter requirement. Hence, to enhance the signal features of such large-parameter signals, an improved multiparameter-adjusting SR model based on the bistable Duffing system was proposed by the authors previously [
45]. By introducing two adjusted parameters
and
, this model can be written as:
where
is the amplitude transform coefficient used to transform the amplitude of the input signal to an appropriate range, and
is the scale-transformation coefficient used to transform the time scale of the input signal from
to
. The scale transformation can be simply realized by applying a time step of
instead of
in numerical calculation, where
denotes the sampling frequency of the system. Therefore, the frequency of the characteristic signal (
) can be regarded as
in the numerical calculation. The large frequency of the input signal can be compressed accordingly in the numerical calculation by setting an appropriate value of
.
The output signal-to-noise ratio (SNR) of the Duffing system can be regarded as an objective optimization function to decide whether the system achieves SR. The output SNR of the system is defined as:
where
represents the single-side spectrum of input
, and
represents the single-side spectrum of the system output
. Moreover,
and
indicates the amplitudes of the system input signal and output signal at the characteristic frequency
.
2.2. Hybrid Optimization Algorithm Combining GA and SA
To achieve SR adaptively in a bistable Duffing system under an input signal with fixed parameters, an optimization algorithm is needed to obtain a group of appropriate system parameters (
,
), damping ratio (
) and adjusted parameters (
,
) that match the fixed input signal. Among various optimization algorithms, the genetic algorithm (GA) is an effective intelligent optimization algorithm when the objective optimization function is not differentiable, and it can obtain a local optimization value greater than 90% of the global optimization one in a short time [
35]. In the GA, every individual represents a solution, and its principle is to obtain the optimal population by selecting the parents to do cross and mutation according to the fitness function.
However, the optimal results obtained from the GA are local optimization solutions, which could become better for the objective optimization function (such as the output
of the Duffing system). This can be obtained by adopting the simulated annealing (SA), which is another important optimization algorithm proposed by Metropolis et al. in 1953 based on the solid annealing process in physics. The SA can accept a solution worse than the current one with a certain probability, resulting in a capacity of jumping out of the local optimal solution and reaching the global optimal solution. The probability of accepting a new solution in the Metropolis criterion in this paper is defined as:
where
and
are the new condition and temporary optimal condition, respectively, where
is the current updated temperature, and
is a Boltzmann constant set as
in this work [
34]. The new condition is undoubtedly accepted as the updated temporary optimal condition when
; while when
, the new condition can be also accepted if the acceptance probability
is greater than a random number between
, thus finally obtaining a satisfactory optimization result. The main disadvantage of the SA is its slow optimization speed, which is not satisfied in practical conditions such as the adaptive multiparameter optimization of SR for big data.
Therefore, a hybrid optimization algorithm (HOA) based on both the GA and the SA combining their advantages was utilized in this work for high-performance multiparameter optimization of SR. In the HOA, the Metropolis criterion of the SA was added in the parents’ selection of cross and mutate stages based on the GA framework. Hence, the HOA can get a better optimization solution through the capacity of the SA that jumps out of local optimization in a short time. As a result, we can get an acceptable SNR for the Duffing system in a short time using the HOA.
2.3. Data Optimization Based on Multiparameter-Adjusting SR of Duffing System
According to previous analyses, the proposed HOA provides an effective approach to achieving multiparameter-adjusting SR in a Duffing system, thus improving the quality of the input raw signal, and enhancing its SNR. The relevant data optimization method is presented in this subsection.
It is noted that the fourth-order Runge–Kutta algorithm is adopted in this work to solve the Duffing system. The HOA used in this paper adopts a binary encoding format, and each parameter consists of 15-bit binary numbers to guarantee sufficient resolution. Moreover, the value of
is pre-set as an appropriate value to ensure that the calculation results will not overflow. Hence, the optimization parameter dimension of the Duffing system is 4 (
,
,
, and
) in the optimization, and the flowchart of the optimization process to achieve multiparameter-adjusting SR of the Duffing system using the hybrid optimization algorithm is shown in
Figure 2.
In order to use roulette to select the crossed parents and mutation parents of the HOA (see
Figure 2), a fitness function
is defined:
where
, and
is the maximum number of iterations;
denotes the minimum value of the
in the whole population in the
iteration, and
denotes the value of the
of the
individual in the
iteration. It is noted that
with a quite small value is used in Equation (5) to avoid the fitness function equaling to
(
is set in this work), thus, the optimization parameters in terms of minimum
can be abandoned. Hence, the value of the fitness function
is always positive. Other parameters in the HOA were set as follows: population number
, cross probability
, mutation probability
, initial temperature
, minimum temperature
and update weight
. It is necessary to mention that
and
represent the random numbers in the cross and mutate stages of GA.
2.4. SR-Based Mapping Method with a Noise Intensity Sequence
When using the SR-based methods, engineers are required to have extensive experience, hence, it takes a lot of time and manpower to find out the characteristic frequency of the bearing fault vibration signal. This can be solved by using an intelligent diagnosis method proposed in this work that combines SR with a neural network classifier. For this purpose, the SR output of the Duffing system should be converted into a grey image, which can be further used for feature extraction and fault classification by the neural network. The process of the mapping method is as follows.
First,
continuous time domain points are intercepted from a raw signal to form a new signal
, which is always a noisy signal. To produce more feature information in one image, a sequence of noise
, whose noise densities are given by
is further added to the signal
. Therefore, a sequence of input signals can be obtained:
where
, where
is the pre-set number of the input signals.
Next, by inputting
into the Duffing system of Equation (2), an optimal parameter sets
and the output signal
can be obtained based on the proposed data optimization method. Therefore, the matrix of the output signals
can be further converted into a visual grey matrix
according to:
where
and
;
and
are the maximum and minimum values of
. It is noted that
and
are pre-set in this work.
Hence, for each detected signal, a grey image can be obtained by adding a group of noises with different intensities into the input signal and then being processed by the adaptive multiparameter adjusting Duffing system. More detected signals can produce more grey images, which can be further used by the convolutional neural network (CNN) for feature extraction and fault diagnosis.
3. Construction of the CNN
In this section, basic knowledge of a CNN is briefly introduced. The network architecture used for fault classification was obtained by modifying the parameters of the conventional visual geometry group (VGG) net architecture to satisfy the resolution of grey images obtained by the proposed mapping method. It is noted that compared to the conventional VGG, the batch normalization (BN) modules are added in the convolution layer and the dropout modules are added in the full connected (FC) layer to enhance the generalization of the VGG in this work.
3.1. Brief Introduction of CNN
The architecture of CNN is briefly introduced in this subsection. A CNN consists of some filter stage and one classification stage [
46]. The filter stage contains convolutional layers, activation layers, BN layers, and pooling layers.
The convolutional layer convolves the input local region with kernel filters, and the following activation layer generates a feature map. The kernel that extracts the local features keeps the same in each filter, thus reducing the complication of CNN. The convolution process is described as follows:
where
is a convolutional operator;
and
represent the weight and bias of the
th kernel filter from layer
to layer
;
denotes the
th local region of layer
,
denotes the output of layer
calculated by convolution. Moreover, the padding method is used in convolution to make full use of all the features of the grey images.
In the activation layer, a nonlinear activation function of Rectified Linear Unit (ReLU) is widely used to improve the expression ability of the whole network, which means the functions that can be expressed are more abundant. The ReLU can prevent the occurrence of overfitting by making the output of some neurons to be zero, thus resulting in the sparsity of networks.
A BN layer is further designed to speed up training and convergence of the network and reduce the shift of internal covariance. The pooling layer generally adopts the max-pooling layer, which enhances the generalization of the model by reducing the parameters while retaining the main features.
Moreover, the classification stage is composed of several FC layers. The FC layer is used for enhancing the generation of the model after convolution, and the number of neurons in the output layer denotes the types of bearing health conditions.
3.2. Architecture of the Proposed CNN Model
The whole architecture of the CNN used in this work is shown in
Figure 3, which includes convolutional layers, ReLU layers, BN layers, max-pooling layers, and FC layers. The number of convolutional layers depends on the size of the grey image produced by the proposed mapping method. Small convolutional kernels make the networks deeper, which helps to improve the generalization ability of the network, and the size of the convolutional kernel is set as
accordingly. The BN is implemented after the convolutional layers to accelerate the training process, and the ReLU is utilized in the next layer to prevent the occurrence of overfitting. Max-pooling is used to reduce the parameters of the networks, and the kernel size is set as
. The classification stage includes three FC layers for classification, and the output layer has ten outputs, which represent ten different bearing health conditions.
In the process of training, the number of iterations was set as 300, and an Adam optimizer was utilized to minimize the loss function, where the learning rate was set as initially. After every 100 iterations, the learning rate reduced 10 times to get more accurate optimal solutions.
4. Verification of the Proposed Method
The proposed method, which can be used for fault classification and fault identification in practical applications, is verified in this section. It is necessary to point out that the computer used for numerical simulations had a CPU of Intel(R) Core (TM) i5-10400 and RAM with 16.00 GB as its main configurations.
For practical noisy fault signals, the characteristic frequency
of the fault signal is always unknown in advance. Therefore, the characteristic frequency
should be pre-estimated according to the specific working environment before fault diagnosis, and the objective function
for optimization and
for comparison are redefined as follows:
where
and
; where
is the adjusted frequency resolution after scale transformation. Consequently, several spectral lines around the pre-estimated frequency are involved to avoid the characteristic spectral line being missing.
4.1. Verification of the HOA
In this subsection, the advantage of the HOA is verified by processing a simulated signal with different injected noises. The signal was set as a pure harmonic signal with amplitude and frequency Hz; the sampling frequency Hz and the sampling points . This signal was injected with noises of different intensities ranging from 0.04 to 5.12, and the obtained noisy signals were then input into the multiparameter-adjusting Duffing system shown in Equation (2). Both the HOA and the conventional GA were used to find the optimal for each signal.
Before optimizations, the value of
should be determined, and the range of each adjustable parameter (
,
,
,
) should be selected as well. The value of
, which is the scale transformation parameter, was fixed as
according to the large frequency domain and the large sampling frequency
to ensure that the calculation results will not overflow in the numerical simulation. Moreover,
is the amplitude transformation coefficient, whose selected range should be determined according to the amplitude of the input signal. Based on our previous research [
30], the value of
should be between 0.001 and 0.1, where
with
the single side spectrum of the input signal and
the number of spectral lines during the frequency range of
, which is the manual-selected frequency range involving
. In our simulation,
Hz and
Hz were set. For the given signals with different noises, the values of
are within
, hence, the range of
was selected for
to guarantee
. Moreover, the selected ranges of the other three parameters were set artificially as
,
and
in optimization. Under different noise intensities, the optimal
of the Duffing system obtained from both the HOA and the GA with a population number of 500 are shown in
Figure 4. The
of input signals with different noise intensities are plotted in this figure as well.
Figure 4 shows that the
of input signal presents a decreasing trend as the noise intensity increases, while both the HOA and the GA can achieve a relatively high
regardless of the noise intensity, demonstrating the feasibility of the multiparameter optimization algorithms in achieving SR in the Duffing system. Moreover, in most cases (
of
) the optimal
obtained from the HOA is larger than that obtained from the GA. The advantage of the HOA is it can be also quantitatively concluded that the average value of the optimal
obtained from the HOA (
dB) is larger than that obtained from the GA (
dB). This result indicates that the HOA has a higher possibility to obtain a better local optimization result compared to the GA.
Besides, it is convenient to set a large population number to obtain a local optimization close to the global optimization with a large
, but it takes a lot of extra time. Whereas, in practical analysis, the time for fault diagnosis is relatively short. Therefore, a smaller population number should be used in practical engineering to achieve acceptable time cost and
. Its influence on the classification results will be studied in
Section 4.2.4. The population number was set as 50 in the following of this section.
4.2. Application for Practical Bearing Fault Data Classification
4.2.1. Introduction of the Used Bearing Fault Data Set
In this subsection, the vibration signals of the rotating bearing from the bearing data center of Case Western Reserve University (CWRU) were processed by the proposed method, thus verifying its feasibility in bearing fault data classification and fault diagnosis. The test rig is shown in
Figure 5, which contained a motor with a load of up to 3 hp, a torque transducer or encoder, and a dynamometer.
In the test rig, the test bearings, which were deep groove ball bearings of type 6205-2RS JEM SKF, were used to support the motor shaft. The bearing details are listed in
Table 1. In the test, motor bearings were seeded with faults using electro-discharge machining. The diameters of the faults ranged from 0.007 to 0.04 inches, and the faults were separately located at the inner ring, outer ring and rolling element. Faulty bearings were installed onto the test motor, and the vibration data was recorded under the load of 0 to 3 hp (the motor speed was 1797 to 1720 rpm). Therefore, the bearings contained different faults with different health conditions, producing a variety of vibration signals of faulty bearings when they operated.
The bearing data used in the experiment was sampled at the end of the drive with a sampling frequency of 12,000 Hz. As the location of the fault relative to the bearing load area affected the vibration response of the whole motor system, the bearing data at 6 o’clock, 3 o’clock and 12 o’clock directions of the bearing load area were listed, respectively. In this work, the bearing data at 6 o’clock was used for verification.
However, the bearing data set only contains one bearing signal of each fault type, which is not enough for data training. To obtain more fault signals to make the classification results more generalized, each bearing signal was expanded to 400 samples, as shown in
Figure 6. The first
time domain points of each bearing signal are intercepted from each bearing signal, thus forming a new signal indicated as
. Next, the 257th to 768th time domain points of the bearing signal are intercepted to form a new signal indicated as
. More new signals can be further obtained using the same method. In this work, each bearing signal was expanded to 400 new signals. Moreover, to reduce the amount of computation and save time cost, the parameter set optimized by
was used for other expanded new signals.
The details of the datasets are shown in
Table 2. The datasets contain the signal of the 10 different healthy condition categories under four different loads of 1, 2, 3 and 4 hp, which are represented as datasets A, B, C and D, respectively. Each bearing signal was expanded to 400 samples, among which 320 samples were training samples and 80 samples were testing samples.
4.2.2. Optimization Results
In this subsection, the samples of dataset A are taken as examples to be processed by adaptive optimization of SR. As we mentioned at the beginning of
Section 4, the characteristic frequencies of the faulty bearings should be estimated first to calculate the
. When the system rotated at a constant speed, the characteristic frequency of the bearings can be calculated according to
Table 1.
In the optimization processes, the scale transformation parameter was set as
, thus, the frequency resolution
at sampling frequency of 12,000 Hz can be obtained as
Hz. The optimization ranges of other adjustable parameters are pre-set, as shown in
Table 3. Note that the characteristic frequency of the normal bearing signals cannot be estimated. As a result, the parameters of the Duffing system with normal bearing signals input cannot be optimized, whereas the normal bearing signals are also processed by the Duffing system with a group of fixed parameters (
,
,
,
) to maintain the uniformity compared to other
fault bearing signals.
Hence, the
of the Duffing system with different input signals can be optimized according to the HOA. For example, the values of the optimized
for
of the faulty signals of dataset A against the noise intensity of the injected noise are shown in
Figure 7. It is necessary to mention that the
curves are not drawn because signal numbers are too small to find precise feature frequency, but the
can be obtained by using Equation (11), which are less than −30 dB.
Figure 7 shows that all the
are more than −20 dB. It can be obtained that the optimized
has significant enhancements compared to the
, demonstrating the feasibly and effectiveness of the HOA in enhancing the weak features of practical signals.
4.2.3. Accuracies of Classification
The classification accuracies of the bearing signals are presented in this subsection. It is noted that the batch size has a significant influence on the coverage speed and classification result. In this work, the batch size was set as 32 to obtain the highest classification accuracy with a relatively high convergence speed. Through simulations, an accuracy of 100% can be obtained in
Table 4 for the 10 categories of all datasets presented in
Table 2, which is higher than that obtained from other classification methods including SVM, Multilayer Perceptron (MLP), and DNN [
47], showing that the proposed method has a good performance in fault classification and feature extraction. Moreover,
Figure 8 shows the confusion matrix of dataset A, which clearly shows that each label was classified well.
However, an accuracy of 100% is difficult to achieve in practical engineering as the number of fault categories is far more than 10. To study the classification performance of the proposed method for more fault categories, the datasets A, B, C and D presented in
Table 2 were combined in pairs to increase the health conditions to 20. The new datasets, which include 20 categories of bearing signals, were further processed by the proposed method, and the classification accuracies of the testing data are shown in
Table 5. An accuracy of more than 96.9% was achieved. As comparisons, the raw signals, which were not processed by the multiparameter adjusting Duffing system before classification, were also processed using the proposed CNN and the traditional CNN; the optimal signals were also processed using the traditional CNN. The classification accuracies of the testing data are shown in
Table 5. One can see that the classification accuracies are enhanced by either pre-processing the raw signals or using our CNN architecture, and the classifications accuracies achieved the highest when both methods were adopted. Therefore, both the optimization method and our CNN architecture play important roles in enhancing classification accuracies, showing that the proposed method has the capability to realize high-performance fault classification and feature extraction results better than conventional methods. It is noted that as the accuracy is different for every training and inference, each classification accuracy presented in
Table 5 was obtained by taking the average of the accuracy results of five simulations. To observe which healthy conditions were difficult to classify,
Figure 9 shows the confusion matrix of datasets A and B using optimal signals with our CNN architecture. Only two normal signals under different loads had misclassification, which means that two normal signals contain similar information and features and it was difficult to classify them correctly.
4.2.4. Influence of the Population Number on Classification Accuracies and Calculation Time
In addition to the classification accuracies, the calculation time is another important index to evaluate the performance of a classification method. Both indexes are affected by the population number, which is studied in the subsection.
The datasets shown in
Table 4 were re-processed using the proposed method with different population numbers. The accuracies and calculation time against the population number are shown in
Figure 10. It can be seen from
Figure 10a that when the population number increased from 1 to 50, the classification accuracies had slight increases of less than 1%, meaning that the population number had a relatively low influence on the classification accuracies.
Figure 10b shows that increasing the population number significantly increases the calculation time. When the population number increased from 1 to 50, the cost time for the optimization process increased from 30 s to 6500 s. Therefore, it is possible to get an acceptable high-accuracy classification result in a short time using the proposed method.
4.2.5. Visualizations of Feature Maps and Networks
Generally, CNN is an efficient tool to extract features, but it is hard to understand how CNN processes grey images. In this subsection, the feature maps and networks are plotted for a better understanding of the powerful feature extraction and classification capacities of CNN.
For dataset A,
Figure 11 shows the feature distributions of some representative layers in the CNN visualized by the t-distributed stochastic neighbor embedding (t-SNE) [
48]. The features of the input signals of the CNN, which are the output signals of the Duffing system, are not obvious. The features are continuedly separated by each convolutional layer, and the classification result of each fault type becomes obvious after the fourth convolutional layer. Moreover, the features in the fully connected layer are easier to be divided and an accuracy of 100% can be obtained.