Trend Feature Consistency Guided Deep Learning Method for Minor Fault Diagnosis

Jia, Pengpeng; Wang, Chaoge; Zhou, Funa; Hu, Xiong

doi:10.3390/e25020242

Open AccessArticle

Trend Feature Consistency Guided Deep Learning Method for Minor Fault Diagnosis

by

Pengpeng Jia

,

Chaoge Wang

,

Funa Zhou

^* and

Xiong Hu

School of Logistic Engineering, Shanghai Maritime University, Shanghai 201306, China

^*

Author to whom correspondence should be addressed.

Entropy 2023, 25(2), 242; https://doi.org/10.3390/e25020242

Submission received: 14 December 2022 / Revised: 13 January 2023 / Accepted: 16 January 2023 / Published: 28 January 2023

(This article belongs to the Special Issue Fault Diagnosis Methods Based on Information Theory or Machine Learning: From Theory to Application)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Deep learning can be applied in the field of fault diagnosis without an accurate mechanism model. However, the accurate diagnosis of minor faults using deep learning is limited by the training sample size. In the case that only a small number of noise-polluted samples is available, it is crucial to design a new learning mechanism for the training of deep neural networks to make it more powerful in feature representation. The new learning mechanism for deep neural networks model is accomplished by designing a new loss function such that accurate feature representation driven by consistency of trend features and accurate fault classification driven by consistency of fault direction both can be secured. In such a way, a more robust and more reliable fault diagnosis model using deep neural networks can be established to effectively discriminate those faults with equal or similar membership values of fault classifiers, which is unavailable for traditional methods. Validation for gearbox fault diagnosis shows that 100 training samples polluted with strong noise are adequate for the proposed method to successfully train deep neural networks to achieve satisfactory fault diagnosis accuracy, while more than 1500 training samples are required for traditional methods to achieve comparative fault diagnosis accuracy.

Keywords:

deep learning; minor fault; trend feature consistency; fault orientation consistency; small sample size

1. Introduction

As a key component of the motor drive system for electric vehicles, a healthy state of the gearbox is critical to the safe operation of autonomous ships, since they are prone to suffer from faults due to heavy loads or mechanical deterioration [1,2,3]. Therefore, fault diagnosis for critical components of the motor drive system is vitally important [4]. However, early minor faults are a challenging diagnostic task, because the fault features are weak and easily submerged by strong noise interference, which makes them difficult to extract and identify [5].

In general, the existing minor fault diagnosis methods can be classified into three categories: physical model-based methods, expert knowledge-based methods, and data-driven methods [6,7]. However, it is difficult to establish accurate physical models for complex systems, which limits the application of physical model-based methods in the engineering field [8]. On the other hand, the expert knowledge-based methods require unique expertise in specific areas, which have limited their generalization [9]. Data-driven methods have received wide attention from engineers since only monitoring data of the operation status are required [10]. As a data-driven approach, deep learning has a good effect on the fault feature extraction of monitoring data. Compared with shallow learning methods, deep learning has the ability to approximate complex functions by means of layer-by-layer feature extraction [11]. According to the difference in network structure, deep learning methods can be classified into four classes: deep belief network (DBN), convolutional neural network (CNN), recurrent neural network (RNN), and stacked auto-encoder (SAE) [12,13,14,15]. Since a 1D vibration signal can be easily collected, the deep neural network (DNN) constructed by SAE is preferred in gearbox fault diagnosis based on deep learning. However, the features of minor faults are generally very weak, and they can be easily buried in strong environmental noise and high-order harmonic components.

The comprehensive extraction capability of minor fault features is affected by strong noise or the limited feature extraction ability of traditional DNNs. The existing DNN-based minor fault diagnosis methods can be classified into three classes: DNN-based minor fault diagnosis method using preprocessing for denoising, DNN-based fault diagnosis method using post-processing for fusion, and fault diagnosis methods using DNN with more powerful feature extraction.

On the aspect of the preprocessing of DNNs, Chen et al. [16] performed Fast Fourier Transform (FFT) to obtain the frequency spectrum of fault signals, and then the frequency spectrum was fed to SAE, which can effectively diagnose faults with weak symptoms in the time domain but significant symptoms in the frequency domain. However, FFT is a linear transform, so it cannot detect minor faults with nonlinear and non-stationary characteristics. Li et al. [17] used Variational Mode Decomposition (VMD) as a preprocessing tool to decompose the original vibration signal with noise into different components. Then, the decomposed signal is fed to SAE, such that it can perform well in the fault diagnosis of a planetary gearbox with insignificant wear faults or others. However, the decomposition level of the VMD algorithm cannot be adaptively selected. Tang et al. [18] used complete ensemble empirical mode decomposition with adaptive noise and Fast Fourier Transform as a preprocessing tool to extract time–frequency features; then, the extracted features are fed to SAE to achieve the accurate diagnosis of minor faults polluted by noise. Although signal preprocessing can alleviate the incapability of diagnosing minor faults submerged by strong noise, the minor fault features may be mis-removed by simply preprocessing the monitoring data. Thus, it is faced with the side effects of DNN-based fault diagnosis using preprocessing.

The problems mentioned above can be avoided by utilizing the post-processing methods of decision-level fusion. On the aspect of post-processing, Jin et al. [19] designed a decision fusion mechanism for multiple feature extraction. Multiple optimization criteria are used to train a unique SDAE model, such that different features involved in a unique set of data are extracted from different points of view. Then, different classification results can be obtained by using these features, and a decision mechanism based on multi-response linear regression is developed to fuse these classification results, which effectively improves the accuracy of planetary gearbox minor fault detection. Zhang et al. [20] proposed a decision fusion mechanism to fuse the classification of CNN and SDAE. For the purpose of the accurate diagnosis of minor faults, D-S evidence theory is used to incorporate a CNN’s capability to process 2D images and SDAE’s capability to process 1D sequences. Although decision-level fusion can effectively solve the problems faced by the preprocessing methods, information loss is serious since the fusion is only performed on the final output of SAE, without taking deep fusion into account.

For this goal, we need to design new networks or learning skills to achieve more powerful DNN feature extraction. Stacked denoising auto-encoder (SDAE) is a commonly used learning tool to alleviate noise effects. On the aspect of powerful feature extraction, Lu et al. [21] used SDAE to decrease noise such that gearbox weak fault features buried by noise can be better recognized. Chen et al. [22] developed an improved SDAE model with moving windows to reconstruct pure operating data from data polluted with different levels of noise. Rashidi [23] proposed a correlative SAE and DNN, which realizes output-related fault diagnosis by building two new constructive and demoting loss functions relatively. Zhu et al. [24] designed a stacked pruning sparse denoising auto-encoder to diagnose minor faults. The developed pruning strategy can eliminate the neurons contributed by pruning the output of SDAE. In such a sense, weak features of minor faults can be better extracted since the non-superior units are pruned out. However, this pruning method may eliminate all information, which is not a satisfactory outcome. Zhou et al. [25] proposed a new DNN structure with a sparse gate to reduce the influence of less contributed neurons, thus achieving minor fault diagnosis with data affected by noise. Although SDAE does well in noise reduction, it is still limited by the poor feature representation capacity without using the idea of information fusion.

The question of how to gain a more comprehensive feature representation using a fusion strategy is significant. Shao et al. [26] developed a fusion mechanism to fuse the features extracted by auto-encoders with different activation functions. The extracted features of two auto-encoders are merged before feeding into the classifier to accurately recognize the minor fault. Kong et al. [27] built several SAEs with different activation functions for feature extraction from different aspects in the first step. They then designed a fusion mechanism with a feature pool to merge the different features, which can be used for minor fault diagnosis. Shi et al. [28] developed a fusion mechanism to fuse the output of a DNN, using the raw data, FFT data, and WPT data as the input of the DNN, respectively. The purpose of fusion is to achieve more comprehensive features from the time domain, frequency domain, and time–frequency domain to recognize minor faults. Zhou et al. [29] developed a deep multi-scale feature fusion mechanism to fuse the features extracted on adjacent layers of a unique SAE. It performs well in minor faults since it can combine features on adjacent layers to compensate for the information missing during the process of layer-by-layer feature extraction. Although Refs. [26,27,28,29] used the same network structure to extract different features by using different fusion mechanisms, a single mode of data is used to fuse features extracted from different aspects, which will inevitably result in feature redundancy. Designing a new learning algorithm to fuse data from different modes is significant in comprehensive feature extraction. Subsequently, Zhou et al. [30] proposed an alternative fusion mechanism to fuse features extracted by SAE to the 1D vibration data and CNN to the 2D image data, respectively. Ravikumar et al. [31] built a multi-scale deep residual learning and stacked LSTM model to achieve gearbox fault diagnosis by fusing features extracted using multiple CNN models and feeding them into a stacked LSTM model. Since this method fully uses heterogeneous data, it can achieve more accurate diagnosis for minor faults.

However, the methods mentioned above cannot effectively extract trend features involved in 1D time series. The fusion of LSTM to trend features and a CNN to static features can be used to solve these problems. Chen et al. [32] proposed an embedded LSTM-CNN auto-encoder to extract trend features that contain both local features and degradation trend information from vibration data. The fused features can be fed into a fault classifier such that minor faults can be well diagnosed. Zhou et al. [33] proposed a fusion network with a new training mechanism to diagnose gearbox minor faults, which is extracted by LSTM to the 1D time series and a CNN to the 2D image data. Although trend features can be extracted by LSTM, it is difficult to train with a large number of network parameters and may suffer from gradient disappearance when dealing with long time series.

This paper focuses on solving the problem encountered by traditional DNNs to diagnose minor faults using a small number of training samples polluted with strong noise. A new training mechanism is established by designing a new loss function where trend feature consistency and fault orientation consistency are both taken into account, such that similar minor faults can be well discriminated. The contributions of this paper are as follows:

1. This paper establishes a new training mechanism for a DNN-based fault diagnosis model to make it more powerful for trend feature representation and the discrimination of similar faults that have equal or similar values of membership.

2. A new loss function for layer-by-by pre-training is designed by taking the consistency of auto-correlation and cross-correlation into account to characterize the trend features involved in the fault signal, such that the extracted features can be more accurate. The loss function considering the consistency of fault orientation is also designed for the backpropagation adjustment of the model to ensure its capability to discriminate similar faults.

3. In the engineering field, when there is only a small number of fault samples polluted by strong noise available, the proposed method is of much significance, since it can provide a more advanced learning mechanism for DNNs to accurately extract the potential weak features of minor faults. Thus, it provides an effective means to secure the accuracy of minor fault diagnosis.

The structure of this paper is as follows. In Section 2, the relevant basic theories are briefly introduced. Section 3 details the specific improvement measures of the proposed method. Section 4 validates the effectiveness and superiority of the proposed method on gearbox and bearing datasets and compared with other mainstream diagnostic techniques. Finally, the conclusions are summarized in Section 5.

2. Related Theories

2.1. Deep Neural Network Based on Stacked Auto-Encoder

A DNN [34] is built by stacking multiple AEs, and the structure is as shown in Figure 1. An auto-encoder mainly reconstructs the input of the network through the encoding and decoding network. The training process of a DNN is divided into two parts; the first is bottom-up layer-by-layer unsupervised pre-training to obtain a valid feature representation of the original input training data, and this is followed by the top-down supervised global fine-tuning of the training network parameters [35,36].

The corresponding formulas of a forward propagation auto-encoder are as follows [37]:

\hat{x} = σ_{d} (W^{(2)} \cdot σ_{e} (W^{(1)} x + b^{(1)})) + b^{(2)}

(1)

where

x = {x_{1}, x_{2}, \dots, x_{m}}

denotes the training data; for each sample

x_{i} = {[x_{i 1}, x_{i 2}, \dots, x_{i N}]}^{T}

, the encoder calculates the input data

x

to obtain the encoding, and the decoder obtains the reconstruction result

{\hat{x}}_{i}

of the input data by performing decoding on encoding data.

\hat{x} = {{\hat{x}}_{1}, {\hat{x}}_{2}, \dots, {\hat{x}}_{m}}

is the reconstruction result of the AE, for each reconstruction result

{\hat{x}}_{i} = {[{\hat{x}}_{i 1}, {\hat{x}}_{i 2}, \dots, {\hat{x}}_{i N}]}^{T}

, and the traditional loss function of feature learning based on amplitude is as follows:

l o s s_{M S E} = \frac{1}{m} \sum_{i = 1}^{m} \frac{1}{2} {‖x_{i} - {\hat{x}}_{i}‖}^{2}

(2)

where

m

is the number of samples. For a traditional auto-encoder, the reverse derivation of the network model parameters through the chain rule is as shown in Equations (3)–(6).

\frac{\partial l o s s_{M S E}}{\partial W^{(2)}} = \sum_{i = 1}^{m} (x_{i} - {\hat{x}}_{i}) \cdot σ^{'} (Z_{d}) \cdot σ_{e} (W^{(1)} x + b^{(1)})

(3)

W^{(2)} = W^{(2)} - \frac{\partial l o s s_{M S E}}{\partial W^{(2)}}

(4)

\frac{\partial l o s s_{M S E}}{\partial b^{(2)}} = \sum_{i = 1}^{m} (x_{i} - {\hat{x}}_{i}) \cdot σ^{'} (Z_{d})

(5)

b^{(2)} = b^{(2)} - \frac{\partial l o s s_{M S E}}{\partial b^{(2)}}

(6)

2.2. Cross-Correlation Coefficient

In signal processing, the cross-correlation coefficient is a measure that describes the degree of correlation between signals and thus enables the identification, detection, and extraction of signals [38], which is defined as shown in Equation (7).

ρ_{x y} = \frac{\sum_{i = 1}^{n} x_{i} y_{i}}{\sqrt{\sum_{i = 1}^{n} x_{i}^{2} \sum_{i = 1}^{n} y_{i}^{2}}}

(7)

where

x_{i}

and

y_{i}

are discrete signals, whose length is

m

. From the Cauchy–Schwarz inequality,

|ρ_{x y}| \leq 1

, only if

x_{i} = y_{i}

,

ρ_{x y} = 1

, and the correlation between

x_{i}

and

y_{i}

is the maximum. When

x_{i}

and

y_{i}

are correlated, the

|ρ_{x y}|

ranges from 0 to 1. On the contrary, when

x_{i}

is completely independent of

y_{i}

,

ρ_{x y} = 0

. The cross-correlation of samples is defined as shown in Equation (8).

ρ_{x y} = \frac{\sum_{j = 1}^{m} [\sum_{i = 1}^{n} x_{i j} \cdot y_{i j}]}{\sqrt{\sum_{j = 1}^{m} \sum_{i = 1}^{n} x_{i j}^{2} \sum_{j = 1}^{m} \sum_{i = 1}^{n} y_{i j}^{2}}}

(8)

n

is the dimension of the sample, and

m

is the number of samples.

2.3. Auto-Correlation Coefficient

The auto-correlation coefficient of a random signal reflects the degree of correlation of the signal itself at different times [39]. The auto-correlation coefficient is defined as shown in Equation (9).

γ_{h} = \frac{cov (x_{i}, x_{i - h})}{\sqrt{D (x_{i}) D (x_{i - h})}}, h = 1, 2, \dots

(9)

where

h

is the order of the auto-correlation coefficient and

D (x_{i})

is the variance of

x

. The set of

\{γ_{h}\}

auto-correlation coefficients is called the auto-correlation function. In probability and statistical parameter estimation, for the overall sample

\{x_{t}\}

, the variance and covariance all contain unknown parameters. Therefore, it is necessary to rely on the sample auto-correlation coefficient. For a given set of samples

x = \{x_{1}, x_{2}, \dots, x_{m}\}

, the sample auto-correlation coefficient with interval

h

is defined as shown in Equation (10).

γ_{h} = \sum_{i = 1}^{n} \frac{\sum_{j = h + 1}^{m} (x_{j i} - \bar{x_{i}}) (x_{(j - h) i} - \bar{x_{i}})}{\sum_{j = 1}^{m} {(x_{j i} - \bar{x_{i}})}^{2}}, 0 \leq h \leq m - 1

(10)

where

\bar{x}

is the mean of

x

.

n

is the dimension of the sample, and

m

is the number of samples.

2.4. Cosine Distance

In geometry, the angle cosine is used to measure the difference between the directions of two vectors. In machine learning, this concept is used to measure differences between sample vectors. Compared with distance measurement, the cosine distance pays more attention to the difference in direction between two vectors, rather than the difference in distance or length [40]. The cosine distance in m-dimensional space is

\cos ({\vec{y}}_{p r e}, {\vec{y}}_{t r u e}) = \frac{\sum_{i = 1}^{C} {\vec{y}}_{p r e_{i}} {\vec{y}}_{r e a l_{i}}}{\sqrt{\sum_{i = 1}^{C} {\vec{y}}_{p r e_{i}}^{2}} \sqrt{\sum_{i = 1}^{C} {\vec{y}}_{r e a l_{i}}^{2}}}

(11)

where

{\vec{y}}_{p r e}

is the prediction label, and

{\vec{y}}_{r e a l}

is the real label. This cosine value can be used to represent the similarity of the two vectors. The smaller the included angle, the closer to 0 degrees, the closer the cosine value is to 1, the more consistent their directions are, and the more similar they are. When the directions of the two vectors are completely opposite, the cosine of the included angle takes the minimum value of −1. When the cosine value is 0, the two vectors are orthogonal and the included angle is 90°. Therefore, it can be seen that the cosine similarity is not related to the magnitude of the vector, but only to the direction of the vector, where the vectors

{\vec{y}}_{p r e}

and

{\vec{y}}_{t r a i n}

are one-dimensional vectors. Therefore, the cosine distance in the one-dimensional vector form is generalized to the two-dimensional matrix form, as follows:

\cos (y_{p r e}, y_{t r u e}) = \sum_{j = 1}^{n} \frac{\sum_{i = 1}^{C} y_{p r e_{i j}} \cdot y_{r e a l_{i j}}}{\sqrt{\sum_{i = 1}^{C} y^{2}_{p r e_{i j}}} \sqrt{\sum_{i = 1}^{C} y^{2}_{r e a l_{i j}}}}

(12)

where

y_{p r e}

and

y_{r e a l}

are

R^{C \times n}

, and

C

is the total number of labels.

n

is the sample size.

3. Minor Fault Diagnosis Method Using DNN Guided by Consistency of Trend Features and Consistency of Fault Orientation

3.1. Incapability of DNN to Extract Separable Features of Similar Faults

The basic idea of the traditional AE learning process is to ensure that the features obtained by the coding network can reconstruct the input data of the encoder. Its optimization criterion is to construct the loss function only by the amplitude consistency of the signal, but there may still be trend differences between the input and output, which leads to the loss of the trend features of the faulty signal during the representation. However, the lost trend information is key to minor fault diagnosis. Figure 2 shows that a minor fault is difficult to diagnose because the fault feature extracted by using the current learning mechanism is similar to other features.

In Figure 2, the solid black line represents a normal signal F₀. The cyan dashed line represents the first fault signal F₁, which is generated by subtracting a slowly drifting signal from F₀. The magenta dotted line represents the second fault signal F₂, which is constructed by subtracting the slowly drifting signal for an odd sample time of F₀ and adding the slowly drifting signal at an even sample time of F₀. The dash-dotted red line represents the third fault signal F₃, which is generated by adding the slowly drifting signal at an odd sample time of F₀ and subtracting the slowly drifting signal at an even sample time of F₀. It can be easily seen that the loss defined only by amplitude consistency is equal in

l o s s_{M S E} (F_{0}, F_{1}), l o s s_{M S E} (F_{0}, F_{2})

, and

l o s s_{M S E} (F_{0}, F_{3})

, which means that it is difficult to discriminate the three similar faults mentioned above. On the other hand, Figure 2 shows that the trend features of

F_{0}

,

F_{2}

, and

F_{3}

are completely different, which means that they are three different types of faults. Thus, the question of how to design a new training mechanism to extract potential trend features is crucial to minor fault diagnosis.

If the features of the three faults mentioned above are not well represented, it will lead to fault classification errors due to the proximity of the first and second memberships in the fault classification. Figure 3 shows that it is possible to misclassify similar faults if the method is incapable of obtaining an adequate difference between the first and the second membership. In Figure 3, the red dots indicate the first membership and the green dots indicate the second membership.

3.2. Minor Fault Diagnosis Using DNN Guided by Consistency of Trend Features and Consistency of Fault Orientation

3.2.1. Trend Feature-Guided Training of a Unique AE

This section focuses on developing a new training mechanism by designing a new loss function to extract more accurate features by the AE. Rather than defining the loss function related to amplitude consistency, we choose to design a new loss function taking both amplitude consistency and trend feature consistency into account, such that additional trend features that are helpful for minor fault diagnosis can be well extracted. As shown in Figure 4, the auto-correlation for the input signal of the AE and the cross-correlation between the input and output of the AE (CASAE) are used to characterize the trend feature.

The main idea is to minimize the difference between the input and output of the AE by using the loss function designed in Equation (13).

\begin{matrix} l o s s (W_{e n}, b_{e n}, W_{d e}, b_{d e}) & = {l o s s}_{M S E} + {l o s s}_{c_c} + l o s s_{a_c} \\ = \frac{1}{2 m} \sum_{i = 1}^{m} {‖x_{i} - {\hat{x}}_{i}‖}^{2} + \frac{1}{m} {\sum_{i = 1}^{m} ‖1 - ρ (x_{i}, {\hat{x}}_{i})‖}^{2} + \frac{1}{2 l a g e s} \sum_{h = 1}^{l a g s} {‖γ_{h} (x (t), x (t - h)) - γ_{h} (\hat{x} (t), \hat{x} (t - h))‖}^{2} \end{matrix}

(13)

where

W_{e n}, b_{e n}

are the weight and bias matrix of the encoder, respectively.

W_{d e}, b_{d e}

are the weight and bias matrix of the decoder, respectively.

x_{i}

is the input data of the AE, and

{\hat{x}}_{i}

is the reconstructed result of the AE. The cross-correlation coefficient is used to capture the trend feature reconstruction error index

ρ (x, \hat{x}) = \frac{c o v (x, \hat{x})}{s (x) s (\hat{x})}

, where

c o v (x, \hat{x})

is the covariance matrix of x and

\hat{x}

, and

s (x)

is the standard deviation of

x

. Another reconstruction error index

γ_{h} (x (t), x (t - h)) = \frac{(x_{i} - \bar{x}) (x_{i - h} - \bar{x})}{s^{2} (x)}

,

s^{2} (x)

is the variance of

x

, and

h

is the order of the auto-correlation coefficient, which is also used to extract the trend feature.

l o s s_{c_c}

aims to extract the trend feature of the correlation between the input and output of the AE, which can be defined as in Equation (14).

\begin{matrix} l o s s_{c_c} & = \frac{1}{m} {\sum_{i = 1}^{m} ‖1 - ρ (x_{i}, {\hat{x}}_{i})‖}^{2} \\ = \frac{1}{m} {\sum_{i = 1}^{m} ‖1 - \frac{\frac{1}{m} \sum_{i = 1}^{m} (x_{i} - \frac{1}{m} \sum_{i = 1}^{m} x_{i}) ({\hat{x}}_{i} - \frac{1}{m} \sum_{i = 1}^{m} {\hat{x}}_{i})}{\sqrt{\frac{1}{m} \sum_{i = 1}^{m} {(x_{i} - \frac{1}{m} \sum_{i = 1}^{m} x_{i})}^{2}} \sqrt{\frac{1}{m} \sum_{i = 1}^{m} {({\hat{x}}_{i} - \frac{1}{m} \sum_{i = 1}^{m} {\hat{x}}_{i})}^{2}}}‖}^{2} \end{matrix}

(14)

From Equation (14), the trend consistency for the input and output of data is characterized by cross-correlation. The optimization criterion for minimizing the reconstruction error in the feature process can be achieved when the trend features for the input and output of the AE are consistent.

l o s s_{a_c}

aims to extract the trend distribution features for the input and output of AE data and can be defined as in Equation (15).

\begin{matrix} l o s s_{a_c} & = \frac{1}{2 l a g e s} \sum_{h = 1}^{l a g s} {‖γ_{h} (x (t), x (t - h)) - γ_{h} (\hat{x} (t), \hat{x} (t - h))‖}^{2} \\ = \frac{1}{2 l a g e s} {\sum_{j = 1}^{N} ‖\sum_{h = 1}^{l a g s} (\frac{\sum_{i = h + 1}^{m} (x_{i j} - {\bar{x}}_{i j}) (x_{(i - h) j} - {\bar{x}}_{i j})}{\sum_{i = 1}^{m} {(x_{i j} - {\bar{x}}_{i j})}^{2}} - \frac{\sum_{i = h + 1}^{m} ({\hat{x}}_{i j} - {\bar{\hat{x}}}_{i j}) ({\hat{x}}_{(i - h) j} - {\bar{\hat{x}}}_{i j})}{\sum_{i = 1}^{m} {({\hat{x}}_{i j} - {\bar{\hat{x}}}_{i j})}^{2}})‖}^{2} \end{matrix}

(15)

From Equation (15), if the input and output data of the AE are identical, then their cross-correlation and auto-correlation are equal as well. In the case that the amplitude difference is consistent,

l o s s_{a_c}

in Equation (13) can well capture the difference in trend features for the input and output of the AE.

The detailed backpropagation algorithm corresponding to the new loss function defined in Equations (16)–(19) is as follows:

W_{d e} = W_{d e} - l r \times (\frac{\partial l o s s_{M S E}}{\partial W_{d e}} + \frac{\partial l o s s_{c_c}}{\partial W_{d e}} + \frac{\partial l o s s_{a_c}}{\partial W_{d e}})

(16)

b_{d e} = b_{d e} - l r \times (\frac{\partial l o s s_{M S E}}{\partial b_{d e}} + \frac{\partial l o s s_{c_c}}{\partial b_{d e}} + \frac{\partial l o s s_{a_c}}{\partial b_{d e}})

(17)

W_{e n} = W_{e n} - l r \times (\frac{\partial l o s s_{M S E}}{\partial W_{e n}} + \frac{\partial l o s s_{c_c}}{\partial W_{e n}} + \frac{\partial l o s s_{a_c}}{\partial W_{e n}})

(18)

b_{e n} = b_{e n} - l r \times (\frac{\partial l o s s_{M S E}}{\partial b_{e n}} + \frac{\partial l o s s_{c_c}}{\partial b_{e n}} + \frac{\partial l o s s_{a_c}}{\partial b_{e n}})

(19)

where

l r

is the learning rate;

\frac{\partial l o s s_{M S E}}{\partial *}

,

\frac{\partial l o s s_{c_c}}{\partial *}

, and

\frac{\partial l o s s_{a_c}}{\partial *}

are, respectively, the loss functions constructed as mentioned above.

*

corresponds to each parameter of the AE. The gradient for

l o s s_{M S E}

is calculated as follows:

\frac{\partial l o s s_{M S E}}{\partial W_{d e}} = \frac{1}{m} \sum_{i = 1}^{m} (x_{i} - {\hat{x}}_{i}) \cdot {σ^{'}}_{d e} (z_{d e}) \cdot σ_{e n} (z_{e n})

(20)

\frac{\partial l o s s_{M S E}}{\partial b_{d e}} = \frac{1}{m} \sum_{i = 1}^{m} (x_{i} - {\hat{x}}_{i}) \cdot {σ^{'}}_{d e} (z_{d e})

(21)

\frac{\partial l o s s_{M S E}}{\partial W_{e n}} = \frac{1}{m} \sum_{i = 1}^{m} (x_{i} - {\hat{x}}_{i}) \cdot {σ^{'}}_{d e} (z_{d e}) \cdot W_{d e} \cdot {σ^{'}}_{e n} (z_{e n}) \cdot x

(22)

\frac{\partial l o s s_{M S E}}{\partial W_{e n}} = \frac{1}{m} \sum_{i = 1}^{m} (x_{i} - {\hat{x}}_{i}) \cdot {σ^{'}}_{d e} (z_{d e}) \cdot W_{d e} \cdot {σ^{'}}_{e n} (z_{e n})

(23)

The gradient for

l o s s_{c_c}

is calculated as follows:

\frac{\partial l o s s_{c_c}}{\partial W_{d e}} = \frac{1}{m} \sum_{i = 1}^{m} (1 - \frac{\frac{1}{m} \sum_{i = 1}^{m} (x_{i} - \bar{x}) ({\hat{x}}_{i} - \frac{1}{m} \sum_{i = 1}^{m} {\hat{x}}_{i})}{\sqrt{\frac{1}{m^{2}} \sum_{i = 1}^{m} {(x_{i} - \bar{x})}^{2} \sum_{i = 1}^{m} {({\hat{x}}_{i} - \frac{1}{m} \sum_{i = 1}^{m} {\hat{x}}_{i})}^{2}}}) {(- \frac{\frac{1}{m} \sum_{i = 1}^{m} (x_{i} - \bar{x}) ({\hat{x}}_{i} - \frac{1}{m} \sum_{i = 1}^{m} {\hat{x}}_{i})}{\sqrt{\frac{1}{m^{2}} \sum_{i = 1}^{m} {(x_{i} - \bar{x})}^{2} \sum_{i = 1}^{m} {({\hat{x}}_{i} - \frac{1}{m} \sum_{i = 1}^{m} {\hat{x}}_{i})}^{2}}})}^{'} {σ^{'}}_{d e} (z_{d e}) σ_{e n} (z_{e n})

(24)

\frac{\partial l o s s_{c_c}}{\partial b_{d e}} = \frac{1}{m} \sum_{i = 1}^{m} (1 - \frac{\frac{1}{m} \sum_{i = 1}^{m} (x_{i} - \bar{x}) ({\hat{x}}_{i} - \frac{1}{m} \sum_{i = 1}^{m} {\hat{x}}_{i})}{\sqrt{\frac{1}{m^{2}} \sum_{i = 1}^{m} {(x_{i} - \bar{x})}^{2} \sum_{i = 1}^{m} {({\hat{x}}_{i} - \frac{1}{m} \sum_{i = 1}^{m} {\hat{x}}_{i})}^{2}}}) (- \frac{\frac{1}{m} \sum_{i = 1}^{m} (x_{i} - \bar{x}) ({\hat{x}}_{i} - \frac{1}{m} \sum_{i = 1}^{m} {\hat{x}}_{i})}{\sqrt{\frac{1}{m^{2}} \sum_{i = 1}^{m} {(x_{i} - \bar{x})}^{2} \sum_{i = 1}^{m} {({\hat{x}}_{i} - \frac{1}{m} \sum_{i = 1}^{m} {\hat{x}}_{i})}^{2}}})^{'} {σ^{'}}_{d e} (Z_{d e})

(25)

\frac{\partial l o s s_{c_c}}{\partial W_{e n}} = \frac{1}{m} \sum_{i = 1}^{m} (1 - \frac{\frac{1}{m} \sum_{i = 1}^{m} (x_{i} - \bar{x}) \cdot ({\hat{x}}_{i} - \frac{1}{m} \sum_{i = 1}^{m} {\hat{x}}_{i})}{\sqrt{\frac{1}{m^{2}} \sum_{i = 1}^{m} {(x_{i} - \bar{x})}^{2} \sum_{i = 1}^{m} {({\hat{x}}_{i} - \frac{1}{m} \sum_{i = 1}^{m} {\hat{x}}_{i})}^{2}}}) \cdot (- \frac{\frac{1}{m} \sum_{i = 1}^{m} (x_{i} - \bar{x}) \cdot ({\hat{x}}_{i} - \frac{1}{m} \sum_{i = 1}^{m} {\hat{x}}_{i})}{\sqrt{\frac{1}{m^{2}} \sum_{i = 1}^{m} {(x_{i} - \bar{x})}^{2} \sum_{i = 1}^{m} {({\hat{x}}_{i} - \frac{1}{m} \sum_{i = 1}^{m} {\hat{x}}_{i})}^{2}}})^{'} \cdot {σ^{'}}_{d e} (z_{d e}) \cdot W_{d e} \cdot {σ^{'}}_{e n} (z_{e n}) \cdot x

(26)

\frac{\partial l o s s_{c_c}}{\partial b_{e n}} = \frac{1}{m} \sum_{i = 1}^{m} (1 - \frac{\frac{1}{m} \sum_{i = 1}^{m} (x_{i} - \bar{x}) ({\hat{x}}_{i} - \frac{1}{m} \sum_{i = 1}^{m} {\hat{x}}_{i})}{\sqrt{\frac{1}{m^{2}} \sum_{i = 1}^{m} {(x_{i} - \bar{x})}^{2} \sum_{i = 1}^{m} {({\hat{x}}_{i} - \frac{1}{m} \sum_{i = 1}^{m} {\hat{x}}_{i})}^{2}}}) (- \frac{\frac{1}{m} \sum_{i = 1}^{m} (x_{i} - \bar{x}) ({\hat{x}}_{i} - \frac{1}{m} \sum_{i = 1}^{m} {\hat{x}}_{i})}{\sqrt{\frac{1}{m^{2}} \sum_{i = 1}^{m} {(x_{i} - \bar{x})}^{2} \sum_{i = 1}^{m} {({\hat{x}}_{i} - \frac{1}{m} \sum_{i = 1}^{m} {\hat{x}}_{i})}^{2}}})^{'} {σ^{'}}_{d e} (z_{d e}) W_{d e} {σ^{'}}_{e n} (z_{e n})

(27)

The gradient for

l o s s_{a_c}

is calculated as follows:

\frac{\partial l o s s_{a_c}}{\partial W_{d e}} = \frac{1}{l a g s} \sum_{j = 1}^{N} [\sum_{h = 1}^{l a g s} (\frac{\sum_{i = h + 1}^{m} (x_{i j} - {\bar{x}}_{i j}) (x_{(i - h) j} - {\bar{x}}_{i j})}{\sum_{i = 1}^{m} {(x_{i j} - {\bar{x}}_{i j})}^{2}} - \frac{\sum_{i = h + 1}^{m} ({\hat{x}}_{i j} - {\bar{\hat{x}}}_{i j}) ({\hat{x}}_{(i - h) j} - {\bar{\hat{x}}}_{i j})}{\sum_{i = 1}^{m} {({\hat{x}}_{i j} - {\bar{\hat{x}}}_{i j})}^{2}})] (- \frac{\sum_{i = h + 1}^{m} ({\hat{x}}_{i j} - {\bar{\hat{x}}}_{i j}) ({\hat{x}}_{(i - h) j} - {\bar{\hat{x}}}_{i j})}{\sum_{i = 1}^{m} {({\hat{x}}_{i j} - {\bar{\hat{x}}}_{i j})}^{2}})^{'} {σ^{'}}_{d e} (z_{d e}) σ_{e n} (z_{e n})

(28)

\frac{\partial l o s s_{a_c}}{\partial b_{d e}} = \frac{1}{l a g s} \sum_{j = 1}^{N} [\sum_{h = 1}^{l a g s} (\frac{\sum_{i = h + 1}^{m} (x_{i j} - {\bar{x}}_{i j}) (x_{(i - h) j} - {\bar{x}}_{i j})}{\sum_{i = 1}^{m} {(x_{i j} - {\bar{x}}_{i j})}^{2}} - \frac{\sum_{i = h + 1}^{m} ({\hat{x}}_{i j} - {\bar{\hat{x}}}_{i j}) ({\hat{x}}_{(i - h) j} - {\bar{\hat{x}}}_{i j})}{\sum_{i = 1}^{m} {({\hat{x}}_{i j} - {\bar{\hat{x}}}_{i j})}^{2}})] (- \frac{\sum_{i = h + 1}^{m} ({\hat{x}}_{i j} - {\bar{\hat{x}}}_{i j}) ({\hat{x}}_{(i - h) j} - {\bar{\hat{x}}}_{i j})}{\sum_{i = 1}^{m} {({\hat{x}}_{i j} - {\bar{\hat{x}}}_{i j})}^{2}})^{'} {σ^{'}}_{d e} (Z_{d e})

(29)

\frac{\partial l o s s_{a_c}}{\partial W_{e n}} = \frac{1}{l a g s} \sum_{j = 1}^{N} [\sum_{h = 1}^{l a g s} (\frac{\sum_{i = h + 1}^{m} (x_{i j} - {\bar{x}}_{i j}) (x_{(i - h) j} - {\bar{x}}_{i j})}{\sum_{i = 1}^{m} {(x_{i j} - {\bar{x}}_{i j})}^{2}} - \frac{\sum_{i = h + 1}^{m} ({\hat{x}}_{i j} - {\bar{\hat{x}}}_{i j}) ({\hat{x}}_{(i - h) j} - {\bar{\hat{x}}}_{i j})}{\sum_{i = 1}^{m} {({\hat{x}}_{i j} - {\bar{\hat{x}}}_{i j})}^{2}})] (- \frac{\sum_{i = h + 1}^{m} ({\hat{x}}_{i j} - {\bar{\hat{x}}}_{i j}) ({\hat{x}}_{(i - h) j} - {\bar{\hat{x}}}_{i j})}{\sum_{i = 1}^{m} {({\hat{x}}_{i j} - {\bar{\hat{x}}}_{i j})}^{2}})^{'} {σ^{'}}_{d e} (z_{d e}) W_{d e} {σ^{'}}_{e n} (z_{e n}) x

(30)

\frac{\partial l o s s_{a_c}}{\partial b_{e n}} = \frac{1}{l a g s} \sum_{j = 1}^{N} [\sum_{h = 1}^{l a g s} (\frac{\sum_{i = h + 1}^{m} (x_{i j} - {\bar{x}}_{i j}) (x_{(i - h) j} - {\bar{x}}_{i j})}{\sum_{i = 1}^{m} {(x_{i j} - {\bar{x}}_{i j})}^{2}} - \frac{\sum_{i = h + 1}^{m} ({\hat{x}}_{i j} - {\bar{\hat{x}}}_{i j}) ({\hat{x}}_{(i - h) j} - {\bar{\hat{x}}}_{i j})}{\sum_{i = 1}^{m} {({\hat{x}}_{i j} - {\bar{\hat{x}}}_{i j})}^{2}})] (- \frac{\sum_{i = h + 1}^{m} ({\hat{x}}_{i j} - {\bar{\hat{x}}}_{i j}) ({\hat{x}}_{(i - h) j} - {\bar{\hat{x}}}_{i j})}{\sum_{i = 1}^{m} {({\hat{x}}_{i j} - {\bar{\hat{x}}}_{i j})}^{2}})^{'} {σ^{'}}_{d e} (z_{d e}) W_{d e} {σ^{'}}_{e n} (z_{e n})

(31)

where

{σ^{'}}_{d e} (z_{d e}) = 1 - {[σ_{d e} (z_{d e})]}^{2}

,

{σ^{'}}_{e n} (z_{e n}) = 1 - {[σ_{e n} (z_{e n})]}^{2}

denote the tanh activation function. The total gradient of weights and bias can be calculated via Equations (32)–(35):

\frac{\partial l o s s}{\partial W_{d e}} = \frac{\partial l o s s_{M S E}}{\partial W_{d e}} + \frac{\partial l o s s_{c_c}}{\partial W_{d e}} + \frac{\partial l o s s_{a_c}}{\partial W_{d e}}

(32)

\frac{\partial l o s s}{\partial b_{d e}} = \frac{\partial l o s s_{M S E}}{\partial b_{d e}} + \frac{\partial l o s s_{c_c}}{\partial b_{d e}} + \frac{\partial l o s s_{a_c}}{\partial b_{d e}}

(33)

\frac{\partial l o s s}{\partial W_{e n}} = \frac{\partial l o s s_{M S E}}{\partial W_{e n}} + \frac{\partial l o s s_{c_c}}{\partial W_{e n}} + \frac{\partial l o s s_{a_c}}{\partial W_{e n}}

(34)

\frac{\partial l o s s}{\partial b_{e n}} = \frac{\partial l o s s_{M S E}}{\partial b_{e n}} + \frac{\partial l o s s_{c_c}}{\partial b_{e n}} + \frac{\partial l o s s_{a_c}}{\partial b_{e n}}

(35)

3.2.2. Orientation Consistency-Guided Training of SAE in Stage of Backpropagation

The training of SAE involves a supervised global fine-tuning process with an additional classifier on the top layer of the DNN. If similar faults, as shown in Figure 3, occur in the system, the traditional training mechanism is unable to extract separable features for them, as the membership of the classifier is guided by amplitude consistency without considering the consistency of fault orientation.

In order to solve the above-mentioned problem, this section designs a fault orientation consistency-guided training mechanism for SAE. Aiming to discriminate between the similar faults shown in Figure 2, amplitude consistency and fault orientation consistency are used to guide the global adjustment of the backpropagation stage. A schematic diagram of the training mechanism guided by fault orientation consistency for the global adjustment of the DNN (G-CASAE) is shown in Figure 5.

This section focuses on designing a new classification mechanism for SAE, in the event that the fault features are similar, by taking both amplitude consistency and fault orientation consistency into account, to address the problem that the first and second membership of predicted labels are similar. The main idea is to minimize the difference between the predicted labels and the real labels by using the loss function designed in Equation (36).

\begin{matrix} l o s s_{a_o} & = l o s s_{d i s} + l o s s_{o r n} \\ = \frac{1}{m} \sum_{i = 1}^{m} \frac{1}{2} | | y_{p r e_{i}} - y_{r e a l_{i}} | |^{2} + \frac{1}{m} \sum_{i = 1}^{m} \frac{1}{2} | | 1 - \frac{y_{p r e_{i}} \cdot y_{r e a l_{i}}}{|y_{p r e_{i}}| |y_{r e a l_{i}}|} | |^{2} \end{matrix}

(36)

where

y_{p r e}

is the predicted label, and

y_{r e a l}

is the real label.

l o s s_{o r n}

aims to the extract the orientation difference in the predicted labels and the real labels in Equation (37).

l o s s_{o r n} = \frac{1}{m} \sum_{i = 1}^{m} \frac{1}{2} | | 1 - \frac{y_{p r e_{i}} \cdot y_{r e a l_{i}}}{|y_{p r e_{i}}| |y_{r e a l_{i}}|} | |^{2}

(37)

where

\frac{y_{p r e_{i}} \cdot y_{r e a l_{i}}}{|y_{p r e_{i}}| |y_{r e a l_{i}}|}

is used to measure the orientation consistency between the predicted label and the real label in the case in which the amplitude consistency is zero.

3.2.3. Trend Feature Consistency-Driven Deep Learning for Minor Fault Diagnosis

When the online data

x_{o n l i n e} (t)

are collected, they are fed into the trained deep neural network with global fine-tuning (G-CASAE) for feature extraction.

F e a t u r e_{C F} (t)

indicates the features used for classification, which are extracted from the online data by the trained model, as shown in Equation (38).

F e a t u r e_{C F} (t) = G_{C F} (N e t_{G - C A S A E}, T r_{g l o b a l}, x_{o n l i n e} (t))

(38)

where

N e t_{G - C A S A E}

is the network model structure, and

T r_{g l o b a l}

are the trained model parameters. The feature

F e a t u r e_{C F} (t)

is fed into the classifier for fault diagnosis, as shown in Equations (39) and (40).

\begin{matrix} M_{θ, G - C A S A E} (t) & = [\begin{matrix} p (l a b e l = 1) | (F e a t u r e_{C F} (t); θ_{c l a s s i f i e r}) \\ p (l a b e l = 2) | (F e a t u r e_{C F} (t); θ_{c l a s s i f i e r}) \\ ⋮ \\ p (l a b e l = L) | (F e a t u r e_{C F} (t); θ_{c l a s s i f i e r}) \end{matrix}] \\ = \frac{1}{\sum_{l = 1}^{L} θ_{f_{l}}^{T} F e a t u r e_{C F} (t)} [\begin{array}{l} e^{θ_{f_{1}}^{T} F e a t u r e_{C F} (t)} \\ e^{θ_{f_{2}}^{T} F e a t u r e_{C F} (t)} \\ ⋮ \\ e^{θ_{f_{L}}^{T} F e a t u r e_{C F} (t)} \end{array}] \end{matrix}

(39)

l a b e l [x_{o n l i n e} (t)] = \underset{m = 1, 2, \dots, L}{argmax} \{M_{θ_{c l a s s i f i e r}, G - C A S A E} (t) | (x_{o n l i n e} (t); θ_{c l a s s i f i e r})\}

(40)

where

θ_{c l a s s i f i e r}

is the trained parameter of the classifier, and

M_{θ, G - C A S A E} (t)

is the probability that the online sample belongs to each category.

l a b e l [x_{o n l i n e} (t)]

is the fault diagnosis result at time t for the online data. The specific steps of the algorithm are described as follows (Algorithm 1).

Algorithm 1: Gradient Descent Algorithm for Training G-CASAE with Constructive Loss Function

Step 1:: The data acquisition system collects the multivariable monitoring signals of the key components of the rotating machinery.
Step 2:: Prepare the SAE for fault diagnosis.
Step 3:: Set the parameters of SAE, including the number of neurons for each layer, the learning rate, and the maximum generation number or threshold for exit training.
Step 4:: Initialize all parameters of each layer $W, b$ to be learned by backpropagation.
Step 5:: Amplitude consistency and trend consistency are used to design the new SAE loss function for layer-by-layer feature extraction to extract the weak fault trend feature, which aims to enhance the feature extraction capability from the monitoring signals.
Step 6:: The well-learned features are fed into the classification layer, which is classified by the loss function constructed with amplitude distance consistency and orientation consistency.
Step 7:: Save the network model parameters.
Step 8:: Extraction and diagnosis of minor fault features using the trained model.
Step 9:: Output fault diagnosis results.

A flowchart of the feature extraction process of the new learning method is shown in the following Figure 6.

4. Experimental Analysis

Rolling bearings and gearboxes are important components in the motor drive system for autonomous ships. In this section, the effectiveness of the proposed model algorithm is verified using a gearbox dataset and rolling bearing dataset.

4.1. Fault Diagnosis for Parallel Gearbox

4.1.1. Dataset 1: Gearbox Data

The gearbox data obtained from the platform of QPZZ-II rotary machinery vibration are used as the benchmark data to test the efficiency of the proposed method [41]. The experimental platform is shown in Figure 7. It mainly consists of a speed drive motor, bearing, parallel gearbox, governor, and so on. The structural parameters of the gearbox are as follows: the gear module is 2 mm, the number of large gear teeth is 75, and the number of small gear teeth is 55. During the experiment, the drive motor was run at 880 rpm, with a braking torque output current of 0.2 A. The gears were cut on either one or both sides by means of wire cutting to simulate faults of broken teeth and wear faults. The pitting fault was simulated by the EDM technique. Healthy monitoring data of the gearbox were collected for H: normal, F1: wear, F2: pitting, F3: broken tooth, F4: pitting and wear fault, respectively. The layout of the sensors to collect data is shown in Table 1, and the sampling frequency was 5120 Hz. The raw data of the collected monitoring signals from 9 channels were fed directly into the proposed model to identify the health status of the gearbox.

In Experiment 1, under different fault conditions, five sets of gearbox data corresponding to different fault conditions were collected. Table 2 shows the experimental scenarios for Experiment 1.

To verify the superiority of the proposed method, five other minor fault diagnosis methods were compared, as shown in Table 3.

4.1.2. Results and Analysis of Experiment 1

The gearbox data are not subject to any preprocessing and artificial feature extraction, as we aim to use the original data for fault diagnosis. When the network models are constructed, the influences of the neuron numbers in each layer are considered. Firstly, the number of neurons in the input layer depends on the dimension of the input sample. Secondly, the setting of the neuron number in the hidden layer follows the principle of first upgrading the input data to a high-dimensional space, and then compressing the high-dimensional features in turn. Therefore, the designed model structure and the compared models are as shown in Table 4.

Simultaneously, in order to illustrate the superiority of the proposed G-CASAE algorithm, it is compared with five other methods. The diagnostic accuracy of each neural network model with a training sample size of 100 in Experiment 1 is given in Table 5.

The fault diagnosis result with a training sample size of 100 is shown in Table 5. As can be seen from columns 2 to 4 in row 2, when the training samples are small, deep learning can be less effective than shallow learning for fault diagnosis due to overfitting. However, BP is unable to mine the intrinsic features of the collected data under the influence of strong noise. Therefore, it is still necessary to use the deep learning method for fault diagnosis. Comparing columns 4 and 5 in row 2, it can be seen that the correlation-based LSTM fault diagnosis model has a significant advantage in its signal trend feature extraction capability, and it can effectively alleviate the overfitting problem of SAE and SDAE with small samples as well. Comparing columns 5 and 6 in row 2, MSFSAE can effectively alleviate the problems of poor diagnosis due to small samples and inconspicuous fault features. It is better than the traditional method of using only the last layer of features for fault diagnosis, but there is also a redundancy feature. Therefore, from the contrastive analysis of columns 6 and 7 in row 2, the proposed CASAE method not only considers the consistency of amplitude in the process of feature extraction, but also uses the consistency of trends to extract the fault features of signals. Compared with MSFSAE, the designed method can effectively guide the learning parameters in the unsupervised pre-training process toward the optimal parameters with relevant feature extraction, which eliminates the interference of redundant features. The comparison of columns 7 and 8 in row 2 shows that when the label similarity measure is added to the global fine-tuning loss function of SAE, the proposed method G-CASAE can effectively solve the problem of misclassification caused by the similar membership of predicted labels and improve the accuracy of fault diagnosis.

The comparative analysis of rows 2 to 5 in Table 5 shows that the diagnostic accuracy decreases with the increase in noise, but the proposed method G-CASAE still has good robustness. Comparing columns 6 and 8 of row 5 in Table 5 indicates that the accuracy of the proposed G-CASAE method is 32.5% higher than the feature fusion MSFSAE method when the effect is stronger. In Table 6 to illustrate the traditional method requires large samples to achieve the di-agnostic accuracy of the proposed method at small samples with signal to noise ratio of 20 dB.

Through the analysis with a signal to noise ratio of 20 dB and a sample size of 1500 in Table 6, it can be found that when the sample size is large enough, the five traditional methods can achieve similar diagnostic accuracy to the designed method with small samples. Therefore, the fault diagnosis effect of the proposed algorithm is obviously superior to the traditional fault diagnosis method under small samples and strong noise.

To summarize, the experimental results show that the designed G-CASAE method has better performance and stronger noise robustness than other methods. To further visualize the diagnostic effect, the confusion matrix for each type of fault diagnosis is shown in Figure 8. The darker color indicates that more samples are correctly diagnosed for each type of data in Figure 8.

To further illustrate the superiority and robustness of the proposed method, the confusion matrix of these methods is given in Figure 8. As we can see in Figure 8, the proposed G-CASAE has the highest number of correct diagnostic samples for all fault types. It is clear from the diagnostic results of the seven confusion matrices that there is a mutual misclassification problem in the second and third rows. From Figure 8a–d, the other five compared models have the most misclassifications in pitting–wear composite faults, which is due to the fact that the weak fault features of pitting–wear composite faults are not easily extracted by the traditional method, which in turn leads to poor identification results. For the traditional method, the feature extraction capability is limited because the loss function is constructed only based on the amplitude of the signal during the feature extraction process. It cannot extract features with small differences from signal to signal. However, we modified the loss function in the feature extraction process, and the essential information of the original data can be extracted not only by amplitude consistency but also by trend consistency, so the proposed CASAE method can solve the problem that the similarity between fault samples prevents the extraction of key subtle differences, which leads to low diagnostic accuracy. The proposed G-CASAE method adds the similarity loss function of labels in global fine-tuning after unsupervised pre-training to extract relevant features, which solves the problem of the misclassification of predicted labels due to the similarity of fault features and further improves the fault diagnosis accuracy of the model.

4.1.3. Results and Analysis of Experiment 2

In order to verify the reliability of the algorithm under different amounts of data, Experiment 2 is designed by adding experimental samples. The superiority of G-CASAE is compared through learning and training with a larger sample size. In this stage, the experimental scenarios for Experiment 2 were set up for the gearbox data, as shown in Table 7.

The network models of Experiment 2 were the same as those of Experiment 1. The diagnostic results of each model are shown in Table 8.

It can be seen from columns 2 to 5 in Table 4 and Table 8 that only the number of training samples is increased. When the training sample size increases, it can effectively alleviate the overfitting problem of deep learning when the sample size is insufficient. However, it can be found that the robustness of the comparison model cannot be guaranteed, even if the sample is increased. In contrast, the proposed method still has high diagnostic accuracy on poor-quality test data. This further illustrates that the features extracted from different levels by the method proposed in this paper can better reflect the nature of the data.

Comparing Table 6 and Table 9, it can be obtained that the diagnostic accuracy can reach 86.12% when the proposed algorithm utilizes 500 samples at an SNR of 20dB. In order for the existing fault diagnosis methods to achieve consistent accuracy with the proposed method under the same noise, at least 2500 training samples would be required. In summary, when the existing fault diagnosis methods fail in the context of small samples with strong noise, the proposed method still has superior diagnostic accuracy that meets the practical engineering requirements.

To further present the diagnostic effect of the proposed method, the confusion matrix for each model in Experiment 2 is given in Figure 9.

As can be seen from Figure 9, the diagnostic accuracy of both the proposed method CASAE and G-CASAE is improved significantly when the samples are increased. This is because there are sufficient samples, more fault information is provided, and the effective feature extraction method can capture the fault features well, which in turn facilitates minor fault diagnosis.

4.2. Fault Diagnosis for the Rolling Bearing

4.2.1. Dataset 2: Rolling Bearing Data

The bearing data collected from the test bench of the Case Western Reserve University rotary machinery vibration are used as the benchmark data to test the effectiveness of the proposed method [42]. The experimental platform is shown in Figure 10, which is mainly composed of a 1.5 kW (2 HP) motor, a torque transducer, and a dynamometer. In this experiment, test data were selected for the drive end bearing at a speed of 1772 rpm with a motor load of 1 HP, and a single point of failure of 0.007″ was simulated on the bearing by the electro-discharge machining technique. The sampling frequency was 12 kHz. Healthy monitoring data of the bearing were collected for H: normal, F₁: inner ring fault, F₂: outer ring fault, and F₃: rolling ball fault, respectively. The collected vibration signals were used to construct the dataset with a sliding window of 400 and step size of 50.

In order to further verify the effectiveness of the G-CASAE algorithm, this experiment avoided the problem of insufficient model feature learning due to the small sample size during the construction of the bearing experimental samples. In this section, the sample set is constructed using a sliding window of size 400 for the preprocessing of the bearing data, and the experimental scenarios set up using the obtained samples are shown in Table 10.

4.2.2. Results and Analysis of Experiment 3

To further verify the effectiveness and generalization performance of the proposed algorithm, Experiment 3 was used to verify the designed algorithm. The network model structures and parameters used in Experiment 3 are shown in Table 11. Simultaneously, we compared and analyzed our method with the above five methods and calculated the correct diagnosis accuracy for each type of fault. The experimental results are shown in the following Table 12.

From Table 12, the proposed G-CASAE method also has high diagnostic accuracy on the bearing dataset. Compared with the other five models, the diagnostic accuracy of the proposed method reaches 95.625%. Comparing columns 3 to 5 and column 7 in row 2, it can be seen that the incremental diagnostic accuracy of the improved feature extraction method in unsupervised pre-training can reach at least 10.4% compared with traditional deep learning methods such as LSTM. Comparing columns 6 and 7 in row 2, the proposed method is 4.75% more accurate than the feature fusion method. The comparison of columns 7 and 8 in row 2 shows that after effective unsupervised feature pre-training, by adding the cosine similarity measure in the supervised global fine-tuning process, the degree of tag similarity is taken into account in training, and the diagnostic accuracy of the proposed method can also be effectively improved.

In addition, from rows 2 to 5 of Table 12, it can be seen that all comparison methods are affected differently by noise of different intensities, and the proposed method has strong anti-interference ability and good generalization performance.

A comparative analysis of column 6 in row 6 and columns 7 to 8 in row 5 of Table 13 shows that when the bearing data are affected by the same noise, the fault diagnosis accuracy of the traditional method requires a large number of samples to achieve fault diagnosis accuracy comparable to the proposed method. Comparing Experiments 1, 2, and 3, it can be seen that the proposed algorithm is applicable to the identification and diagnosis of minor fault monitoring data of both gearboxes and rolling bearings. Moreover, the experimental results in different datasets maintain consistency, which shows that G-CASAE performs effective fault feature extraction and diagnosis in scenarios with small samples and strong noise.

The result further illustrates the superiority of the designed G-CASAE method. The confusion matrix of the diagnostic effect of each network model is shown in Figure 11.

It can be seen from Figure 11 that the proposed method is superior to the existing methods in the case that only 400 samples polluted with strong noise are available for training the DNN-based fault diagnosis model. This is because it can well discriminate F1 and F2, which are typical similar faults.

It can be also seen from Table 13 that 400 training samples polluted with strong noise are adequate for the proposed method to successfully train the DNN to achieve satisfactory fault diagnosis accuracy, while more than 2000 training samples are required for the traditional method to achieve comparative fault diagnosis accuracy.

5. Conclusions

Deep learning can be applied to the fault diagnosis of rotating machinery, where the diagnostic accuracy often depends on the number and quality of training samples. Since a small sample size of fault data polluted by strong noise is common in the field of engineering, it is necessary to develop a new training mechanism in order to accurately extract separable features suitable for discriminating similar faults in the event that only a small number of training samples polluted by strong noise is available. A trend feature consistency-guided deep learning method for minor fault diagnosis is proposed in this paper to make the DNN more powerful in feature presentation and similar fault discrimination. The proposed algorithm designs new loss functions for the accurate representation of data features guided by trend feature consistency and the accurate classification of faults guided by fault orientation consistency. The method overcomes the problem in which faults with similar membership cannot be effectively distinguished by traditional methods. Experimental validation of benchmark datasets shows that the proposed method is superior to traditional methods in the sense that more than a 10% diagnosis improvement can be achieved in the case in which only a small number of training samples polluted with strong noise is available. Compared with traditional methods, much fewer training samples are adequate for the proposed method to train a satisfactory DNN-based fault diagnosis model.

The experiment result also shows that when the training samples are polluted with strong noise, the fault diagnosis accuracy decreases, although it is still superior to the existing methods. In the case that the value of the training samples is too poor to establish a satisfactory fault diagnosis model, establishing a federation learning model to incorporate training data provided by different clients may be possible, which will be our future task.

Author Contributions

Methodology, C.W.; Resources, X.H.; Writing—original draft, P.J.; Writing—review & editing, F.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China (62073213) and the National Natural Science Foundation of China Youth Fund (52205111).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data involved in this article have been presented in the article.

Acknowledgments

The authors would like to thank QPZZ-II rotary machinery vibration for providing the gearbox vibration data and Case Western Reserve University for providing the motor bearing vibration data.

Conflicts of Interest

The authors declare no conflict of interest.

References

Liu, R.N.; Yang, B.Y.; Zio, E.; Chen, X.F. Artificial intelligence for fault diagnosis of rotating machinery: A review. Mech. Syst. Signal Process. 2018, 108, 33–47. [Google Scholar] [CrossRef]
Adamsab, K. Machine learning algorithms for rotating machinery bearing fault diagnostics. Mater. Today Proc. 2021, 44, 4931–4933. [Google Scholar] [CrossRef]
Dou, D.Y.; Yang, J.G.; Liu, J.T.; Zhao, Y.K. A rule-based intelligent method for fault diagnosis of rotating machinery. Knowl. Based Syst. 2012, 36, 1–8. [Google Scholar] [CrossRef]
Quinde, I.R.; Sumba, J.C.; Ochoa, L.E.; Vallejo Guevara, A., Jr.; Morales-Menendez, R. Bearing Fault Diagnosis Based on Optimal Time-Frequency Representation Method. IFAC-PapersOnLine 2019, 52, 194–199. [Google Scholar] [CrossRef]
Xu, B.M.; Shi, J.C.; Zhong, M.; Zhang, J. Incipient fault diagnosis of planetary gearboxes based on an adaptive parameter-induced stochastic resonance method. Appl. Acoust. 2022, 188, 108587. [Google Scholar] [CrossRef]
Lei, Y.G.; Lin, J.; Zuo, M.J.; He, Z.J. Condition monitoring and fault diagnosis of planetary gearboxes: A review. Measurement 2014, 48, 292–305. [Google Scholar] [CrossRef]
Youssef, A.; Delpha, C.; Diallo, D. An optimal fault detection threshold for early detection using Kullback–Leibler Divergence for unknown distribution data. Signal Process. 2016, 120, 266–279. [Google Scholar] [CrossRef]
Liu, J.; Wang, W.; Golnaraghi, F. An Enhanced Diagnostic Scheme for Bearing Condition Monitoring. IEEE Trans. Instrum. Meas. 2010, 59, 309–321. [Google Scholar]
Ren, L.; Xu, Z.Y.; Yan, X.Q. Single-sensor incipient fault detection. IEEE Sens. J. 2011, 11, 2102–2107. [Google Scholar] [CrossRef]
Rathore, M.S.; Harsha, S.P. Degradation Pattern of High Speed Roller Bearings Using a Data-Driven Deep Learning Approach. J. Signal Process. Syst. 2022, 94, 1557–1568. [Google Scholar] [CrossRef]
Chang, C.H. Deep and shallow architecture of multilayer neural networks. IEEE Trans. Neural Netw. Learn. Syst. 2015, 26, 2477–2486. [Google Scholar] [CrossRef] [PubMed]
Zhang, C.; Lim, P.; Qin, A.K.; Tan, K.C. Multi-objective deep belief networks ensemble for remaining useful life estimation in prognostics. IEEE Trans. Neural Netw. Learn. Syst. 2017, 28, 2306–2318. [Google Scholar] [CrossRef] [PubMed]
Chen, Z.Q.; Li, C.; Sánchezr, R.V. Gearbox fault identification and classification with convolutional neural networks. Shock Vib. 2015, 2015, 390134. [Google Scholar] [CrossRef] [Green Version]
Guo, L.; Li, N.P.; Jia, F.; Lei, Y.; Lin, J. A recurrent neural network based health indicator for remaining useful life prediction of bearings. Neurocomputing 2017, 240, 98–109. [Google Scholar] [CrossRef]
Jia, F.; Lei, Y.G.; Guo, L.; Lin, J.; Xing, S. A neural network constructed by deep learning technique and its application to intelligent fault diagnosis of machines. Neurocomputing 2018, 272, 619–628. [Google Scholar] [CrossRef]
Chen, Z.Q.; Deng, S.C.; Chen, X.D.; Li, C.; René-Vinicio, S.; Qin, H.F. Deep neural networks-based rolling bearing fault diagnosis. Minor. Reliab. 2017, 75, 327–333. [Google Scholar] [CrossRef]
Li, Y.; Cheng, G.; Liu, C.; Chen, X. Study on planetary gear fault diagnosis based on variational mode decomposition and deep neural networks. Measurement 2018, 130, 94–104. [Google Scholar] [CrossRef]
Tang, Z.H.; Wang, M.J.; Ouyang, T.H.; Che, F. A wind turbine bearing fault diagnosis method based on fused depth features in time–frequency domain. Energy Rep. 2022, 8, 12727–12739. [Google Scholar] [CrossRef]
Jin, Q.; Wang, Y.R.; Wang, J. Planetary gearboxes fault diagnosis based on multiple feature extraction and information fusion combined with deep learning. China Mech. Eng. 2019, 30, 196–204. [Google Scholar]
Zhang, L.Z.; Tan, J.W.; Xu, W.X.; Jing, L.M. Fault Diagnosis Methods of Rolling Bearings Based on Decision Fusion of Multiple Deep Learning Models. Modul. Mach. Tool Automatic Manuf. Tech. 2019, 8, 59–62. [Google Scholar]
Lu, C.; Wang, Z.Y.; Qin, W.L.; Ma, J. Fault diagnosis of rotary machinery components using a stacked denoising autoencoder-based health state identification. Signal Process. 2017, 130, 377–388. [Google Scholar] [CrossRef]
Chen, J.S.; Li, J.; Chen, W.G.; Wang, Y.Y.; Jiang, T.Y. Anomaly detection for wind turbines based on the reconstruction of condition parameters using stacked denoising autoencoders. Renew. Energy 2020, 147, 1469–1480. [Google Scholar] [CrossRef]
Rashidi, B.; Zhao, Q. Output-related fault detection in non-stationary processes using constructive correlative-SAE and demoting correlative-DNN. Appl. Soft Comput. 2022, 123, 108898. [Google Scholar] [CrossRef]
Zhu, H.P.; Cheng, J.X.; Zhang, C.; Wu, J.; Shao, X.Y. Stacked pruning sparse denoising autoencoder based intelligent fault diagnosis of rolling bearings. Appl. Soft Comput. 2020, 88, 106060. [Google Scholar] [CrossRef]
Zhou, F.N.; Sun, T.; Hu, X.; Wang, T.Z.; Wen, C.L. A sparse denoising deep neural network for improving fault diagnosis performance. Signal Image Video Process. 2021, 15, 1889–1898. [Google Scholar] [CrossRef]
Shao, H.D.; Jiang, H.K.; Lin, Y.; Li, X.Q. A novel method for intelligent fault diagnosis of rolling bearings using ensemble deep auto-encoders. Mech. Syst. Signal Process. 2018, 102, 278–297. [Google Scholar] [CrossRef]
Kong, X.G.; Mao, G.; Wang, Q.B.; Ma, H.B.; Yang, W. A multi-ensemble method based on deep auto-encoders for fault diagnosis of rolling bearings. Measurement 2020, 151, 107132. [Google Scholar] [CrossRef]
Shi, C.M.; Panoutsos, G.; Luo, B.; Liu, H.Q. Using Multiple-Feature-Spaces-Based Deep Learning for Tool Condition Monitoring in Ultraprecision on Manufacturing. IEEE Trans. Ind. Electron. 2019, 66, 3794–3803. [Google Scholar] [CrossRef] [Green Version]
Zhou, F.N.; Zhang, Z.Q.; Chen, D.M. Bearing fault diagnosis based on DNN using multi-scale feature fusion. In Proceedings of the 2020 35th Youth Academic Annual Conference of Chinese Association of Automation (YAC), Zhanjiang, China, 16–18 October 2020; pp. 150–155. [Google Scholar]
Zhou, F.N.; He, Y.F.; Han, H.T. Fault Diagnosis of Multi-source Heterogeneous Information Fusion Based on Deep Learning. In Proceedings of the 2019 IEEE 8th Data Driven Control and Learning Systems Conference (DDCLS), Dali, China, 24–27 May 2019; pp. 1295–1300. [Google Scholar]
Ravikumar, K.N.; Yadav, A.; Kumar, H.; Gangadharan, K.V.; Narasimhadhan, A.V. Gearbox fault diagnosis based on Multi-Scale deep residual learning and stacked LSTM model. Measurement 2021, 186, 110099. [Google Scholar] [CrossRef]
Chen, Z.P.; Zhu, H.P.; Wu, J.; Fan, L.Z. Health indicator construction for degradation assessment by embedded LSTM-CNN autoencoder and growing self-organized map. Knowl. Based Syst. 2022, 252, 109399. [Google Scholar] [CrossRef]
Zhou, F.N.; Zhang, Z.Q.; Chen, D.M. Real-time fault diagnosis using deep fusion of features extracted by parallel long short-term memory with peephole and convolutional neural network. Proc. Inst. Mech. Eng. Part I J. Syst. Control Eng. 2021, 235, 1873–1897. [Google Scholar] [CrossRef]
Nguyen, V.; Cheng, J.S.; Thai, V. Stacked Auto-encoder Based Feature Transfer Learning and Optimized LSSVM-PSO Classifier in Bearing Fault Diagnosis. Meas. Sci. Rev. 2022, 22, 177–186. [Google Scholar] [CrossRef]
Yang, Z.; Xu, B.B.; Luo, W.; Chen, F. Autoencoder-based representation learning and its application in intelligent fault diagnosis: A review. Measurement 2022, 189, 110460. [Google Scholar] [CrossRef]
Hoang, D.T.; Kang, H.J. A survey on Deep Learning based bearing fault diagnosis. Neurocomputing 2019, 335, 327–335. [Google Scholar] [CrossRef]
Lin, M.; Wang, G.J.; Xie, C.; Eugene Stanley, C. Cross-correlations and influence in world gold markets. Phys. A Stat. Mech. Its Appl. 2018, 490, 504–512. [Google Scholar] [CrossRef]
Li, J.W.; Jiang, C. A novel imprecise stochastic process model for time-variant or dynamic uncertainty quantification. Chin. J. Aeronaut. 2022, 35, 255–267. [Google Scholar] [CrossRef]
Lin, D.Y.; Li, Y.Q.; Cheng, Y.; Prasad, S.; New, T.L.; Dong, S.; Guo, A.Y. Multi-view 3D object retrieval leveraging the aggregation of view and instance attentive features. Knowl. Based Syst. 2022, 247, 108745. [Google Scholar] [CrossRef]
Shao, H.D.; Jiang, H.K.; Zhao, H.W.; Wang, F. A novel deep autoencoder feature learning method for rotating machinery fault diagnosis. Mech. Syst. Signal Process. 2017, 95, 187–204. [Google Scholar] [CrossRef]
Gearbox Data Set [DB/OL]. Available online: http://www.pudn.com/Download/item/id/3205015.html (accessed on 21 June 2022).
Case Western Reserve University Bearing Data Center Website. Available online: http://www.eecs.case.edu/laboratory/bearing (accessed on 24 August 2022).

Figure 1. The structure of an auto-encoder.

Figure 2. Distribution trend graph between input data and output data.

Figure 3. Visualization diagram of weak fault affiliation for different methods: (a) traditional deep learning minor fault diagnosis, (b) deep learning small fault diagnosis method based on denoising, (c) deep Learning for minor fault diagnosis-based correlation, (d) deep Learning minor fault diagnosis method-based fusion. Polka dot indicates the first degree of membership and triangle indicates the second degree of membership.

Figure 4. Schematic diagram of the proposed CASAE method.

Figure 5. Schematic diagram of the proposed G-CASAE method.

Figure 6. Flowchart of the proposed G-CASAE method.

Figure 7. QPZZ-II rotary machinery vibration test bench.

Figure 8. Confusion matrix for fault diagnosis results of parallel gearbox with no SNR for sample size of 100. (a) BP, (b) SAE, (c) SDAE, (d) LSTM, (e) MSFSAE, (f) CASAE, (g) G-CASAE.

Figure 9. Confusion matrix for fault diagnosis results of parallel gearbox with no SNR for sample size of 500. (a) BP, (b) SAE, (c) SDAE, (d) LSTM, (e) MSFSAE (f) CASAE, (g) G-CASAE.

Figure 10. Bearing signal acquisition experimental platform.

Figure 11. Confusion matrix for fault diagnosis results of rolling bearing with no SNR for sample size of 400. (a) BP, (b) SAE, (c) SDAE, (d) LSTM, (e) MSFSAE, (f) CASAE, (g) G-CASAE.

Table 1. Various sensor installation positions.

Channels	Serial Number in Figure 7	Sensor Location and Type
CH1	1	Photoelectric speed
CH2	2	Input shaft horizontal displacement
CH3	3	Input shaft vertical displacement
CH4	4	Input shaft left end cover vertical acceleration
CH5	5	Output shaft left end cover horizontal acceleration
CH6	6	Output shaft left end cover vertical acceleration
CH7	7	Output shaft right end cover horizontal acceleration
CH8	8	Output shaft right end cover vertical acceleration
CH9	9	Output shaft load side bearing magneto-electric speed

Table 2. Experiment 1 design for gearbox data.

Fault Type	Bearing Health Condition	Training Sample Size	Testing Sample Size
H	Normal	20	500
F₁	Wear	20	500
F₂	Pitting and wear	20	500
F₃	Pitting	20	500
F₄	Broken tooth	20	500

Table 3. The related fault diagnosis models for comparison.

Model	Model Description
BP	Backpropagation neural network
SAE	Stacked auto-encoder
SDAE	Stacked denoising auto-encoder
LSTM	Long short-term memory neural network
MSFSAE [26]	Multi-scale feature fusion stacked auto-encoder
CASAE	Cross-correlation and auto-correlation stacked auto-encoder
G-CASAE	Global fine-tuning of cross-correlation and auto-correlation stacked auto-encoder

Table 4. Network model parameters in gearbox diagnosis experiment.

Model	Model Parameters
BP	Number of layers: 5, neurons in each layer: 9/150/70/36/5, learning rate: 0.007
SAE	Number of layers: 5, neurons in each layer: 9/150/70/36/5, learning rate: 0.005
SDAE	Number of layers: 5, neurons in each layer: 9/150/70/36/5, learning rate: 0.005
LSTM	Cell number: 3, number of hidden neurons in the cell: 60, learning rate: 0.0004
MSFSAE	Number of layers: 5, neurons in each layer: 9/150/70/36/5, learning rate: 0.005
CASAE	Number of layers: 5, neurons in each layer: 9/150/70/36/5, learning rate: 0.005
G-CASAE	Number of layers: 5, neurons in each layer: 9/150/70/36/5, learning rate: 0.005

Table 5. Fault diagnosis results of different methods with different sample quality for 100 samples.

	BP	SAE	SDAE	LSTM	MSFSAE	CASAE	G-CASAE
SNR	BP	SAE	SDAE	LSTM	MSFSAE	CASAE	G-CASAE
None	78.70%	75.40%	78.20%	84.00%	85.60%	90.20%	93.40%
SNR:60	72.60%	70.40%	71.20%	79.20%	81.60%	86.40%	90.48%
SNR:40	58.48%	53.60%	56.40%	61.60%	67.20%	84.80%	86.20%
SNR:20	39.80%	38.20%	39.20%	49.60%	51.20%	82.40%	83.70%

Table 6. Diagnostic accuracy of the proposed method with 100 samples and traditional method with 1500 samples at SNR = 20 dB.

Model	BP	SAE	SDAE	LSTM	MSFSAE	CASAE	G-CASAE
Training sample size	1500					100
Accuracy	64.40%	72.00%	74.6%	80.8%	83.4%	82.40%	83.70%

Table 7. Experiment 2 design for gearbox data.

Fault Type	Bearing Health Condition	Training Sample Size	Testing Sample Size
H	Normal	100	500
F₁	Wear	100	500
F₂	Pitting and wear	100	500
F₃	Pitting	100	500
F₄	Broken tooth	100	500

Table 8. Fault diagnosis results of different methods with different sample quality for 500 samples.

	BP	SAE	SDAE	LSTM	MSFSAE	CASAE	G-CASAE
SNR	BP	SAE	SDAE	LSTM	MSFSAE	CASAE	G-CASAE
None	79.60%	84.00%	86.16%	89.68%	92.40%	96.76%	98.64%
SNR:60	75.48%	80.28%	83.56%	80.00%	86.84%	93.56%	94.16%
SNR:40	60.72%	65.40%	72.00%	75.96%	77.72%	89.72%	90.52%
SNR:20	42.48%	46.20%	48.72%	54.12%	59.40%	84.52%	86.12%

Table 9. Diagnostic accuracy of the proposed method with 500 samples and traditional method with 2500 samples at SNR = 20 dB.

Model	BP	SAE	SDAE	LSTM	MSFSAE	CASAE	G-CASAE
Training sample size	2500					100
Accuracy	66.80%	75.80%	76.60%	82.80%	86.40%	84.52%	86.12%

Table 10. Experiment 3 design for rolling bearing data.

Fault Type	Bearing Health Condition	Training Sample Size	Testing Sample Size
H	Normal	100	1000
F₁	Inner ring	100	1000
F₂	Outer ring	100	1000
F₃	Rolling	100	1000

Table 11. Network model parameters in bearing diagnosis experiment.

Model	Model Parameters
BP	Number of layers: 5, neurons in each layer: 400/600/200/100/4, learning rate: 0.001
SAE	Number of layers: 5, neurons in each layer: 400/600/200/100/4, learning rate: 0.004
SDAE	Number of layers: 5, neurons in each layer: 400/600/200/100/4, learning rate: 0.004
LSTM	Cell number: 8, number of hidden neurons in the cell: 50, learning rate: 0.0004
MSFSAE [26]	Number of layers: 5, neurons in each layer: 400/600/200/100/4, learning rate: 0.004
CASAE	Number of layers: 5, neurons in each layer: 400/600/200/100/4, learning rate: 0.004
G-CASAE	Number of layers: 5, neurons in each layer: 400/600/200/100/4, learning rate: 0.004

Table 12. Fault diagnosis results of different methods with different sample quality for 400 samples.

	BP	SAE	SDAE	LSTM	MSFSAE [26]	CASAE	G-CASAE
SNR	BP	SAE	SDAE	LSTM	MSFSAE [26]	CASAE	G-CASAE
None	70.10%	75.03%	78.15%	85.23%	90.88%	95.63%	98.85%
SNR:20	63.05%	72.88%	74.10%	82.93%	86.13%	92.25%	93.13%
SNR:10	55.70%	62.70%	70.65%	78.25%	79.75%	87.65%	89.87%
SNR:2	45.20%	58.53%	63.30%	69.85%	75.28%	84.38%	86.70%

Table 13. Diagnostic accuracy of the proposed method with 400 samples and traditional method with 2000 samples at SNR = 2 dB.

Model	BP	SAE	SDAE	LSTM	MSFSAE [26]	CASAE	G-CASAE
Training sample size	2000					400
Accuracy	68.83%	74.62%	76.28%	83.53%	85.33%	84.38%	86.70%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Jia, P.; Wang, C.; Zhou, F.; Hu, X. Trend Feature Consistency Guided Deep Learning Method for Minor Fault Diagnosis. Entropy 2023, 25, 242. https://doi.org/10.3390/e25020242

AMA Style

Jia P, Wang C, Zhou F, Hu X. Trend Feature Consistency Guided Deep Learning Method for Minor Fault Diagnosis. Entropy. 2023; 25(2):242. https://doi.org/10.3390/e25020242

Chicago/Turabian Style

Jia, Pengpeng, Chaoge Wang, Funa Zhou, and Xiong Hu. 2023. "Trend Feature Consistency Guided Deep Learning Method for Minor Fault Diagnosis" Entropy 25, no. 2: 242. https://doi.org/10.3390/e25020242

APA Style

Jia, P., Wang, C., Zhou, F., & Hu, X. (2023). Trend Feature Consistency Guided Deep Learning Method for Minor Fault Diagnosis. Entropy, 25(2), 242. https://doi.org/10.3390/e25020242

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Trend Feature Consistency Guided Deep Learning Method for Minor Fault Diagnosis

Abstract

1. Introduction

2. Related Theories

2.1. Deep Neural Network Based on Stacked Auto-Encoder

2.2. Cross-Correlation Coefficient

2.3. Auto-Correlation Coefficient

2.4. Cosine Distance

3. Minor Fault Diagnosis Method Using DNN Guided by Consistency of Trend Features and Consistency of Fault Orientation

3.1. Incapability of DNN to Extract Separable Features of Similar Faults

3.2. Minor Fault Diagnosis Using DNN Guided by Consistency of Trend Features and Consistency of Fault Orientation

3.2.1. Trend Feature-Guided Training of a Unique AE

3.2.2. Orientation Consistency-Guided Training of SAE in Stage of Backpropagation

3.2.3. Trend Feature Consistency-Driven Deep Learning for Minor Fault Diagnosis

4. Experimental Analysis

4.1. Fault Diagnosis for Parallel Gearbox

4.1.1. Dataset 1: Gearbox Data

4.1.2. Results and Analysis of Experiment 1

4.1.3. Results and Analysis of Experiment 2

4.2. Fault Diagnosis for the Rolling Bearing

4.2.1. Dataset 2: Rolling Bearing Data

4.2.2. Results and Analysis of Experiment 3

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI