A Novel Method for Intelligent Single Fault Detection of Bearings Using SAE and Improved D–S Evidence Theory

In order to realize single fault detection (SFD) from the multi-fault coupling bearing data and further research on the multi-fault situation of bearings, this paper proposes a method based on features self-extraction of a Sparse Auto-Encoder (SAE) and results fusion of improved Dempster–Shafer evidence theory (D–S). Multi-fault signal compression features of bearings were extracted by SAE on multiple vibration sensors’ data. Data sets were constructed by the extracted compression features to train the Support Vector Machine (SVM) according to the rule of single fault detection (R-SFD) this paper proposed. Fault detection results were obtained by the improved D–S evidence theory, which was implemented via correcting the 0 factor in the Basic Probability Assignment (BPA) and modifying the evidence weight by Pearson Correlation Coefficient (PCC). Extensive evaluations of the proposed method on the experiment platform datasets showed that the proposed method could realize single fault detection from multi-fault bearings. Fault detection accuracy increases as the output feature dimension of SAE increases; when the feature dimension reached 200, the average detection accuracy of the three sensors for bearing inner, outer, and ball faults achieved 87.36%, 87.86% and 84.46%, respectively. The three types’ fault detection accuracy—reached to 99.12%, 99.33% and 98.46% by the improved Dempster–Shafer evidence theory (IDS) to fuse the sensors’ results—is respectively 0.38%, 2.06% and 0.76% higher than the traditional D–S evidence theory. That indicated the effectiveness of improving the D–S evidence theory by evidence weight calculation of PCC.


Introduction
Bearings are one of the most common and critical components of rotating machinery, and their health is directly related to the production efficiency and safety. Any failure of bearings may cause huge time and money costs of repairing broken machines. Bearing faults that are not diagnosed in time can also pose serious hidden danger to the whole equipment. For safety and cost considerations, the bearing fault diagnosis research is of great significance [1].
The bearing fault diagnoses have been researched for a long time [2,3]. Under the background of data-driven fault diagnosis, recent studies have shown that the research of feature automatic extraction and automatic diagnosis has become an important research field [4]. Considering the critical task of detecting bearing faults early, Hoang [5] presented an automatic bearing fault diagnosis method, which evidence theory is applied widely in many fields [21][22][23]. However, the classic D-S evidence theory appears to be imperfect as the fusion results may be counter-intuitive when facing a situation in which evidence highly conflicts. In this respect, many scholars have been working on improving classic D-S evidence theory [24,25]. In this paper, due to the bearing multi-fault data, PCC calculation and distribution of weight of evidence to improve the D-S evidence theory is proposed. The fused results showed the benefits of improved method.
Contributions of this paper are shown below: (1) The problem about how to realize intelligent single fault detection (SFD) from multi-fault by the features auto-extraction method was presented in this paper. This paper considered training the classifiers to distinguish whether data contains the single target fault or not, and the completed classifier can be the identification tool of the specific fault type trained before. (2) This paper proposed one feature auto-extraction method of SAE to process vibration data of different sensors. For the multi-sensor and multi-species faults coupling data features extraction, SAE demonstrates excellent computing and feature compression capabilities, extracted features are important for the single fault detection in this study. (3) The improved D-S evidence method for bearing fault diagnosis has been used to calculate multiple sensors' classification results. Considering the difficulties of this data-driven method to separate faults, improving fault detection accuracy and solving the common paradoxes on traditional D-S evidence, IDS had been proposed and applied to fuse the detection results of multiple sensors.
The remaining part of this paper was organized as follows. In Section 2, some concepts and basic theories related to the paper's work were presented. In Section 3, the SAE and IDS evidence theory for bearing single fault detection were shown in detail. In Section 4, the experiment platform was shown and the detailed results analysis of the proposed approach was evaluated. Finally, the studies of this paper were concluded in Section 5.

Related Theory
In this part, some basic theories and methods related to this paper were mainly introduced. Section 2.1 discusses about the SAE and its essential input, output, and fundamental questions. Section 2.2 presents the basic fusion and computational principles of D-S evidence theory [26]. Section 2.3 overviews the basic concepts of PCC. These basic theories supported the research of this paper.

Sparse Auto-Encoder (SAE)
SAE is a variant algorithm of the Auto-Encoder (AE). Its change process is achieved by adding sparsity constraints to the AE. A similar variant algorithm has a regularization self-encoding that adds a regular term to the loss function called Regularized Auto-Encoder, noise reduction self-encoding for destroying input data and then performing encoding and decoding called Denoising Auto-Encoder and so on.
As shown in Figure 1, the input layer and hidden layer constitute an encoding process, the hidden layer and output layer constitute a decoding process. The goal of this structure is that through the process of coding and decoding, the output data can still be as consistent as possible to the input data. This means that the output data of the hidden layer in the trained network structure is a certain feature representation of the original data after encoding. This feature contains relatively complete data information of the original data, so SAE can be used to extract the features of data. x 3 x n (1) h2 (1) hm (1) bh (1)  Assumed sample set ∈ { ( ) } , m is the number of samples, the encoding process function can be expressed as (1) where f s is the hidden layer activation function, (1) w is the calculated weight matrix input to the hidden layer, x b is the bias, (1) ( , ) The decoding process function can be expressed as (1) (1) where g s is the decoding output layer activation function, ( ) T w is the calculation weight of the hidden layer to the output layer, (1) h b is the bias, It is necessary to ensure that x and x are as close as possible, so that the encoded data h can be regarded as a certain characteristic of the original data, and it has the value of reflecting certain information of the original data. The function that evaluates the errors of output and input is ( , ) where, in Equation (3), β is the sparsity parameter, generally taking 0.05~0.5. The is given to limit the activation of hidden layer, j ρ is the average activation value of the j-th neuron.
( , ) J W b is the cost function of AE, as in Equation (4), where the first term is the mean squared term, is the weight attenuation coefficient, Ω weight is the weight regularization to prevent over fitting, , n l is the number of layers in the network, and l is the serial number of the layers. s l denotes the number of nodes in the l-th layer of the network, and W ji (l) denotes all the weight vectors to connect the lth layer and (l + 1)-th layer.
Through the above data calculation transfer method, the encoded representation of the original data can be obtained, and the output features of hidden layer is used to train the classifiers. According to the setting of the number of different hidden layer neurons, the encoded feature dimension will also be changed. Assumed sample set x ∈ x (n) m , m is the number of samples, the encoding process function can be expressed as where s f is the hidden layer activation function, w (1) is the calculated weight matrix input to the hidden layer, b x is the bias, f (w (1) The decoding process function can be expressed aŝ where s g is the decoding output layer activation function, (w (1) ) T is the calculation weight of the hidden layer to the output layer, b h (1) is the bias, g((w (1) ) It is necessary to ensure that x andx are as close as possible, so that the encoded data h can be regarded as a certain characteristic of the original data, and it has the value of reflecting certain information of the original data. The function that evaluates the errors of output and input is J sparse (W, b): where, in Equation (3), β is the sparsity parameter, generally taking 0.05~0.5. The KL(ρ ρ j ) is ρ is given to limit the activation of hidden layer, ρ j is the average activation value of the j-th neuron. J(W, b) is the cost function of AE, as in Equation (4), where the first term is the mean squared term, L(x m ,x m ) = h W,b (x (i) ) −x (i) 2 , λ is the weight attenuation coefficient, Ω weight is the weight regularization to prevent over fitting, Ω weight = 1/2 , n l is the number of layers in the network, and l is the serial number of the layers. s l denotes the number of nodes in the l-th layer of the network, and W (l) ji denotes all the weight vectors to connect the lth layer and (l + 1)-th layer. Through the above data calculation transfer method, the encoded representation of the original data can be obtained, and the output features of hidden layer is used to train the classifiers. According to the setting of the number of different hidden layer neurons, the encoded feature dimension will also be changed.

Basic Concept of D-S Evidence Theory
D-S evidence theory is an uncertainty reasoning method proposed by Dempster in 1967 and perfected by Shafer in 1976. It can deal with decision problems with unknowingness and uncertainty without prior probability. D-S evidence theory has shown better performance in data fusion-based classification compared to the traditional probability theory due to its capability to grasp the unknown and uncertainty [27]. With all of that, this paper chose and improved the D-S evidence theory for sensors' results fusion to enhance diagnosis accuracy.
Evidence theory can deal with the uncertainty caused by "unknow evidence condition". It is a theory based on a non-empty finite set U, any two elements of them are mutually exclusive. U is called the recognition framework, and E is a set class on U's power set 2 U , defining m(E) assigns a basic probability of E, indicating the exact degree of trust to E, m is a measure on the subsets of U and is called a basic probability assignment function (BPAF). This function is subject to the following conditions: where Bel(E) is the belief function, which represents the sum of the likelihood metrics for all subsets of E, Pl(E) is called the plausibility function, [Bel(E), Pl(E)] is called the confidence interval, to describe the uncertainty interval proposition, called confidence interval, as shown in Figure 2.

Basic Concept of D-S Evidence Theory
D-S evidence theory is an uncertainty reasoning method proposed by Dempster in 1967 and perfected by Shafer in 1976. It can deal with decision problems with unknowingness and uncertainty without prior probability. D-S evidence theory has shown better performance in data fusion-based classification compared to the traditional probability theory due to its capability to grasp the unknown and uncertainty [27]. With all of that, this paper chose and improved the D-S evidence theory for sensors' results fusion to enhance diagnosis accuracy.
Evidence theory can deal with the uncertainty caused by "unknow evidence condition". It is a theory based on a non-empty finite set U, any two elements of them are mutually exclusive. U is called the recognition framework, and E is a set class on U ʹs power set 2 U , defining m(E) assigns a basic probability of E, indicating the exact degree of trust to E, m is a measure on the subsets of U and is called a basic probability assignment function (BPAF). This function is subject to the following conditions: where Bel(E) is the belief function, which represents the sum of the likelihood metrics for all subsets of E, Pl(E) is called the plausibility function,  is called the confidence interval, to describe the uncertainty interval proposition, called confidence interval, as shown in Figure 2.  The synthetic rule of D-S evidence is a combination of any two evidences of the recognition framework. Defined 1 and 2 be the two belief functions on the same recognition frame U.
and are the corresponding basic probability assignments (BPAs) respectively. The focal elements are , , ..., E k and , , ..., . And the combination rules are as follows: The synthetic rule of D-S evidence is a combination of any two evidences of the recognition framework. Defined Bel1 and Bel2 be the two belief functions on the same recognition frame U. m 1 and m 2 are the corresponding basic probability assignments (BPAs) respectively. The focal elements are E 1 , E 2 , . . . , E k and F 1 , F 2 , . . . , F r . And the combination rules are as follows: where K is the collision factor. The K value reflects the degree of conflict between the two evidences. The larger the K value is, the greater the conflict between the two evidences will be; the smaller the K value is, the smaller the conflict between the two evidences will be. When K ≥ 1, the D-S combination rule is invalid. The coefficient 1/(1 − K) is the normalization factor, which is used to avoid assigning a non-zero probability to the empty set during synthesis. Therefore, when multiple evidences fusion is carried out, appropriate measures should be taken to avoid fusion failure. However, there still exist some common paradoxes. BPAs for common paradoxes are as Table 1 shows. Table 1. Basic Probability Assignments (BPAs) for common paradoxes [28].

Paradoxes
Evidences Propositions Due to its own calculation rules, the traditional D-S evidence theory fails to fuse evidence. In order to solve these conflict problems, an improved D-S evidence theory through the PCC and modified fusion method was proposed in this paper. Specific improvement processes were introduced in detail in the third section.

Pearson Correlation Coefficient (PCC)
PCC is the covariance of the two variables divided by the product of their standard deviations. The form of the definition involves a "product moment", that is, the mean (the first moment about the origin) of the product of the mean-adjusted random variables.
Assuming there are two variables X and Y, the PCC between two variables is defined as the quotient of the covariance and standard deviation between the two variables: where E is mathematical expectation, cov is covariance.
where σ 2 X is variance, which measures how far a set of numbers are spread out. In Equation (12), In statistics, the Pearson correlation coefficient is used to measure the correlation between two variables X and Y, with values between −1 and 1, where 1 is total positive linear correlation, 0 is no linear correlation, and −1 is total negative linear correlation. The relevant strength of the variable is usually judged by the following range of values, as shown in Table 2. Table 2. The relationship of correlation coefficient and relevant degree.

Correlation Coefficient
Relevant Degree Very weak or no correlation

The SAE and Improved D-S Evidence Theory for Bearing Single Fault Detection (SFD)
In this part, the theories introduced above were applied to the bearing single fault detection. The detail fault detection frame was introduced in Section 3.1. The rule of single fault detection (R-SFD) was shown in Section 3.2. For the finally improved fusion method of D-S evidence theory, are detailed introduced in Section 3.3.

The Frame of Bearing Single Fault Detection
For the diagnosis and faults detection of bearing multi-fault signal, current methods mostly depend on artificial analysis of traditional signals. The specific method has been introduced in detail in the previous section. Inspired by the research of deep learning and traditional single fault diagnosis, this paper proposes a single fault detection (SFD) method of bearings based on SAE to realize the features self-extraction on multi-fault bearings' vibration signal data. The method flowchart of this paper is shown in Figure 3.
As Figure 3 shows, multiple sensors' data from the experimental platform are collected firstly, and the position arrange of the sensors are set according to the experimental platform. Then, creating training data sets and test data sets from the sensors' raw data, and the number of training and test samples per sensor needs to be consistent here in order to ensure the weight consistency of each sensor. Next, feature extractor (SAE) is trained by training data, each sensor prepares an extractor for training, and feature extraction of training datasets and test datasets uses trained classifier models. This is followed by classification training on the obtained feature sets, and each sensor's feature sets corresponds to a classifier for training. Then, all trained classifiers' test data results are combined for fusion calculation by IDS. Finally, the fusion results are compared with the real label of the test data to acquire the accuracy of the proposed method.   . Flowchart of the proposed method. ① Signal Acquisition: collect bearing multisensors' data according to requirements and get data sets of sensor1, sensor2, …; ② Data pre-processing: random sampling from the experimental bearing multi-sensor data sets to form the training sets and test sets; ③ Feature extraction: feature extraction for training and testing sets; ④ Classifier training: classify training on every sensors' features data; ⑤ IDS fusion: convergence calculation for classification results of all classifiers by improved D-S evidence theory; ⑥ Accuracy get: compare the real label with fusion result of all the test data sets to get the accuracy of the proposed method.

Rule of Single Fault Detection(R-SFD)
In the flowchart in Figure 3, the fault detector part is as Figure 4 shows, the training rule and single fault detection rule are showed in this section.

Rule of Single Fault Detection(R-SFD)
In the flowchart in Figure 3, the fault detector part is as Figure 4 shows, the training rule and single fault detection rule are showed in this section.
(1) SAE training on the sets of training samples and features extraction for each type of training x a_train is a set of all data including a fault, for example, a single fault condition is a, b and c, combined multi-fault types are ab, ac, bc and abc, then f a_train represents a trained features set of a, ab, ac and abc. The same is true for other situations, this training principle is shown in Figure 5. (3) The classified output of the plurality of sensor data was fused, and the fusion result was used for label determination. Finally, all the trained classifiers are used to test whether the faults exist in the test data, and the accuracy of the proposed method fault diagnosis is obtained. As shown in Figure 3, the test label is compared with the real label to obtain the accuracy of the proposed method in single fault detection.
Entropy 2019, 21, x 9 of 21 (3) The classified output of the plurality of sensor data was fused, and the fusion result was used for label determination. Finally, all the trained classifiers are used to test whether the faults exist in the test data, and the accuracy of the proposed method fault diagnosis is obtained. As shown in Figure 3, the test label is compared with the real label to obtain the accuracy of the proposed method in single fault detection.

The Improved D-S Evidence Theory (IDS)
The traditional D-S evidence theory presents a good method for evidence fusion. However, there are some limitations, which can lead to failure of evidence fusion under certain circumstances such as some complex practical environments with probable conflicts of different evidence or some different data from various fields [20]. Compared with the traditional D-S evidence theory, existing research showed that the improved methods have a good effect in solving the paradoxes caused by the uncertainty and unknown evidence, and the modified algorithm became more suitable for figuring out all the paradoxes [24,28]. As a matter of fact, different types of paradox do not appear in one kind of data. Considering that, this paper chose PCC to correct the evidence weight, and the results showed its effectiveness.
For the bearing data and research objectives, the improvements of D-S evidence theory in this paper are as follows: (1) Overcome "One-vote veto" phenomenon In the D-S evidence theory, if the BPA of pre-fusion proposition was 0, according to the combination rule of D-S evidence theory the proposition will completely negated, such as a recognition framework U = {A, B , C}, with three evidences:

The Improved D-S Evidence Theory (IDS)
The traditional D-S evidence theory presents a good method for evidence fusion. However, there are some limitations, which can lead to failure of evidence fusion under certain circumstances such as some complex practical environments with probable conflicts of different evidence or some different data from various fields [20]. Compared with the traditional D-S evidence theory, existing research showed that the improved methods have a good effect in solving the paradoxes caused by the uncertainty and unknown evidence, and the modified algorithm became more suitable for figuring out all the paradoxes [24,28]. As a matter of fact, different types of paradox do not appear in one kind of data. Considering that, this paper chose PCC to correct the evidence weight, and the results showed its effectiveness.
For the bearing data and research objectives, the improvements of D-S evidence theory in this paper are as follows: (1) Overcome "One-vote veto" phenomenon In the D-S evidence theory, if the BPA of pre-fusion proposition was 0, according to the combination rule of D-S evidence theory the proposition will completely negated, such as a recognition framework U = {A, B , C}, with three evidences:

The Improved D-S Evidence Theory (IDS)
The traditional D-S evidence theory presents a good method for evidence fusion. However, there are some limitations, which can lead to failure of evidence fusion under certain circumstances such as some complex practical environments with probable conflicts of different evidence or some different data from various fields [20]. Compared with the traditional D-S evidence theory, existing research showed that the improved methods have a good effect in solving the paradoxes caused by the uncertainty and unknown evidence, and the modified algorithm became more suitable for figuring out all the paradoxes [24,28]. As a matter of fact, different types of paradox do not appear in one kind of data. Considering that, this paper chose PCC to correct the evidence weight, and the results showed its effectiveness.
For the bearing data and research objectives, the improvements of D-S evidence theory in this paper are as follows: This phenomenon is called "One-vote veto". For this question, this paper uses a micro BPA to reconstruct origin BPA, and this micro BPA is set by 0.01 in this paper. The details are given in Algorithm 1. Algorithm 1. "One-vote veto" eliminate Defining the support level of evidence m i as Sup(m i , the sum of all elements except the self in the similarity measure matrix represents the degree of support between the evidence m i and other evidence: Then, defining the trust degree of the evidence m i as Cred(m i ), standardizing the support degree, satisfying the requirement of D-S evidence theory, Cred(m i ) ∈ [0, 1], and n i=1 Cred(m i = 1. Finally, the evidence credibility with total negative linear correlation was set to absolute value, others remains unchanged. 2 Correct the original BPA using the obtained evidence credibility Cred(m i ). Define the corrected BPA as m * i (X). 3 As Algorithm 1 shows, replace 0 items to 0.01 from the corrected BPA matrix. 4 According to the D-S fusion calculate rules, the first fusion support vector can be obtained. 5 Add the BPA vector obtained from the first fusion above to original BPA matrix and get the M new . 6 Modifying the 0 value of M new as 3 do. 7 Perform the D-S fusion calculation once again as 4 does, and get the finally fusion results. This phenomenon is called "One-vote veto". For this question, this paper uses a micro BPA to reconstruct origin BPA, and this micro BPA is set by 0.01 in this paper. The details are given in Algorithm 1.

Algorithm 1. "One-vote veto" eliminate
, where n is the number of evidences, and X is the identification framework.   (15) Defining the support level of evidence m i as Sup(m i ), the sum of all elements except the self in the similarity measure matrix represents the degree of support between the evidence m i and other evidence: Then, defining the trust degree of the evidence m i as Cred(m i ) , standardizing the support degree, satisfying the requirement of D-S evidence theory, Cred(m i )∈ 0,1 , and ∑ Cred(m i ) n i=1 = 1.

Data Acquisition and Experimental Analysis
According to the method proposed in the single fault detection in this paper, the self-extraction method of SAE was applied to get the features of the data sets, and three SVMs were used to classify whether the three faults are included, the specific classification method has been introduced in Section 3. In this section, experiments are conducted to detect whether the outer, inner, and ball fault exist or not. Data processes on different sensors data had the same steps. Finally, results fusion was done for the sensors.

Data Preparation
The data is collected by the experiment platform shown in Figure 7. The test bearing is placed at the right end of the test bench. The sampling frequency is 2k, the motor speed is 2500 rpm, and the bearing model is deep groove ball bearing 6900zz. In order to obtain the vibration data of the bearing comprehensively, three sensors are installed in the directions of X, Y and Z, as Figure 8 shows. The data acquisition card is a multi-channel dynamic signal test and analysis system of DH5920 made by DONGHUA company. The entire experimental process is without load. The bearings health status in this paper included eight types: normal (N), outer ring fault (O), ball fault (B), inner ring fault (I), outer ring and ball fault (OB), outer ring and inner ring fault (OI), ball and inner ring fault (BI), outer ring, ball and inner ring fault (OBI). Fault types are cylindrical grooves with a diameter of 0.2 mm and a depth of 0.1 mm. The machining method is electrical discharge machining (EDM). The machined fault is shown in Figure 9. From left to right, the outer ring fault, the ball fault, and the inner ring fault are shown.   According to the requirements of this experiment, the experimental data was collected, the training and testing sample sets were randomly selected, as shown in Table 3.   According to the requirements of this experiment, the experimental data was collected, the training and testing sample sets were randomly selected, as shown in Table 3. Table 3. Experimental data sets.

Type
Train Set Sample Test Set Sample Sample Length Tags  N  700  300  1024  000  O  700  300  1024  100  B  700  300  1024  010  I  700  300  1024  001  OB  700  300  1024  110  OI  700  300  1024  101  BI  700  300  1024  011  OBI  700  300 1024 111   According to the requirements of this experiment, the experimental data was collected, the training and testing sample sets were randomly selected, as shown in Table 3. Table 3. Experimental data sets.

Analysis
The faults detection model in this article should undergo four steps. Firstly, features were extracted by SAE on the training data. Secondly, the classifier of SVM training by the extracted features, rule of dividing single fault (R-DSF), has been described before. This paper chooses the default parameters to train SVM: penalty parameter C is 1.0, the kernel function is radial basis function (RBF), the output is probability estimation for the preparation of fusion method. Thirdly, the trained model is evaluated by data of test sets. Finally, the IDS was applied to fuse different sensors' results and judge whether or not such a fault is included is determined.
In this study, the proposed method was used to analyze the mixed multi-faults status of rolling bearings. Seven sets of experiments are performed on different coding dimensions, with three sensors in different directions, an improved fusion method for multiple sensors. As shown from Figures 11-13, it is proved that the idea of intelligent SFD from multi-fault was meaningful and effective. The average accuracy of three different faults and three sensors of different positions are shown in Figures  14 and 15. In Figure 16, the superiority of fusion theory is explained, the reason and advantages that fusion method is adapted to fault detection on multiple sensors are proved. The analysis of the

Analysis
The faults detection model in this article should undergo four steps. Firstly, features were extracted by SAE on the training data. Secondly, the classifier of SVM training by the extracted features, rule of dividing single fault (R-DSF), has been described before. This paper chooses the default parameters to train SVM: penalty parameter C is 1.0, the kernel function is radial basis function (RBF), the output is probability estimation for the preparation of fusion method. Thirdly, the trained model is evaluated by data of test sets. Finally, the IDS was applied to fuse different sensors' results and judge whether or not such a fault is included is determined.
In this study, the proposed method was used to analyze the mixed multi-faults status of rolling bearings. Seven sets of experiments are performed on different coding dimensions, with three sensors in different directions, an improved fusion method for multiple sensors. As shown from Figures 11-13, it is proved that the idea of intelligent SFD from multi-fault was meaningful and effective. The average accuracy of three different faults and three sensors of different positions are shown in Figures 14 and 15. In Figure 16, the superiority of fusion theory is explained, the reason and advantages that fusion method is adapted to fault detection on multiple sensors are proved. The analysis of the experimental results of the improved fusion method to illustrate the advantages and causes of the improvement is shown in Figures 17 and 18 and Table 4.

Effectiveness Analysis of Single Fault Detection Method by SAE and D-S Evidence Theory
Coding dimension reflects the number of extracted features, so the coding dimension directly affects whether the output of hidden layer can completely retain the fault features that are needed, and also directly affects the classification accuracy. For the purpose of researching the relationship between features number and detection precision, this paper chooses the different coding dimension of gradient change to make experiment analysis. As the coding dimension increases, so does the detection accuracy, as shown in Figures 11-13.

Effectiveness Analysis of Single Fault Detection Method by SAE and D-S Evidence Theory
Coding dimension reflects the number of extracted features, so the coding dimension directly affects whether the output of hidden layer can completely retain the fault features that are needed, and also directly affects the classification accuracy. For the purpose of researching the relationship between features number and detection precision, this paper chooses the different coding dimension of gradient change to make experiment analysis. As the coding dimension increases, so does the detection accuracy, as shown in Figures 11-13.    In Figures 11-13, there is the detailed accuracy of every type of fault detection, whether it is the inner, ball, or outer fault. From the presented results, the research on intelligent SFD is meaningful, the method of SAE proved its effectiveness to features extraction of a single fault from multi-fault bearings' data. Firstly, the trends of accuracy rate are consistent with expectations as the feature dimension changes, the accuracy rises quickly at the beginning, and keeps stable in the feature dimension around 150 to 200, but as the number of features' dimension increases, the calculation amount to extract features increases, too. From the results that the three pictures show, significant differences can be seen that show the sensors of different positions have different detection accuracies to different faults. Generally, the data of the X-axis sensor has the best detection results for all three faults except for several fewer features of 25 to 100 on inner and outer faults. It is obvious that it is more difficult to detect the fault by the data of the Y-axis sensor than the other two, the accuracy is always below 0.85. In order to compare the detection capabilities of the three sensors on three faults, we averaged the three fault accuracy rates of each sensor and obtained the results shown in Figures  14 and 15.   In Figures 11-13, there is the detailed accuracy of every type of fault detection, whether it is the inner, ball, or outer fault. From the presented results, the research on intelligent SFD is meaningful, the method of SAE proved its effectiveness to features extraction of a single fault from multi-fault bearings' data. Firstly, the trends of accuracy rate are consistent with expectations as the feature dimension changes, the accuracy rises quickly at the beginning, and keeps stable in the feature dimension around 150 to 200, but as the number of features' dimension increases, the calculation amount to extract features increases, too. From the results that the three pictures show, significant differences can be seen that show the sensors of different positions have different detection accuracies to different faults. Generally, the data of the X-axis sensor has the best detection results for all three faults except for several fewer features of 25 to 100 on inner and outer faults. It is obvious that it is more difficult to detect the fault by the data of the Y-axis sensor than the other two, the accuracy is always below 0.85. In order to compare the detection capabilities of the three sensors on three faults, we averaged the three fault accuracy rates of each sensor and obtained the results shown in Figures  14 and 15.  In Figure 14, it can be seen that no matter how the coding dimension increases, except the coding dimension of 25, the order of accuracy from big to small of three sensors is X, Z and Y, and the gap of detection accuracy becomes larger between the X-axis sensor and others when the feature dimension increases. This indicates that the positions of sensors have a great influence on the fault detection. In Figure 15, the results show that there are differences in the difficulty of detecting different faults, and faults, thoughts of fusion method are easy to be considered.
In addition, in Figures 11-13, the detection accuracy of three faults after the D-S method is higher than the single sensor all the time, it can be seen that combining the different sensors' results will improve the accuracy in all kinds of detection of faults. In order to clearly compare the accuracy of different fault detections that use, or do not use, D-S evidence theory, the comparison results are shown in Figure 16. In Figure 16, similar to the comparison of a single fault in Figures 11-13, the fused results are always higher than the average accuracy of three sensors. As the dimension of features increases, the detection accuracy of the inner fault is becoming higher than the other two, the phenomenon never  In Figure 17, compared with the D-S, the higher accuracy effect of IDS can be seen from the values of the figure. In order to see the influences of improved method clearly, the difference value between IDS and D-S is shown in Figure 18. When the dimension of features increases, the difference value increases first and then decreases in IF detection, keeps relatively stable and high in OF  In Figure 17, compared with the D-S, the higher accuracy effect of IDS can be seen from the values of the figure. In order to see the influences of improved method clearly, the difference value between IDS and D-S is shown in Figure 18. When the dimension of features increases, the difference value increases first and then decreases in IF detection, keeps relatively stable and high in OF  The factors affecting the accuracy of SFD mainly include the dimension of features. Through multiple experiments, when the coding dimension are 150 and 200, the accuracy basically reached a steady state, the SVM can achieve a high classification accuracy for fault detection, and the fusion result was basically the best. The accuracy of the fault detection improved by IDS fusion method is shown in Figures 17 and 18 and Table 5. Next, the results of experiments by the Proposed methods in the paper are presented, and the detailed analysis is attached, too. Coding dimension reflects the number of extracted features, so the coding dimension directly affects whether the output of hidden layer can completely retain the fault features that are needed, and also directly affects the classification accuracy. For the purpose of researching the relationship between features number and detection precision, this paper chooses the different coding dimension of gradient change to make experiment analysis. As the coding dimension increases, so does the detection accuracy, as shown in Figures 11-13.
In Figures 11-13, there is the detailed accuracy of every type of fault detection, whether it is the inner, ball, or outer fault. From the presented results, the research on intelligent SFD is meaningful, the method of SAE proved its effectiveness to features extraction of a single fault from multi-fault bearings' data. Firstly, the trends of accuracy rate are consistent with expectations as the feature dimension changes, the accuracy rises quickly at the beginning, and keeps stable in the feature dimension around 150 to 200, but as the number of features' dimension increases, the calculation amount to extract features increases, too. From the results that the three pictures show, significant differences can be seen that show the sensors of different positions have different detection accuracies to different faults. Generally, the data of the X-axis sensor has the best detection results for all three faults except for several fewer features of 25 to 100 on inner and outer faults. It is obvious that it is more difficult to detect the fault by the data of the Y-axis sensor than the other two, the accuracy is always below 0.85. In order to compare the detection capabilities of the three sensors on three faults, we averaged the three fault accuracy rates of each sensor and obtained the results shown in Figures 14 and 15.
In Figure 14, it can be seen that no matter how the coding dimension increases, except the coding dimension of 25, the order of accuracy from big to small of three sensors is X, Z and Y, and the gap of detection accuracy becomes larger between the X-axis sensor and others when the feature dimension increases. This indicates that the positions of sensors have a great influence on the fault detection. In Figure 15, the results show that there are differences in the difficulty of detecting different faults, and it's relatively difficult to identify the ball fault compared to other faults in this paper's experiment. So, it can be concluded that the signal of the X-axis is beneficial to faults detection and the ball fault is more difficult to be detected than others. In order to take advantage of differences in sensors and faults, thoughts of fusion method are easy to be considered.
In addition, in Figures 11-13, the detection accuracy of three faults after the D-S method is higher than the single sensor all the time, it can be seen that combining the different sensors' results will improve the accuracy in all kinds of detection of faults. In order to clearly compare the accuracy of different fault detections that use, or do not use, D-S evidence theory, the comparison results are shown in Figure 16.
In Figure 16, similar to the comparison of a single fault in Figures 11-13, the fused results are always higher than the average accuracy of three sensors. As the dimension of features increases, the detection accuracy of the inner fault is becoming higher than the other two, the phenomenon never occured in average accuracy. That indicates that the influence of fusion method in the inner fault is higher than outer fault, this also confirms the benefits of combining different sensors for the detection results. Overall, from this figure, the advantages of fusion theory is obviously seen. Traditional D-S shows a good level of multi-sensor fusion, and it can achieve a high recognition accuracy level when the dimension is about 150 to 200 and keep steady.
In this study, the figures also showed that the fusion result is the best for each fault situation, as the dimension of features is 150, the results of D-S method are 0.1209, 0.1133 and 0.1242 higher than unfused in inner, outer, and ball fault detection.

Evaluation of the Improved D-S (IDS) Evidence Fusion Algorithm and Its Effect in Bearing Fault Detection
To illustrate the effectiveness of the IDS method in this paper, the BPAs mentioned in Table 1 are used to calculate here. The results of several improvement methods to D-S evidence theory and the IDS of this paper are shown in Table 4.
It can be seen that both Yager and Sun solve the conflicts through allotting the conflict factor to the unknown proposition in Θ and Sun allots all the paradoxes directly to it, the unknown proposition of Θ which increases the uncertainty, and the two are unable to handle any kind of paradoxes. Murphy, Deng and IDS were consistent in judging the paradoxes, but the IDS is better than the first two on all paradoxes. This showed that using PPC for weight calculation of evidences and improved fusion method can make the decision results more correctly. Overall, the IDS fusion method proposed in this paper has achieved better results for three paradoxes than the other studies above.
Due to the problem of D-S evidence this paper illustrated before, for the improved method, the experiment analysis of the bearing data is shown in Figures 17 and 18.
In Figure 17, compared with the D-S, the higher accuracy effect of IDS can be seen from the values of the figure. In order to see the influences of improved method clearly, the difference value between IDS and D-S is shown in Figure 18. When the dimension of features increases, the difference value increases first and then decreases in IF detection, keeps relatively stable and high in OF detection, increases highly and decrease slowly in BF detection. In the largest difference value, the accuracy of the IDS is 1.41%, 1.78% and 2.33% higher than D-S in IF, OF and BF detection. From the results that are compared with D-S evidence theory, the PCC can effectively achieve the weight calculation for different sensors, and the better results can prove the impact of different sensors' data on fault identification once again. Considering overall performance of the three fault types, it's most obvious that IDS shows better performance than D-S in the dimension of 150 and 200, and the results of IDS and D-S in the feature dimension of 150 and 200 are shown in Table 5.
As the Table 5 shows above, the accuracy of the IF, OF and BF is 0.987, 0.991 and 0.977 in the feature dimension of 150; and in the dimension of 200, the accuracy is up to 0.991, 0.993 and 0.985, which is 0.42%, 0.21% and 0.79% higher, respectively, than dimension of 150. The improvement of fusion method on the BF detection is more obvious, the detection accuracy is 2.33% and 2.06% higher than D-S. In the end, a conclusion can be drawn in the bearing fault detection data that PCC plays a significant role in improving D-S by reassigning the weight of sensors.

Conclusions
A method of single fault detection (SFD) of bearings based on SAE and improved D-S evidence theory was proposed in the present study. The compressed feature was auto-extracted by SAE and classified by SVM. Because the output feature dimension of SAE is controllable, that will be useful for analyzing the effectiveness of feature auto-extraction of a single fault. Considering the fault feature information contained in different sensors is different, the weight of different sensors should be corrected. The Pearson correlation coefficient was used to calculate and modify the evidences' weight of the D-S evidence theory, combining the improved fusion method to fuse the SVMs' results. The single fault detection of bearings was realized and improved in its detection accuracy by integrated multi-sensor information. The experimental results showed that features of a single fault can be auto-extracted from multi-fault bearings' data by the automatic feature extraction algorithm and classified by classification algorithm. This single fault detection method showed great advantages compared with the traditional manual analysis to signals. In addition, the dimension of features extracted by SAE has great impact on the results and the good result needs more features be extracted. However, the more features that are extracted, the more time will cost. Finally, the fused results of different sensors by the proposed IDS method show great effects. When the feature dimension is 200, the detection accuracies of inner, outer, and ball faults by the improved method is 11.77%, 11.48% and 14.00% higher, respectively, than the average accuracy of three sensors, and is 0.38%, 2.06% and 0.76% higher than the traditional D-S evidence theory, reaching 99.12%, 99.33% and 98.46%.
As a special idea to research multi-fault bearings, single fault detection (SFD) is an important breakthrough point. Among them, the algorithm of fault feature automatic extraction is an important tendency. In the feature self-extraction method, the measurement of feature dimensions and calculations in this paper needs to be improved in the next study. The fusion method can improve the results to a great extent if enough different data can be obtained. The IDS of this paper rely heavily on manual calculation and the calculation steps are too complex, the further research on D-S evidence theory should simplify the calculation steps while solving the paradox.