Gearbox Failure Diagnosis Using a Multisensor Data-Fusion Machine-Learning-Based Approach

Failure detection and diagnosis are of crucial importance for the reliable and safe operation of industrial equipment and systems, while gearbox failures are one of the main factors leading to long-term downtime. Condition-based maintenance addresses this issue using several expert systems for early failure diagnosis to avoid unplanned shutdowns. In this context, this paper provides a comparative study of two machine-learning-based approaches for gearbox failure diagnosis. The first uses linear predictive coefficients for signal processing and long short-term memory for learning, while the second is based on mel-frequency cepstral coefficients for signal processing, a convolutional neural network for feature extraction, and long short-term memory for classification. This comparative study proposes an improved predictive method using the early fusion technique of multisource sensing data. Using an experimental dataset, the proposals were tested, and their effectiveness was evaluated considering predictions based on statistical metrics.


Introduction
In recent years, the industry has undergone significant development requiring the use of increasingly complex rotating machinery [1] that needs to be monitored and maintained to avoid unplanned shutdowns [2]. Condition-based maintenance (CBM) is, therefore, the tool of choice for monitoring rotating machines' state of health [3]. In this context, a CBM strategy includes failure detection, diagnosis, and prognosis to estimate the remaining useful life [4].
Rotating-machine diagnosis can be carried out using model-or data-based approaches [5]. While model-based techniques require the use of accurate models including machine parameters, signal-based approaches have the advantage to be driven by data without any prior knowledge about the monitored system. Signals reflecting the rotating machine's state of health need to be acquired, however [6]. Acquired signals are further processed to extract useful information from noisy signals, which are used for failure detection and diagnosis [7].
Signal processing is ensured using different types of approaches, such as time-, frequency-, and time-frequency-domain analyses [8,9]. In this context, time-frequency is considered to be the analytical approach of choice, particularly for nonstationary signals, because of its ability to simultaneously capture features from the time and frequency domains [10]. In this field, signal-processing techniques are often combined with artificialintelligence tools [11], to automate the diagnostic process and minimize human involvement [12,13]. In terms of artificial intelligence, machine learning is the solution of choice • comparative study between two methodologies for gearbox diagnosis based on LPC-LSTM and MFCC-CNN-LSTM. This study highlights key features of technique suitability in an industrial context, particularly Industry 4.0; • the use of multisensor data fusion (early fusion) to improve diagnostic reliability of the above-considered methodologies. In this context, the proposed early fusion-based fault diagnosis methodology clearly decreases training time and the data amount for storage, and improves accuracy.
The proposed methodologies were tested using a dataset collected from a specifically developed test rig, and evaluated by diagnostic metrics to highlight their industrial application interest.
This paper is organized as follows. Section 2 presents the theoretical background of the proposed methodologies. Section 3 evaluates the methodologies on the basis of an experimental dataset. A conclusion and future prospects end the paper.

Linear Prediction Coefficients
As measured rotating-machinery signals are often nonstationary and can be highly noisy, there is a clear need for increasingly efficient signal-processing techniques to improve failure-diagnosis accuracy [32,33]. In this context, LPC, widely used especially in speech recognition for signal analysis and feature extraction, is an interesting option for investigating failure diagnosis signal processing.
LPC is based on the fact that each sample S(n) can be written as a sum of P pastelement s(n − k), weighted with model parameters a k and added to a residual term Gu(n), as follows [34,35]: otherwise, Equation (2) can be reformulated into the frequency domain into a digital filter: Estimating S(n) can be performed by a linear approximation of the previous p samples: Prediction-coefficient determination is based on minimizing the error between the original and approximated signals: Obtained coefficients a k are the image of the processed signal that carries discriminating information among different classes. These coefficients are the inputs of the learning network.

Mel-Frequency Cepstral Coefficients
This signal-processing technique first consists of windowing signal into samples to be as close as possible to a stationary signal. Each sample is then processed by discrete Fourier transform (DFT). Signals are then filtered to extract each level's information. The mel-frequency spectrum uses triangular windowing that allows for calculating the energy logarithm in each filter, as shown in Figure 3. Applying a discrete cosine transform on mel-log-power allows for lastly calculating the cepstral coefficients [22].

Convolutional Neural Network
While several algorithms are used for feature extraction, CNNs are effective in many application domains ranging from medicine to object detection. CNNs are primarily composed of a succession of convolutional layers using different filter sizes to generate features and pooling (max and average) layers using a nonlinear downsampler to extract local features [36].
In this work, a 2D-CNN is proposed for feature extraction from MFCC spectral images to distinguish between different gearbox failures.

Long Short-Term Memory
As CNNs are generally unable to learn features from nonstationary signals such as vibratory measurements, RNNs were introduced [16]. They, however, suffer from gradient vanishing at the training end. To tackle this issue, LSTM RNNs are the new variant.This allows for controlling the generated information flow, and solves the gradient-vanishing issue with dynamic learning features [13].
LSTM gate equations are formulated as follows [16]. Input gate: Forgetting gate: Output gate: Next LSTM state: where σ and tanh are the sigmoid and hyperbolic tangent activation functions, respec- are the (input, recurrent, and bias) learnable (input, update, forget, and output) weights, respectively, where N denote the size of the hidden layer per LSTM cell, and M is the feature size. x t is the current input, h t−1 and h t are the previous and actual hidden state, and C t−1 and C t are the previous and actual memory cell value. Equations (6) to (11) manage the flow of information in an LSTM node ( Figure 4).

Evaluation and Classification
The proposed methodologies' last step is failure diagnosis based on the abovedefined networks. Classifications are assessed using two criteria, accuracy and confusion matrix [37], where accuracy is used for a general evaluation, and the confusion matrix is used for the detailed evaluation of each fault.

Experimental Test Bench and Dataset
For validation purposes, a specific test bench, namely, HTM90, including gearbox and bearing failures, was used ( Figure 5). This is dedicated to the emulation of mechanical faults in rotating machines (gear, rolling, misalignment, etc.). It mainly consists of a motor, gearbox, and various healthy and faulty components to carry out fault-detection and -diagnosis tests. To build the dataset, signals were acquired through three prepolarized piezoelectric 4188-C-001 microphones from Bruël and Kjaer (radial-vertical (RV), axialhorizontal (AH), and radial-horizontal (RH)). Another channel was devoted to a tachometer. The electrical signal of the microphones was acquired using a Bruël and Kjaer 3050-A-060 acquisition board, which has 6 LEMO7-pin channels and a maximal sampling frequency of 50 kHz.
The testing procedure consisted of the following steps: (1) three microphones were connected to the acquisition board an using 7-pin connector cable (AO-0414); (2) the microphones' technical characteristic specification (sensor type, sensitivity, etc.) was used in the Bruël and Kjaer Pulse Labshop software; (3) lastly, acquisition frequency was set to 25.6 kHz. The main bench components and specifications were: (1) DC motor (Baldor AP7422, type 2424P, 0.25HP, 3450 rpm), and (2) speed was set to 1500 rpm (25 Hz) thanks to a tachometer connected to a digital display (speed control). The motor was connected to a drive shaft supported by a rolling platform by flexible coupling, and similarly on the other side of the shaft connected to the gearbox. This gearbox consisted of a single gear stage supported by four bearings, as shown in Figure 5.
Tests were performed at room temperature (25°C) with lubrication after each installation. The used bearings had the following specifications: 1621-RS, 12.7 mm inner diameter, 34.925 mm outer diameter, and 11.112 mm width. Healthy and faulty (inner race failure) bearings are illustrated by Figure 6. The used spur gears were Boston Gear YD54A (20°p ressure angle, 54 teeth) and YD18-3/4 (20°pressure angle, 18 teeth) for gearbox input and output, respectively, as shown in Figure 7 (healthy gear); Figure 8 shows the used faulty gear.
Recording began after microphone installation over a 500 mm radius of the gearbox for each configuration shown in Table 1, on the three directions, namely, RV, AH, and RH.   A 40 s recording was adopted for each failure; each recording was split into 0.5 s pieces leading to a total of 80 samples for each failure. The test bench allowed for emulating 12 failures by combining four gear states (healthy, broken side, broken tooth, and notched) with three bearing states (healthy, inner race failure, and rusty bearing), as shown in Table 1. Samples of obtained signals from each failure class simulation are shown in Figure 9. These signals were later processed using MATLAB (from Matworks, licenced to Ecole Militaire Polytechnique, Algiers, Algeria).  This framework is acoustical fault diagnosis, which has several advantages over other monitoring techniques, such as vibration and current. Among these advantages are the following: (1) noncontact measuring, which can be useful in harsh and severe environments (e.g., high temperatures and corrosion) [38,39]; (2) cheap and practical technique to deploy compared to vibration-or current-based monitoring [39,40]; (3) machine diagnosis is often preceded by fault-source location by a microphone array. It is then easier to use a few microphones for diagnostic purposes [41].

LPC-LSTM-Based Failure-Diagnosis Methodology
All the above-mentioned samples were processed by LPC to estimate the first 15 signal coefficients for the 12 considered failures, as shown in Figure 10. Afterwards, the obtained coefficients fed the LSTM network for learning. This step allowed for identifying common features between samples of the same class and feature-discriminating classes. LSTM failure learning and classification are illustrated by Figure 11. The considered network consisted of four layers: the first for input data, a 100-node LSTM layer, a 10-node fully connected layer, and a softmax layer for classification. Regarding training, the used options were: max epochs, 100; minibatch size, 27; and initial learning rate of 0.001 with a drop factor of 0.6 every 30 epochs with the Adam solver.

LPC-LSTM Methodology Results and Evaluation
Specific data issued for the experimental dataset were used for testing. In this case, the three microphones' prediction assessments are illustrated in Figures 12-14 in terms of confusion matrix, and in Table 2 in terms of accuracy.       The achieved results showed quite interesting performance, with around 90% accuracy. When analyzing the confusion matrices, two misclassification types were found. The first concerned misclassified classes in one microphone, but perfectly classified in the two others. The case of the 6th failure that was perfectly classified in the first and third microphones, and misclassified 6/24 samples in the second microphone. The same applied to the 9th failure, giving 24/24 for the first and second microphones, and missing 6/24 samples for the third microphone. This led to the important conclusion that misclassifications by one microphone can be perfectly retrieved by the others.
The second misclassification type concerned failed samples in each class. For example, in the 8th class, there were 4/24 failed samples in the first microphone, of which 3/24 were in the 2nd class, while 1/24 in the 11th class. On the other hand, the third microphone failed 4/24, of which 2/24 were in the 9th class, while 2/24 others were in the 11th class. Another example concerned the 12th class, where the second microphone missed 1/24 in the 3rd class, 1/24 in the 7th class, and 1/24 in the 2nd class. On the other hand, the third microphone missed 1/24 in the 5th class, and 5/24 in the 11th class. This second type of misclassification allowed for us to highlight that samples missed in a microphone are not necessarily those missed in another.
These two types of analysis allow for concluding that classification performance could be improved by merging data from different microphones.

MFCC-CNN-LSTM-Based Failure-Diagnosis Methodology
MFCC is proposed for investigation, as it is specifically efficient for processing acoustic signals, which was the case of the used gearbox-failure dataset.
In this context, with a sampling frequency of 25.6 kHz, MFCC 2D spectral image outputs, illustrated in Figure 15, were used as CNN inputs for feature extraction. The used convolutional network consisted of a succession of layers, as shown in Figure 16, with a 14 × 48 sized 2D input layer, and a convolutional layer with stride and padding equal to 2 and 1, respectively. To enhance learning, a batch-standardization layer was used to ensure that the characteristics are in the same range. A ReLU layer was then used to cancel values below zero and obtain an output between 0 and 1. Before learning began, a flattened layer was used to align the resulting image in vector form. On this level, a specific architecture is proposed to enhance failure-diagnosis results. Convolutional operations of the above-mentioned step results are proposed. The proposed network architecture consisted of 3 layers: LSTM with 10 nodes superimposed on a fully connected layer of 12 nodes, and a softmax layer. Regarding training, the used options were an Adam optimizer, learning rate of 0.001, and minibatch size set at 27, computed on a CPU with a learning-rate drop factor of 0.6 every 30 epochs.

MFCC-CNN-LSTM Methodology Results and Evaluation
The achieved accuracy results given in Table 3 highlight the improvement brought by MFCC (about 7%) compared to that of the LPC-LSTM methodology. Confusion-matrix analysis in Figures 17-19 confirmed the better classification tendency of the failure majority because of MFCC spectrum representation providing more time and frequency details from nonlinear and nonstationary signals [29], in addition to CNNs, which are known for their strong ability to extract useful features.      Figure 18. Confusion matrix (2nd microphone).   Figure 19. Confusion matrix (3rd microphone).
Despite the improvement in accuracy, this approach deals with computational-burden issues related to the convolutional layers' slow training [42] due to successive convolutional operations during training (convolution, pooling, etc). This drawback limits convolutional networks' usefulness for real-time diagnosis. In addition, the amount of data to be managed by a CNN is very important. It typically consists of 14 × 48 elements for spectral MFCC images against the 15 coefficients obtained by LPC, in addition to multiplying the number of images generated at each convolutional layer using different filters. This large amount of data can lead to memory saturation and thereby block the monitoring process, especially when monitoring several systems at the same time. Therefore, and according to confusion-matrix analysis (Figures 12-14) and the disadvantages of the MFCC-CNN-LSTM approach, multisensor data fusion was adopted to improve the obtained results using the LPC-LSTM approach.

LPC-LSTM Early Fusion-Based Failure Diagnosis
A machine-learning literature review for classification or regression highlights penalizing a technique over another for accuracy enhancement. Analysis of other metrics such as the confusion matrix helps in improving the prediction results, with simple methods such as multichannel data fusion [20,43].
The main objective of this study was to show the effectiveness of early fusion for failure-diagnosis performance enhancement. In this context, signal merging allows for extracting discriminant features between different obtained classes from different sensors. This leads to better prediction results than those by separately using each signal. Early fusion is a machine-learning solution where fusion is ensured when training a learning network. This allows for collecting a set of features related to each class from input signals while leading to better efficiency and a higher confidence.
In this context, the three microphones' signals are processed by LPC, as shown in Section 3.2, and the 45 obtained coefficients from the three signals (15 from each signal) are input to the learning network as shown in Figure 20. Learning then allows for the discriminating selection of the features of each class (each microphone). The obtained confusion matrix after fault diagnosis is shown in Figure 21, which clearly highlights the benefit of using early fusion, as 100% failure-diagnosis accuracy is achieved, compared to less than 90% for the same signals used separately, as shown in Table 4.

Microphone 1
Microphone 3 Microphone 2      The achieved results clearly show the value of multisensor data fusion compared to that of a monosensor approach. This is mainly due to the difficulty of determining the monomicrophone optimal position to capture the maximal amount of information, especially without prior knowledge of the likely fault source. In addition, for complex machines, there may be interferences from multiple faults. These interferences influence microphones in different ways depending on the orientation and the distance from the sources of interfering faults [38,44].
The data-fusion technique based on LPC-LSTM led to encouraging results compared to those of other techniques. This is due to the small amount of postprocessing data (15 coefficients) compared to the original signal size or a transform giving a signal of significant length, such as the spectrum used for fusion in [45]. In addition, convolutional steps suffering from slow training speed [42] are not required, such as in the case of the MFCC-CNN-LSTM approach and image fusion in [46].

Conclusions
This paper provided a comparative study of two machine-learning-based approaches for gearbox failure diagnosis. The first used linear predictive coefficients for signal processing and long short-term memory for learning, while the second was based on mel-frequency cepstral coefficients for signal processing, a convolutional neural network for feature extraction, and long short-term memory for classification. In this context, the objective was to clearly highlight the importance of signal processing before learning. In addition to highlighting the advantage of using mel-frequency cepstral coefficients to enhance failure-diagnosis accuracy, there is room to further improve accuracy using multisensor data fusion. Indeed, this allows for reducing the interpretation time of each result of microphone diagnosis, in addition to improving diagnostic reliability and accuracy.
The proposed gearbox failure diagnosis methodologies were evaluated using an experimental dataset built from a specific test bench with gearbox and bearing failures.
Future investigations will focus on the optimization of learning-network hyperparameters to decrease training time and increase the number of diagnosed failures.