Current-Based Bearing Fault Diagnosis Using Deep Learning Algorithms

: Artiﬁcial intelligence algorithms and vibration signature monitoring are recurrent approaches to perform early bearing damage identiﬁcation in induction motors. This approach is unfeasible in most industrial applications because these machines are unable to perform their nominal functions under damaged conditions. In addition, many machines are installed at inaccessible sites or their housing prevents the setting of new sensors. Otherwise, current signature monitoring is available in most industrial machines because the devices that control, supply and protect these systems use the stator current. Another signiﬁcant advantage is that the stator phases lose symmetry in bearing damaged conditions and, therefore, are multiple independent sources. Thus, this paper introduces a new approach based on fractional wavelet denoising and a deep learning algorithm to perform a bearing damage diagnosis from stator currents. Several convolutional neural networks extract features from multiple sources to perform supervised learning. An information fusion (IF) algorithm then creates a new feature set and performs the classiﬁcation. Furthermore, this paper introduces a new method to achieve positive unlabeled learning. The ﬂattened layer of several feature maps inputs the fuzzy c-means algorithm to perform a novelty detection instead of clusterization in a dynamic IF context. Experimental and on-site tests are reported with promising results.


Introduction
Induction motors are present in most industrial processes because of their versatility for many applications, efficiency and robustness to operate in severe conditions. Recent studies have reported that 40% of operational failures caused in these machines are related to damage from bearings, which can be separated into two categories. The first category is the punctual damages that appear on a delimited bearing surface producing an impulsive mechanical vibration. The second category is the distributed damages that produce continuous mechanical vibrations with low magnitude harmonics [1,2].
The most recurrent approach to achieving bearing condition monitoring and damage diagnosis is to acquire vibration-based signals from accelerometers and perform supervised learning algorithms. However, many industrial motors are unable to provide vibration signals because they are installed at inaccessible locations or their housings are inadequate to install new devices. The vibration data acquisition is also expensive, demanding new sensors and devices to transduce, transmit and process [3]. Otherwise, current-based data acquisition is available in most industrial electric motors because the stator current is monitored for control, supply and protection purposes. Consequently, each motor phase is a distinct data source for the machine status because the phases lose symmetry due to disturbances, interferences, noises, intrinsic conditions, bearing damage or other reasons [4,5].
In this context, the wavelet transform (WT) becomes a recurrent tool for signal processing because of its advantage of a multiresolution analysis. The fractional wavelet

of 14
The FS is obtained by interpolating polynomial B-splines. The centered fractional B-splines of degree α are defined by the convolution operator as: and ∆ α * ↔ 1 − e −jw α denotes the symmetric fractional finite difference operator. The Fourier transform of the one-side causal and centered fractional splines are calculated as follows:β α Fractional B-splines satisfy all requirements to construct a wavelet basis for an α > −0.5 given by: where the filters h α + (k), h α − (k) and h α (k) are defined as: The anti-causal (-) is obtained by substituting h α − (k) = h α + (−k). The general approach to orthonormalize the fractional splines generates the scaling function given by: where (a α ϕ (k)) −1/2 is the convolution of the FS sequence. The Fourier transform A α ϕ (w) is defined as: Leading the corresponding two-relation: The low-pass filter and high-pass filter can be written as: Thereby, the behavior of the filter tends to an ideal low-pass and high-pass filter as α → 0 [6,21]. The overlapping group shrinkage (OGS) algorithm reconstructs the denoised signal by observing the wavelet coefficients and performing a convex regularization while minimizing a cost function [22,23]. Therefore, in this paper, the equivalent filter bank denoises the raw signals to input the feature maps.

Long Short-Term Memory
A recurrent neural network (RNN) is a class of ANN that identifies patterns in sequential data. However, an RNN has a few drawbacks for most applications including gradient vanishing and gradient explosion in the backpropagation. Long short-term memory (LSTM) solves the gradient vanishing problem, using a memory cell that improves the RNN units [24,25]. Figure 1 shows a typical LSTM cell.

Long Short-Term Memory
A recurrent neural network (RNN) is a class of ANN that identifies patterns in sequential data. However, an RNN has a few drawbacks for most applications including gradient vanishing and gradient explosion in the backpropagation. Long short-term memory (LSTM) solves the gradient vanishing problem, using a memory cell that improves the RNN units [24,25]. Figure 1 shows a typical LSTM cell. The LSTM gates control the information flow of the current state, the input gate (i) and the output gate (o) [25]. The forget gate (ft) determines how much previous information should be removed or saved as follows: where is a sigmoid function, is the weight, is the current input, ℎ ( ) is the output of the previous cell and is the bias. The input gate determines the behavior of , the previous layer ℎ ( ) and the current state ( ) as follows: The output gate controls the cell information and state as follows: Thus, the LSTM can memorize relevant time-dependent features to discriminate long-time delay events with overlapping low-frequency components [26].

Convolutional Neural Networks
The basic CNN contains three structures that provide feature extraction, perform classification and represent the decision with a probabilistic function. The convolution layer (CL) performs dot products, preserving the spatial structure of the previous layer and output abstract features [27,28]. The convolutional process is described as follows: The LSTM gates control the information flow of the current state, the input gate (i) and the output gate (o) [25]. The forget gate (f t ) determines how much previous information should be removed or saved as follows: where σ is a sigmoid function, w is the weight, z t is the current input, h (t−1) is the output of the previous cell and b f is the bias. The input gate determines the behavior of x t , the previous layer h (t−1) and the current state (Ct) as follows: The output gate controls the cell information and state as follows: Thus, the LSTM can memorize relevant time-dependent features to discriminate long-time delay events with overlapping low-frequency components [26].

Convolutional Neural Networks
The basic CNN contains three structures that provide feature extraction, perform classification and represent the decision with a probabilistic function. The convolution layer (CL) performs dot products, preserving the spatial structure of the previous layer and output abstract features [27,28]. The convolutional process is described as follows: where * is the convolution operation and x l−1 denotes the input data of the previous layer. Each layer consists of n l kernels with a weight matrix k l j and a bias vector b l j . The output of the nonlinear active function f ( * ) is the n l matrices x l j where j = 1 : n l corresponds with n kernels from the layer l. The activation function leaky-Relu has a linear identity for positive values and a slope for negative values to avoid gradient problems. The pooling layer (PL) down-samples the previous CL to control the feature map size and save abstract information. The function max x l j = x l+1 j is the most recurrent down-sampling operation in CNN models. Therefore, the feature map is an independent structure containing successive CLs and PLs that control the size and depth to extract more abstract features. The last PL can input LSTM cells to extract time-dependent features or be transformed into a flattened layer (FL) to input an ANN or other classifier [27,28]. The third structure is the SoftMax probabilistic distribution operator that transforms the classifier output z i into a normalized vector as follows: These three structures are used in different configurations to extract abstract features and improve accuracy classification. The feature maps can also be arranged in parallel with profound and shallow receptive fields to extract features from multiple sources [29].

Fuzzy C-Means Algorithm
The one-dimensional FL with data X = {x 1 , x 2 , . . . x k } inputs the FCM algorithm to divide X into several clusters. The objective function of the FCM is defined as follows: where U = [u ij ] c×k is a membership matrix, m > 1 is the fuzzifier, v i are the prototypes and c is the number of clusters. The solution for updating the partition matrix and the prototypes is given by: A recurrent approach to stop criteria is a threshold between two successive partitions [30]. The fuzzy C-means can perform bearing detection and classification in an unlabeled context, presenting remarkable results with vibration-based signals. The main advantage is that the FCM algorithm allows changes in the regularization, cluster shape, cost function and membership function to improve the performance. The intra-cluster variance can also be minimized by adjusts in the fuzzifiers, keeping an adequate boundary. Indeed, the support vector data description (SVDD) and the one-class SVM present boundary problems including loose boundaries, data rejection and outlier misclassification among others that increase the complexity of the distribution interpretation and classification.
In this paper, the Fisher discriminant ratio (FDR) and Kullback-Leibler divergence (KLD) perform center monitoring and distribution behavior in the PUL context. Initially, an FL from a healthy source inputs the FCM algorithm, configurated in a paradigm to identify two clusters. Therefore, if the input remains healthy, the difference between the centers and distribution divergence must remain constant after successive batches.

Information Fusion
Multiple sources can merge into a new feature set throughout information fusion algorithms (IF) [31,32]. Assuming that s samples, c classes and n independent sources and classifiers are available, the features set from each source X n is represented as follows: Therefore, n feature sets input n CNNs to classify C i classes in each feature set X n . The conditional probability P(·|·, ·) of the class i based on the observation of k CNN on the sample x k j is defined as: All combinations of CNN k are rearranged in the matrix P k with the size c × s. The output of all classifiers is then merged to the (c × n) × s matrix P as follows: The task of analyzing multiple sources becomes a task of classifying the new feature set P. Consequently, the IF approach for the FCM is similar. The flattened vector from n feature maps input n FCM. The s samples are replaced by batches with measures m j (KLD and FDR) to form the feature sets: Therefore, n FCM identifies the healthy class (c = 1) or detects the novelty (c = 2) of X n . The conditional probability P(·|·) of the class C i based on the observation of k FCM on the measure m k j is defined as: This new approach leads to similar P k and P matrices of Equation (10), which depends on the FCM performance, batch size and number measures.

Datasets
The tests were performed with every current-based signal developed by the Chair of Design and Drive Technology from the University of Paderborn in Germany containing the current-based signals from an induction motor. Two current probes acquired signals from the test rig with a sampling frequency of 64 kHz, a rotor speed of 900 rpm and 1500 rpm (N09 and N15) and loading conditions of 0.1 Nm and 0.7 Nm (M01 and M07). The classes of these damages were healthy and incipient, distributed and punctual damages [33]. In this test rig, the bearings were located externally from the induction motor to extract more sensitive vibration-based information. However, the internal bearings produced a few effects in data distribution that might cause misclassification in machine learning algorithms that learn from external bearing damage.
This work also used the test rig available at CISE, Electromechatronic Systems Research Centre at the University of Beira Interior in Portugal, to acquire bearing damaged currentbased signals. The test rig consisted of an inverter-fed three-phase squirrel-cage induction motor, a programmable AC power source of 0~300 V, 12 kVA, 192 Amps, 15~1.2 kHz (Chroma), a data acquisition device USB-6366 (National Instruments) and a mechanical system that provided a stable load with speed control. Two current probes sent the stator currents to the acquisition board with a sampling frequency of 44 kHz, producing samples with a rotor speed of 1800 rpm and loading conditions of 0.1 Nm and 0.7 Nm. The first CISE damaged bearing had an incipient punctual damage in both rings caused by electrical discharges. The inner ring damage diameter was 1.5 mm and the spheres and cage remained intact. The outer ring had two opposite damages with diameters of 2.0 mm and 1.5 mm. This type of damage is common in industrial machinery but it was absent in the Padeborn dataset. The second CISE damaged bearing had punctual damage (hole) of 2.0 mm in the outer ring. Different from the Padeborn test rig, the CISE test rig inserted the damaged bearing at the fan on the drive-end side of the induction motor.

Pre-Processing
All stator phases (R1 and R2) of each bearing damage from both datasets were denoised with a FWT and reconstructed with the OGS algorithm to generate F1 and F2 signals. Several signal segments were then rearranged in a square matrix t × t to convert 1-D signals into grayscale images (base2). In this work, the segment t was defined by the lower motor speed that produced (64,000× 60)/900 samples per revolution ( 4266). Therefore, t 2 = 4096 samples produced the normalized gray images with a size of 64 × 64. The sets of gray images from R1, R2, F1 and F2 sources inputted two independent arranges (A1 and A2) of feature maps with a profound and shallow receptive field. Table 1 summarizes the profound configuration, which was a conventional feature map with four CLs and PLs. The feature map with a profound receptive field (A1) consisted of successive CLs and PLs to extract deeper abstract features. The kernel size of the CLs reduced, concentrating the abstract information into more compact structures, increasing the number of kernels. This procedure allowed the extraction of more abstract features with different kernel configurations. The PL controlled the output size through down-sampling operations while grouping relevant information, allowing the CL to increase the kernel number. The last PL inputted an FL with a 1 × 160 dimension. Thus, this feature map was capable of extracting abstract information at each CL, increasing the number of kernels to diversify the feature type.
The shallow receptive field (A2) was a feature map with two successive CLs and PLs and LSTM cells to extract long time-dependent features. In this configuration, the last cell of LSTM 2 also inputted an FL. Table 2 summarizes the shallow configuration.
This feature map consisted of a particular arrangement (A2) to extract shallow abstract features within a large receptive field. The CL and PL controlled the feature map size, avoiding deep features and allowing diversity in the kernels. The PLs reduced the feature map size, revealing inner relations in each kernel. PL 2 inputted the LSTM cells that behaved as recurrent neural networks, saving relevant time-dependent features to input an  In summary, the multiple sources (4) inputted each feature map arrangement (2) to generate eight independent FLs. The supervised learning was performed by eight ANNs with a stochastic gradient descent, a learning rate at 0.0015, momentum at 0.5 and L2 regularization. The training set contained 1000 samples for each class (4000 in total) while the test set had 250 samples for each class. Each SoftMax had four outputs corresponding with each class. The IF unified the output from each SoftMax into the P matrix and a support vector machine performed the classification task. All possible four class combinations with different severity indexes were performed and the results were presented in terms of average accuracy.
Otherwise, in the PUL context, the objective was to identify incipient bearing damage using the KLD and FDR measures from FCM algorithms. Therefore, all possible combinations for healthy versus damaged signals with A1 and A2 arranges were performed.

Supervised Learning
This research compared R1 and R2 and F1 and F2 performance to verify the effectiveness of the IF. The average accuracy of three operation conditions (N15M01, N09M07, N15M07) is summarized in Table 3. The performance of F1 and F2 reached a similar accuracy of the IF with the four signals. Considering the implementation aspect, one can choose to fuse the F1 and F2 sources instead of performing the four sources (IF) to reduce the computational efforts, keeping a high accuracy performance. However, all tests performed in this paper were conducted with four sources in an IF context. Condition N15M07 improved the accuracy for each type of damage while the other two conditions decreased the average. An SVM with a linear kernel and a soft margin approach performed the P matrix classification. The setup of hyperparameters, training, stop criteria and kernel configurations were omitted for the sake of brevity. The polynomial and Gaussian kernels performed similar results although with more convergence time.

Unlabeled Learning
Five recent FCM algorithms were performed to identify novelty in a one-class PU context. The first was the FCM with a genetic optimization (FCM-GO) algorithm that searched for a suboptimal solution [34]. The Gustafson-Kessel (GK) clustering algorithm employed the Mahalanobis distance to update centers and proto-clusters. The FN-DBSC could be characterized by a convex function with a particular set of hyperparameters [35]. The FCM with a focal point (FCMFP) introduced a regularization term into the loss function [35]. Lastly, the Gath-Geva (GG) clustering is an extended version of the FCM that performed the previous detection of sizes and densities of clusters [36]. These fuzzybased algorithms could perform novelty detection in the PUL context with an appropriate initialization method.
Assuming that two clusters were present in the PU data distribution, a previous batch (−τ, t 0 ) defined the centers and boundaries of these pseudo-clusters. In parallel, the KLD and FDR measured the distribution and the center behavior. A successive batch with current data (t 0 , −τ, ) was then used to calculate two new pseudo-clusters. The comparison between the KLD and FDR of previous and current batches identified the changes in the PU data. Figure 2 resumes the cluster behavior of the FCM algorithms in a one-class PU novelty detection paradigm.
Energies 2021, 14, x FOR PEER REVIEW 9 of 15 of hyperparameters, training, stop criteria and kernel configurations were omitted for the sake of brevity. The polynomial and Gaussian kernels performed similar results although with more convergence time.

Unlabeled Learning
Five recent FCM algorithms were performed to identify novelty in a one-class PU context. The first was the FCM with a genetic optimization (FCM-GO) algorithm that searched for a suboptimal solution [34]. The Gustafson-Kessel (GK) clustering algorithm employed the Mahalanobis distance to update centers and proto-clusters. The FN-DBSC could be characterized by a convex function with a particular set of hyperparameters [35]. The FCM with a focal point (FCMFP) introduced a regularization term into the loss function [35]. Lastly, the Gath-Geva (GG) clustering is an extended version of the FCM that performed the previous detection of sizes and densities of clusters [36]. These fuzzy-based algorithms could perform novelty detection in the PUL context with an appropriate initialization method.
Assuming that two clusters were present in the PU data distribution, a previous batch (− , ) defined the centers and boundaries of these pseudo-clusters. In parallel, the KLD and FDR measured the distribution and the center behavior. A successive batch with current data ( , − , ) was then used to calculate two new pseudo-clusters. The comparison between the KLD and FDR of previous and current batches identified the changes in the PU data. Figure 2 resumes the cluster behavior of the FCM algorithms in a one-class PU novelty detection paradigm. Thus, the KLD and FDR measures could identify changes in the data distribution to perform novelty detection. In this case, a healthy bearing signal produced small changes in these measures because their outliers and noises were uncorrelated with bearing damage. Consequently, when damage arose, the previous cluster contained data from the healthy bearing (− , ) while the current cluster contained data from the damaged bearing. This discrepancy produced the center movement and divergence in distributions because the clusters acquired data from the same signal in different conditions. Indeed, healthy bearing distributions can be described as symmetric alpha-stable probability density functions (PDFs) and damaged bearing distributions can be described as non-symmetrical alpha-stable PDFs with elongated, exponential or dense tails, which depend on the damage type and location. That is the principal advantage of KLD, which can monitor this complex distribution computing the PDF with numerical methods. Furthermore, the severity of the failure induced more significant changes in the distribution and center behavior. The relative distance between centers provided a measure to monitor Thus, the KLD and FDR measures could identify changes in the data distribution to perform novelty detection. In this case, a healthy bearing signal produced small changes in these measures because their outliers and noises were uncorrelated with bearing damage. Consequently, when damage arose, the previous cluster contained data from the healthy bearing (−τ, t 0 ) while the current cluster contained data from the damaged bearing. This discrepancy produced the center movement and divergence in distributions because the clusters acquired data from the same signal in different conditions. Indeed, healthy bearing distributions can be described as symmetric alpha-stable probability density functions (PDFs) and damaged bearing distributions can be described as non-symmetrical alpha-stable PDFs with elongated, exponential or dense tails, which depend on the damage type and location. That is the principal advantage of KLD, which can monitor this complex distribution computing the PDF with numerical methods. Furthermore, the severity of the failure induced more significant changes in the distribution and center behavior. The relative distance between centers provided a measure to monitor the bearing damage evolution, quantifying the severity. Therefore, early bearing damage detection can be extended to damage severity monitoring.
In this paper, the multiple sources and arranges (A1 and A2) created the FLs that inputted eight FCM algorithms to measure the KLD and FDR and create the P matrix. The cumulative summation of each KLD and FDR, combined with changes in the P, performed earlier bearing damage detection in PUL. Table 4 presents the average performance of algorithms from healthy versus earlier bearing damage identification under different load and speed conditions. These algorithms presented a similar performance in a one-class PU context, confirming that it was challenging to identify incipient bearing damage with a stator current under several operating conditions. Indeed, the identification of distributed and punctual damage was performed with superior accuracy in the N15M07 condition but the results were omitted for the sake of brevity. This research also performed these algorithms with time, frequency and time-frequency features and the accuracy reached 88% in the best-case scenarios. Furthermore, the performance of these FCM algorithms was similar to the supervised learning approach of Table 3, attesting that the experimental tests presented a promising result.
In this context, both approaches (CNN-IF and FCM-IF) achieved a high accuracy in condition monitoring and bearing damage identification because of the FWT and LSTM, allowing that conventional techniques (e.g., a kurtogram and spectral envelope) provided the damage location (inner ring, outer ring or spheres). Indeed, well-known methods could predict the location of the punctual and distributed bearing damage with a high accuracy by a vibration signal analysis [37,38]. However, considering current-based signals, it was non-trivial to extract the relevant information without performing an adequate denoise technique (FWT) or monitoring relevant long-time behavior (LSTM). A remarkable example is that the FCM-IF could detect a novelty in current-based signals (e.g., a change in distribution) with insufficient information (e.g., harmonics buried in noises) to predict the location with a kurtogram or spectral envelope.

On-Site Tests
On-site tests were conducted in a wastewater pump driven by an electric motor at a gas processing facility ( Figure 3). Initially, the supervised learning was achieved with the historical data, allowing the training and testing of the CNN-IF algorithm with two incipient bearing damage samples caused by wear and pitting, three punctual damages (electrical discharge, scratches and pitting with low severity) and two distributed damages.
This motor operated in two predominant speed conditions of 1500 rpm and 1800 rpm with a variable load that depended on process demand without vibration condition monitoring. The training accuracy reached around 92.15%, 95.26% and 93.08% for incipient, punctual and distributed damage identification, presenting similar results according to Table 3. In these tests, the data acquisition avoided the load transient, interrupting the training until the process (wastewater process) reached a more stable and stationary regime. This approach reduced the misclassification of the supervised algorithm.
The CNN-IF algorithm ran in real-time for sixteen weeks, performing bearing damage monitoring in both speed conditions with a variable load until the detection of an incipient distributed damage caused by wear. The kurtogram and the spectral envelope analysis using the R1 and R2 current-based signals were able to identify the same damage 48 h later. Indeed, the low magnitude harmonics, the poor SNR and the loss of information in the magnetic field reduced the performance of these approaches. Thus, the CNN-IF could perform transfer learning from test benches to on-site historical data (target source), saving the relevant inner structure to retrain partially with on-site data if available. It was also possible to perform transfer learning between similar on-site machines. The CNN-IF algorithm ran in real-time for sixteen weeks, performing bearing damage monitoring in both speed conditions with a variable load until the detection of an incipient distributed damage caused by wear. The kurtogram and the spectral envelope analysis using the R1 and R2 current-based signals were able to identify the same damage 48 h later. Indeed, the low magnitude harmonics, the poor SNR and the loss of information in the magnetic field reduced the performance of these approaches. Thus, the CNN-IF could perform transfer learning from test benches to on-site historical data (target source), saving the relevant inner structure to retrain partially with on-site data if available. It was also possible to perform transfer learning between similar on-site machines.
The real-time test in a one-class PU context was then conducted with FCM-IF algorithms to perform early bearing damage detection in a centrifugal pump driven by the electric motor presented in Figure 4. In this case, the CNN-IF method was inviable because only two bearing damages caused by wear were reported in two years of historical data. This industrial motor pump was the main machine at this facility, running at 1800 rpm with variable loading that depended on processing demand. The motor condition monitoring was performed by current envelope signatures while an automatized protection system prevented high levels of vibration and current. Therefore, there was no vibration-based condition monitoring or other dedicated systems to perform an independent bearing damage analysis. Figure 5 present the behavior of the most sensitive KLD, FDR and FDR moving average (FDR-MA) of the F1 source and the The real-time test in a one-class PU context was then conducted with FCM-IF algorithms to perform early bearing damage detection in a centrifugal pump driven by the electric motor presented in Figure 4. In this case, the CNN-IF method was inviable because only two bearing damages caused by wear were reported in two years of historical data. This industrial motor pump was the main machine at this facility, running at 1800 rpm with variable loading that depended on processing demand. The CNN-IF algorithm ran in real-time for sixteen weeks, performing bearing damage monitoring in both speed conditions with a variable load until the detection of an incipient distributed damage caused by wear. The kurtogram and the spectral envelope analysis using the R1 and R2 current-based signals were able to identify the same damage 48 h later. Indeed, the low magnitude harmonics, the poor SNR and the loss of information in the magnetic field reduced the performance of these approaches. Thus, the CNN-IF could perform transfer learning from test benches to on-site historical data (target source), saving the relevant inner structure to retrain partially with on-site data if available. It was also possible to perform transfer learning between similar on-site machines.
The real-time test in a one-class PU context was then conducted with FCM-IF algorithms to perform early bearing damage detection in a centrifugal pump driven by the electric motor presented in Figure 4. In this case, the CNN-IF method was inviable because only two bearing damages caused by wear were reported in two years of historical data. This industrial motor pump was the main machine at this facility, running at 1800 rpm with variable loading that depended on processing demand. The motor condition monitoring was performed by current envelope signatures while an automatized protection system prevented high levels of vibration and current. Therefore, there was no vibration-based condition monitoring or other dedicated systems to perform an independent bearing damage analysis. Figure 5 present the behavior of the most sensitive KLD, FDR and FDR moving average (FDR-MA) of the F1 source and the The motor condition monitoring was performed by current envelope signatures while an automatized protection system prevented high levels of vibration and current. Therefore, there was no vibration-based condition monitoring or other dedicated systems to perform an independent bearing damage analysis. Figure 5 present the behavior of the most sensitive KLD, FDR and FDR moving average (FDR-MA) of the F1 source and the FCM-GO algorithm. In this case, the bearing damage was caused by wear and the early detection occurred at sample 240 by either the KLD or FDR. The current envelope signature identified the same damage at sample 282, approximately 50 h of difference.
FCM-GO algorithm. In this case, the bearing damage was caused by wear and the early detection occurred at sample 240 by either the KLD or FDR. The current envelope signature identified the same damage at sample 282, approximately 50 h of difference. Indeed, the most sensitive FDR and KLD presented a drastic change around 200 samples, indicating that the distribution was becoming different and that the centers were moving in a new pattern. It was possible to identify these changes with the FDR and KLD because the FWT extracted relevant information and the LSTM saved the abstract longtime behavior from the healthy signal. Moreover, it was difficult to detect incipient wear by analyzing current-based signals with a kurtogram or spectral envelope. The distributed damage information produced low magnitude harmonics and energy information that were buried into noise due to a poor SNR.
Furthermore, every FCM related to this work was performed in real-time with this electric motor. The results were similar to Figure 5, surpassing the current envelope signature performance with an average difference of 50 h. Thus, the performance of this approach was independent of the FCM-IF choice but depended on sources, feature maps, measures and the initialization method. After the bearing damage detection (novelty detection), the clusters moved apart gradually because the successive data (damage versus damage evolving) produced a similar center and distribution. This effect occurred after 300 samples. Furthermore, a few slight variations in the KLD and FDR indexes might indicate that the severity evolved. Both on-site motors were driven by inverters but this methodology could be also applied in line-connected motors.

Conclusions
This research introduced the challenges of current-based condition monitoring and an early bearing damage diagnosis. Classic methods in supervised learning context that extract features in time, frequency and the time-frequency domain provided a high accuracy in a vibration-based analysis. However, these methods were insufficient for currentbased approaches due to a poor SNR and low magnitude harmonics. The principal drawbacks for current-based bearing condition monitoring are the poor SNR, the loss of information in the magnetic field, saturation harmonics, electrical faults, interference and indirect measures, among others. Consequently, the traditional signal processing techniques that denoise and extract information from vibration-based signals had a lower performance in the current-based analysis. Current-based bearing condition monitoring has less available information (e.g., indirect measure) and more feature extraction complexity (e.g., a poor SNR). Thus, this paper introduced two new approaches with denoise methods and machine learning to detect incipient bearing damage by current-based signals with a high accuracy. Indeed, the most sensitive FDR and KLD presented a drastic change around 200 samples, indicating that the distribution was becoming different and that the centers were moving in a new pattern. It was possible to identify these changes with the FDR and KLD because the FWT extracted relevant information and the LSTM saved the abstract long-time behavior from the healthy signal. Moreover, it was difficult to detect incipient wear by analyzing current-based signals with a kurtogram or spectral envelope. The distributed damage information produced low magnitude harmonics and energy information that were buried into noise due to a poor SNR.
Furthermore, every FCM related to this work was performed in real-time with this electric motor. The results were similar to Figure 5, surpassing the current envelope signature performance with an average difference of 50 h. Thus, the performance of this approach was independent of the FCM-IF choice but depended on sources, feature maps, measures and the initialization method. After the bearing damage detection (novelty detection), the clusters moved apart gradually because the successive data (damage versus damage evolving) produced a similar center and distribution. This effect occurred after 300 samples. Furthermore, a few slight variations in the KLD and FDR indexes might indicate that the severity evolved. Both on-site motors were driven by inverters but this methodology could be also applied in line-connected motors.

Conclusions
This research introduced the challenges of current-based condition monitoring and an early bearing damage diagnosis. Classic methods in supervised learning context that extract features in time, frequency and the time-frequency domain provided a high accuracy in a vibration-based analysis. However, these methods were insufficient for current-based approaches due to a poor SNR and low magnitude harmonics. The principal drawbacks for current-based bearing condition monitoring are the poor SNR, the loss of information in the magnetic field, saturation harmonics, electrical faults, interference and indirect measures, among others. Consequently, the traditional signal processing techniques that denoise and extract information from vibration-based signals had a lower performance in the currentbased analysis. Current-based bearing condition monitoring has less available information (e.g., indirect measure) and more feature extraction complexity (e.g., a poor SNR). Thus, this paper introduced two new approaches with denoise methods and machine learning to detect incipient bearing damage by current-based signals with a high accuracy.
Therefore, the first contribution of this paper was the development of the fractional wavelet B-spline to denoise two phases of the stator current, taking advantage of multiple source analyses. The feature maps of CNNs then extracted profound and shallow features from each source while the shallow map contained LSTM cells that identified long time-dependent behavior. The ANN and SoftMax performed the classification and the information fusion algorithm merged each SoftMax classification into a new matrix. This approach addressed the multiple source information fusion problem to a supervised classification task. Indeed, this contribution improved the accuracy of current-based approaches because two arrangements of feature maps extracted more relevant and abstract features with different receptive fields from multiple sources.
The acquisition of a labelled database is unfeasible in most industrial applications because industrial motors are prevented from performing their functions under damaged conditions. Therefore, the second contribution of this work used multiple sources in two arrangements of feature maps and several FCM algorithms to perform bearing damage identification in a one-class positive unlabeled context. This new approach calculated the KLD and FDR from successive FCM batches to input an information fusion algorithm that merged these measures into a new matrix to perform bearing condition monitoring and early damage identification.
Experimental tests with Paderborn and CISE datasets were performed with the most representative type of damage and severity under several operation conditions with FW-CNN-IF and FW-FCM-IF algorithms. Both contributions presented remarkable results for incipient and distributed damage detection by current-based signals. Furthermore, on-site tests were performed in a gas processing facility and these algorithms surpassed the harmonic and envelope spectrum analysis every time.