Fault Diagnosis System for Induction Motors by CNN Using Empirical Wavelet Transform

: Detecting the faults related to the operating condition of induction motors is a very important task for avoiding system failure. In this paper, a novel methodology is demonstrated to detect the working condition of a three-phase induction motor and classify it as a faulty or healthy motor. The electrical current signal data is collected for ﬁve di ﬀ erent types of fault and one normal operating condition of the induction motors. The ﬁrst part of the methodology illustrates a pattern recognition technique based on the empirical wavelet transform, to transform the raw current signal into two dimensional (2-D) grayscale images comprising the information related to the faults. Second, a deep CNN (Convolutional Neural Network) model is proposed to automatically extract robust features from the grayscale images to diagnose the faults in the induction motors. The experimental results show that the proposed methodology achieves a competitive accuracy in the fault diagnosis of the induction motors and that it outperformed the traditional statistical and other deep learning methods.


Introduction
Because of the simple design, low cost, low maintenance, and easy operation, induction motors are one of the most commonly used rotating machines in the industry. In spite of the fact that these machines are more reliable and robust in nature, failure of induction motors is expected, due to the various stresses they encounter during their operating conditions. The most responsible factors behind such failure conditions could be either from mechanical or electrical forces. Different types of machinery faults, like broken bars, bearing faults, an unbalanced rotor, and stator faults and winding faults, have been discussed in the literature [1,2]. Many studies have been conducted on fault diagnosis in recent years. Early detections of the problems are vital to save time and costs, so as to take remedial measures to avoid an entire system failure [3]. The fault diagnosis methods can be classified widely into signal-based, model-based, active/hybrid and knowledge-based methods [4,5]. The knowledge-based methods, also called data-driven methods, require a huge amount of historical data to find the signal patterns for the fault diagnosis of the system.
The predictive maintenance and the data-driven methods are commonly used to analyze signals such as the current, temperature, electrical tension and vibrations, which are captured by the use of sensors [6,7]. The signal-based features are extracted for the fault diagnosis. However, the extracted features need to undergo the feature selection techniques to avoid repeated information and also to significantly reduce the feature dimensions, which can improve the performance by retaining important

Related Works
In recent years, many signal processing techniques have been studied in the frequency domain, time domain, and time-frequency domain to extract the full features and detect the machine operating condition using classification methods. Time-frequency domain methods are preferred, among others, to analyze and extract the features from the non-stationary signals. Wang et al. [25] applied wavelet scalogram images as an input to CNN to learn the features and detect the faults. Lee et al. [26] analyzed a corrupted raw signal and the effect of the noise on training the CNN model. Ge et al. [27] studied and theoretically analyzed the empirical mode decomposition (EMD) method. Lei et al. [28] used the EMD method to extract features from vibration signals and discussed a kurtosis-based method for fault diagnosis. Pandya et al. [11] constructed an efficient KNN classifier using an asymmetric proximity function for fault diagnosis. Yang et al. [10] proposed an SVM-based method to diagnose the fault patterns of roller bearings. Ngaopitakkul et al. [9] proposed a decision algorithm based on ANN for a fault diagnosis using discrete wavelet transform (DWT) and backpropagation neural networks. The high-frequency component of the current signals is decomposed by using a mother wavelet called Daubechies (db4). The DWT extracts the high-frequency component from the fault current signals and the coefficients of the first scale from the DWT are used to detect the fault. Ma et al. [29] proposed a method to extract the features of bearing faults based on the complete ensemble EMD (CEEMD) by enhancing the mode characteristic and via the introduction of adaptive noise to diagnose the bearing faults of rotatory machines. Ge et al. [30] proposed a fault diagnosis method based on an empirical wavelet transform sub modal hypothesis test and ambiguity correlation classification to diagnose the rolling bearing faults using vibration signals. However, the authors concentrated only on rolling bearing faults. Deng et al. [31] studied a fault diagnosis method to extract a new feature by combining Hilbert transform coefficients, the correlation coefficients and the ensemble empirical mode decomposition (EEMD). The vibration signal is decomposed into a list of multiple intrinsic mode functions (IMFs) with distinct frequencies using the EEMD. Agarawal et al. [32] presented a comparative study of ANN and SVM using continuous wavelet transforms and energy entropy methods to diagnose and classify the rolling element bearing faults. Mother base wavelet is selected from four real-valued base wavelets based on the entropy criterions and the energy. The statistical features are extracted from the wavelet coefficients of real signals. The extracted statistical features are provided to ANN and SVM as input for the classification of the bearing faults. These comparative results show SVM giving a better performance than ANN. Jayaswal et al. [33] provided a brief review of recent studies on ANN, fuzzy logic and wavelet transform, used to diagnose rotating machinery faults using raw vibration signals. However, special attention is only given to rolling element bearing faults. Bin et al. [34] studied a method using wavelet coefficients and empirical mode decomposition to extract features and classify faults using a multi-layer perceptron network. However, the ANN study found two main concerns: (1) A large dependency on a prior knowledge of signal processing methods and an expertise in the diagnostic process; and (2) the ANNs studied for the fault diagnosis of induction motors might be limited in their learning capacity from learning complex and nonlinear relationships because of the large information on motor currents. Thus, it is essential to study the deep architecture network for fault diagnosis.
Deep learning is more advanced when compared to traditional machine learning methodology. Due to its potential ways of featuring representation, it has been extensively used in machine health monitoring systems [35]. Jia et al. [36] proposed a neural network-based method to diagnose faults using an auto-encoder. Cho et al. [37] used recurrent neural networks and dynamic Bayesian modeling for fault detection in induction motors. However, with RNN, the information flows via the hidden states and is much slower than with CNN. Deep learning models like deep auto-encoders (DAE), deep belief networks (DBN) and CNNs have been studied for fault diagnosis [13,14,16]. Ince et al. [20] used a one-dimensional (1-D) CNN for a real-time motor fault diagnosis. Xu et al. [38] proposed a study based on the Gabor wavelet and the neural network to detect the image intelligence. The authors employed the Gabor wavelet transform to extract the features of information from images. Abdeljaber et al. [39] proposed a 1-D CNN for real-time structural damage detection. Furthermore, there are various ways to represent machinery data in the 2-D format. Chong [40] proposed an effective way to extract the features by converting 1-D vibration signals into 2-D grayscale images. Gaowei et al. [41] proposed a method based on deep CNN and random forest ensemble learning with a remarkable performance; however, they only focused the bearing fault diagnosis. Lu et al. [42] used a probabilistic neural network as an image classifier by converting signals to images using a bispectrum. Kang et al. [43] used 2-D greyscale images created using Shannon wavelets for an induction motor fault diagnosis. However, an expert's knowledge is necessary for these conversion methods. Although methods such as neural networks, using raw data signals, are considered in many studies in order to diagnose and classify faults, data preprocessing is a highly important action in deep learning. Processing huge quantities of data and examining several qualities of parameters leads to a lot of troubles in data preprocessing. Data with distinct characteristics need distinct methods to extract their characteristics. Many studies use frequency, time-frequency, and histograms to convert signals into images for classification. Similarly, in the proposed study, a two-dimensional matrix generated from wavelet coefficient values is represented as an image. The benefit of presenting an image instead of the raw one-dimensional current signal is that the image can provide spatial and temporal dependencies. Moreover, CNN has been a popular deep learning algorithm for working with image datasets, and traditionally it is two dimensional. The benefit of using CNN over a neural network is its ability to develop an internal representation of a two-dimensional image or a matrix of values. It helps the model to learn the position and scale of different structures in the image data or in the two-dimensional matrix data. It also helps to reduce the number of parameters involved by learning high-level features and via the reusability of weights. In this study, an efficient 1-D signal to 2-D greyscale image representation is proposed by using an empirical wavelet transform. This method is free of any predefined parameters and eliminates the expert's interference.

Proposed Methodology
This section describes the proposed EWT-CNN-based fault diagnosis methodology. As part of data preprocessing, the raw current signal is converted into images using EWT modes. Then, a deep CNN model is presented to extract and learn the features for the fault diagnosis.

Pattern Recognition Technique
As the most common data-driven methods are unable to deal with direct original signals for the fault diagnosis, preprocessing the raw signal is necessary. In recent years, an empirical mode decomposition (EMD) algorithm proposed by Huang et al. [44], and has gained a great interest in signal analysis due to its ability to separate stationary and non-stationary components from a signal. However, although its adaptability seems appreciable, the lack of a mathematical theory is the main issue with this approach. To deal with this problem, an Ensemble EMD (EEMD) is proposed to compute several EMD decompositions of the original signal, averaging the decompositions to get a final EEMD. This method seems appreciable, but it increases the computational cost [45].
Currently, wavelet analysis is classified as one of the most used tools to analyze signals. An extensive literature about wavelet theory [46][47][48] can be referenced for further details. In the temporal domain, with a scaling factor s > 0 and a translation factor u ∈ R, the wavelet dictionary {ψ u,s } is defined as: The scaling factor s is used to stretch or compress the wavelet function in order to change the oscillating frequency, and the translation facto u is used to change the position of the time window. The wavelet functions define the focal features and time-frequency properties, which can effectively capture the non-stationary characteristics of the signal. There are many wavelets functions that are studied, such as Morlet, Meyer, Symlet, Gabor, Coiflet, and Haar [49][50][51][52]. All these methods use either a prescribed scale subdivision or use the output of the classic wavelet output smartly. However, they failed to build a full adaptive wavelet transform. Thus, the proposed method uses a new approach called empirical wavelet transform (EWT) to build a family of wavelets adapted to the processed signal [24,30]. The empirical wavelet transform is defined in a step-by-step manner rather than in a single mathematical formulation as is the case of the classic wavelet transform. The main idea behind the EWT is to extract the different modes of a signal based on Fourier supports detected from the spectrum information of the processed signal.
The following steps summarize the empirical wavelet transform proposed in [24]: Step 1: Find the Fourier transform of the processed input signal.
Step 2: Segment the Fourier spectrum by detecting the local maxima in the spectrum.
Step 3: Sort the local maxima in decreasing order Step 4: Define the boundaries of every segment as the center between two successive maxima.
Step 5: Follow the construction idea of Meyer's wavelet to obtain a tight frameset Step 6: Obtain the corresponding signal filters (modes as defined in [24]). The proposed empirical wavelets correspond to the dilated version of a single mother wavelet in the temporal domain. However, the corresponding dilatation factors do not follow a prescribed scheme but are detected empirically. For further details on the EWT, we refer the reader to the literature [24]. A three-phase current signal from the induction motor is collected. Ten cycles (one full cycle having 167 data points) for each phase current signal, i.e., 1670 continuous points, are sampled.
Then, the 1670 points are converted into a 1670 × N time-frequency spectrum, which consists of the coefficient matrices via the empirical wavelet transform. N stands for the number of modes, and the sufficient raw signal characteristics can be obtained by choosing the appropriate value. Finally, the grayscale image is represented from the time-frequency spectrum.
The raw current signals collected from the different induction motors working at different faulty/healthy condition and operating on the same load condition are shown in Figure 1. However, they are non-distinguishable, and it is almost impossible to diagnose the fault condition of the motors by using the raw current signals. Figure 2 shows the same set of raw signals that are processed by EWT, and they look absolutely distinguishable from each other. Hence, it is indeed necessary to preprocess the raw current signals by EWT in order to find the distinguishable patterns.
Symmetry 2019, 11, x FOR PEER REVIEW 5 of 15 scheme but are detected empirically. For further details on the EWT, we refer the reader to the literature [24]. A three-phase current signal from the induction motor is collected. Ten cycles (one full cycle having 167 data points) for each phase current signal, i.e., 1670 continuous points, are sampled. Then, the 1670 points are converted into a 1670 × N time-frequency spectrum, which consists of the coefficient matrices via the empirical wavelet transform. N stands for the number of modes, and the sufficient raw signal characteristics can be obtained by choosing the appropriate value. Finally, the grayscale image is represented from the time-frequency spectrum.
The raw current signals collected from the different induction motors working at different faulty/healthy condition and operating on the same load condition are shown in Figure 1. However, they are non-distinguishable, and it is almost impossible to diagnose the fault condition of the motors by using the raw current signals. Figure 2 shows the same set of raw signals that are processed by EWT, and they look absolutely distinguishable from each other. Hence, it is indeed necessary to preprocess the raw current signals by EWT in order to find the distinguishable patterns.
The CNN model training will be difficult with the 1670 × N image, as the latter results in computational complexity. A simple image resizing method based on scikit-image processing [53] is used to decrease the image size. Figure 3 illustrates the entire workflow of the proposed method. Figure 4 shows the distinguishable grayscale resized (32 × 32) images for each fault type and the healthy type of motors data.     The CNN model training will be difficult with the 1670 × N image, as the latter results in computational complexity. A simple image resizing method based on scikit-image processing [53] is used to decrease the image size. Figure 3 illustrates the entire workflow of the proposed method. Figure 4 shows the distinguishable grayscale resized (32 × 32) images for each fault type and the healthy type of motors data.

Proposed Deep Convolutional Neural Network
After converting the raw current signals into grayscale images, a deep CNN model is designed and pre-trained for feature learning. The proposed deep CNN has a three-stage structure. Each stage represents a feature learning stage with a different feature-level, which includes convolution, activation, and pooling layers. Figure 5 illustrates the architecture of the proposed CNN model, which consists of three convolutional layers with filters 32-3 × 3, 64-3 × 3 and 128-3 × 3, respectively. In addition to that, there are three max-pooling layers of size 2 × 2. The most commonly-used activation functions are the hyperbolic tangent, softmax, ReLU, and sigmoid function [54]. Among them, ReLU has proven to be more effective than the others. However, during the training, ReLU units can die, and this could

Proposed Deep Convolutional Neural Network
After converting the raw current signals into grayscale images, a deep CNN model is designed and pre-trained for feature learning. The proposed deep CNN has a three-stage structure. Each stage represents a feature learning stage with a different feature-level, which includes convolution, activation, and pooling layers. Figure 5 illustrates the architecture of the proposed CNN model, which consists of three convolutional layers with filters 32-3 × 3, 64-3 × 3 and 128-3 × 3, respectively. In addition to that, there are three max-pooling layers of size 2 × 2. The most commonly-used activation functions are the hyperbolic tangent, softmax, ReLU, and sigmoid function [54]. Among them, ReLU has proven to be more effective than the others. However, during the training, ReLU units can die, and this could occur when a large gradient flows through a ReLU neuron. This causes the weights to update, so that the neuron will never activate again on any data point. A leaky ReLU is an attempt to solve this problem [55,56]; thus, the leaky ReLU (Rectified Linear Units) is applied as an activation function to introduce non-linearity into each stage, allowing the CNN to learn complex models. Pooling is used to reduce the resolution of the input image via the process of subsampling, and Max Pooling is used in the proposed model. occur when a large gradient flows through a ReLU neuron. This causes the weights to update, so that the neuron will never activate again on any data point. A leaky ReLU is an attempt to solve this problem [55,56]; thus, the leaky ReLU (Rectified Linear Units) is applied as an activation function to introduce non-linearity into each stage, allowing the CNN to learn complex models. Pooling is used to reduce the resolution of the input image via the process of subsampling, and Max Pooling is used in the proposed model. Training the CNN model involves learning all the weights and biases, and it is important to optimize these parameters for an efficient feature learning. Apart from the training parameters, the CNN also needs to optimize the hyperparameters, such as the learning rate and dropout. The dropout is an important property of CNN, which can greatly help in preventing the overfitting by generalizing the model [57]. A dropout of size 0.4 is used for a better regularization in the proposed CNN. The adapted moment estimation (ADAM), which is a backpropagation algorithm, is used to optimize the learning rate and other hyperparameters. The ADAM adapts the learning rate scale through different layers and avoids the manual assignment for choosing the best learning rate [58]. At the end of the three stages, the feature maps are flattened and classified via a fully connected layer for 6 types of classifications.

Experimental Results and Discussion
To assess the performance of the proposed methodology, the raw current signal data from an experimental setup involving a total of six induction motors with the same specifications are used. This includes one healthy and five fault types of raw current data signals, which are collected from the experimental setup. The six types of current signals are studied and analyzed for the healthy condition of the motor, as well as for the following five faulty conditions of the motor [59]. The data preprocessing and the CNN model are written in Python 3.6 with TensorFlow and run on the Windows 64 bit operating system.

Faults in Induction Motors
The motors undergo various types of failure modes, mostly due to electrical and mechanical forces. These failure modes eventually break the entire system from its normal working condition. This section deals mainly with the five types of faults, namely: bearing axis deviation, stator and rotor friction, rotor aluminum end ring break, bearing noise and poor insulation.
1. Bearing Axis Deviation: The structure of the bearing is precise. If it is disturbed by some external forces, the structure of the bearing may be affected. After connecting the motor to the load, an earthquake, collision, and the assembly process may introduce an offset of midpoints on both ends of the connection, which causes heating problems and unwanted noise. A normal motor Training the CNN model involves learning all the weights and biases, and it is important to optimize these parameters for an efficient feature learning. Apart from the training parameters, the CNN also needs to optimize the hyperparameters, such as the learning rate and dropout. The dropout is an important property of CNN, which can greatly help in preventing the overfitting by generalizing the model [57]. A dropout of size 0.4 is used for a better regularization in the proposed CNN. The adapted moment estimation (ADAM), which is a backpropagation algorithm, is used to optimize the learning rate and other hyperparameters. The ADAM adapts the learning rate scale through different layers and avoids the manual assignment for choosing the best learning rate [58]. At the end of the three stages, the feature maps are flattened and classified via a fully connected layer for 6 types of classifications.

Experimental Results and Discussion
To assess the performance of the proposed methodology, the raw current signal data from an experimental setup involving a total of six induction motors with the same specifications are used. This includes one healthy and five fault types of raw current data signals, which are collected from the experimental setup. The six types of current signals are studied and analyzed for the healthy condition of the motor, as well as for the following five faulty conditions of the motor [59]. The data preprocessing and the CNN model are written in Python 3.6 with TensorFlow and run on the Windows 64 bit operating system.

Faults in Induction Motors
The motors undergo various types of failure modes, mostly due to electrical and mechanical forces. These failure modes eventually break the entire system from its normal working condition. This section deals mainly with the five types of faults, namely: bearing axis deviation, stator and rotor friction, rotor aluminum end ring break, bearing noise and poor insulation.

1.
Bearing Axis Deviation: The structure of the bearing is precise. If it is disturbed by some external forces, the structure of the bearing may be affected. After connecting the motor to the load, an earthquake, collision, and the assembly process may introduce an offset of midpoints on both ends of the connection, which causes heating problems and unwanted noise. A normal motor with a full load is used, and, for this experiment, the coupling is shifted 0.5 mm upward to imitate the deviation condition. The experimental motor model is shown in Figure 6d.

2.
Stator and Rotor Friction and Poor Insulation: Because of friction, overheating, insulation aging, dampness and corona, the stator or rotor coil is short-circuited, and hence it will break down if not diagnosed. The insulation of the adjacent turns in the stator coil will be damaged, causing a short circuit, as shown in the Figure 6a. When the motor is started, the short-circuit current value will be high due to the difference in excessive voltage caused by different wound turns in the stator, and the motor will be burnt. The experimental motor model is shown in Figure 6a.

3.
Rotor Aluminum End Ring Break: The outer ring damage is one of the most common faults. If the starting frequency is very high and/or the motor is overloaded, the rotor bar will break due to the excessive current. For this experiment, a hole with a diameter of 7 mm and a depth of 30 mm is made on the rotor bar to simulate the fault condition. The experimental motor model is shown in Figure 6b. 4.
Bearing Noise: Damage to the bearing's outer race is considered one of the constant faults observed in bearings. The structure of the bearing is always kept precise. However, if the structure is disturbed by an external force or some other structures of bearing, this causes messy and numerous harmonics in the measured spectrum. A hole with a diameter and depth of 1 mm is made in the outer race to simulate the fault condition for this experiment. The experimental motor model is shown in Figure 6c.
The proposed method uses the motor raw current signal values to analyze and find patterns for the fault diagnosis of the above-listed motor faults. with a full load is used, and, for this experiment, the coupling is shifted 0.5 mm upward to imitate the deviation condition. The experimental motor model is shown in Figure 6d. 2. Stator and Rotor Friction and Poor Insulation: Because of friction, overheating, insulation aging, dampness and corona, the stator or rotor coil is short-circuited, and hence it will break down if not diagnosed. The insulation of the adjacent turns in the stator coil will be damaged, causing a short circuit, as shown in the Figure 6a. When the motor is started, the short-circuit current value will be high due to the difference in excessive voltage caused by different wound turns in the stator, and the motor will be burnt. The experimental motor model is shown in Figure 6a. 3. Rotor Aluminum End Ring Break: The outer ring damage is one of the most common faults. If the starting frequency is very high and/or the motor is overloaded, the rotor bar will break due to the excessive current. For this experiment, a hole with a diameter of 7 mm and a depth of 30 mm is made on the rotor bar to simulate the fault condition. The experimental motor model is shown in Figure 6b. 4. Bearing Noise: Damage to the bearing's outer race is considered one of the constant faults observed in bearings. The structure of the bearing is always kept precise. However, if the structure is disturbed by an external force or some other structures of bearing, this causes messy and numerous harmonics in the measured spectrum. A hole with a diameter and depth of 1 mm is made in the outer race to simulate the fault condition for this experiment. The experimental motor model is shown in Figure 6c.
The proposed method uses the motor raw current signal values to analyze and find patterns for the fault diagnosis of the above-listed motor faults.

Dataset
The collected dataset from the experiment consists of 900 samples [60]. 50 samples from the healthy condition motor and 50 samples from each kind of faulty condition motor on a 100% load (full load) are collected and analyzed. As three-phase induction motors are used in this study, there are three current signals, with differences in the phase, and each phase current is considered when preparing the dataset. Hence, a total of 150 raw current data samples are prepared for the healthy motor and for each of the five faulty motors, as described in Table 1. The data set is divided into three parts, as described in Table 2. 70% of the dataset (630 samples for training) and 15% of the dataset (135 samples for validation) are used simultaneously to train the CNN model. The remaining 15% (135 image samples) are used to test the trained CNN model. Cross-validation techniques are often used for simple models having few trainable parameters like linear regression, logistic regression, small neural networks and support vector machines. A CNN model having many parameters will lead to too many possible changes in the architecture. However, in this study, the proposed CNN model is trained and evaluated using a k-fold cross-validation with the data split ratio shown in Table 2.

CNN Performance Evaluation Results
The proposed CNN model is trained over 150 epochs to learn the robust features for each type of faulty condition motor and one normal operating condition motor. A k-fold cross-validation technique with five folds is applied manually to evaluate the model training and testing. The CNN model is trained to extract and learn the features from 630 samples of the training dataset, simultaneously validated against 135 samples of the validation dataset during each iteration for the five folds of the dataset split. The trained CNN model is evaluated against 135 samples of the test-dataset. The model is cross-validated over five folds with the dataset split ratio being described in Table 2, after which the averages of all the accuracies and losses during each fold are collected in order to observe the accuracies and losses during the training, as shown in Figure 7.
The proposed CNN model is trained and tested with batch sizes of 16, 32 and 64, and we found the best results to be with a size of 32. The CNN model is trained over 50 to 200 epochs to learn the robust features and analyze the classification performance, in order to choose the number of epochs. The average accuracies and losses (training and validation) are collected at each iteration while training the CNN model with a k-fold cross-validation technique and are then plotted, as shown in Figure 7. The CNN model hit the training accuracy by almost 100% with a validation accuracy of around 91%. Over the 150 epochs, the proposed CNN model was able to learn the robust and generalized features of the EWT grayscale images, in order to diagnose the motor faults and classify them into faulty or healthy categories.
Poor insulation can also be observed due to stator and rotor friction and bearing axis deviation. Hence, there are some misclassifications with other types. Figure 8 illustrates the confusion matrix, which explains the classification results on the test dataset (135 samples) using the well-trained CNN model. Almost all the test samples are correctly classified, with a few misclassifications involving the poor insulation condition and other faulty conditions. In order to assess the performance metrics of the proposed deep CNN model, a few of the other statistical and deep learning models are chosen to compare them with the proposed deep CNN model. The experiment of comparing this model with the traditional methodologies is conducted with the same dataset that was considered to evaluate the proposed model. The collected dataset is used to evaluate the traditional and other deep learning models listed in Table 4. The proposed methodology is compared with the deep belief network (DBN) [16], SVM [61], sparse filter [18], ANN [16] and adaptive deep convolutional neural network (ADCNN) [62]. Similar to the proposed CNN model, a k-fold cross-validation technique with five folds is used to train these methods. The test data (135) samples are used to evaluate these models. The prediction accuracy for the test dataset is collected for each of these methods and presented in Table 4.  To evaluate the performance of the trained CNN model, 135 samples of the test dataset are used. The performance result of the trained CNN model looks interesting, with an average accuracy of 97% on the test dataset, as described in the classification report (Table 3). From the classification report, it can be clearly seen that the proposed CNN model is capable of extracting and learning the features from the test dataset and of classifying the features for the respective faulty and healthy conditions. The proposed model is able to classify the healthy condition, bearing axis deviation fault, rotor aluminum end ring break fault and bearing noise fault more effectively than the other faults. However, the model needs to be tuned in the case of the motor with poor insulation faulty condition. Poor insulation can also be observed due to stator and rotor friction and bearing axis deviation. Hence, there are some misclassifications with other types. Figure 8 illustrates the confusion matrix, which explains the classification results on the test dataset (135 samples) using the well-trained CNN model. Almost all the test samples are correctly classified, with a few misclassifications involving the poor insulation condition and other faulty conditions.   In order to assess the performance metrics of the proposed deep CNN model, a few of the other statistical and deep learning models are chosen to compare them with the proposed deep CNN model. The experiment of comparing this model with the traditional methodologies is conducted with the same dataset that was considered to evaluate the proposed model. The collected dataset is used to evaluate the traditional and other deep learning models listed in Table 4. The proposed methodology is compared with the deep belief network (DBN) [16], SVM [61], sparse filter [18], ANN [16] and adaptive deep convolutional neural network (ADCNN) [62]. Similar to the proposed CNN model, a k-fold cross-validation technique with five folds is used to train these methods. The test data (135) samples are used to evaluate these models. The prediction accuracy for the test dataset is collected for each of these methods and presented in Table 4. The comparison results explain that the proposed deep CNN model attains a prominent result when compared to the other methods. The prediction accuracy is 97.37%, which is better than all the other methods; this shows the significant performance of the proposed deep CNN model.

Conclusions and Future Work
An effective methodology was presented to diagnose the faults in a three-phase induction motor based on EWT and deep CNN. The main contributions of this study are that we propose a method to convert time-series data, such as current signals, into grayscale images, using EWT and applying the proposed deep CNN model to classify the EWT grayscale images for a fault diagnosis. The proposed methodology was tested for five fault types of the induction motor, including bearing axis deviation, stator and rotor friction, rotor aluminum end ring break, bearing noise and poor insulation, and it achieved a significant accuracy of 97.37%. The proposed methodology performed better than the other traditional and deep learning methods. We demonstrated that the proposed methodology, which took into account a single variable as the input feature, yielded promising results when compared to rule-based diagnosis systems that take into account multiple features for a fault diagnosis.
The limitations of the proposed methodology are as follows. First, the dataset that was considered for the experiment was comparatively small, and a huge number of data samples need to be collected for different load conditions, such as no load, half load or full load. Second, data from motors with different specifications need to be collected in order to learn more generalized features. Third, the most common faults in induction motors need to be detected in order to avoid misclassifications. Based on the limitations described above, our future work is focused on collecting more data samples from induction motors having different specifications and working at different loads, as well as investing in collecting information on the most common fault types in induction motors in order to avoid misclassification. Furthermore, CNN-based transfer learning can be studied to reduce training costs.