Frequency Occurrence Plot-Based Convolutional Neural Network for Motor Fault Diagnosis

A novel motor fault diagnosis using only motor current signature is developed using a frequency occurrence plot-based convolutional neural network (FOP-CNN). In this study, a healthy motor and four identical motors with synthetically applied fault conditions—bearing axis deviation, stator coil inter-turn short circuiting, a broken rotor strip, and outer bearing ring damage—are tested. A set of 150 three-second sampling stator current signals from each motor fault condition are taken under five artificial coupling loads (0, 25%, 50%, 75% and 100%). The sampling signals are collected and processed into frequency occurrence plots (FOPs) which later serve as CNN inputs. This is done first by transforming the time series signals into its frequency spectra then convert these into two-dimensional FOPs. Fivefold stratified sampling cross-validation is performed. When motor load variations are considered as input labels, FOP-CNN predicts motor fault conditions with a 92.37% classification accuracy. It precisely classifies and recalls bearing axis deviation fault and healthy conditions with 99.92% and 96.13% f-scores, respectively. When motor loading variations are not used as input data labels, FOP-CNN still satisfactorily predicts motor condition with an 80.25% overall accuracy. FOP-CNN serves as a new feature extraction technique for time series input signals such as vibration sensors, thermocouples, and acoustics.


Introduction
Prognostics and health management (PHM) has modernized the industry in terms of equipment reliability, attracting both academia and industry practice [1]. In the PHM strategy, diagnostics and prognostics are two important mechanisms applied in machine condition-based maintenance. A diagnostic mechanism detects, isolates, and identifies the present machine condition. Driven primarily by machines such as motors and generators, modern industries have been improved by advance diagnostics such as better preventive maintenance, improved safety and increased reliability [2]. Deep learning, an emerging branch of artificial intelligence (AI), has been playing an important role in this PHM modernization.
For prognostics and the health management of machines, bearing fault diagnosis is one of the well-known applications of deep learning (DL). The recent survey in [3] and the review in [4] provide comprehensive assessments of different state-of-the-art DL-based machine health monitoring systems applied to bearing fault diagnostics. These systems vary by their different settings; thus, there is always the need to provide alternatives to help AI practitioners choose the best-suited algorithm.
Feature selection is one of the primary concerns for effective deep learning (DL) applications. Vibrations or acoustic signals tend to be widely used features in bearing fault diagnosis [4][5][6]. Various

Synthetic Motor Fault Conditions
There are four typical motor fault conditions-bearing axis misalignment, inter-turn short circuiting, a broken rotor strip and an outer ring bearing fault-synthetically applied in four respective test motors.

Bearing Axis Misalignment
The bearing axis deviation fault happens when a motor is eccentrically coupled to its load. Improper installation, changes or damage to motor bases cause the motor shaft to misalign with the coupling load. Similar to [40], an artificially created eccentricity misalignment experiment with an elevation of 0.5 mm, as illustrated in Figure 1a, is also used to simulate this fault.
Electronics 2020, 9, x FOR PEER REVIEW 4 of 18 fault-conditioned motors in terms of current magnitude in all three motor loadings. This may be because healthy motors have less energy dissipation than faulty motors.

Data Preprocessing
With the generated raw motor data, two data processing techniques are used before learning. First, a signal data transformation, from the time domain to the frequency spectrum, is performed using a frequency transformation. Second, novel frequency occurrence plot (FOP) image generation is employed to convert the frequency spectra into FOPs. These plots will then serve as inputs for the convolutional neural network (CNN) model.

Fast Frequency Transform
Fourier analysis is a widely known tool when converting a time series signal into frequency spectrum representation, and vice versa. It has a form called discrete-time Fourier transform (DTFT) that analyzes discrete-time samples whose intervals have units of time. Given a sampling signal , = 0,1, … , 1 with sampling points, a sequence in (1), as a function of frequency , gives the complex Fourier amplitudes. The expression in (2) is a principle th root of unity in a complex Fourier series. For this motor digital signal application, this DTFT is employed using discrete Fourier

Stator Inter-Turn Short Circuiting
The aging insulation of the stator coil due to the long operation period of motors is often believed to be the primary reason for motor overheating. In severe cases, it causes short-circuits between turns of the same phase or even in different phases. To simulate this motor stator turn-to-turn short circuit fault, two adjacent turns of the stator winding of a test motor are intentionally short-circuited by breaking their insulation and allowing them to make contact, as shown in Figure 1b.

Broken Rotor Strip Fault
Excessive current due to long-term overloading is often seen as the reason of a broken rotor strip fault. This fault is synthetically simulated by drilling directly into one side of the rotor bar similar to [41]. Taking into consideration its serious impact to the motor, the first drill underwent an experimental detection. After checking that the motor is still running well, a second drill is performed as shown in Figure 1c.

Outer Ring Bearing Fault
Lastly, outer ring damage is a common bearing fault. This bearing fault increases machine vibration whenever a bearing ball passes over a damaged area [42]. In this simulation, a hole is drilled in the outer ring of a test motor. Electrical conducting heat is applied to the hole, making sure that its residue is removed and no physical deformations are present after drilling, as shown in Figure 1d.

Data Collection
The experimental simulation of previous test motors produces a total of 3750 time series data from 150 three-second current signals of five motor conditions under five loading variations. A set of three full-cycle sample waves of current signals plotted in Figure 2 shows five motor conditions under three different motor loadings. It can be observed that the healthy motor upper-bounded all other fault-conditioned motors in terms of current magnitude in all three motor loadings. This may be because healthy motors have less energy dissipation than faulty motors.

Data Preprocessing
With the generated raw motor data, two data processing techniques are used before learning. First, a signal data transformation, from the time domain to the frequency spectrum, is performed using a frequency transformation. Second, novel frequency occurrence plot (FOP) image generation is employed to convert the frequency spectra into FOPs. These plots will then serve as inputs for the convolutional neural network (CNN) model.

Fast Frequency Transform
Fourier analysis is a widely known tool when converting a time series signal into frequency spectrum representation, and vice versa. It has a form called discrete-time Fourier transform (DTFT) that analyzes discrete-time samples whose intervals have units of time. Given a sampling signal , = 0,1, … , 1 with sampling points, a sequence in (1), as a function of frequency , gives the complex Fourier amplitudes. The expression in (2) is a principle th root of unity in a complex Fourier series. For this motor digital signal application, this DTFT is employed using discrete Fourier transform (DFT).

Data Preprocessing
With the generated raw motor data, two data processing techniques are used before learning. First, a signal data transformation, from the time domain to the frequency spectrum, is performed using a frequency transformation. Second, novel frequency occurrence plot (FOP) image generation is employed to convert the frequency spectra into FOPs. These plots will then serve as inputs for the convolutional neural network (CNN) model.

Fast Frequency Transform
Fourier analysis is a widely known tool when converting a time series signal into frequency spectrum representation, and vice versa. It has a form called discrete-time Fourier transform (DTFT) that analyzes discrete-time samples whose intervals have units of time. Given a sampling signal X( j), j = 0, 1, . . . , N − 1 with N sampling points, a sequence in (1), as a function of frequency n, gives the complex Fourier amplitudes. The expression in (2) is a principle Nth root of unity in a complex Fourier series. For this motor digital signal application, this DTFT is employed using discrete Fourier transform (DFT).
Fast Fourier transform (FFT) is simply an algorithm that performs DFT [43]. With the previously generated time series motor current signals, FFT is used to transform these into frequency spectra. An example of a frequency spectrum of a healthy motor signal is shown in Figure 3. FFT is performed using SciPy library [44] to convert the time series data into the frequency spectrum. Three data preprocessing techniques are performed to avoid potential noise and to ease the learning process of the proposed fault classification system.
First, data clipping helps truncate the converted data within the 0-500 Hz range. It was assumed here that all motor faults would not operate beyond 500 Hz. Increasing the frequency range decreases the FOP image resolution, which may give a poor CNN classification performance. Second, a 90% percentile clipping is performed to change the magnitude of less significant frequencies into zero, thus avoiding possible noise. Finally, the magnitude of the operating frequency (60 Hz) and its side Electronics 2020, 9, 1711 5 of 17 band frequencies (around 59-61 Hz) are far greater than the other modal frequencies; thus, a log function normalization is performed. Figure 4 plots a sample of a pre-processed healthy motor condition dataset of frequency spectra.
preprocessing techniques are performed to avoid potential noise and to ease the learning process of the proposed fault classification system.
First, data clipping helps truncate the converted data within the 0-500 Hz range. It was assumed here that all motor faults would not operate beyond 500 Hz. Increasing the frequency range decreases the FOP image resolution, which may give a poor CNN classification performance. Second, a 90% percentile clipping is performed to change the magnitude of less significant frequencies into zero, thus avoiding possible noise. Finally, the magnitude of the operating frequency (60 Hz) and its side band frequencies (around 59-61 Hz) are far greater than the other modal frequencies; thus, a log function normalization is performed. Figure 4 plots a sample of a pre-processed healthy motor condition dataset of frequency spectra.  The presence of other frequencies with noticeable amplitudes is often believed to have been caused by motor faults. Identifying these frequencies for each motor fault type is most likely difficult due to the complex frequency spectra, as perceived in Figure 5. Accordingly, differences among five preprocessing techniques are performed to avoid potential noise and to ease the learning process of the proposed fault classification system.
First, data clipping helps truncate the converted data within the 0-500 Hz range. It was assumed here that all motor faults would not operate beyond 500 Hz. Increasing the frequency range decreases the FOP image resolution, which may give a poor CNN classification performance. Second, a 90% percentile clipping is performed to change the magnitude of less significant frequencies into zero, thus avoiding possible noise. Finally, the magnitude of the operating frequency (60 Hz) and its side band frequencies (around 59-61 Hz) are far greater than the other modal frequencies; thus, a log function normalization is performed. Figure 4 plots a sample of a pre-processed healthy motor condition dataset of frequency spectra.  The presence of other frequencies with noticeable amplitudes is often believed to have been caused by motor faults. Identifying these frequencies for each motor fault type is most likely difficult due to the complex frequency spectra, as perceived in Figure 5. Accordingly, differences among five The presence of other frequencies with noticeable amplitudes is often believed to have been caused by motor faults. Identifying these frequencies for each motor fault type is most likely difficult due to the complex frequency spectra, as perceived in Figure 5. Accordingly, differences among five motor conditions can be observed, but these seem difficult to distinguish simply by using human visual recognition.

Frequency Occurrence Plots
Let metric space M be defined and let A(i) ∈ M denote the ith point of the previously defined frequency spectrum A. A frequency occurrence plot is defined in (3) where the spectrum A(i) = A( j).
Here, ε is the mapping resolution, which scales the difference of two identical signals in the ith row and jth column. The matrix FOP(i, j) are transformed into a color map. This initially normalizes Electronics 2020, 9, 1711 6 of 17 and scales the data, then maps it into an RGB color map using the Matplotlib library [45]. Thus, ε has no effect in the color mapping, but is useful for visualization purposes. After a series of trials, we use ε = 0.001 which produces distinctive occurrence plots.
Electronics 2020, 9, x FOR PEER REVIEW 6 of 18 motor conditions can be observed, but these seem difficult to distinguish simply by using human visual recognition.

Frequency Occurrence Plots
Let metric space be defined and let ∈ denote the th point of the previously defined frequency spectrum . A frequency occurrence plot is defined in (3) where the spectrum = A .
Here, is the mapping resolution, which scales the difference of two identical signals in the row and column. The matrix , are transformed into a color map. This initially normalizes and scales the data, then maps it into an RGB color map using the Matplotlib library [45]. Thus, has no effect in the color mapping, but is useful for visualization purposes. After a series of trials, we use = 0.001 which produces distinctive occurrence plots. Figure 6 displays a sample illustration of how a frequency occurrence plot (FOP) is produced from a sample in the range of the 500 Hz frequency spectrum. Each plot has a 217 × 217 resolution. Brightly colored lines, vertically and horizontally, represent higher magnitude in the frequency spectrum. Accordingly, the motor operating frequency at 60 Hz has the brightest color. The other bright line represents other frequencies with significant magnitudes, which may have been caused by different motor faults.  Brightly colored lines, vertically and horizontally, represent higher magnitude in the frequency spectrum. Accordingly, the motor operating frequency at 60 Hz has the brightest color. The other bright line represents other frequencies with significant magnitudes, which may have been caused by different motor faults.

Deep Learning Implementation
A convolutional neural network (CNN), a powerful deep learning tool for image recognition, is used to learn and classify faults from the generated FOPs in this study. Initially, FOPs are converted into three images, each representing its red, green, and blue (RGB) color features with the same image

Deep Learning Implementation
A convolutional neural network (CNN), a powerful deep learning tool for image recognition, is used to learn and classify faults from the generated FOPs in this study. Initially, FOPs are converted into three images, each representing its red, green, and blue (RGB) color features with the same image size. These serve as initial inputs for a CNN model.

Convolutional Neural Network
The architecture of the employed sequential-based CNN is shown in Figure 7. The CNN model initially inputs three extracted color images from the original FOP. It has various stages such as convolution, max pooling, dense, flatten, and dropout that lead to its final fault classification. Convolution layers use a filter matrix to obtain convolved feature maps by performing convolution operations over an array of input image pixels. On the other hand, the max pooling layer applies a moving two-dimensional window to the incoming matrix and outputs its maximum value to down-sample it, reduce its dimension and generalize its internal feature. We use a 2 × 2 window for two max pool layers, thus reducing their output by half. The dense layer is simply a linear operation where each input is connected to every output with weights. The first dense layer has a huge number of output units, so dropout was performed. Flattening, which is simply a method of linearizing a two-dimensional array, was also used. Dropout is a popular and well-known regularization technique that reduces the risk of overfitting. It is applied and tuned in different values per layer. Finally, another dense layer is added, which serves as its output layer. Here, five output units represent five motor conditions. This model is trained under batch gradient descent and the Adam optimizer. For epoch optimization, the Adam optimizer is a widely used optimization method for deep learning applications and is favorably chosen over other stochastic optimization methods [46].

Supervised Learning
The supervised learning of CNN is summarized in Figure 8. There are five steps performed in this implementation-model selection, model training and testing, model performance comparison, test scenarios, and performance validation. This model is trained under batch gradient descent and the Adam optimizer. For epoch optimization, the Adam optimizer is a widely used optimization method for deep learning applications and is favorably chosen over other stochastic optimization methods [46].

Supervised Learning
The supervised learning of CNN is summarized in Figure 8. There are five steps performed in this implementation-model selection, model training and testing, model performance comparison, test scenarios, and performance validation.

Model Selection
First, the CNN model architecture and its hyper-parameters are chosen during the model selection, based on the brute-force method. This manual selection is still a limitation of this study because finding its optimal value tends to be computationally expensive and complex.

Model Training and Testing
Simultaneously, the FOPs generated from the previous section are split into training and testing datasets. Using the selected model parameters, model training is first performed by learning the patterns Electronics 2020, 9, 1711 8 of 17 and features from the training FOPs and evaluating their training performance. Then, another set of FOPs, also called testing FOPs, test the trained CNN model, and evaluate its testing performance. In the third stage, a comparison between training and testing performances is performed to observe the presence of overfitting. This model is trained under batch gradient descent and the Adam optimizer. For epoch optimization, the Adam optimizer is a widely used optimization method for deep learning applications and is favorably chosen over other stochastic optimization methods [46].

Supervised Learning
The supervised learning of CNN is summarized in Figure 8. There are five steps performed in this implementation-model selection, model training and testing, model performance comparison, test scenarios, and performance validation.

Model Selection
First, the CNN model architecture and its hyper-parameters are chosen during the model selection, based on the brute-force method. This manual selection is still a limitation of this study because finding its optimal value tends to be computationally expensive and complex.

Model Training and Testing
Simultaneously, the FOPs generated from the previous section are split into training and testing datasets. Using the selected model parameters, model training is first performed by learning the patterns and features from the training FOPs and evaluating their training performance. Then, another set of FOPs, also called testing FOPs, test the trained CNN model, and evaluate its testing performance. In the third stage, a comparison between training and testing performances is performed to observe the presence of overfitting.

Model Performance Evaluation
The supervised training and testing of CNN are evaluated using training and testing datasets, respectively. There are two typical modes of evaluation. First, the loss function of the CNN model is determined, usually in the form of a loss of function graph. This measures the consistency between the predicted value and actual label of the input FOPs during the training phase based on the theoretical functions used by CNN. The robustness of the model increases as the loss value decreases. To determine whether the model has overfitting issues, we also consecutively determine the loss function of the testing dataset and compare it with the loss function of the training dataset in every epoch. Categorical cross entropy (CCE) is commonly used and is shown to have a robust performance, even with synthetically generated noisy labels [47].
Second, the classification performance accuracy in (4) is used to determine the model prediction accuracy. To further evaluate its performance in terms of positive and false negative classification, the F-score in (5) is also used. In addition, a confusion matrix provides a visualization of each classification performance.

Test Scenarios
A motor usually operates in different loading conditions throughout its whole operation. In actual practice, monitoring motor load values may have practical applications for a company. This study thus simulates two test scenarios. First, a simulation is performed when motor loading condition is available. Five separate train-test CNN models are simulated, corresponding to the five motor loading conditions. Electronics 2020, 9, 1711 9 of 17 Each model is simulated using 600 and 150 frequency occurrence plots (FOPs) for training and testing simulations, respectively. All five CNN models are combined into a single, generalized model.
Another case is performed when the motor loading condition is assumed to be unavailable. Only one CNN model, using the entire dataset at once, is learned, without data labeling, based on motor loading conditions. A total of 3750 frequency occurrence plots (FOPs) are used. It only learns 3000 and 750 FOPs for training and testing simulations, respectively. Both cases have similar partitions and equal numbers of FOPs.

Performance Validation
Testing datasets are used to evaluate the same model chosen from the previous step. To avoid model bias, each testing dataset is completely different from its training dataset. After testing, the testing performance is compared with the previous training performance. This is repeated via fivefold cross-validation with stratified sampling data partition, as shown in Figure 9.

5
# model selection and training for i = 1: n training of CNN using train FOP datasets determine train loss function 6 # model training and validation testing of CNN using test datasets determine test loss function 7 # model overall performance evaluation measure average classification accuracy measure average f-score, precision, and recall Figure 10 shows a matrix of FOPs where all five motor fault conditions under five different loading conditions are compared. A motor operating frequency of 60 Hz with its sidebands is noticeable in all graphs, since all test motors have identical specifications. The healthy motor seems to have the clearest plots, while the rotor coil turn-to-turn and outer ring bearing damage faults tends to share similar messy plots. In addition, it can be observed that there are variations in the occurrence plots for each motor fault condition under different motor loadings. For example, the healthy motor tends to have a clearer plot when it is under no load or a full load. Similar observations can be made for bearing axis misalignment (Fault 1). Motors with other fault conditions tend to be messy at any loading condition. A smoother plot is expected for the healthy motor condition.   [48]. The scikit-learn platform in [49] is used to implement the performance evaluation. In this second stage, the motor current signal data undergo signal transformation. Step No. Function # model overall performance evaluation measure average classification accuracy measure average f-score, precision, and recall Figure 10 shows a matrix of FOPs where all five motor fault conditions under five different loading conditions are compared. A motor operating frequency of 60 Hz with its sidebands is noticeable in all graphs, since all test motors have identical specifications. The healthy motor seems to have the clearest plots, while the rotor coil turn-to-turn and outer ring bearing damage faults tends to share similar messy plots. In addition, it can be observed that there are variations in the occurrence plots for each motor fault condition under different motor loadings. For example, the healthy motor tends to have a clearer plot when it is under no load or a full load. Similar observations can be made for bearing axis misalignment (Fault 1). Motors with other fault conditions tend to be messy at any loading condition. A smoother plot is expected for the healthy motor condition.

Computer Simulation Specifications
This supervised train-test simulation is performed in an Intel(R) Core (TM) i5-4590 CPU processor at 3.30 GHz with 16 GB of RAM installed memory. Since this study is concerned with supervised learning, computational speed is not the primary objective and can be sped up with newer specifications.

Results and Discussion
The CCE loss functions of five models according to five motor couple loadings are shown in Figure 11a-e. All five models tend to converge to a CCE loss value less than 0.25 at each epoch. The presence of early convergence occurred in some runs in the early epochs. Applying dropouts help the model to get away from early convergence, which is often believed to be caused by local optimum convergence. On the other hand, when the motor loading condition is not used as an input label, the model in Figure 12 tends to converge at a slightly higher loss value of 0.50 after five cross-validation runs. All of the models are still converging, but training them further seems to produce no significant changes in their performances and will only lead to a greater risk of overfitting.

Computer Simulation Specifications
This supervised train-test simulation is performed in an Intel(R) Core (TM) i5-4590 CPU processor at 3.30 GHz with 16 GB of RAM installed memory. Since this study is concerned with supervised learning, computational speed is not the primary objective and can be sped up with newer specifications.

Results and Discussion
The CCE loss functions of five models according to five motor couple loadings are shown in Figure 11a-e. All five models tend to converge to a CCE loss value less than 0.25 at each epoch. The presence of early convergence occurred in some runs in the early epochs. Applying dropouts help the model to get away from early convergence, which is often believed to be caused by local optimum convergence. On the other hand, when the motor loading condition is not used as an input label, the model in Figure 12 tends to converge at a slightly higher loss value of 0.50 after five cross-validation runs. All of the models are still converging, but training them further seems to produce no significant changes in their performances and will only lead to a greater risk of overfitting. Figure 13a shows the average loss function difference between the combined five models when the motor loading condition is used as an input label, and the model when the motor loading condition is not used. It is evident that FOP-CNN tends to predict better than when the load condition is not used. Figure 13b also presents their classification performances. This average classification accuracy graph of both cases shows similar converging performances. Both cases have identical FOP-CNN parameters as shown in Table 2. Furthermore, this further verifies the observed graphical differences across motor loading conditions (see Figure 10). Simulating separate models, as performed in the first case model, may have avoided the difficulty caused by these differences. However, both cases still reach practical accuracies of 92% and 80%, respectively. specifications.

Results and Discussion
The CCE loss functions of five models according to five motor couple loadings are shown in Figure 11a-e. All five models tend to converge to a CCE loss value less than 0.25 at each epoch. The presence of early convergence occurred in some runs in the early epochs. Applying dropouts help the model to get away from early convergence, which is often believed to be caused by local optimum convergence. On the other hand, when the motor loading condition is not used as an input label, the model in Figure 12 tends to converge at a slightly higher loss value of 0.50 after five cross-validation runs. All of the models are still converging, but training them further seems to produce no significant changes in their performances and will only lead to a greater risk of overfitting. (e) Figure 11. Five train-test categorical cross entropy (CCE) loss function graphs (a-e) of five motor loading conditions, respectively, with five train-test runs of cross-validation. Figure 12. CCE loss graph with five train-test runs of cross-validation for the second case when motor load condition is ignored as input label. Figure 13a shows the average loss function difference between the combined five models when the motor loading condition is used as an input label, and the model when the motor loading condition is not used. It is evident that FOP-CNN tends to predict better than when the load condition is not used. Figure 13b also presents their classification performances. This average classification accuracy graph of both cases shows similar converging performances. Both cases have identical FOP-CNN parameters as shown in Table 2. Furthermore, this further verifies the observed graphical differences across motor loading conditions (see Figure 10). Simulating separate models, as performed in the first case model, may have avoided the difficulty caused by these differences. However, both cases still reach practical accuracies of 92% and 80%, respectively.  The classification reports of both cases are also taken. Each motor fault condition has 150 balance data. The first case model can classify bearing misalignment (Fault 1) with perfect recall (100%) and almost perfect precision (99.88%), as shown in Table 3. This seems intuitive since the energy loss caused by this fault has a direct effect on the motor's current signature. It also precisely classifies healthy motors with 100% precision, and has little difficulty in recalling other faults with 92.22% recall. Two motor fault conditions-stator inter-turn fault (Fault 2) and broken rotor strip (Fault 3)both have particularly good performances, with F-scores greater than 87.70% and 94.74%, respectively. However, outer bearing ring damage (Fault 4) performs the worst, with an F-score of 83.38%. When predicting Fault 4, there is a strong confusion in terms of predicting it as Fault 2, as shown in the confusion matrix of Figure 14a. Compared to other motor fault conditions, this fault may prove difficult to predict using FOP-CNN based only on the motor's current signature. It seems that this fault is greatly confused with the motor stator inter-turn fault condition, with a 15% accuracy on average. The prediction of a healthy motor condition is also confused with Fault 4, with a 5.20% average accuracy. This means that the synthetic physical damage inflicted to the outer ring bearing may have insignificant energy loss and may be due to friction. This small amount of energy loss may have no direct effect on and lead no changes in the motor's current signature.    The classification reports of both cases are also taken. Each motor fault condition has 150 balance data. The first case model can classify bearing misalignment (Fault 1) with perfect recall (100%) and almost perfect precision (99.88%), as shown in Table 3. This seems intuitive since the energy loss caused by this fault has a direct effect on the motor's current signature. It also precisely classifies healthy motors with 100% precision, and has little difficulty in recalling other faults with 92.22% recall. Two motor fault conditions-stator inter-turn fault (Fault 2) and broken rotor strip (Fault 3)-both have particularly good performances, with F-scores greater than 87.70% and 94.74%, respectively. However, outer bearing ring damage (Fault 4) performs the worst, with an F-score of 83.38%. When predicting Fault 4, there is a strong confusion in terms of predicting it as Fault 2, as shown in the confusion matrix of Figure 14a. Compared to other motor fault conditions, this fault may prove difficult to predict using FOP-CNN based only on the motor's current signature. It seems that this fault is greatly confused with the motor stator inter-turn fault condition, with a 15% accuracy on average. The prediction of a healthy motor condition is also confused with Fault 4, with a 5.20% average accuracy. This means that the synthetic physical damage inflicted to the outer ring bearing may have insignificant energy loss and may be due to friction. This small amount of energy loss may have no direct effect on and lead no changes in the motor's current signature.   In the second case, the FOP-CNN model still performs like the previous case model, but with relatively lower classification accuracy. Predictions of bearing misalignment faults and the healthy state have better F-scores than the other fault conditions, as shown in Table 4. The classification of the bearing axis misaligned fault (Fault 1) also has the highest precision, with 93.40%. Fault 4 has the least recall, with only 62.60%, which seems to affect the overall performance. The model can classify all motor fault conditions with at least 83.20% accuracy, except when classifying outer bearing ring damage (Fault 4), where it only has a 62.95% accuracy, as shown in its confusion matrix in Figure  14b. When predicting Fault 4, this is often confused with the stator inter-turn fault (Fault 2) and broken rotor strip (Fault 3) conditions. This performance is relatively similar to the previous case model where prediction Fault 4 is worst-performing class. The performances of other motor fault detection algorithms are shown in Table 5. Comparatively, the proposed FOP-CNN performs competitively with the other known algorithms. It is important to take note that these algorithms have different case settings; thus, the comparison based on classification accuracy seems ungrounded and is difficult to justify. Moreover, note that the In the second case, the FOP-CNN model still performs like the previous case model, but with relatively lower classification accuracy. Predictions of bearing misalignment faults and the healthy state have better F-scores than the other fault conditions, as shown in Table 4. The classification of the bearing axis misaligned fault (Fault 1) also has the highest precision, with 93.40%. Fault 4 has the least recall, with only 62.60%, which seems to affect the overall performance. The model can classify all motor fault conditions with at least 83.20% accuracy, except when classifying outer bearing ring damage (Fault 4), where it only has a 62.95% accuracy, as shown in its confusion matrix in Figure 14b. When predicting Fault 4, this is often confused with the stator inter-turn fault (Fault 2) and broken rotor strip (Fault 3) conditions. This performance is relatively similar to the previous case model where prediction Fault 4 is worst-performing class. The performances of other motor fault detection algorithms are shown in Table 5. Comparatively, the proposed FOP-CNN performs competitively with the other known algorithms. It is important to take note that these algorithms have different case settings; thus, the comparison based on classification accuracy seems ungrounded and is difficult to justify. Moreover, note that the dataset used for FOP-CNN is identical to empirical wavelet transform convolutional neural network (EWT-CNN) [50], but with more samples collected. Table 5. Comparison with other algorithms based on [50].

Methods
Testing Accuracy (%) ANN [51] 81.8 DBN [51] 96.4 SVM [31] 89.8 Sparse filter [52] 92.2 ADCNN [53] 96.2 EWT-CNN [50] 97.4 FOP-CNN (proposed) 92.4 The difference between training and testing accuracies is the most common indicator to analyze the presence of overfitting-an especially important property to determine whether the train-test learning algorithm is robust and reliable. The higher the difference, the greater the learning generalization, thus making the method more unreliable. In Table 6, the proposed FOP-CNN is shown to be more robust than the best-performing algorithm, with a 13-fold lower learning difference. This means that the model tends to be more generalized, meaning that it can predict motor faults more reliably and accurately. It is commonly known in the literature that the higher the dataset, the more reliable and accurate the learning model tends to be, which is the case for the proposed algorithm.

Conclusions
A novel motor fault diagnosis is successfully performed using only motor stator current signals and a frequency occurrence plot-based convolutional neural network (FOP-CNN). Five motor fault conditions-bearing axis deviation, stator coil turn-to-turn short circuit fault, broken rotor strip, outer bearing ring damage, and healthy motors-are considered and simulated under five motor loading conditions: 0%, 25%, 50%, 75% and 100% coupled loads. The diagnosis is also evaluated under two case scenarios-when the motor loading condition is considered as a label and when it is not. It was found that FOP-CNN tends to have a more robust performance when the motor load condition is available and is considered as an input label of the model. However, FOP-CNN still performed satisfactorily when the loading condition was not considered as an input label. Both cases provide users with an option of whether to install motor-coupled load monitoring or not.
FOP-CNN easily predicts the bearing axis deviation fault and healthy motor conditions. It can also satisfactorily predict stator coil turn-to-turn short circuit faults, broken rotor strips, and outer bearing ring damage faults. On the other hand, when the motor loading condition is not available, FOP-CNN can still predict all motor fault conditions satisfactorily, except the outer bearing ring damage fault. Future research on motor fault diagnosis based on other signals generated by vibration sensors and thermocouples can use FOP-CNN. This deep learning model also paves the way for new feature extraction techniques for time series applications.
Author Contributions: E.J.P. contributed to the conceptualization, data curation, formal analysis, investigation, methodology, project administration, validation, visualization, writing original draft, review, and editing. Y.-T.C. contributed to the conceptualization, data curation, formal analysis, investigation, methodology, software, and validation. H.-C.C. contributed to the funding acquisition, investigation, methodology, project administration, resources, and validation. C.-C.K. is the corresponding author and contributed to the conceptualization, data curation, formal analysis, funding acquisition, methodology, project administration, resources, review and editing. All authors have read and agreed to the published version of the manuscript.