Demagnetization Fault Diagnosis of a PMSM Using Auto-Encoder and K-Means Clustering

In recent years, many motor fault diagnosis methods have been proposed by analyzing vibration, sound, electrical signals, etc. To detect motor fault without additional sensors, in this study, we developed a fault diagnosis methodology using the signals from a motor servo driver. Based on the servo driver signals, the demagnetization fault diagnosis of permanent magnet synchronous motors (PMSMs) was implemented using an autoencoder and K-means algorithm. In this study, the PMSM demagnetization fault diagnosis was performed in three states: normal, mild demagnetization fault, and severe demagnetization fault. The experimental results indicate that the proposed method can achieve 96% accuracy to reveal the demagnetization of PMSMs.


Introduction
Permanent magnet synchronous motors (PMSMs) are widely used for consumer products and in industry. However, PMSMs often experience a significant temperature increase under high speed or high load operation. The heat may lead to irreversible demagnetization and degrade the performance of PMSMs. For the purpose of predictive maintenance, several studies on demagnetization fault diagnosis of PMSMs were performed; signal analysis and intelligent learning algorithm are common methods.
Many signal analysis methods have been developed based on frequency-domain and time-frequency domain analysis. Ishikawa [1] proposed a demagnetization fault diagnosis method for PMSMs based on vibration signals, which were analyzed using a fast Fourier transform (FFT); the demagnetization situation was determined by comparing the difference in the frequency and amplitude between normal and demagnetization motors. Many physical signals can also be used for fault diagnosis [1][2][3][4][5][6], but the additional installed sensors increase the cost. To reduce cost, the demagnetization fault diagnosis using stator current signal analysis is also popular as it does not require installation of additional sensors [7].
Given the ease of handling of nonlinear dynamic systems, machine learning algorithms have been widely used in fault diagnosis fields such as photovoltaic arrays, marine diesel engines, and hydraulic brake systems [8][9][10]. The dynamic PMSM model is also a nonlinear system [11], and thus several recent studies used machine learning techniques for PMSM demagnetization fault diagnosis. Zhu [2] proposed a PMSM demagnetization fault diagnosis method using a back propagation neural network (BPNN) with acoustic noise data; however, an additional microphone must be used, which is more expensive for data collection. Kao [12] developed PMSM fault classification with a convolutional neural network model that predicts five failure modes. The abovementioned investigations were carried out using a supervised learning model, which has been a popular method for fault diagnosis research in recent years [12][13][14][15][16][17][18][19]. However, it is hard to accurately label different demagnetization states from numerous experimental data. To simplify label engineering, in this paper, a PMSM demagnetization demagnetization states from numerous experimental data. To simplify label engineering, in this paper, a PMSM demagnetization fault diagnosis method is proposed using an unsupervised learning model. No additional sensor is required in the data collection process, and five different physical signals (current, voltage, speed, power, and torque) are captured directly from the motor driver. Using multiple physical signals simultaneously is helpful for reducing the influence of data noise instead of using only a single physical signal. The proposed fault diagnosis method is divided to two parts: the autoencoder and the K-means algorithm. The autoencoder is used to confirm whether the motor is demagnetized, and the demagnetization level is determined by K-means clustering. The flowchart of demagnetization fault diagnosis is shown in Figure 1. This paper is organized as follows: In Section 2, the experimental setup and data processing are introduced. In Section 3, the PMSM demagnetization state is detected by using the autoencoder. In Section 4, the PMSM demagnetization level is obtained based on K-means clustering. Section 5 presents the experimental result. Finally, discussion and conclusions of the work are presented in Sections 6.

Experimental Setup and Data Processing
To establish a machine learning model, the first step was data collection and pre-processing. This section introduces PMSM experimental setup with different demagnetization state, then the data processing methods are outlined.

Demagnetization Fault Implementation
In this study, the experimental data were collected in healthy and demagnetization fault conditions. To create a demagnetization fault in a PMSM, the PMSM was fixed by a dyno motor, and a reverse excitation current was applied on d-axis to generate a reverse magnetic field [20], as shown in Figure 2. This paper is organized as follows: In Section 2, the experimental setup and data processing are introduced. In Section 3, the PMSM demagnetization state is detected by using the autoencoder. In Section 4, the PMSM demagnetization level is obtained based on K-means clustering. Section 5 presents the experimental result. Finally, discussion and conclusions of the work are presented in Section 6.

Experimental Setup and Data Processing
To establish a machine learning model, the first step was data collection and pre-processing. This section introduces PMSM experimental setup with different demagnetization state, then the data processing methods are outlined.

Demagnetization Fault Implementation
In this study, the experimental data were collected in healthy and demagnetization fault conditions. To create a demagnetization fault in a PMSM, the PMSM was fixed by a dyno motor, and a reverse excitation current was applied on d-axis to generate a reverse magnetic field [20], as shown in Figure 2.
The specifications of the test PMSMs used in this paper are shown in Table 1, which are divided into three categories. The first is in the normal state; the second has 10% demagnetization, called the mild demagnetization fault; and the third has 25% demagnetization, called the severe demagnetization fault.  The specifications of the test PMSMs used in this paper are shown in Table 1, which are divided into three categories. The first is in the normal state; the second has 10% demagnetization, called the mild demagnetization fault; and the third has 25% demagnetization, called the severe demagnetization fault.  Figure 3 shows the data collection experimental setup, the test PMSM with different demagnetization degrees operated at the desired speed, and a smaller power PMSM (400 W) was controlled to give a load to the test PMSM. Both motors were operated simultaneously to collect training data.  Figure 3 shows the data collection experimental setup, the test PMSM with different demagnetization degrees operated at the desired speed, and a smaller power PMSM (400 W) was controlled to give a load to the test PMSM. Both motors were operated simultaneously to collect training data. The specifications of the test PMSMs used in this paper are shown in Table 1, which are divided into three categories. The first is in the normal state; the second has 10% demagnetization, called the mild demagnetization fault; and the third has 25% demagnetization, called the severe demagnetization fault.  Figure 3 shows the data collection experimental setup, the test PMSM with different demagnetization degrees operated at the desired speed, and a smaller power PMSM (400 W) was controlled to give a load to the test PMSM. Both motors were operated simultaneously to collect training data. Data were acquired performed using the RS485 module for 1 s with a sampling frequency of 1000 Hz to comply with the Nyquist sampling theorem because the maximum speed of the motor Data were acquired performed using the RS485 module for 1 s with a sampling frequency of 1000 Hz to comply with the Nyquist sampling theorem because the maximum speed of the motor was 2000 rpm and the maximum electric angular frequency was 133.33 Hz. Five physical signals (current, voltage, speed, power, and torque) were collected from the motor drive under different PMSM operation conditions. Based on the T-N curve of the test PMSM provided by the vendor, the PMSM operation is limited in the "continue duty" region during data collection, as shown in Figure 4. was 2000 rpm and the maximum electric angular frequency was 133.33 Hz. Five physical signals (current, voltage, speed, power, and torque) were collected from the motor drive under different PMSM operation conditions. Based on the T-N curve of the test PMSM provided by the vendor, the PMSM operation is limited in the "continue duty" region during data collection, as shown in Figure  4. The data collection can be divided to two parts: training data and test data. The training data were used to train the unsupervised model (auto-encoder); thus, the data were collected from normal PMSM operation. A normal PMSM was operated at speeds between 200 and 2000 rpm and loads between 0.5 and 2.5 Nm. There was a total of 50 operation conditions, and the data collection was repeated 20 times in each operation condition; thus, 1000 datasets were obtained for training an unsupervised model. Similarly, the test data were collected from normal and demagnetized PMSM in 10 random operation conditions; thus, there were a total of 600 datasets, as shown in Table 2.

Data Processing
Initially, we obtained the five physical signals: current, voltage, speed, power, and torque, from the motor drive through the RS485 module. Subsequently, we converted these five time-series signals into root mean square (RMS). It was obtained from the following expression: where means the five physical signals, is the sampling data, and is the RMS of each 1000 sampling data.
Through the procedure of feature transformation, the original 5000 data points were compressed into five data points for reducing the data capacity. Afterward, these data were normalized by dividing the maximum current, maximum voltage, maximum speed, maximum power, and maximum torque, respectively. It was obtained from the following expression: where means the maximum value in all , and is the normalization of the . The equation excludes the limitation of data units and improved the convergence speed and accuracy of the unsupervised learning model through normalization [21]. The data collection can be divided to two parts: training data and test data. The training data were used to train the unsupervised model (auto-encoder); thus, the data were collected from normal PMSM operation. A normal PMSM was operated at speeds between 200 and 2000 rpm and loads between 0.5 and 2.5 Nm. There was a total of 50 operation conditions, and the data collection was repeated 20 times in each operation condition; thus, 1000 datasets were obtained for training an unsupervised model. Similarly, the test data were collected from normal and demagnetized PMSM in 10 random operation conditions; thus, there were a total of 600 datasets, as shown in Table 2.

Data Processing
Initially, we obtained the five physical signals: current, voltage, speed, power, and torque, from the motor drive through the RS485 module. Subsequently, we converted these five time-series signals into root mean square (RMS). It was obtained from the following expression: where X means the five physical signals, X i is the ith sampling data, and X RMS is the RMS of each 1000 sampling data. Through the procedure of feature transformation, the original 5000 data points were compressed into five data points for reducing the data capacity. Afterward, these data were normalized by dividing the maximum current, maximum voltage, maximum speed, maximum power, and maximum torque, respectively. It was obtained from the following expression: where X MAX means the maximum value in all X RMS , and X NOR is the normalization of the X RMS . The equation excludes the limitation of data units and improved the convergence speed and accuracy of the unsupervised learning model through normalization [21].

PMSM Demagnetization State Detection
This section introduces the training process of an autoencoder. Next, it describes how to fit a normal surface. Finally, the anomaly detection is presented in detail.

Autoencoder Training
We started with selecting an unsupervised learning model. The lack of need for prior labeling, effective feature extraction of data, and filtering of noise are advantages of autoencoders [22]. Therefore, we used the autoencoder as an unsupervised dimension reduction model. An autoencoder is a type of artificial neural network, and the difference between the two is that the output layer has the same number of nodes as the input layer, with the purpose of reconstructing its inputs instead of predicting the target value given inputs. Therefore, the labeling process is not required to enable learning in advance and is regarded as an unsupervised learning model. Its architecture is mainly divided into two parts: the first is a decoder and the second is an encoder. To simplify, we assume a neural network is composed of a single hidden layer. The encoder maps the input vector s ∈ R A in the hidden representation h ∈ R B as follows: where f (·) is a non-linear activation function, W 1 ∈ R B×A is a weight matrix, and b 1 ∈ R B is a bias vector. The decoder part tries to reconstruct the input by using the following expression: where W 2 ∈ R A×B is a weight matrix of the decoder and b 2 ∈ R A is the decoder bias vector. Substituting Equation (1) into Equation (2), the total autoencoder was obtained using the following expression: The autoencoder model was trained using a training dataset composed of only normal data and minimizing the mean squared error (MSE) loss function: where p contains all the parameters of the autoencoder, i.e., the elements of W 1 , W 2 , b 1 , and b 2 .
The autoencoder previously presented is composed of only two layers: one for the encoder and one for the decoder; their number was increased to create a deep autoencoder. In general, the feature space R B has lower dimensionality than the input space R A , and the feature vector h is regarded as a compressed representation of the input s. Hence, we performed the dimensionality reduction through the encoder part. In this paper, the training process does not stop until the loss converges. We saved the encoder part from the trained autoencoder and the feature vector h was extracted from the training dataset to the next step. The feature vector h was set as three-dimensional vector to visualize the results.
In the training process, we trained the autoencoder through the normalized training dataset composed of normal data only. The parameter settings of the proposed autoencoder are shown in Table 3. The autoencoder was trained using the Adam algorithm [23] with the learning rate of 0.001, with 100 epochs. In addition, the hyperbolic tangent function was selected as the activation function and the mean squared error was chosen as the loss function. As shown in Figure 5a, the model loss converged after training for 100 epochs. Therefore, it met the standard of completing the model training, and the three-dimensional vectors of training dataset is displayed in Figure 5b.

Surface Fitting
Initially, we obtained the three-dimensional vector from the training dataset. Subsequently, we used the three-dimensional vector to fit a surface called the "normal surface" because the training dataset consisted of only normal data. The normal surface was obtained using the following expression: where , , and are the three-dimensional coordinates of the space ℝ , and , , and are all the parameters of the normal surface. Fitting the normal surface was performed using the linear least squares method that minimized the sum of squared residuals [24]: where , , and are from the three-dimensional vector . In this paper, the fitting process did not stop until the R-squared was higher than 0.9. We also saved the normal surface for the anomaly detection.
In the experiment, we fitted the normal surface with the three-dimensional vectors of training dataset and calculated some statistical metrics to evaluate the fit. Figure 6 plots the normal surface. The parameters of , , and are presented and the common statistical metrics such as the sum of squared error (SSE), root mean squared error (RMSE), and R-squared are also presented in Table  4. SSE and RMSE were close to 0, and R-squared was greater than 0.9. Hence, it met the standard of

Surface Fitting
Initially, we obtained the three-dimensional vector h from the training dataset. Subsequently, we used the three-dimensional vector h to fit a surface called the "normal surface" because the training dataset consisted of only normal data. The normal surface was obtained using the following expression: where x, y, and z are the three-dimensional coordinates of the space R B , and p 0 , p 1 , and p 2 are all the parameters of the normal surface. Fitting the normal surface was performed using the linear least squares method that minimized the sum of squared residuals [24]: where x i , y i , and z i are from the three-dimensional vector h. In this paper, the fitting process did not stop until the R-squared was higher than 0.9. We also saved the normal surface for the anomaly detection.
In the experiment, we fitted the normal surface with the three-dimensional vectors of training dataset and calculated some statistical metrics to evaluate the fit. Figure 6 plots the normal surface. The parameters of p 0 , p 1 , and p 2 are presented and the common statistical metrics such as the sum of squared error (SSE), root mean squared error (RMSE), and R-squared are also presented in Table 4. SSE and RMSE were close to 0, and R-squared was greater than 0.9. Hence, it met the standard of completing the surface fitting, and we saved the trained encoder and the normal surface for the purpose of anomaly detection.
Energies 2020, 13, x FOR PEER REVIEW 7 of 13 completing the surface fitting, and we saved the trained encoder and the normal surface for the purpose of anomaly detection.

Anomaly Detection
We started by loading the encoder that was trained during the autoencoder training stage. Next, we used the preprocessed test data as the input of the encoder. Carrying out the encoder was the last ste, to obtain the three-dimensional vector from the test data.

Anomaly Detection
We started by loading the encoder that was trained during the autoencoder training stage. Next, we used the preprocessed test data as the input of the encoder. Carrying out the encoder was the last ste, to obtain the three-dimensional vector h from the test data.
Next, we loaded the normal surface that was fitted in the surface fitting stage, and obtained the three-dimensional vector h from the test data. After, we computed the distance from the h to the normal surface from the following Equation: where x i , y i , and z i are from the three-dimensional vector h; d is the distance from the h to the normal surface; and p 0 , p 1 , and p 2 are the parameters of the normal surface. Then, a judgement equation was constructed to determine whether the data were abnormal. The judgement equation is expressed as: where is the threshold, d is the distance from the h to the normal surface. If the distance d is less than or equal to the threshold , the motor state is 0, which means that the motor is under a normal state; on the contrary, if the distance d is greater than the threshold , the motor state becomes one, which means that the motor is under an abnormal state. The distance between h and the normal surface can determine whether demagnetization has occurred. To determine the demagnetization level, we needed to incorporate the next stage of clustering.
In the experiment, we used the preprocessed test data as the input, loaded the trained encoder, and obtained the three-dimensional vectors of test data. Figure 7 shows the three-dimensional vectors of test data. We also plotted test points on a normal surface, as shown in Figure 8a. Since the normal surface was obtained by minimizing the sum of squared residuals from training data, the maximum residual was a threshold value to determine if demagnetization had occurred. In this study, the threshold ϵ was set to 0.00102 to distinguish between normal and abnormal data. Figure 8b shows the distance from each test point to the normal surface and the threshold. In this figure, blue bars represent the distance for the normal PMSM, green bars represent the distance for the PMSM with 10% Figure 7. Three-dimensional vectors of test data. The distribution of three test PMSMs: blue circles represent data for the normal PMSM, green triangles represent data for the PMSM with 10% demagnetization fault, and red squares represent data for PMSM with 25% demagnetization fault. We also plotted test points on a normal surface, as shown in Figure 8a. Since the normal surface was obtained by minimizing the sum of squared residuals from training data, the maximum residual was a threshold value to determine if demagnetization had occurred. In this study, the threshold was set to 0.00102 to distinguish between normal and abnormal data. Figure 8b shows the distance from each test point to the normal surface and the threshold. In this figure, blue bars represent the distance for the normal PMSM, green bars represent the distance for the PMSM with 10% demagnetization fault, red bars represent the distance for PMSM with 25% demagnetization fault, and the black horizontal line represents the threshold. This figure also illustrates that the more the PMSM was demagnetized, the farther the distance point to the normal surface. Therefore, we used the threshold to distinguish between normal and abnormal data. The confusion matrix of anomaly detection is displayed in Table 5. In the 600 test data, 576 predictions were correct and 24 were errors. Table 6 shows the accuracy of anomaly detection; the accuracy of the normal and abnormal were 94.5%, and 96.8%, respectively. The overall accuracy of anomaly detection was 96%. Figure 7. Three-dimensional vectors of test data. The distribution of three test PMSMs: blue circles represent data for the normal PMSM, green triangles represent data for the PMSM with 10% demagnetization fault, and red squares represent data for PMSM with 25% demagnetization fault We also plotted test points on a normal surface, as shown in Figure 8a. Since the normal surface was obtained by minimizing the sum of squared residuals from training data, the maximum residual was a threshold value to determine if demagnetization had occurred. In this study, the threshold ϵ was set to 0.00102 to distinguish between normal and abnormal data. Figure 8b shows the distance from each test point to the normal surface and the threshold. In this figure, blue bars represent the distance for the normal PMSM, green bars represent the distance for the PMSM with 10% demagnetization fault, red bars represent the distance for PMSM with 25% demagnetization fault, and the black horizontal line represents the threshold. This figure also illustrates that the more the PMSM was demagnetized, the farther the distance point to the normal surface. Therefore, we used the threshold to distinguish between normal and abnormal data. The confusion matrix of anomaly detection is displayed in Table 5. In the 600 test data, 576 predictions were correct and 24 were errors. Table 6 shows the accuracy of anomaly detection; the accuracy of the normal and abnormal were 94.5%, and 96.8%, respectively. The overall accuracy of anomaly detection was 96%.

PMSM Demagnetization Level Clustering
In this study, we chose the K-means clustering as an unsupervised clustering model. This algorithm does not require prior labeling, is relatively simple to implement, scales to large datasets, and guarantees convergence [25]. The K-means algorithm is an iterative algorithm that tries to partition the dataset into K distinct non-overlapping clusters where each data point belongs to only one group. It tries to make the inter-cluster data points as similar as possible while also keeping the clusters as different as possible. It assigns data points to a cluster such that the sum of the squared distance between the Energies 2020, 13, 4467 9 of 12 data points and the cluster's centroid is minimum. The less the variation within clusters, the more homogeneous the data points within the same cluster.
Initially, we obtained the three-dimensional vector from the test data, surmising that the motor state is abnormal from the anomaly detection. Subsequently, we used the three-dimensional vector as the input of the K-means model. Next, the model classified if the demagnetization fault in PMSM was severe. If the demagnetization fault was mild, the diagnosis system continued; however, if the demagnetization fault was severe, the diagnosis system sent a warning signal.
It was difficult to choose the number of clusters in the K-means clustering. Therefore, a method called elbow method was proposed by Robert in 1953. It uses the sum of squared errors (SSE) within the cluster to measure the quality of the cluster. As the number of clusters increases, the SSE in the cluster continues to decrease. This method contends that increasing the number of clusters will not enhance the effect of clustering. Therefore, there is an "elbow", which is the optimal number of clusters.
Applying the elbow method to the abnormal test data, the experimental results are shown in Figure 9. The horizontal axis represents the number of clusters, and the vertical axis represents SSE. We observed from the figure that the most severe stage of decline is from clusters 1 to 2, and followed by a slow downward trend. Therefore, the number of clusters 2 is regarded as an elbow and also becomes the number of clusters in the K-means clustering. In the experiment, we obtained the three-dimensional vectors of test data considered abnormal in anomaly detection, as shown in Figure 10a. Then, K-means clustering was used for clustering. Figure 10b displayed the result of the K-means clustering, and we found that the abnormal data was divided into two clusters: one cluster was mainly composed of the PMSM with 10% demagnetization fault; the second cluster was mainly composed of the PMSM with 25% demagnetization fault. The confusion matrix of clustering is displayed in Table 7. Of the 398 test data considered abnormal, 386 predictions were correct and 12 were errors. Table 8 shows the accuracy of clustering, the accuracies of the PMSM with 10% demagnetization fault and the PMSM with 25% demagnetization fault were 99.5%, and 100%, respectively. The overall accuracy of clustering was 97%. In the experiment, we obtained the three-dimensional vectors of test data considered abnormal in anomaly detection, as shown in Figure 10a. Then, K-means clustering was used for clustering. Figure 10b displayed the result of the K-means clustering, and we found that the abnormal data was divided into two clusters: one cluster was mainly composed of the PMSM with 10% demagnetization fault; the second cluster was mainly composed of the PMSM with 25% demagnetization fault. The confusion matrix of clustering is displayed in Table 7. Of the 398 test data considered abnormal, 386 predictions were correct and 12 were errors. Table 8 shows the accuracy of clustering, the accuracies of the PMSM with 10% demagnetization fault and the PMSM with 25% demagnetization fault were 99.5%, and 100%, respectively. The overall accuracy of clustering was 97%. fault; the second cluster was mainly composed of the PMSM with 25% demagnetization fault. The confusion matrix of clustering is displayed in Table 7. Of the 398 test data considered abnormal, 386 predictions were correct and 12 were errors. Table 8 shows the accuracy of clustering, the accuracies of the PMSM with 10% demagnetization fault and the PMSM with 25% demagnetization fault were 99.5%, and 100%, respectively. The overall accuracy of clustering was 97%.
(a) (b) Figure 10. Clustering: (a) data considered abnormal in anomaly detection and (b) results of clustering.

Experimental Results
Combining the anomaly detection and clustering, the confusion matrix of the total test data is summarized in Table 9. Of the 600 test data, 575 predictions were correct and 25 were errors. Table 10 shows the accuracy of test dataset, the normal PMSM, the PMSM with 10% demagnetization fault, and the PMSM with 25% demagnetization fault: 94.5%, 93%, and 100%, respectively. The overall accuracy of our proposed method was 96%. The experimental results showed that the proposed unsupervised demagnetization fault diagnosis system in PMSM can accurately diagnose demagnetization faults in a real environment.

Conclusions
In this paper, an unsupervised demagnetization fault diagnosis method in PMSM was proposed. The unsupervised learning method was adopted to help simplify label engineering in training process, unlike recent research that often uses supervised learning methods. To reduce costs, five different physical signals were captured directly from the motor driver as training data and test data as opposed to the method used in the literature, [4] which only adopts a stator current signal. Multiple physical signals were used simultaneously that reduced the influence of data noise compared to using single physical signal.
In this study, the PMSM demagnetization fault diagnosis was performed in three states: normal, 10% demagnetization fault, and 25% demagnetization fault. More than 1000 training data points were used for training the algorithm and 600 for testing. The experimental results showed that the accuracy of anomaly detection was 96%, the accuracy of clustering was 97%, and the accuracy of total diagnosis system was 96%. The experimental results confirmed that the proposed method is feasible for PMSM demagnetization fault diagnosis.