Design of a Chamfering Tool Diagnosis System Using Autoencoder Learning Method

: In this paper, the autoencoder learning method is proposed for the system diagnosis of chamfering tool equipment. The autoencoder uses unsupervised learning architecture. The training dataset that requires only a positive sample is quite suitable for industrial production lines. The abnormal tool can be diagnosed by comparing the output and input of the autoencoder neural network. The adjustable threshold can e ﬀ ectively improve accuracy. This method can e ﬀ ectively adapt to the current environment when the data contain multiple signals. In the experimental setup, the main diagnostic signal is the current of the motor. The current reﬂects the torque change when the tool is abnormal. Four-step conversions are developed to process the current signal, including (1) current-to-voltage conversion, (2) analog-digital conversion, (3) downsampling rate, and (4) discrete Fourier transform. The dataset is used to ﬁnd the best autoencoder parameters by grid search. In training results, the testing accuracy, true positive rate, and precision approach are 87.5%, 83.33%, and 90.91%, respectively. The best model of the autoencoder is evaluated by online testing. The online test means loading the diagnosis model in the production line and evaluating the model. It is shown that the proposed tool can e ﬀ ectively detect abnormal conditions. The online assessment accuracy, true positive rate, and precision are 75%, 90%, and 69.23% in the original threshold, respectively. The accuracy can be up to 90% after adjusting the threshold, and the true positive rate and precision are up to 80% and 100%, respectively.


Introduction
Machine learning has become more and more mature with the advancement of technology. Many traditional industrial goals have been transformed into intelligence factories for optimizing yield and profit such as turning, milling and planing, the data for which are usually built in the time domain. The machining error is affected by the status of the tool.
The most common time-domain classification technique is to use algorithms with recurrent time characteristics. For example, Wenpeng Yin analyzes a dataset of different characteristics by the recurrent neural network (RNN), long-short term memory (LSTM), and gated recurrent unit (GRU) [1]. That dataset, emotional classification, sentence content classification, word part-of-speech classification, and path selection are compared by the algorithms above and convolutional neural networks (CNN). Due to the improvement in computer performance in recent years, the model of time series application can perform more complex operations. For example, H. Dinkel integrates the complex convolutional with other algorithms, such as long-short-term-memory and fully deep-neural-network [2]. The above in the grid search. The early stopping method is used to save the model when the testing accuracy reached the highest point [22,23]. This method can avoid overfitting problems.
The goal of this paper is to improve the yield and cost of the chamfering task, so the diagnostic of tool state is the closest approach. In the above literature, using machine learning is more accurate and faster than the expert system, so machine learning is used in this paper to improve the problem. In addition, the material is filtered [24,25] to unify the material size before processing. Although the application is similar to [17,18], the unobvious features of the dataset and the unpredictable noise of the environment lead to performance difficulties. In addition, the goal of this paper is to use a low-cost controller for edge computing. The controller is less suitable for complex models. The current of the motor is used for diagnosis in this paper.
This article will proceed with the following sections: In the second chapter, we will describe the working environment in the actual production line and how to design the data set. The third chapter describes the algorithm used. The fourth chapter states the details of the experiment. The fifth chapter uses the data in the actual production line to evaluate and analyze. The last chapter draws conclusions from the above results and outlines future work.

System Structure
The architecture diagram is shown in Figure 1. The master controller, a personal computer (PC), was used to control the 3-dimensional servo system, which sent the control commands to devices through the control area network (CAN) bus from the PC-CAN. The chamfering process is described in Figure 2. Due to the height difference of the chopstick tubes, the Z-axis stroke of the 3D servo table needed to be adjusted. The phase current of the chamfering motor changed when the material was scraped, so the microcontroller unit (MCU), Renesas RX231 was used to sample the current. When the current of the motor varied, the master was notified to control the drill stroke and the AI unit was triggered to sample the current of the motor by MCU. The goal of this paper is to improve the yield and cost of the chamfering task, so the diagnostic of tool state is the closest approach. In the above literature, using machine learning is more accurate and faster than the expert system, so machine learning is used in this paper to improve the problem. In addition, the material is filtered [24,25] to unify the material size before processing. Although the application is similar to [17,18], the unobvious features of the dataset and the unpredictable noise of the environment lead to performance difficulties. In addition, the goal of this paper is to use a lowcost controller for edge computing. The controller is less suitable for complex models. The current of the motor is used for diagnosis in this paper.
This article will proceed with the following sections: In the second chapter, we will describe the working environment in the actual production line and how to design the data set. The third chapter describes the algorithm used. The fourth chapter states the details of the experiment. The fifth chapter uses the data in the actual production line to evaluate and analyze. The last chapter draws conclusions from the above results and outlines future work.

System Structure
The architecture diagram is shown in Figure 1. The master controller, a personal computer (PC), was used to control the 3-dimensional servo system, which sent the control commands to devices through the control area network (CAN) bus from the PC-CAN. The chamfering process is described in Figure 2. Due to the height difference of the chopstick tubes, the Z-axis stroke of the 3D servo table needed to be adjusted. The phase current of the chamfering motor changed when the material was scraped, so the microcontroller unit (MCU), Renesas RX231 was used to sample the current. When the current of the motor varied, the master was notified to control the drill stroke and the AI unit was triggered to sample the current of the motor by MCU.         The tool, chamfer tool, was diagnosed by the AI model based on the current samples during drilling stroke. The AI unit is an arithmetic unit developed by Renesas. It is composed of a higher performance MCU. Both the AI unit and MCU are behind the processing platform. The processing platform is shown in Figure 3, which is one of the stations in the overall product processing system. The models trained by Tensorflow can be loaded. The method is relatively low in cost and can perform edge operations to improve overall system performance. The tool, chamfer tool, was diagnosed by the AI model based on the current samples during drilling stroke. The AI unit is an arithmetic unit developed by Renesas. It is composed of a higher performance MCU. Both the AI unit and MCU are behind the processing platform. The processing platform is shown in Figure 3, which is one of the stations in the overall product processing system. The models trained by Tensorflow can be loaded. The method is relatively low in cost and can perform edge operations to improve overall system performance.

Dataset Collection
Data collection is always an important part of machine learning. In this paper, the microcontroller unit (MCU) was used to detect the U-phase current of the chamfering motor by the current sensor. We selected one phase current of the three-phase motors for sampling, because the three-phase signal is the same. The current translates to voltage as follows, where , , and are the output voltage, logic power voltage, and current sources, respectively. The Vcc used 5 volts in this paper. These relations [26] of the signal are shown. In order to get the data of the complete chamfering process, the diagnostic task was started when the current raised slightly. The definition of the sample label is more rigorous at the training stage. Figure 4a shows the normal chamfering tool, and Figure 4b shows the abnormal chamfering tool. Although both can complete the processing, the training phase needed to find the threshold, so the definition was stricter. After that, the proper threshold was adjusted based on the application in the machine evaluation.

Dataset Collection
Data collection is always an important part of machine learning. In this paper, the microcontroller unit (MCU) was used to detect the U-phase current of the chamfering motor by the current sensor. We selected one phase current of the three-phase motors for sampling, because the three-phase signal is the same. The current translates to voltage as follows, where V IOUT , V CC , and I p are the output voltage, logic power voltage, and current sources, respectively. The Vcc used 5 volts in this paper. These relations [26] of the signal are shown. In order to get the data of the complete chamfering process, the diagnostic task was started when the current raised slightly.
The definition of the sample label is more rigorous at the training stage. Figure 4a shows the normal chamfering tool, and Figure 4b shows the abnormal chamfering tool. Although both can complete the processing, the training phase needed to find the threshold, so the definition was stricter. After that, the proper threshold was adjusted based on the application in the machine evaluation.  The data were originally collected at a sampling rate of 20 kHz by analog-digital conversion (ADC) of MCU. The conversion result and reduction voltage value are as follows, where is the digital output of voltage after ADC the , is reference voltage, and N is a number of bits in ADC converter. The and N was set to 5 and 12 as shown in [27], so we can get Equation (3) and Equation (5). In order to reduce the complexity of the model, the data set was downsampled and trained. When the AI unit was triggered, it was sampled at a sampling rate of 2 kHz for 768 ms, which had 1536 sampling points. The sampling stuffed to 2048 by average value. Finally, there were 1024 points after DFT and they were saved for training and testing. The DFT algorithm is as follows, where is the frequency domain data and N is the data length. In fact, the signal characteristics are very similar. The time-domain raw data is shown in Figure 5. Figure 5a is the current signal of the motor for normal tool, and Figure 5b is the current signal of the motor for abnormal. The spectrum is shown in Figure 6. Figure 6a shows the normal chamfering tool, and Figure 6b shows the abnormal chamfering tool. Finding key features with manpower is time-consuming, so machine learning was used to solve the problem in this paper. The data were originally collected at a sampling rate of 20 kHz by analog-digital conversion (ADC) of MCU. The conversion result and reduction voltage value are as follows, where D output is the digital output of voltage after ADC the V IOUT , V re f is reference voltage, and N is a number of bits in ADC converter. The V re f and N was set to 5 and 12 as shown in [27], so we can get Equations (4) and (6). In order to reduce the complexity of the model, the data set was downsampled and trained. When the AI unit was triggered, it was sampled at a sampling rate of 2 kHz for 768 ms, which had 1536 sampling points. The sampling stuffed to 2048 by average value. Finally, there were 1024 points after DFT and they were saved for training and testing. The DFT algorithm is as follows, where X f d is the frequency domain data and N is the data length. In fact, the signal characteristics are very similar. The time-domain raw data is shown in Figure 5. Figure 5a is the current signal of the motor for normal tool, and Figure 5b is the current signal of the motor for abnormal. The spectrum is shown in Figure 6. Figure 6a shows the normal chamfering tool, and Figure 6b shows the abnormal chamfering tool. Finding key features with manpower is time-consuming, so machine learning was used to solve the problem in this paper.

Methodology
The diagnostic methods include these three types: (1) statistical methods, (2) neighbor-based methods, and (3) dimensionality reduction based methods. Autoencoder belongs to the last one, and it was implemented on the reconstruction error detection.
Autoencoder is also a kind of artificial neuron network as shown in Figure 7, and it is given as,

Methodology
The diagnostic methods include these three types: (1) statistical methods, (2) neighbor-based methods, and (3) dimensionality reduction based methods. Autoencoder belongs to the last one, and it was implemented on the reconstruction error detection.
Autoencoder is also a kind of artificial neuron network as shown in Figure 7, and it is given as,

Methodology
The diagnostic methods include these three types: (1) statistical methods, (2) neighbor-based methods, and (3) dimensionality reduction based methods. Autoencoder belongs to the last one, and it was implemented on the reconstruction error detection.
Autoencoder is also a kind of artificial neuron network as shown in Figure 7, and it is given as, where x pre is the neuron of the first layer, K is the number of the neuron, w is the weight of the neuron, bias is an offset, and x out is the output of the neuron network. where is the neuron of the first layer, K is the number of the neuron, w is the weight of the neuron, bias is an offset, and is the output of the neuron network. The AE includes an input layer, hidden layers, and an output layer as in Figure 8. The AE scheme can be divided into two parts, encoder and decoder. The number of input layer neurons successively decreased in the hidden layers of the encoder block. The feature extraction for the input data was performed, and the most important information was kept after the number of hidden layers neurons successively decreased. After that, the output layer was reconstructed by the decoder, and the number of hidden layers neurons successively increased. Then, the information was restored from the hidden layers. When the model was training, the objective function is the following, The AE includes an input layer, hidden layers, and an output layer as in Figure 8. The AE scheme can be divided into two parts, encoder and decoder. The number of input layer neurons successively decreased in the hidden layers of the encoder block. The feature extraction for the input data was performed, and the most important information was kept after the number of hidden layers neurons successively decreased. After that, the output layer was reconstructed by the decoder, and the number of hidden layers neurons successively increased. Then, the information was restored from the hidden layers. When the model was training, the objective function is the following, where x out , x in , and N are the output of AE, the input of AE, and the number of input. The characteristics of AE includes data compression and reconstruction. In more complex applications, AE was used to extract the important parts from a large amount of input information by data compression feature, and then the extracted data was sent to another classifier for classification. This method can effectively reduce the complexity of classification. The reconstruction error was calculated from the difference between the original input signal and the decompressed output signal. Next, the error can be used to detect the defect. The model can also be trained only by positive samples, so this method is more suitable for abnormal diagnosis when abnormal data was lacking. Its detection effect is good and the model complexity is low. This model is used in many applications of yield detection and device detection. successively decreased. After that, the output layer was reconstructed by the decoder, and the number of hidden layers neurons successively increased. Then, the information was restored from the hidden layers. When the model was training, the objective function is the following, where , , and N are the output of AE, the input of AE, and the number of input. The characteristics of AE includes data compression and reconstruction. In more complex applications, AE was used to extract the important parts from a large amount of input information by data

Cross-Validation
To develop and evaluate the prediction model, three CV techniques such as K-fold CV, leave-one-out CV, and independent dataset are frequently used [28][29][30][31]. To reduce the noise of dataset influence, we attempted the K-fold CV, where K = 4. This method can be determined by the dataset not to generate super parameter in grid search. In the experiment, the grid search contained a lot of combinations and low hardware performance. The experiment took a long time to calculate. We finally chose to skip CV techniques.

Evaluation Metrics
If there was noise in the training and validation set in training, it made the threshold set too big. In the diagnosis, the larger threshold caused the abnormal tool not to be discovered. This paper used the traversal value of the training set and validation set to define the threshold to find the best threshold during training. Although the method will reduce the precision (PRE) as follows, where true positive (TP) means the abnormal tool was correctly identified as abnormal, and false positive (FP) means the normal tool was incorrectly identified as abnormal. In addition, the ACC and TPR are as follows, where true negative (TN) means a normal tool was correctly identified as normal, false negative (FN) is an abnormal tool incorrectly identified as normal.

Parameter Optimization
The parameter settings vary for different applications, and manual adjustments are quite impractical. This paper uses a grid search to test the model for different combinations of parameters. The traversal AE parameters are as follows: The zoom in and zoom out size increased from 1.25 to 15 at intervals of 0.25. Both zooms in and zoom out means are the number of neural scaling ratio between previous layers to the current layer. However, the hidden layer numbers are one and three, and the Energies 2019, 12, 3708 9 of 13 reconstruction error type is maximum of absolute difference (MAD), sum of absolute difference (SAD), and sum of squared difference (SSD). The equations are defined as follows: In the training parameter, the learning rate is set to 0.001 and epoch is 3000. There are a total of 342 combinations. However, the input size is 1024 as follows, where N is the number of time-domain data, and the number is set as 2048 here. The input size is based on Hermitian, so the maximum K is half of N. After completing the grid search, several better model parameters are listed in Tables 1 and 2. These tables use grid search to evaluate each combination and select the best top three models. In Table 1, the maximum error value of the training and verification set is used as the threshold. In Table 2, the appropriate value for traversing the sample is used as the threshold. Obviously, the latter method can effectively improve the accuracy rate (ACC) and true positive rate (TPR) and only needs to sacrifice a little PRE.  Figure 9 shows the training curve of the AE best model. The blue line is the testing accuracy, the green line is the testing TPR, the red line is the testing PRE, and the orange line is the MAD convergence process. It can be seen from the figure that the model achieves the best accuracy in 1726 epochs, and TPR and PRE are also the best values. The higher accuracy is used to save the best model before the model begins to overfit in this paper. In Figure 10, the MAD of the overall dataset is calculated based on the best model. As Figure 10 shown, the MAD of the abnormal sample area is significantly increased, and there are still several samples with large MAD in the normal sample area. It may be caused by power or mechanism error of the production line.  Figure 9 shows the training curve of the AE best model. The blue line is the testing accuracy, the green line is the testing TPR, the red line is the testing PRE, and the orange line is the MAD convergence process. It can be seen from the figure that the model achieves the best accuracy in 1726 epochs, and TPR and PRE are also the best values. The higher accuracy is used to save the best model before the model begins to overfit in this paper. In Figure 10, the MAD of the overall dataset is calculated based on the best model. As Figure 10 shown, the MAD of the abnormal sample area is significantly increased, and there are still several samples with large MAD in the normal sample area. It may be caused by power or mechanism error of the production line.

Online Analysis
In this paper, the best model is loaded into the AI unit. The model is evaluated by 10 normal samples and 10 abnormal samples. Figure 11 shows the calculated results, MAD by AE. The accuracy of the online evaluation is only 75%, the TPR is 90%, and the PRE is 69.23%. It can be seen from the figure that there is a significant difference between the normal and the abnormal chamfering tool. Because the label definition is stricter during the training, the chamfering tool is judged to abnormal state when it was slightly worn. The prior error may be caused by system noise.   Figure 9 shows the training curve of the AE best model. The blue line is the testing accuracy, the green line is the testing TPR, the red line is the testing PRE, and the orange line is the MAD convergence process. It can be seen from the figure that the model achieves the best accuracy in 1726 epochs, and TPR and PRE are also the best values. The higher accuracy is used to save the best model before the model begins to overfit in this paper. In Figure 10, the MAD of the overall dataset is calculated based on the best model. As Figure 10 shown, the MAD of the abnormal sample area is significantly increased, and there are still several samples with large MAD in the normal sample area. It may be caused by power or mechanism error of the production line.

Online Analysis
In this paper, the best model is loaded into the AI unit. The model is evaluated by 10 normal samples and 10 abnormal samples. Figure 11 shows the calculated results, MAD by AE. The accuracy of the online evaluation is only 75%, the TPR is 90%, and the PRE is 69.23%. It can be seen from the figure that there is a significant difference between the normal and the abnormal chamfering tool. Because the label definition is stricter during the training, the chamfering tool is judged to abnormal state when it was slightly worn. The prior error may be caused by system noise.

Online Analysis
In this paper, the best model is loaded into the AI unit. The model is evaluated by 10 normal samples and 10 abnormal samples. Figure 11 shows the calculated results, MAD by AE. The accuracy of the online evaluation is only 75%, the TPR is 90%, and the PRE is 69.23%. It can be seen from the figure that there is a significant difference between the normal and the abnormal chamfering tool. Because the label definition is stricter during the training, the chamfering tool is judged to abnormal state when it was slightly worn. The prior error may be caused by system noise. Energies 2019, 12, x FOR PEER REVIEW 11 of 13 Figure 11. Online test result.

Discussion
This paper proposes the autoencoder and signals processing scheme for a chamfering tool diagnosis system. The machine learning algorithm is implemented and developed in an AI unit. In the online evaluation, the error tolerance is low due to the completion of the training under the strict definition. In fact, the processing precision is not a high requirement in this application, which can increase the judgment threshold. For example, the MAD threshold is set to 30 as shown in Figure 12. The value of errors dropped to two. The accuracy rate increased to 90% and the PRE rose to 100%, but the TPR decreased to 80%. Adjustable detection makes the application flexible and suitable for our requirements. It is illustrated that AE can be performed as a low-cost controller. In the future, we can try to correct the position of the machine and the tubular material to reduce noise, or use multisensing technology to increase input characteristics and improve the accuracy of diagnosis. In addition to the optimization of the approach, the method can be applied to another diagnosis, such as stamping press fixture, rail drill tool, and lathe tool, etc.

Discussion
This paper proposes the autoencoder and signals processing scheme for a chamfering tool diagnosis system. The machine learning algorithm is implemented and developed in an AI unit. In the online evaluation, the error tolerance is low due to the completion of the training under the strict definition. In fact, the processing precision is not a high requirement in this application, which can increase the judgment threshold. For example, the MAD threshold is set to 30 as shown in Figure 12. The value of errors dropped to two. The accuracy rate increased to 90% and the PRE rose to 100%, but the TPR decreased to 80%. Adjustable detection makes the application flexible and suitable for our requirements. It is illustrated that AE can be performed as a low-cost controller. In the future, we can try to correct the position of the machine and the tubular material to reduce noise, or use multi-sensing technology to increase input characteristics and improve the accuracy of diagnosis. In addition to the optimization of the approach, the method can be applied to another diagnosis, such as stamping press fixture, rail drill tool, and lathe tool, etc.

Discussion
This paper proposes the autoencoder and signals processing scheme for a chamfering tool diagnosis system. The machine learning algorithm is implemented and developed in an AI unit. In the online evaluation, the error tolerance is low due to the completion of the training under the strict definition. In fact, the processing precision is not a high requirement in this application, which can increase the judgment threshold. For example, the MAD threshold is set to 30 as shown in Figure 12. The value of errors dropped to two. The accuracy rate increased to 90% and the PRE rose to 100%, but the TPR decreased to 80%. Adjustable detection makes the application flexible and suitable for our requirements. It is illustrated that AE can be performed as a low-cost controller. In the future, we can try to correct the position of the machine and the tubular material to reduce noise, or use multisensing technology to increase input characteristics and improve the accuracy of diagnosis. In addition to the optimization of the approach, the method can be applied to another diagnosis, such as stamping press fixture, rail drill tool, and lathe tool, etc.