A CNN Prediction Method for Belt Grinding Tool Wear in a Polishing Process Utilizing 3-Axes Force and Vibration Data

: This paper presents a tool wear monitoring methodology on the abrasive belt grinding process using vibration and force signatures on a convolutional neural network (CNN). A belt tool typically has a random orientation of abrasive grains and grit size variation for coarse or ﬁne material removal. Degradation of the belt condition is a critical phenomenon that affects the workpiece quality during grinding. This work focuses on the identifation and the study of force and vibrational signals taken from sensors along an axis or combination of axes that carry important information of the contact conditions, i.e., belt wear. Three axes of the two sensors are aligned and labelled as X-axis (parallel to the direction of the tool during the abrasive process), Y-axis (perpendicular to the direction of the tool during the abrasive process) and Z-axis (parallel to the direction of the tool during the retract movement). The grinding process was performed using a customized abrasive belt grinder attached to a multi-axis robot on a mild-steel workpiece. The vibration and force signals along three axes (X, Y and Z) were acquired for four discrete sequential belt wear conditions: brand-new, 5-min cycle time, 15-min cycle time, and worn-out. The raw signals that correspond to the sensor measurement along the different axes were used to supervisedly train a 10-Layer CNN architecture to distinguish the belt wear states. Different possible combinations within the three axes of the sensors (X, Y, Z, XY, XZ, YZ and XYZ) were fed as inputs to the CNN model to sort the axis (or combination of axes) in the order of distinct representation of the belt wear state. The CNN classiﬁcation results revealed that the combination of the XZ-axes and YZ-axes of the accelerometer sensor provides more accurate predictions than other combinations, indicating that the information from the Z-axis of the accelerometer is signiﬁcant compared to the other two axes. In addition, the CNN accuracy of the XY-axes combination of dynamometer outperformed that of other combinations.


Introduction
In manufacturing industries, the requirements for high-quality precision parts with complex geometries have been increasing rapidly [1][2][3]. A workpiece generally goes through various stages such as machining, finishing and polishing to attain prescribed design specification and tolerance [4]. The development of science technology, modern control theory, and advanced machining techniques using multi-axis arm robot has enabled automated finishing and polishing for curved surfaces to achieve uniform quality.
The quality of surface finish for a complex geometry part in manufacturing processes, e.g., grinding and polishing, depends primarily on two main variables, i.e., (1) the condition of belt grinding tool and (2) the combination of operating parameters such as cutting speed, force, feed rate, polymer wheel hardness and grit size. Monitoring the condition of the belt grinding tool is necessary because if the tool deterioration is undetected, it will affect the material removal mechanism and ultimately affect the workpiece's surface quality. In addition, the polishing process parameters are also an important aspect as they must be adaptable during the manufacturing process depending on certain scenarios or condition such as (1) the curvature shape of the workpiece, (2) the changes in abrasive tool wear condition, (3) the hardness of the workpiece, which is generally not fully uniform and (4) the area of the workpiece being manufacture, e.g., at the edges or toward the part center. This study focuses on predicting belt grinding tool condition from brand-new to wear with certain manufacturing parameters.
Several studies on tool condition monitoring found in manufacturing process literature mainly concentrate on the cutting tool and the milling process [5][6][7][8][9], while those on the belt grinding tool condition and prediction are still limited [10]. The abrasive belt grinding is a modification of the traditional rigid grinding process. However, the advantage of belt grinding over traditional grinding lies in its ease to uniformly machine any intricate geometries for workpieces [11,12]. The polymer wheel backing of the belt grinding tool on which the belt embedded with abrasive grains rests enables conformability with the intricate surfaces [13]. Due to the nature of the polymer backing's compliant, the process is highly nonlinear and dependent on the process parameters [11,14]. The belt grinding tool consists of three primary parts, namely the driving mechanism, the polymer wheel and the belt itself. The driving mechanism generally consists of at least two wheels in which the belt rotates. The compliance depends primarily on the type and contour of the backing material, and finally, the belt grit size and grain composition define the longevity and the material removal characteristics. Apart from the three primary components, the operating style, such as the applied force, the depth of cut, feeding direction, lubrication, the angle of feed etc., also affects the process dynamics [15].
Out of all the process parameters, only the belt degradation over the cycle time cannot be controlled. The degradation of the belt happens with the abrasive grains withering out or by the grains degenerating resulting in loss of performance of the tool [16]. Several approaches have been proposed on monitoring tool states in the grinding process using the sensor data acquired during the process and state of the art machine learning (ML) algorithm [17][18][19][20]. Apart from the tool wear, Deep Learning (DL) based auto-encoder architecture has been used to perform pixel-level classification of weld seam/bead states [20]. Spectrograms computed from the sound signals with DL method have been used to classify wear states of the abrasive belt [10]. A multi-sensor fusion method of vision and sound have been used along with a light gradient boosting machine (LightGBM) algorithm to monitor in-process grinding material removal rate (MRR) [21,22].
The advancement of the manufacturing process equipped with an intelligent method opens the possibilities to answer the existing challenges. An alternate monitoring strategy using CNN and signals input from the accelerometer and force sensor is proposed in this study for monitoring the belt states. To apply ML and DL methods in tool condition monitoring, the inputs representing the dynamic condition between the abrasive grinding tool and the workpiece play an important role. It has been studied that the appearance of vibration, force and torque variable in the polishing process can be determined as the indicator of the tool wear condition [17]. To date, most of tool wear condition monitoring methods in abrasive processes, such as grinding and polishing, rely on human-interference inspection and is typically conducted as an offline measurement exercise. The offline measurements consequently will interrupt the entire grinding and polishing process due to dismounting and re-mounting procedures of the working coupon to its reference point. This interruption leads to disruptions in the production line. This paper aims to open the possibility of using the DL method for belt tool condition prediction and the potential implementation towards online monitoring systems.
CNN is one of the DL algorithms based on the convolution sliding kernel approach [23]. CNN is usually implemented for classification and prediction methods to avoid unnecessary intermediate step of computing sparse representation such as time, frequency and time-frequency features as they can handle raw data with minimum pre-processing. The CNN algorithm has also been applied in 1-dimensional (1-D) signal, e.g., audio and vibration signal, and 2-dimensional (2-D) data, e.g., images. More details of CNN algorithm is described in Section 2. Specific to the application of CNN in the manufacturing area, especially in tool condition monitoring, CNN has been applied in tool condition monitoring such as for end milling [7]. The previous work of the authors in [13] presents the application of analysis of variance (ANOVA) combined with an adaptive neuro-fuzzy inference system (ANFIS) to model material removal. The combined ANOVA and ANFIS method is used to obtain the optimum configuration between the process parameters/variables of abrasive belt grinding such as RPM, feed, force, rubber hardness and grit size and the stock material removal rate. In the present work, the CNN method is applied to the vibration and force signal from a three-axis accelerometer and a three-axis dynamometer collected during the polishing process to monitor the belt degradation. A new belt grinding tool was used in the polishing process of the mild-steel workpiece with certain manufacturing parameters until wear. The workpiece's polishing process was performed using a customized abrasive belt grinding attached to a multi-axis arm robot. The vibration and force signals are acquired from the 4 discrete sequential conditions of the belt grinding tool from new to wear (brandnew, 5-min cycle time, 15-min cycle time, worn-out) during the polishing process of the mild-steel workpiece. The '5-min cycle time' and '15-min cycle time' mean that the belt is prepared by polishing continuously for 5 and 15 minutes, respectively. This study also aims to correlate the vibration and force sensor axes directions with the belt grinding condition.
The paper is organized into 5 Sections. Section 1 briefly reviews the belt degradation, its influence on surface quality, and the real-time monitoring strategies. Section 2 gives a brief overview of the CNN architecture used for this work. Section 3 introduces the robotized belt grinding experimental setup, processing parameters and data acquisition setup. Section 4 presents and discusses the CNN classification results on the sensor data. Finally, Section 5 summarizes this investigation's findings and future works.

Convolutional Neural Network (CNN)
CNN is a feed-forward neural network inspired by the brain's visual cortex and specializes in processing data with a grid or a sequential structure [24]. CNN is specifically designed for handling multiple arrays as the input. This initial input process of CNN is analogous to imitating the eyes to identify images, followed by a training process for further recognition of the scene [25]. However, to predict well on tricky scenarios, in many cases CNN must be designed with a more complicated architecture. Even though CNN takes higher computational time and hardware resources compared to traditional ML methods (support vector machine (SVM), artificial neural network (ANN) and adaptive neuro-fuzzy inference system (ANFIS) etc.), they are very efficient in processing raw data with minimum pre-processing.
CNN's are well known as a first applied method for tool wear prediction compared to other DL methods such as recurrent neural network (RNN), gated neural network (GNN) and long short-term memory (LSTM) [10,11,21,26]. CNN can have different architectures, and it depends based on the problem at hand. The model might have different layers such as VGG-16 (16 layers), LeNet, googleNet (22 layers), AlexNet (8 layers), ResNet (152 layers), etc., and also can use different transformation such as ReLu (rectified linear unit), Dropout and batch normalization. The parameters used for the model training are primarily chosen in such a way as to reduce computational cost and hardware utilization. CNN training parameters are chosen based on exhaustive search or on trial and error [27,28].
The first modern CNN used by Tavakkoli et al. [29] has a 7-layer structure (excluding the input layer), namely LeNet-5, which has the following structures C1, S2, C3, S4, C5 and F6 output. In addition, Zhang et al. [27] reported that several layers in a CNN model are to be considered for better performance. The following layers on CNN considered in this work following [27], are as follows: • The sub-sampling layer is usually used to reduce the dimensions coming from the input. The sub-sampling layer's goal is to reduce the number of trainable parameters needed on CNN, which can speed up the performance of CNN. The speed means computational time. By optimizing the CNN architecture, the accuracy and computational time cost could be obtained as fast as possible. The speed also means using fewer resources in identifying images.

•
The convolutional layer works by imitating the properties of the visual cortex of the brain for studying the features from input images. Filters or kernels trained at this layer can identify specific shapes. For example, if it is used to study an image, the filter might learn for edge detection. The following is the equation in the convolution layer: Equation (1) is a formula to calculate the image filter for each pixel. A detailed illustration is presented in Figures 1-5, and a summary of all convolution process is shown in Figure 6.

•
The loss layer is the output of CNN, which is an important part of the neural network. This function calculates loss that corresponds to the difference between the model output and the original ground truth. The magnitude of the loss value determines the rate of gradient change and weights updation during backpropagation.

•
The fully connected layer is found in most neural networks. Fully connected layers use matrix multiplication to get the output from that layer. • ReLU layer is used for thresholding or similar to the activation function on neural network. They bring in non-linearity to the model. The following is the equation in the ReLU layer max(0,x) function: At this stage, the map feature results will be fed into the search function for the highest value between x and 0. The architecture of the ReLU layer process is presented in Figure 7.

•
The pooling layer is very similar to the convolutional layer. This operation in the pooling layer is essential for reducing the spatial size of the feature map. To reduce computational cost, a dimension reduction process is necessary. This layer is useful for extracting domain features that are rotational and invariant so that the model training process becomes more effective. The pooling operations are of many types based on model performance, namely maximum, average, minimum pooling, etc. An example of the max-pooling process of feature map 1 to feature map 9 is presented in Figure 8. • Figure 9 shows how a 2D input image transforms when it passes through series of a convolutional layer to produce a feature map. The pooling layer is generally used to reduce the feature map size between convolutions. The pooling method that is often used in CNN is the max pooling method. In very rare cases averaging pooling is also used.

•
The flattening layer changes a matrix data form (2D and 3D) into a one-dimensional array to enter the next layer, typically before the output layer or SoftMax activation. This layer places all pixel data in one row and makes a connection with the last layer. An illustration of a 3 × 3 matrix conversion to a one-dimensional matrix is illustrated in Figure 10.  The CNN architecture discussed in this section is suitable for solving image classification, object detection, object location and image segmentation problems. Examples of case studies in image recognition using CNN typically have three stages: the input, CNN and output stages. A detailed explanation of the stages of CNN is presented in [30][31][32]. Figure 11 shows the stages carried out by CNN from the input image, convolutional layer, pooling layer and flattening layer.      Step-by-step process of CNN from convolutional layer, pooling layer to flattening layer.

CNN Structure Used in the Present Study
The structure of CNN varies depending on the application, for example, the structure presented in [33]. The details of the CNN architecture to classify belt wear states is presented in Figure 12. The CNN architecture consists of 10 layers, namely 6 one dimensional (1D) convolution layer, 1 max-pooling layer, 1 global average pooling layer, 1 drop layer and 1 fully connected layer. Furthermore, each 1D convolution layer constitutes 200 filters, in which the filter weights have been initialized with random values before training. Additionally, the kernel length of size 4 is used in the convolution process between the signal and filter. At the first stage, the CNN receives the raw time-series 1D signals from a three-axis dynamometer sensor (X, Y and Z) and a three-axis accelerometer sensor (X, Y and Z). In the first layer (L1), each raw signal (X, Y and Z) from the dynamometer and accelerometer sensors are convolved with the filter kernels. Subsequently, the results of the convolutions at each layer are used as input to the next layers. In the fourth layer, to reduce the number of inputs, a pooling layer is applied. In this case, the max-pooling is taken. In the next three layers (L5, L6 and L7), the values from the max-pooling layer are convolved with the filter kernels. In the eighth layer (L8), in order to reduce the number of parameters, the global average pooling layer is applied. In this case, all of the values from each filter were averaged, which results in only one value and turns the total number of the parameter to 1200 (6 inputs × 200 filters). In the ninth layer, a dropping layer is applied to prevent overfitting in the training stage. Finally, all of the nodes from the drop layer is fully connected. In layer 4 (L4), the max-pooling layer was chosen because it helps to have a high contrast feature map of the resulting convolution in the middle of the feature engineering process. On the contrary, average pooling was chosen in the last convolution (L8) to smoothen the feature maps.

Experimental Setup
This study employs a 3-axis accelerometer and 3-axis dynamometer sensor signature to predict the belt wear that affects the polishing quality. The following subsection presents in detail the experimental polishing setup and data acquisition procedure:

Polishing Process Equipment, Sensors and Data Acquisition
The experiments were performed using a multi-axis arm robot mounted with an abrasive belt grinding setup, as presented in Figure 13. The experimental setup consisted of ABB 6660-205-193 robot that was primarily used for imparting motion in the grinding direction. The belt grinder used in this work was electrically powered that can run at 11,000 RPM at no-load condition. The belt grinder could polish and grind with belts about 8" to 3/4" wide × 18" long. The belt grinder was coupled to a force control present in the end effector of the ABB robot with the help of a customized bracket. The force control was closed-looped into the robot controller to ensure the force input from the operator and imparted force in the normal direction (Z-axis) are equal. Normal direction is usually preferred to achieve uniform material removal. As far as the experimental trials, a force of 20 N was applied throughout this study. The experimental conditions used in this work are listed in Table 1. Four different belt condition representing four different classes of wear level in this study were prepared beforehand by performing actual grinding processes at different cycle times, as depicted in Table 2. During the polishing experiments, one variable output process parameter was the belt wear condition (four levels). Other process parameters such as feed rate, grinding speed and force control were maintained at a constant value, as presented in Table 1.   Kistler 8763A500 triaxial accelerometer sensor was attached near the tension arm of the electric belt grinder to obtain the vibration signal during polishing (i.e., contact between belt tool and workpiece), as presented in Figure 13. In addition, Kistler 9254 three-component dynamometer was placed below the mild-steel workpiece, as shown in Figure 13. NI data acquisition (DAQ) module and LabVIEW environment were used to acquire vibration signal in a digitized format with a sampling frequency of 2 kHz. In the case of the dynamometer, a DAQ device supported by the DEWESoft platform was used to acquire force signals across normal and tangential directions. The signals corresponding to the force and accelerometer sensors were synchronized offline.
ABB Robot studio was used to configure and was planned for the tool path. The tool path planning for the linear run performed in this work has five different zones, A-E, as presented in Figure 14. The region between A-B was used to align the tool head over the machining region. Zone B-C was used to attain the force required for grinding. C-D was the zone where force control is actively applied. Finally, Zone D-E was used to decelerate the force sensor and robot motion. The whole experiments performed in this study were carried out in a non-lubricated condition on a mild steel workpiece.

Vibration and Force Data Acquisition
This paper is an extension work of the previous study that applied ML methods to sensor signatures [17,18]. The difference to the work reported by Pandiyan et al. [17] is that they used short-time Fourier transform (STFT) on acoustic emissions signal to detect the change in contact mechanisms caused by tool wear in abrasive belt grinding. However, in the present study, CNN methods are developed to predict the wear level of the belt tool using 3-axes accelerometer and 3-axes dynamometer signatures instead of the acoustic data. All accelerometer and force sparse features from the 3 axes were used as input data in the ML method [18]. However, extracting features and feeding them into the ML algorithms is a two-step procedure and is computationally expensive. In this study, the CNN algorithm evaluates which axis/axes of the sensor signature(s) are more relevant for classifying the abrasive belt wear state using raw sensor signals. An example of raw vibration and force signal from the X-axis sensor direction for different belt tool states, i.e., brand-new, 5-min cycle time, 15-min cycle time and worn-out (from top to bottom) is presented in Figure 15a,b. For interested readers, a complete raw signals plot of Figure 15 is presented in [18]. The raw vibration and force signals dataset were acquired with a sampling frequency of 2 kHz for 3.5 s, resulting in 7000 data points as presented in Figure 15. Prior to the convolution process, the raw datasets were segmented into a smaller data form. The raw datasets were reshaped to 200 data points with 10% overlaps. The new 200 data points were equivalent to a window of 0.1 sec of the sampling frequency of the polishing process. According to previous works [18], 200 data points can still capture the polishing process's dynamic behavior. The pre-processing of the raw dataset produced 64,000 new rows of datasets for each class and 256,000 new datasets for all four classes combined, i.e., brandnew, 5-min cycle time, 15-min cycle time and worn-out. These datasets were then divided into two sets for training and validation. The CNN model computation used 80% of 256,000 datasets (192,000 datasets) as the training set and another 20% of 256,000 datasets (64,000 datasets) for validation.
A sample of data input signals for the CNN algorithm from three directions (X, Y and Z-axes) of the accelerometer is presented in Figure 16. Based on the visualization of the vibration signals corresponding to four different belt conditions, it is evident that distinguishing them is challenging. An example of force signals from three directions (X, Y and Z-axes) of the dynamometer is presented in Figure 17. Synonymous to the vibration signals, differentiating belt conditions visually is also challenging. A detailed illustration of the data input preparation for the CNN method is presented in Figure 18. As presented in Figure 18, raw vibration and force data were collected from three axes. The raw data was then divided further into a smaller sized window for CNN training.   This study is based on the CNN design, as illustrated in Figure 12 on the accelerometer sensor data. To provide a better understanding for the classification labels in Table 3 confusion matrix results in Tables 4-6 and Figures 19 and 20, they are renamed as follows (as presented in Table 2 As depicted in Table 3a, the CNN classification result in the X-axis indicates that all 796 datasets of the "brand-new" class can be classified correctly without misclassification. In the "5-min cycle time" class, 796 datasets were classified correctly. However, 2 datasets were misclassified as "15-min cycle time", and other 2 datasets were misclassified as "wornout". In the "15-min cycle time" class, all 798 datasets were also successfully classified. For the "worn-out" class, 793 datasets were classified correctly, and only 1 dataset was identified as a "15-min cycle time" class.
Additionally, Table 3b shows that 795 datasets of the "brand-new" class corresponding to the Y-axis can be classified correctly, and 1 dataset was misclassified as a "5-min cycle time" class. In the "5-min cycle time" class, 798 datasets were classified correctly. However, 2 datasets were extremely misclassified as "worn-out" class. In the "15-min cycle time" class, 717 datasets were classified correctly. However, 83 datasets were classified incorrectly as "worn-out" class. The worst classification occurs in the "worn-out" class, where 670 datasets were classified correctly, but 89 datasets were classified incorrectly as "brand-new" and 16 datasets were identified as "5-min cycle time" class, while other 19 datasets were misclassified to the "15-min cycle time" class. Table 3. Training curves and confusion matrix of single-axis vibration data.

Model Accuracy and Loss Confusion Matrix
(a) Vibration data of X-axis (b) Vibration data of Y-axis (c) Vibration data of Z-axis Note: please see Table 2 for detailed information on the classification label. Table 3c shows CNN results on Z-axis vibrational signals. 790 datasets of "brand new" class were classified correctly, 1 dataset was incorrectly classified as "5-min cycle time" class, and 5 datasets were misclassified as "worn-out" class. In the "5-min cycle time" class, 791 datasets were classified correctly, 3 datasets were incorrectly classified as "15-min cycle time" class, and 6 datasets were incorrectly classified as "worn-out" class. For the "15-min cycle time" class, 748 datasets were classified correctly, and there were 52 datasets classified incorrectly as "worn-out" class. Similar to the Table 3b result, the poor classification result occurred in the "worn-out" class, where class 2 datasets were misclassified as "brand-new" class, 47 datasets were incorrectly identified as "5-min cycle time", and 26 datasets were misclassified as "15-min cycle time" class. Table 3 shows the accuracy and loss curves during CNN training using the accelerometer sensor data from X, Y or Z axes. In Table 3, it can be seen that processing 7672 data for 50 epoch takes 17 to 20 s. When trained with sensors derived from the X-axis of the accelerometer, the accuracy of the CNN model results in higher accuracy (accuracy: 95.78) than the other axes (Y-axis, accuracy: 87.47%; Z-axis, accuracy: 89.43%). Table 4. Training curves and confusion matrix of double-axis vibration data.

Model Accuracy and Loss Confusion Matrix
(a) Vibration data of XY-axis (b) Vibration data of XZ-axis (c) Vibration data of YZ-axis Note: please see Table 2 for detailed information on the classification label.

Vibration Signal of Double-Axis Accelerometer (XY, XZ and YZ)
The sensor data from two axes were combined to examine the dynamic behavior of the abrasive process. This combined signal is inputted into the CNN algorithm. A confusion matrix from the CNN model that used two-axis (XY, XZ and YZ) from the accelerometer sensor is presented in Table 4. Table 4b indicates that the result across XZ-axis shows higher accuracy with the CNN compared to other combination. Table 4a indicates that all 796 datasets of "brand-new" can be classified appropriately. Additionally, in the "5min cycle time" class, 797 datasets were classified correctly, 1 dataset was misclassified as "brand-new", and 2 datasets were misclassified as "15-min cycle time". For the "15min cycle time" class, all 800 datasets were classified correctly. In the "worn-out" class, 788 datasets were classified correctly, and another 6 datasets were shifted to the "15-min cycle time" class. Table 4 shows the accuracy and loss curves during CNN training using two inputs from sensors with a combination of XY, XZ or YZ inputs. In Table 4, it can be seen that processing 7672 data with 50 epoch takes 17 to 20 s. When trained with sensors derived from the XZ-axes of the accelerometer, the accuracy of the CNN model results in higher accuracy (accuracy: 96.40%) than the other axes (XY-axes, accuracy: 95.06%; YZ-axes, accuracy: 94.92%). Table 5. Model accuracy and loss and confusion matrix of single-axis force data.

Model Accuracy and Loss Confusion Matrix
(a) Force data of X-axis (b) Force data of Y-axis (c) Force data of Z-axis Note: please see Table 2 for detailed information on the classification label. Figure 19 shows the confusion matrix of the CNN model that used the data from the three axes of the accelerometer sensor. Figure 19 indicates that 795 datasets of the "brand-new" class can be classified appropriately, and 1 dataset was misclassified as a "5-min cycle time" class. For the "5-min cycle time" class, 794 datasets were classified correctly, 6 datasets were misclassified to the "15-min cycle time" class. In the "15-min cycle time" class, all 800 datasets were classified correctly. In contrast to the "worn-out" class, 765 datasets were classified correctly, and 29 datasets were misclassified as "15-min cycle time" class. Figure 19 shows the accuracy and loss model during CNN training using all accelerometer sensor inputs. Figure 8 shows that processing 7672 data with the 50 epochs take 18.2 s. When trained with sensors derived from the XYZ-axis accelerometer, the accuracy of the CNN model yields an accuracy of 94.23%. Table 6. Training curves and confusion matrix of double-axis force data.

Model Accuracy and Loss Confusion Matrix
(a) Force data of XY-axis (b) Force data of XZ-axis (c) Force data of YZ-axis Note: please see Table 2 for detailed information on the classification label.

CNN Results of Force Signals
This sub-section provides the CNN classification results of force data from three categories: (1) single-axis (X, Y and Z), (2) double-axis (XY, XZ and YZ) and (3) triple-axis (XYZ).

Force Signal of Single-Axis Dynamometer (X, Y and Z)
This study also utilized the CNN method for the three-axis force data, as presented in Table 5. The CNN result in the X-axis (Table 5a) indicates that 793 datasets of the "brand-new" class can be classified appropriately, and 3 datasets were misclassified to the "5-min cycle time" class. In the "5-min cycle time" class, 797 datasets were classified correctly, 3 datasets were identified as "brand-new" class. In the "15-min cycle time" class, 798 datasets were classified correctly, 1 dataset was classified incorrectly to "brand-new", and 1 dataset was classified incorrectly as "worn-out". In the "worn-out" class, 747 datasets were classified correctly. However, 47 datasets were classified incorrectly to the "15-min cycle time" class. Table 5b shows that 794 datasets of the "brand-new" class can be classified, and 2 datasets were incorrectly classified as "5-min cycle time" class. In the "5-min cycle time" class, 796 datasets were classified correctly, and 4 datasets were identified as "worn-out". In the "15-min cycle time" class, 771 datasets were classified correctly, and 29 datasets were classified incorrectly to the "worn-out" class. Meanwhile, in the "worn-out" class, 764 datasets were classified correctly, and 30 datasets were classified incorrectly to the "15-min cycle time" class.  Table 2 for detailed information on classification label).  Table 2 for detailed information on the classification label). Table 5c shows that 794 datasets of the "brand-new" class can be classified correctly. However, 2 datasets were misclassified as "5-min cycle time" class. In the "5-min cycle time" class, 798 datasets were classified correctly, and 2 datasets were incorrectly classified as "15-min cycle time". In the "15-min cycle time" class, 788 datasets were classified correctly. However, 12 datasets were classified incorrectly to the "worn-out" class. In the "worn-out" class, 785 datasets were classified correctly, and 9 datasets were misclassified as "15-min cycle time" class. Table 5 shows the accuracy and loss curves during CNN training using force data input from X, Y or Z axes. In Table 5, it can be seen that processing 7672 data with 50 epochs took nearly 17 to 20 s. When trained with sensors derived from the X-axis of the dynamometer, the accuracy of the CNN model resulted in higher accuracy (accuracy: 93.07%) than the other axes (Y-axis, accuracy: 92.14%; Z-axis, accuracy: 92.94%).

Force Signal of Double-Axis Dynamometer (XY, XZ and YZ)
The force signature across two directions were combined to examine the dynamic behavior during the abrasive process. This combined signal was input into the CNN algorithm. The confusion matrices from the CNN model that used two-axis (XY, XZ and YZ) is presented in Table 6. According to the CNN results presented in Table 6, Table 6a indicates that the CNN model results show highest than other combinations. In detail, Table 6a indicates that all 796 datasets of the "brand-new" class can be classified perfectly. For the "5-min cycle time" class, 797 datasets were classified correctly, 1 dataset was identified as "brand-new" class, and 2 datasets were incorrectly classified as "15-min cycle time" class. In the "used" class, all 800 datasets were also classified perfectly. Furthermore, in the "worn-out" class, 788 datasets were classified correctly, and 6 datasets were incorrectly identified as "15-min cycle time" class. Table 6 shows the accuracy and loss curves during CNN training with inputs from axes combinations such as of XY, XZ or YZ. In Table 6, it can be seen that processing 7672 data with the number of 50 epochs takes 17 to 20 s. When trained with sensors derived from the XY-axis of the dynamometer, the accuracy of the CNN model results in higher accuracy (accuracy: 99.80%) than the other axes (XZ-axes, accuracy: 95.53%; YZ-axes, accuracy: 96.47%).

Force Signal of Triple-Axis Dynamometer (XYZ)
Model accuracy and the confusion matrix from the CNN model that used tree axis from dynamometer sensor is presented in Figure 20. Figure 20 indicates that 793 datasets of "brand-new" can be classified appropriately, and three datasets were misclassified to the "5-min cycle time" class. Additionally, in the "5-min cycle time" class, 797 datasets were classified correctly, 2 datasets were misclassified to "brand-new" class and 1 dataset was misclassified to "15-min cycle time" class. Moreover, in the "15-min cycle time" class, 798 datasets were classified correctly and 2 datasets were incorrectly classified to "wornout" class. For the "worn-out" class, 790 datasets were classified correctly and 4 datasets were misclassified to the "15-min cycle time" class. Figure 20 shows the accuracy and loss curves during CNN model training using all dynamometer sensor inputs. In Figure 9, it can be seen that processing 7672 data with the number of 50 epochs takes 18.2 s. When trained with sensors derived from the XYZ-axis dynamometer, the accuracy of the CNN model yields an accuracy of 96.34%.

Accelerometer (Vibration) Data
This section provides a statistical summary of CNN prediction and classification of abrasive belt grinding condition from brand-new to worn-out condition. The statistical summary of metrics such as training time, testing time, accuracy on test data and loss on test data.
Training Time Table 7 shows the summary of 'training time' for the CNN model. Each combination of three-axis accelerometer sensor showed different time required to train the model. Table 7 indicates that a combination of the XY axes of the accelerometer sensor shows less 'training time', i.e., 14.71 ± 0.51 s than other axis combination. A box plot of the 'training time' of the CNN model for a different accelerometer axis direction is presented in Figure 21. All of the model's combination input showed that the standard deviation value is small (Q3-Med), as presented in Table 7. This is indicated that the proposed architecture model was able to train the CNN model. The highest time required to train the model is shown by the XY combination (29.64 s). In training the model, using a single input axis (X, Y or Z), it is shown that the time required is almost similar. However, by using more input (all axis), the time needed to train the model was not reduced. The boxplot is divided into four quartiles, namely 1st, 2nd, 3rd, and 4th quartiles. The 1st, 2nd, 3rd, and 4th quartiles show a range data distribution of 0-25%, 25-50%, 50-75%, and 75-100%, respectively. The grey and yellow color in the boxplot of Figures 21-28. indicates the second, and third quartile, respectively. A wider box in the boxplot shows more distribution of data.       Testing Time Table 8 shows the summary of 'testing time' for the CNN model. Each combination of three-axis accelerometer sensor showed a different time to test the model. Table 8 indicates that a combination of the accelerometer sensor's YZ-axes shows less time (0.22 ± 0.01 s) than other axis combination. A box plot of 'testing time' of the CNN model for a different combination of accelerometer axis is presented in Figure 22. All of the model's combination input showed that the standard deviation value is low (Q3-Med), as presented in Table 8. This indicated that the proposed architecture model was able to test the CNN model. The highest time required to test the model is shown by the XZ combination (0.33 s). In training the model, using a single input axis (X, Y or Z), it is shown that the required processing time is almost similar. However, by using more input (all axis), the time needed to train the model was not reduced. This issue was confirmed when we compared between YZ and XYZ combination. Table 9 shows the summary of the performance of the CNN model. Each combination of three-axis accelerometer sensor showed different accuracy. Table 9 indicates that a combination of the accelerometer sensor XY-axes shows better accuracy than that of other combinations (1.00 ± 0.00). The lowest accuracy was found when the model was trained using Y-axis (0.95 ± 0.03). A box plot of 'accuracy on test data' of the CNN model for a different combination of accelerometer axis is presented in Figure 23. The CNN model, which is trained using XY-axis datasets, showed the highest accuracy (1.00 ± 0.00). In contrast, The CNN model based on Y-axis indicated the lowest accuracy (0.89 ± 0.02). Additionally, the CNN which is trained using Y and Z datasets, showed wider deviation among the others axis combination. Table 10 shows the loss value of the CNN model. The loss value is important to indicate the summation of the error in the evaluation and testing datasets. Each combination of three-axis accelerometer sensor showed different loss value. Table 10 indicates that a combination of the accelerometer sensor's XY-axes shows a better loss value than other combinations (0.02 ± 0.01). The worst loss value was found when the model was trained using Y-axis (0.20 ± 0.05). A box plot of 'loss on test data' of the CNN model for a different combination of accelerometer axis is presented in Figure 24. The CNN model, which is trained using XY-axis datasets, showed the lowest value (0.02 ± 0.01). In contrast, The CNN model based on the Y and Z axis indicated the higher loss and wider deviation value among others axis combination.

Dynamometer (Force) Data
This section provides a statistical summary of CNN prediction and classification of abrasive belt grinding condition from brand-new to worn-out condition. The statistical summary includes 'training time', 'testing time', 'accuracy on test data' and 'lost on test data'.
Training Time Table 11 shows the summary of 'training time' for the CNN model. Each combination of three-axis dynamometer sensor showed different time required to train the model. Table 11 indicates that a combination of the XY-axes of the dynamometer sensor shows less training time (14.23 ± 0.54 s) than other axis combinations. A box plot of the 'training time' of the CNN model for a different combination of dynamometer axis is presented in Figure 25. All of the combined input of the model showed that the value of the standard deviation is small, as presented in Table 11. This is indicated that the proposed architecture model was able to train the CNN model. The highest time required to train the model is presented on Z-axis (32.49 s). Using double and triple input axis in training, the model shows that the 'training time' required is less than the single-axis, especially XY with 14.23 s on average.
Testing Time Table 12 shows the summary of 'testing time' for the CNN model. Each combination of three-axis accelerometer sensor showed a different time to test the model. Table 12 indicates that a combination of XZ-axes of accelerometer sensor shows less time (0.22 ± 0.01 s) than other axis combination. A box plot of 'testing time' of the CNN model for a different combination of dynamometer axis is presented in Figure 26. All of the model's combination input showed that the standard deviation value is small, as presented in Table 12. This is indicated that the proposed architecture model was able to test the CNN model. The highest time required to test the model is shown by the XZ combination (0.21 s) compared to the single input axis (X, Y and Z) and double input axis (XY and YZ). In addition, by using more inputs (XYX) the time needed to test the model was not improved even better. Table 13 shows the summary of the performance of the CNN model. Each combination of three-axis dynamometer sensor showed different accuracy. Table 13 indicates that a combination of the XZ-axes of the dynamometer sensor shows better accuracy (1.00 ± 0.00) than other axis combinations. The lowest accuracy was found when the model was trained using Y-axis datasets, i.e., 0.95 ± 0.02. A box plot of 'accuracy on test data' of the CNN model for a different combination of dynamometer axis is presented in Figure 27. The CNN model, which is trained using XZ-axis datasets, showed the highest accuracy (1.00 ± 0.00). In contrast, The CNN model based on Y-axis indicated the lowest accuracy (0.95 ± 0.02). Additionally, the CNN which is trained using Y and Z datasets showed wider deviation among the others axis combination. Table 14 shows the loss value of the CNN model. The loss value is important to indicate the summation of the error in the evaluation and testing datasets. Each combination of three-axis dynamometer sensor showed different loss value. Table 14 indicates that a combination of the XZ-axes of the accelerometer sensor shows a better loss value than other combinations (0.02 ± 0.01). The worst loss value was found when the model was trained using Y-axis (0.15 ± 0.03). A box plot of the "loss on test data" of the CNN model for a different combination of dynamometer axis is presented in Figure 28. The CNN model, which is trained using XZ-axis datasets, showed the lowest value (0.02 ± 0.01). In contrast, The CNN model based on the Y and Z axis indicated the higher loss and wider deviation value among others axis combination.

Discussion
Based on the CNN results presented in Section 4.1 for vibration signals and Section 4.2 for force signals, it is found that the sensor data across different axes has significant variation in representing the dynamic behavior of the abrasive process, which is directly related to the belt grinding condition.
In vibration signals, the X-axis direction shows the highest classification accuracy among the single-axis sensors assessment compared to those from the other two single-axis sensors assessment compared to those from the other two (Y-axis and Z-axis). The result of the combining axes showed that the combination of XZ and YZ has higher accuracy than the XY combination. Moreover, the signal combination from the triple-axis (XYZ) does not improve the accuracy. The accuracy summary of the three-axis accelerometer signals assessment is presented in Table 15. In force signals, all single-axis sensor assessment shows significant accuracy drop due to a lot of misclassified datasets. In addition, the result combining sensor data between two axes shows that higher accuracy is obtained from the combination of XY compared to the other combinations (XZ and YZ). Moreover, the triple-axis (XYZ) signal combination does not show better accuracy compared to the single-axis and double-axis assessment. The accuracy summary of the three-axis accelerometer signals assessment is presented in Table 16.  C  I  C  I  C  I  C  I  C  I  C  I  C  I   1  793  3  794  2 needs further investigation if the proposed DL prediction can be generalized and translated across different materials during grinding, which is the research direction in progress.