Prediction of Kerf Width in Laser Cutting of Thin Non-Oriented Electrical Steel Sheets Using Convolutional Neural Network

: Kerf width is one of the most important quality items in cutting of thin metallic sheets. The aim of this study was to develop a convolutional neural network (CNN) model for analysis and prediction of kerf width in laser cutting of thin non-oriented electrical steel sheets. Three input process parameters were considered, namely, laser power, cutting speed, and pulse frequency, while one output parameter, kerf width, was evaluated. In total, 40 sets of experimental data were obtained for development of the CNN model, including 36 sets for training with k -fold cross-validation and four sets for testing. Compared with a deep neural network (DNN) model and an extreme learning machine (ELM) model, the developed CNN model had the lowest mean absolute percentage error (MAPE) of 4.76% for the ﬁnal test dataset in predicting kerf width. This indicates that the proposed CNN model is an appropriate model for kerf width prediction in laser cutting of thin non-oriented electrical steel sheets.


Introduction
Non-oriented electrical steels are produced from Fe-Si or Fe-Si-Al alloys and used as the core material in electrical machinery [1]. Generally, the stator and rotor of electric motors are formed by lamination of non-oriented electrical steel sheets with a thickness of 0.1 mm to 1 mm. Such parts are usually stamped in a cost-effective way for mass production with limited precision. However, expensive fixtures and tools seem to be a drawback of stamping for low-volume production or rapid prototyping [2]. Laser cutting is an alternative to stamping, which could provide the availability to minimize the cost for small quantity production [2]. In addition, the plastic and elastic stresses induced by mechanical cutting could result in deterioration of the magnetic properties of electrical steel [3] and efficiency of the core [4][5][6]. Shear deformation at the cutting edge is typically found in conventional mechanical cutting processes, which might have a detrimental effect on the core performance in electrical machinery [7,8]. The magnetic field and flux density of electrical steels are affected by residual stress [9,10]. For laser cutting, no remarkable shear deformation at the cutting edge is found [11].
There are several process parameters influencing the kerf quality of laser cutting, including laser power, pulse frequency, and cutting speed. High-quality kerf of nonoriented electrical steel sheet is achievable with proper design of laser process parameters. Therefore, an investigation into the prediction of kerf width using various combinations of laser cutting process parameters is essential. Several methods have been proposed for kerf width prediction in laser cutting of metals [12][13][14][15][16][17]. Mathematical models [12][13][14] have been widely used to assess kerf quality for laser cutting processes. Statistical analysis has been conducted to study the effect of process parameters on the laser cutting quality [13,14]. Recently, the artificial intelligence (AI) technique has attracted more attention in effectively predicting the kerf quality of laser cutting [15,17]. Although there are a variety of AI methods for prediction of cutting quality in laser cutting [15][16][17], studies for the prediction of kerf width in laser cutting of non-oriented electrical steel are limited.
The convolutional neural network (CNN) model is mainly used for image analysis in different fields, e.g., remote sensing [18,19] and biometrics [20,21]. The convolutional approach of CNN is its major advantage. The CNN obtains major features automatically in a learning process such that the manual feature extraction step is neglected. However, CNN is also interesting for other applications that do not deal with image analysis, such as onedimensional data analysis. To improve the classification performance of patient-specific electrocardiograms (ECGs), a CNN model was developed to combine two major aspects, namely, feature extraction and classification [22]. Feature extraction can be used to classify data in a fast and accurate manner [22]. Moreover, a simple CNN-based method was applied for the classification of vibrational spectroscopic data [23]. It was demonstrated that the CNN could be applied to select important spectral regions [23]. This suggests that the CNN was less dependent on preprocessing for vibrational spectroscopic data classification and achieved excellent performance on raw data [23]. Another CNN method was built up for chemometric data regression [24]. The standard CNN architecture was modified for adaptation to one-dimensional data analysis [24]. The results yielded by CNN were very promising with great accuracy, suggesting that the CNN model was able to achieve feature extraction for regression of one-dimensional data [22][23][24]. In this regard, CNN was employed in this study for kerf width prediction in the laser cutting of electrical steel sheets.
The objective of this study was to develop an effective CNN model for prediction of kerf width in the laser cutting of thin metallic sheets. In particular, experiments were performed using a pulsed laser for cutting 0.1 mm thick non-oriented electrical steel sheets. After cutting, the output quality of kerf width was measured using a charge-coupled device (CCD) camera. The three input parameters considered were laser power, pulse frequency, and cutting speed. The experimental data were used to train and test the CNN model through k-fold cross-validation. The performance of the kerf width prediction by the developed CNN model was also compared with that of other AI-based methods, namely, deep neural network (DNN) and extreme learning machine (ELM).

Experimental Setup
The experiments were carried out using a 20 W ytterbium pulsed fiber laser with a wavelength of 1064 nm (YLP-1-100-20, IPG Photonics Co., Oxford, MA, USA). As shown in Figure 1, the laser system consisted of scanning mirrors and a focusing lens attached to a computed numerically controlled (CNC) machine. They moved together through a moveable mechanism controlled by the CNC under various cutting speeds from 0.1 to 0.5 mm/s. Laser power (P), cutting speed (v), and laser pulsed frequency (f ) were selected as the input process parameters. The values of process parameters considered are listed in Table 1. A schematic of the laser cutting of a 20 mm long straight slit on a commercial thin non-oriented electrical steel sheet is shown in Figure 2. The dimensions of the workpiece were 20 mm in width, 40 mm in length, and 0.1 mm in thickness (ST-100, Nikkin Denji Kogyyo Co., Ltd., Saitama, Japan). After completion of the cutting process, the kerf width of each sample was measured using a computer vision system consisting of a light source, a CCD camera (BFS-U3-51S5M-C, VS Technology Co., Tokyo, Japan) with a CMOS sensor (IMX250, Sony Co., Tokyo, Japan), and a lens (VS-TCH3-60, VS Technology Co., Tokyo, Japan). As shown in Figure 2, the kerf width was determined by taking the average of the measurements in the segment AB.
Mathematics 2021, 9, x FOR PEER REVIEW 3 of 12 A schematic of the laser cutting of a 20 mm long straight slit on a commercial thin non-oriented electrical steel sheet is shown in Figure 2. The dimensions of the workpiece were 20 mm in width, 40 mm in length, and 0.1 mm in thickness (ST-100, Nikkin Denji Kogyyo Co., Ltd., Saitama, Japan). After completion of the cutting process, the kerf width of each sample was measured using a computer vision system consisting of a light source, a CCD camera (BFS-U3-51S5M-C, VS Technology Co., Tokyo, Japan) with a CMOS sensor (IMX250, Sony Co., Japan), and a lens (VS-TCH3-60, VS Technology Co., Tokyo, Japan). As shown in Figure 2, the kerf width was determined by taking the average of the measurements in the segment AB.

Convolutional Neural Network Model
The CNN structure for kerf width prediction is shown in Figure 3. The first part of the CNN model was for feature extraction. The final, fully connected layer was added for regression analysis. The developed CNN model contained seven layers including the input layer, one convolutional layer, one flattened layer, three fully connected hidden layers, and the output layer. There were three input nodes, namely, laser power, pulse frequency, and cutting speed, while the output node was the kerf width. In total, 40 sets of experimental data ( Table 2) were obtained for training and testing of the developed CNN model, including 36 sets for training and four sets for testing. Note that dataset Nos. 37-40 in Table 2 were used for final testing. A schematic of the laser cutting of a 20 mm long straight slit on a commercial thin non-oriented electrical steel sheet is shown in Figure 2. The dimensions of the workpiece were 20 mm in width, 40 mm in length, and 0.1 mm in thickness (ST-100, Nikkin Denj Kogyyo Co., Ltd., Saitama, Japan). After completion of the cutting process, the kerf width of each sample was measured using a computer vision system consisting of a light source a CCD camera (BFS-U3-51S5M-C, VS Technology Co., Tokyo, Japan) with a CMOS senso (IMX250, Sony Co., Japan), and a lens (VS-TCH3-60, VS Technology Co., Tokyo, Japan) As shown in Figure 2, the kerf width was determined by taking the average of the meas urements in the segment AB.

Convolutional Neural Network Model
The CNN structure for kerf width prediction is shown in Figure 3. The first part o the CNN model was for feature extraction. The final, fully connected layer was added fo regression analysis. The developed CNN model contained seven layers including the in put layer, one convolutional layer, one flattened layer, three fully connected hidden lay ers, and the output layer. There were three input nodes, namely, laser power, pulse fre quency, and cutting speed, while the output node was the kerf width. In total, 40 sets o experimental data (Table 2) were obtained for training and testing of the developed CNN model, including 36 sets for training and four sets for testing. Note that dataset Nos. 37-40 in Table 2 were used for final testing.

Convolutional Neural Network Model
The CNN structure for kerf width prediction is shown in Figure 3. The first part of the CNN model was for feature extraction. The final, fully connected layer was added for regression analysis. The developed CNN model contained seven layers including the input layer, one convolutional layer, one flattened layer, three fully connected hidden layers, and the output layer. There were three input nodes, namely, laser power, pulse frequency, and cutting speed, while the output node was the kerf width. In total, 40 sets of experimental data (Table 2) were obtained for training and testing of the developed CNN model, including 36 sets for training and four sets for testing. Note that dataset Nos. 37-40 in Table 2 were used for final testing.
In this study, the convolutional operation processed the input data using 64 convolution filters and outputted the feature map. Each convolutional filter extracted certain Mathematics 2021, 9, 2261 4 of 12 patterns from the input data. The mathematical expression of the convolutional operation in layer l is defined below [25].
where K l i and b l i denote the weight matrix and bias vector of the i-th filter in the l-th layer, and x l (j) represents the input of the l-th layer. The asterisk denotes the convolution operator, and y l+1 i (j) is the output of the convolutional operation.    As shown in Figure 3, the input layer had L × 3 inputs, where L = 36 was the number of datasets used for training. The depth of the convolutional layer had 64 filters. Each of the 64 filters was applied to the windows in the input layer to generate L × 1 hidden features via the convolution operation. The CNN convolutional layer was designed to improve accuracy and speed up training [26]. Thereafter, the rectified linear unit (ReLU) was applied to increase nonlinearity in the CNN [25,26], which can be expressed as follows [25]: where a l+1 i (j) denotes the activation function of y l+1 i (j). The extracted features from the flattened layer were merged into one single vector which was used as an input for the first of the three fully connected layers. After flattening, the extracted features passed through the first fully connected layer with 64 nodes. The fully connected layers are similar to hidden layers in a traditional artificial neural network (ANN). The three fully connected layers had 64, 32, and 16 nodes, respectively. The output of each node in the fully connected layers can be mathematically expressed as follows [25], where w and b represent weight and bias, respectively. ReLU was used in all fully connected layers as a typical nonlinear activation function for the regression analysis [26].
A backpropagation algorithm was used as a backward pass by adjusting the weights assigned to the nodes of all layers in order to minimize the error between the predicted and experimental values [27]. The adaptive moment estimation (ADAM) algorithm [26] was used as a key element of the backpropagation algorithm. Firstly, the backpropagation algorithm computed the gradient [27], and then the ADAM algorithm was applied to find the gradient descent to measure the error function slope. The weights were updated using Equation (4).
where i is the node number of the previous layer, j is the node number of the next layer, w t ij is the weight at time t, w t+1 ij is the updated weight at time t + 1, η is the learning rate, and E is the error function.

Deep Neural Network Model
Unlike the CNN model, the DNN model makes no assumption about the input, and it tends to perform worse for feature extraction. It consists of one input layer, one or several hidden layers, and one output layer, as shown in Figure 4. Layer k transforms the activity of the previous layer (h k−1 ) into the current layer (h k ) by multiplying it with a weight matrix W k , adding a bias vector b k , and applying a nonlinear activation function f, as expressed below [28].
it tends to perform worse for feature extraction. It consists of one input layer, one or several hidden layers, and one output layer, as shown in Figure 4. Layer k transforms the activity of the previous layer (hk−1) into the current layer (hk) by multiplying it with a weight matrix Wk, adding a bias vector bk, and applying a nonlinear activation function f, as expressed below [28].
( ) As shown in Figure 4, the structure of the DNN employed in this study contained five layers, namely, the input layer, three hidden layers, and the output layer. Similar input nodes were used, namely, laser power, pulse frequency, and cutting speed. The hidden layers had 64, 32, and 16 nodes, respectively. The activation function used for the DNN was a ReLU function. The output node was the kerf width. The ADAM algorithm was used as the training algorithm for a maximum of 300 iterations.

Extreme Learning Machine Model
The concept of an ELM model was proposed by Huang et al. [29]. It is a single-hidden-layer feedforward neural network. The input weights and biases are randomly assigned without any iterative computation, and the output weights are calculated by the Moore-Penrose generalized inverse matrix [29].
The structure of the ELM model used in this study is shown in Figure 5. It had a three-layer architecture. Considering a set of N samples ( ) R represents the input data and 1 2 , ,..., sents the output data, the output of the network can be expressed as follows [29]: As shown in Figure 4, the structure of the DNN employed in this study contained five layers, namely, the input layer, three hidden layers, and the output layer. Similar input nodes were used, namely, laser power, pulse frequency, and cutting speed. The hidden layers had 64, 32, and 16 nodes, respectively. The activation function used for the DNN was a ReLU function. The output node was the kerf width. The ADAM algorithm was used as the training algorithm for a maximum of 300 iterations.

Extreme Learning Machine Model
The concept of an ELM model was proposed by Huang et al. [29]. It is a singlehidden-layer feedforward neural network. The input weights and biases are randomly assigned without any iterative computation, and the output weights are calculated by the Moore-Penrose generalized inverse matrix [29].
The structure of the ELM model used in this study is shown in Figure 5. It had a three-layer architecture. Considering a set of N samples ( . , x in ] T ∈ R n represents the input data and t i = [t i1 , t i2 , . . . , t in ] T ∈ R m represents the output data, the output of the network can be expressed as follows [29]: where L is the number of hidden nodes, w i = [w i1 , w i2 , . . . , w in ] T and b i denote the learning parameters of the i-th hidden node, β i = [β i1 , β i2 , . . . , β im ] T is the weight vector connecting the i-th hidden node and the output nodes, and G(x) is the activation function. The j-th  (6) can be further compactly written in a matrix form as follows [29]: where H is the hidden layer output matrix [29].
where H is the hidden layer output matrix [29].
The output weights are constructed using the following equation [29]: † β = H T , (10 where † H is the Moore-Penrose generalized inverse of the hidden layer output matrix H. For the ELM model employed in this study, the minimum error occurred at a numbe of 34 hidden nodes. ReLU was also used as the activation function for hidden nodes, a shown in Figure 5.  The output weights are constructed using the following equation [29]: where H † is the Moore-Penrose generalized inverse of the hidden layer output matrix H. For the ELM model employed in this study, the minimum error occurred at a number of 34 hidden nodes. ReLU was also used as the activation function for hidden nodes, as shown in Figure 5. A high-level network application programming interface (API) library (Keras 2.2.4, Google LLC, Mountain View, CA, USA) using an interpreted language (Python 3.7, Python Software Foundation, Wilmington, DE, USA) was employed to build the CNN model, DNN model, and ELM model.

Optimal CNN Model through k-Fold Cross-Validation
To evaluate the performance of the CNN model developed, the mean absolute percentage error (MAPE) was utilized as a statistical criterion, which can be expressed as follows: Mathematics 2021, 9, 2261 8 of 12 where N is the number of samples, and y i andŷ i represent the i-th experimental measurement and CNN prediction, respectively. One of the most common techniques for model evaluation is k-fold cross-validation [30], which can be used to evaluate a CNN model more fully and accurately with a small dataset. The main idea behind cross-validation is that each observation in the dataset has the opportunity to be tested. The key configuration parameter for k-fold cross-validation is k, which defines the number of folds to split a given dataset. Figure 6 shows an example of a k-fold cross-validation procedure with k = 10. In order to choose an appropriate k value, results from each fold are averaged to produce a single estimation. Firstly, the training dataset is divided roughly into equal parts. Then, for each m = 1, 2, 3, . . . , k, MAPE m is computed for the m-th fold, as shown in Figure 6. The average MAPE for the selected k folds is then calculated as follows: urement and CNN prediction, respectively. One of the most common techniques for model evaluation is k-fold cross-validation [30], which can be used to evaluate a CNN model more fully and accurately with a small dataset. The main idea behind cross-validation is that each observation in the dataset has the opportunity to be tested. The key configuration parameter for k-fold cross-validation is k, which defines the number of folds to split a given dataset. Figure 6 shows an example of a k-fold cross-validation procedure with k = 10. In order to choose an appropriate k value, results from each fold are averaged to produce a single estimation. Firstly, the training dataset is divided roughly into equal parts. Then, for each m = 1, 2, 3, …, k, MAPEm is computed for the m-th fold, as shown in Figure 6. The average MAPE for the selected k folds is then calculated as follows: After that, the process is repeated for various numbers of k, and the best k is that with the smallest MAPE. In this study, k = 5, 6, 7, 8, 9, and 10 were selected for comparison in determining the optimal value. The resulting MAPE values in Figure 7 reveal that, for the given CNN model and datasets, the MAPE of the given k numbers was higher than 7.9% except for k = 8 and 10. The findings also suggest that k = 10 was slightly better than k = 8 in terms of a more accurate prediction. Therefore, k = 10 was selected as the optimal number of folds in determining the CNN model.  After that, the process is repeated for various numbers of k, and the best k is that with the smallest MAPE. In this study, k = 5, 6, 7, 8, 9, and 10 were selected for comparison in determining the optimal value. The resulting MAPE values in Figure 7 reveal that, for the given CNN model and datasets, the MAPE of the given k numbers was higher than 7.9% except for k = 8 and 10. The findings also suggest that k = 10 was slightly better than k = 8 in terms of a more accurate prediction. Therefore, k = 10 was selected as the optimal number of folds in determining the CNN model.

Comparison with Other ANN Models
A random forest algorithm was applied to explore the importance of the selected input parameters in correlation with the output [31]. As shown in Figure 8, among the given three inputs, cutting speed was the most important factor affecting the kerf width. The relative importance of laser power, pulse frequency, and cutting speed on kerf width was 0.15, 0.37, and 0.48, respectively.
To verify the effectiveness of the proposed CNN model, the DNN model and the ELM model described above were used for comparison. Figure 9 shows the comparison of experiment and prediction of kerf width for the same final test dataset using the given three ANN models. The prediction accuracy of the given three models was evaluated using Equation (11). The MAPE of these three models for predicting the kerf width of the same final test dataset is listed in Table 3. As shown in Table 3, the CNN model exhibited an average MAPE of 4.76% in kerf width prediction, which was lower in comparison with 15.24% for the ELM model and 16.93% for the DNN model. This proves that the structure of the CNN model developed in this study significantly improved the feature learning performance compared to the DNN and ELM models, which had a limited performance when using original data without feature extraction.

Comparison with Other ANN Models
A random forest algorithm was applied to explore the importance of the selected input parameters in correlation with the output [31]. As shown in Figure 8, among the given three inputs, cutting speed was the most important factor affecting the kerf width. The relative importance of laser power, pulse frequency, and cutting speed on kerf width was 0.15, 0.37, and 0.48, respectively.
To verify the effectiveness of the proposed CNN model, the DNN model and the ELM model described above were used for comparison. Figure 9 shows the comparison of experiment and prediction of kerf width for the same final test dataset using the given three ANN models. The prediction accuracy of the given three models was evaluated using Equation (11). The MAPE of these three models for predicting the kerf width of the same final test dataset is listed in Table 3. As shown in Table 3, the CNN model exhibited an average MAPE of 4.76% in kerf width prediction, which was lower in comparison with 15.24% for the ELM model and 16.93% for the DNN model. This proves that the structure of the CNN model developed in this study significantly improved the feature learning performance compared to the DNN and ELM models, which had a limited performance when using original data without feature extraction.

Comparison with Other ANN Models
A random forest algorithm was applied to explore the importance of the selected input parameters in correlation with the output [31]. As shown in Figure 8, among the given three inputs, cutting speed was the most important factor affecting the kerf width. The relative importance of laser power, pulse frequency, and cutting speed on kerf width was 0.15, 0.37, and 0.48, respectively.
To verify the effectiveness of the proposed CNN model, the DNN model and the ELM model described above were used for comparison. Figure 9 shows the comparison of experiment and prediction of kerf width for the same final test dataset using the given three ANN models. The prediction accuracy of the given three models was evaluated using Equation (11). The MAPE of these three models for predicting the kerf width of the same final test dataset is listed in Table 3. As shown in Table 3, the CNN model exhibited an average MAPE of 4.76% in kerf width prediction, which was lower in comparison with 15.24% for the ELM model and 16.93% for the DNN model. This proves that the structure of the CNN model developed in this study significantly improved the feature learning performance compared to the DNN and ELM models, which had a limited performance when using original data without feature extraction.

Conclusions
In summary, a CNN model was developed to predict the kerf width in the laser cutting of non-oriented electrical steel sheet. It is evident that the CNN structure was suitable for kerf width prediction with high accuracy. The conclusions drawn from the results were as follows: (1) The k-fold cross-validation method was employed to improve the generalization ability of the developed CNN model. Moreover, for k = 10, the average MAPE of the validation dataset had the lowest value of 7.51% compared to other k values. (2) In comparison with other ANN methods such as DNN and ELM, the results clearly indicated that the CNN approach developed in this study exhibited improved performance and achieved the highest prediction accuracy of 4.76% in terms of MAPE for the same final test dataset. Therefore, the developed CNN with k-fold cross-validation is effective for kerf width prediction of the given laser cutting process. (3) The relative importance of the given three input parameters was analyzed using a random forest algorithm. It was found that the most important variable for kerf width is the cutting speed, followed by the pulse frequency and the laser power.