1. Introduction
Driven by artificial intelligence (AI) and internet of things (IoT) technologies, manufacturers are paying more and more attention to smart manufacturing [
1]. Predicting product quality is fundamental in smart manufacturing. For some manufactured products whose quality cannot be measured speedily or handily, it is desirable to fast and accurately predict the product quality based on static data, such as manufacturing parameters tuned before production, as well as dynamic data, such as manufacturing conditions gathered during production.
This paper focuses on product quality prediction for wire electrical discharge machining (WEDM) [
2]. In practice, it focuses on predicting surface roughness Ra of the WEDM product. The surface roughness Ra is defined as the arithmetic mean of the absolute values of the profile deviations from the mean line of the roughness profile. Surface roughness has strong influence on product properties, such as friction, corrosion resistance, wear resistance, light reflection, holding lubricant, heat transfer and distribution, strength, and fatigue strength. Hence, this paper focuses on predicting WEDM product surface roughness rather than many other product qualities, such as the drum-shaped error, geometric accuracy, and so on.
WEDM is a thermo-electrical process that can produce complex 2D and 3D shapes from electrically conductive workpieces by using sparks of electrical discharges. The schematic diagram of a WEDM machine is shown in 
Figure 1 [
3]. The workpiece material and a wire electrode are subject to a pulse voltage, usually of tens or even hundreds of volts. However, they are separated by dielectric fluid (e.g., deionized water), so they are insulated. The wire electrode is usually made of copper, brass, or tungsten. It is wound between two spools, and travels at a constant velocity. The workpiece is moved toward the wire electrode for machining. When the workpiece is very close to the wire electrode (e.g., when the gap between them is less than a few µm), the insulation is broken and a plasma channel is formed in a small area. Discharge occurs between the workpiece and the wire electrode, generating sparks that produce intense heat with temperatures of 8000 °C to 12,000 °C to melt or even vaporize workpiece material. The heat also vaporizes the dielectric fluid, causing large explosion to remove (or flush away) workpiece material debris. The pulse voltage can be used to control discharges. When the pulse is off, the discharge stops and the insulation remains. When the pulse is on again, discharges reoccur to remove workpiece materials. Repeating voltage pulse-on and pulse-off periods can thus achieve the purpose of machining materials, even of high strength and toughness.
The WEDM product quality, such as the surface roughness, is affected by many machining parameters, including the pulse-on time, pulse-off time, open voltage, gap voltage, peak current, wire tension, wire material, wire diameter, wire feed rate, servo feed rate, dielectric flushing pressure, dielectric flow rate, conductivity of dielectric fluid, workpiece height, and thermal conductivity of workpieces, etc. [
3]. Due to the large number of parameters and their combinations, researchers usually fix some parameters and changed only few parameters to perform WEDM experiments for gathering data. The gathered data are then used for analyzing and modeling to optimize WEDM processes [
4,
5,
6,
7,
8,
9,
10,
11,
12,
13,
14] and predict WEDM product quality [
15,
16,
17,
18,
19,
20,
21,
22,
23,
24,
25,
26,
27,
28].
A method, called MTF-CLSTM, is proposed in this paper to integrate the Markov transition field (MTF) [
29] and the convolutional long short-term memory (CLSTM) neural network for WEDM product quality prediction. The MTF is used to represent dynamic WEDM manufacturing conditions as images. The images are then fed into a convolutional neural network (CNN) [
30] to further extract features. Finally, the extracted features, along with static manufacturing parameters, are fed into a long short-term memory (LSTM) neural network [
31] to predict the surface roughness of the WEDM product right after manufacturing. MTF-CLSTM is compared with related work [
19,
20,
21,
22,
23,
24,
25,
26,
27,
28] in many aspects. There is only one existing method [
27] that is like MTF-CLSTM to predict WEDM workpiece surface roughness by using dynamic manufacturing conditions along with static machining parameters. Experiments are conducted to evaluate MTF-CLSTM performance to show that MTF-CLSTM significantly outperforms the existing method in terms of the prediction mean absolute percentage error (MAPE).
The rest of this paper is organized as follows. 
Section 2 introduces some background knowledge, and 
Section 3 shows some related research results. The proposed method is elaborated in 
Section 4. Performance evaluation and comparisons of the proposed method with related methods are demonstrated in 
Section 5. Finally, concluding remarks are drawn in 
Section 6.
  2. Background Knowledge
This section describes some background knowledge, including the MTF, CNN, and LSTM models. Below, the models are introduced one by one in separate subsections.
  2.1. Markov Transition Field (MTF)
The Markov transition field (MTF) is closely related to the Markov chain, as introduced below. The Marokv chain can be used to model the state-to-state transitions of a system [
32]. It uses the state transition diagram or state transition matrix (also called Markov transition matrix) to describe the probabilities of a state transiting to itself or other states. For example, 
Figure 2 is the state transition diagram corresponding to a 4-state Markov chain.
Below is the Markov transition matrix 
 corresponding to the Markov transition diagram shown in 
Figure 2. In the diagram and the matrix, 
, and 
 are the four states, and 
 is the probability of state 
 transiting to state 
, where 
, and 
.
        
In general, an 
m-state Markov chain with states 
 can be represented by an 
 Markov transition matrix 
, where 
 is the probability of state 
 transiting to state 
, 
, and 
, as shown below.
        
Wang and Oates introduced the concept of the Markov transition field [
29], as described below. Given a time series 
, a data point 
 at time step 
t is first assigned to a corresponding state 
 (or a quantile bin 
), where 
, and 
m is the number of states (or quantile bins). In this way, an 
 Markov transition matrix 
 associated with the time series 
X can be derived by first calculating 
, which is the count of data points in state 
 transiting to state 
. Afterwards, each entry 
 of 
 can be derived as 
. It can easily check that 
. The Markov transition field in practice captures the multi-span transition probability between any two data points in 
. It is an 
 matrix, as given below.
        
In the above equation, , and  is the probability that state  of data point  at time step k transits to state  of data point  at time step l. Compared with the Markov transition matrix, the Markov transition field has extra temporal information besides state transition probabilities. It is thus more suitable for representing and extracting features of time series. For a time series of a large number n of data points, its associated Markov transition field is a large  matrix, which is usually regarded as an image for the purpose of analyzing and visualizing.
  2.2. Convolutional Neural Network (CNN)
The convolutional neural network (CNN) is a powerful and efficient artificial neural network with the characteristics of neural parameter sharing and sparsity of neural connections. As shown in 
Figure 3, a CNN usually takes an image as input and contains the input layer, several groups of convolutional layers and pooling layers, one or more fully connected layers, and the output layer [
30].
Filters (or kernels) are applied in convolutional layers to slide over the image to perform the convolution operation for extracting image features, which are called feature maps. One filter generates one feature map, corresponding to a channel to be fed into the following layer. Note that a non-linear activation function, such as the rectified linear unit (ReLU) function, is applied to the feature map before it is generated. Filters have different sizes and different hyperparameters, such as the stride and the padding. A filter with width w, length l, stride s, and padding p slides over the image in the left-to-right and top-to-bottom manner. When the filter moves, it jumps s pixels for every move, with p of zeros are padded on the image borders. Filters are also used in the pooling layer to slide over image maps for the purpose of subsampling the image maps (i.e., reducing the image map sizes) while maintaining critical image map features. The maximum pooling layer, which returns the maximum value in the filter region, and the average pooling layer, which returns the average of values in the filter region, are two typical pooling layers. Fully connected layers (or dense layers) come after the convolutional and the pooling layers.
For connecting to the dense layer, the image maps generated by the last pooling layer are flattened; that is, they are transformed from the multiple-dimension shape into the one-dimension shape and concatenated as a multi-tuple vector. The dropout mechanism is usually applied in the dense layers to avoid over-fitting. The multiple-tuple vector then goes through zero, one or more dense layers, and finally the output layer. The softmax function is used in the output layer when the CNN is for the purpose of classifying the input image. However, another activation function, such as the sigmoid function, is used when the CNN is for the purpose of generating values associated with the input image.
  2.3. Long Short-Term Memory (LSTM) Neural Network
The long short-term memory (LSTM) neural network [
31] is a well-known and effective deep-learning-based model. It is a special type of recurrent neural networks (RNN) [
33]. RNNs are suitable for processing time series, as they add a loop to a neuron allowing the output at the current time point to be used as the input at the next time point. 
Figure 4 shows the structure of the RNN and its corresponding logical structure for processing time series. In 
Figure 4, 
 and 
 are the input data and the output (or hidden state) at time point 
t, respectively. The equation associated with the RNN is shown below.
        
        where 
 stands for an activation function, 
W stands for weights, 
 stands for the concatenation operation, 
b stands for the bias. The RNN behaves as if it could memorize previous input values and output results to generate the current output. However, it has the problems of gradient vanishing and gradient exploding when tuning neural network link weights with the traditional gradient descent error backpropagation mechanism. Therefore, the RNN is difficult to reflect the dependency of input data that are separated far apart in the time series.
The LSTM neural network [
31] can mitigate the gradient vanishing and gradient exploding problems of the RNN by including in an LSTM unit a memory cell and three gates: the input gate, the output gate, and the forget gate. Please refer to 
Figure 5 for the details of the unit structure of an LSTM neural network, by which information can be added to or removed from the memory cell via the control of the gates. The weights associated with gates can be learned so that the memory cell can store the most necessary historical information to produce the most proper output. Six Equations (
5)–(
10) associated with the LSTM neural network are shown below. They are for the forget gate, input gate, intermediate value of memory cell, memory cell, output gate, and output (hidden state), respectively. In the equations, 
 stands for the concatenation operation, × stands for the Hadamard product (element-wise product) operation, 
W stands for weights, 
b stands for the bias, 
 stands for the sigmoid function, and 
 stands for the hyperbolic tangent function.
        
  3. Related Work
This section reviews some research studies [
19,
20,
21,
22,
23,
24,
25,
26,
27,
28] related to WEDM product surface roughness prediction. They applied different prediction methods to different WEDM machines with varying machining parameters for machining workpieces of various materials. Below, the studies are elaborated one by one.
The study [
19] used two methods, namely the linear regression (LR) and the artificial neural network (ANN) with only one hidden layer, to predict WEDM surface roughness of product of AISI 4340 steel. The WEDM machine was the Accutex GE series [
34] using the wire electrode of brass with the diameter of 0.25 mm. Four machining parameters, the pulse-on time, open voltage, wire feed rate, and dielectric flushing pressure, were used as inputs of the two methods. The MAPEs of the surface roughness prediction of the two methods were 7.17% and 4.94%, respectively.
The research [
20] used the 2nd-order regression method to predict the surface roughness of the grade-2 titanium workpiece machined by the Electronica Sprintcut 734 WEDM machine [
35]. The electrode was the brass wire with the diameter of 0.25 mm. The Box-Behnken design for the response surface methodology (RSM) was applied for setting up experiments. Six machining parameters, the pulse-on time, pulse-off time, peak current, gap voltage, wire feed, and wire tension, were used as inputs of the method for the purpose of predicting WEDM product surface roughness. The MAPE of the prediction was 3%.
The research [
21] focused on the prediction and the comparison of WEDM performance for Al7075-TiB
 (Alumine 7075-Titanium di-boride) in-situ composite in terms of the surface roughness, material removal rate, and dimensional error. The machining parameters considered by the research are pulse-on time, pulse-off time, and current bed speed. They were selected based on the Taguchi L27 orthogonal array. The ANN model was used for predicting WEDM performances. The minimum and the maximum deviation between the measured and the predicted surface roughness were 1.3% and 12.51% respectively.
The research [
22] estimated the surface roughness, accuracy, material removal rate and electrode wear for workpiece material of Al(5% wt)-Si
N
 based on machining parameters, such as the pulse-on time, pulse-off time, current and bed speed. The Taguchi L27 orthogonal array was used to select parameters and the ANN model was used to perform the prediction. The experiments were carried out on the Concord DK7720C WEDM machine [
36] using the molybdenum wire of 0.18 mm diameter as an electrode. The predicted results were shown to coincide with the measured results.
The research [
23] proposed a WEDM machining quality prediction method for workpieces of Inconel 718 material in terms of the surface roughness, cutting speed, material removal rate, and sparking gap. The prediction method was based on the cascade forward neural network (CFNN) considering five machining parameters like the pulse-on time, pulse-off time, peak current, servo voltage, and flushing pressure. The Taguchi L256 orthogonal array was applied to setting machining parameter level combinations. The experiments were realized on Sodick AQ537L WEDM machine [
37] using the brass wire of diameter 0.25 mm as the wire electrode. The surface roughness prediction MAPE of the method was 2.00%.
The research [
24] used the Electronica Ultracut S0 WEDM machine [
35] to conduct machining experiments on workpieces of the Al 2124 SiCp (0% wt, 15% wt, 20% wt) metal matrix composite (MMC) material for performing dimensional analysis (DA) and for modeling an ANN to predict the workpiece surface roughness and the material removal rate. Machining parameters, such as the pulse-on time, pulse-off time, duty cycle, wire feed rate, wire tension, peak current, and gap voltage were taken as inputs. Furthermore, the density, thermal conductivity, thermal expansion, SiC powder weight of the workpiece material were also taken as inputs. The predicted surface roughness by the DA and the ANN were of correlation coefficients (R
) of 0.92345 and 0.99999, respectively.
In research [
25], a WEDM machine, Agie Charmilles CUT 20P [
38], was used for machining workpieces of materials of Al-Sn-Sic MMC with varying Sn and SiC weight percentages (5%, 10%, 15%, and 20% Sn wt% and Sic wt%) alloyed into aluminum. The workpiece was of 6 mm height and the brass wire of 0.25 mm diameter was used as the wire electrode. Machining parameters, such as the pulse-on time, pulse-off time, wire feed rate, along with Sn wt% and Sic wt%, were taken as inputs of an ANN for predicting the surface roughness of the machining. The Taguchi L32 orthogonal array was used for the experimental design of machining parameter combinations. The ANN model predicted the WEDM workpiece surface roughness with the correlation coefficient (R
) of 0.9851.
The research [
26] used the support vector machine (SVM) to predict the surface roughness of WEDM workpieces. The workpiece was of material AA6063, which is an Al-Si-Mg based alloy, and is with the size of 150 mm by 100 mm by 15 mm. Four machining parameters were taken as SVM inputs; they were the pulse-on time, pulse-off time, servo voltage, and peak current. Experiments were designed according to the full factorial design with 3 levels. The mean square error (MSE) and correlation coefficient R
 of the prediction were 0.389178 m and 0.963426, respectively.
Two methods, based on the deep neural network (DNN) and the Markov chain deep neural network (MC-DNN), were proposed in [
27] to predict the surface roughness of workpieces machined by the Chmer Q4025L WEDM machine [
39]. The first method took static machining parameters like the pulse-on time, pulse-off time, open voltage, servo voltage, and wire tension, as inputs to perform prediction before machining. The second method took the above-mentioned static parameters along with dynamically-changing machining conditions to perform prediction after machining. The conditions were the gap voltage, servo feed rate, normal-state count, and abnormal-state count. The condition data were regarded as time series and modeled by the Markov chain to derive the Markov transition matrix as features to be fed into the DNN for predicting the workpiece surface roughness. The MAPEs of the predictions of the two methods were 4.9% and 4.68%, respectively.
The research [
28] conducted experiments of machining workpieces of Al7075 aluminum alloy via Sodick SL400Q WEDM machine [
37]. It proposed using four methods, the support vector regression (SVR), quadratic support vector regression (Q-SVR), extreme learning machine (ELM), and weighted extreme learning machine (W-ELM), to predict workpiece surface roughness based on machining parameters, like the pulse-on-time, open voltage, dielectric flushing pressure, and wire feed. The prediction correlation coefficient R
 of the four methods were 0.8824, 0.9613, 0.9411, and 0.9720, respectively. It was shown that the W-ELM model had the best prediction performance.
  4. The Proposed Method
The proposed method, called MTF-CLSTM, integrates the MTF model and the CLSTM neural network for WEDM product quality prediction right after manufacturing. The framework of the proposed MTF-CLSTM method is shown in 
Figure 6. Note that the combination of the CNN and the LSTM neural network is called the CLSTM neural network. This is why the proposed method is called the MTF-CLSTM method. MTF-CLSTM uses the MTF model to represent dynamic WEDM manufacturing conditions as images. It then uses the CNN network to extract features of the images. The LSTM network then takes the features along with static machining parameters as the input data for predicting WEDM workpiece surface roughness. Note that we use the LSTM network rather than other models, as it is useful for identifying the relationship between input data points that may be far apart.
The MTF-CLSTM method takes five static machining parameters and four dynamically-changing machining conditions as inputs to perform prediction right after machining. The static machining parameters are the pulse-on time, pulse-off time, open voltage, servo voltage, and wire tension, whereas the dynamically-changing machining conditions are the gap voltage, servo feed rate, normal-state count, and abnormal-state count. The proposed MTF-CLSTM is elaborated below.
First, dynamic WEDM manufacturing condition data are first fed into the MTF model to be represented as images. Four dynamically-changing machining conditions are used by the proposed method. Hence there are four sets of data, each of which corresponds to a time series. Each time series is transformed by the MFT model to be an image. 
Figure 7 shows the process of the MTF transform. As shown in 
Figure 7, data points in a time series are classified into quantile bins (or states). The Markov transition matrix of state transition probabilities associated with the time series is then derived. The matrix is afterward used to derive the Markov transition field that is equivalently an image, called an MTF image. To reduce the size, the image further goes through a blurring process, which is similar to downsampling of the average pooling, to obtain a blurred image, called a blurred MTF image for the sake of analysis efficiency. Note that the Markov transition field contains extra temporal features besides the state transition probability features.
After the four sets of dynamically-changing machining conditions are transformed as four MTF images, the four images are fed into the CNN as a four-channel image to extract more-detailed features. The CNN used in the proposed method is shown in 
Figure 8. Specifically, the CNN takes images of the 
 shape as inputs. Its first convolutional layer has 64 filters of size 
 with the stride of 1 and ‘same’ padding (i.e., to padding proper number of zeros to keep feature images and original images to have the same size). The first pooling layer is an average pooling layer using filters of the size 
 with the stride of 2. The second convolutional layer has 16 filters of size 
 with the stride of 1 and ‘same’ padding. The second pooling layer is also an average pooling layer using filters of the size 
 with the stride of 2 and ‘same’ padding. Note that the LeakyReLU function is used in both the first and the second convolutional layers. After the second pooling layer, there are image maps of the 
 shape, which are flattened as a 256-tuple vector.
The flattened 256-tuple vector and the 5 static machining parameters are combined as a 261-tuple vector to be fed into the LSTM neural network for predicting the WEDM workpiece surface roughness. As shown in 
Figure 9, the LSTM neural network adopted by the MTF-CLSTM method contains 128 LSTM units and takes the hyperbolic tangent function as the activation function. The LSTM neural network outputs the surface roughness prediction after the 261-tuple vector is entirely fed into it.
  5. Performance Evaluation and Comparison
Experiments are conducted to evaluate the performance of the proposed MTF-CLSTM method for comparisons with related methods [
19,
20,
21,
22,
23,
24,
25,
26,
27,
28]. The experiments are performed on the Chmer Q4025L WEDM machine [
39] for machining SKD61 steel with the length of 10 mm, the width of 10 mm, and the height of 30 mm. The brass wire with the diameter of 0.25 mm is used as the wire electrode. Totally 195 data (or files) are gathered, which are divided into training dataset of 185 data and test dataset of 10 data. Five static manufacturing parameters and four dynamic manufacturing conditions are recorded in each of the data. Each datum (or file) can be regarded as a time series of the length of 128.
The 128 data points of a data file are fed into the MFT model to be represented as a  image. The image is downsized to be a  blurred image, called a blurred MFT. Note that the number of states (or quantile bins) of the MFT model is taken as 3, 4, or 5. By the MFT model, 4 images of the  size are generated, each corresponds to a dynamic manufacturing condition. The 4 images are then fed into the CNN for feature extraction with each image as a channel. The CNN layers include two convolutional layers and two pooling layers. The first convolutional layer has 64 filters, all with the size  and the stride 1, followed by an average pooling layer with the pool size of . The second convolutional layer has 16 filters, all with the size  and the stride 1, followed by an average pooling layer with the pool size . The activation function for each convolutional layer neuron is LeakyReLU. After the CNN flatten layer, 256 features are extracted. The 256 features, along with the 5 static manufacturing parameters, are then fed into the LSTM neural network with 128 LSTM units for identifying time-dependency relationship between features to predict the product surface roughness Ra of the WEDM workpiece. When training the CLSTM model, the batch size is taken as 32 with 150 epochs. Furthermore, the 10-fold cross validation and the early stopping mechanisms are used to avoid over-fitting the model.
The experimental results show that the mean absolute percentage errors (MAPE) of MTF-CLSTM method using 3-, 4-, and 5-state MTF are 3.11%, 2.94%, and 3.24%, respectively. 
Figure 10 shows the prediction versus the fact (ground truth) of the WEDM product Ra in the unit of μm.
Table 1 shows the comparisons of the proposed method with related methods [
19,
20,
21,
22,
23,
24,
25,
26,
27,
28] predicting WEDM workpiece surface roughness. Among the compared methods, the MC-DNN method proposed in [
27] is the only method that is like MTF-CLSTM to predict WEDM workpiece surface roughness by using static machining parameters and dynamic manufacturing conditions. By 
Table 1, it can be observed that the proposed MTF-CLSTM method significantly outperforms the MC-DNN method for all the cases of the 3-, 4-, and 5-state MTFs in terms of surface roughness prediction error MAPEs.
   6. Conclusions
This paper proposes a method called MTF-CLSTM, to integrate the MTF model, the CNN, and the LSTM neural network for WEDM product quality prediction right after machining. MTF-CLSTM first uses the MTF model to transform the gathered data into images to extract temporal information and state transition probability information. It further uses the CNN to extract more detailed spacial features from images. Finally, the LSTM neural network is used to capture temporal relationship that may be separated far apart in data. Experiments are conducted to evaluate the performance of the proposed method. The prediction MAPEs of the proposed method using 3-, 4-, and 5-state MTF are 3.11%, 2.94%, and 3.24%, respectively. It can be observed that MTF-CLSTM outperforms DNN and MC-DNN, which are two methods using the same experimental settings as MTF-CLSTM. Besides performance, the proposed method is also compared with related research [
19,
20,
21,
22,
23,
24,
25,
26,
27,
28] in many other aspects, such as the WEDM machine used, workpiece material, workpiece size, parameters used for the prediction, and so on.
In the future, the authors plan to apply the proposed MTF-CLSTM method to predict different product quality like the dimension error and the material removal rate. The authors also plan to apply hyperparameter optimization techniques [
40], such as Bayesian optimization and its variants, multi-bandit mechanisms, and population based training (PBT) approaches, for facilitating hyperparameter tuning and for improving performance. The hyperparameters for tuning include the filter size, the number of filters, the number of layers, the number of neurons per layer, the dropout rate, various activation functions, and so on.