Spark Analysis Based on the CNN-GRU Model for WEDM Process

Wire electrical discharge machining (WEDM), widely used to fabricate micro and precision parts in manufacturing industry, is a nontraditional machining method using discharge energy which is transformed into thermal energy to efficiently remove materials. A great amount of research has been conducted based on pulse characteristics. However, the spark image-based approach has little research reported. This paper proposes a discharge spark image-based approach. A model is introduced to predict the discharge status using spark image features through a synchronous high-speed image and waveform acquisition system. First, the relationship between the spark image features (e.g., area, energy, energy density, distribution, etc.) and discharge status is explored by a set of experiments). Traditional methods have claimed that pulse waveform of “short” status is related to the status of non-machining while through our research, it is concluded that this is not always true by conducting experiments based on the spark images. Second, a deep learning model based on Convolution neural network (CNN) and Gated recurrent unit (GRU) is proposed to predict the discharge status. A time series of spark image features extracted by CNN form a 3D feature space is used to predict the discharge status through GRU. Moreover, a quantitative labeling method of machining state is proposed to improve the stability of the model. Due the effective features and the quantitative labeling method, the proposed approach achieves better predict result comparing with the single GRU model.


Introduction
Wire electrical discharge machining (WEDM) is a non-conventional machining method used to remove material through the high temperature produced by a series of repetitive electrical discharge of small duration and huge current density between the wire tool and work piece [1][2][3][4]. Due to the minute amount of spark erosion, WEDM is usually used in the machining of micro and precision parks. For example, Ahmed et al. [5] conducted the experiment on the manufacture of high-aspect-ratio thin structures of micrometer thickness (117-500 µm) from D2 steel through WEDM. In order to produce microchannels with desired/target geometry and acceptable surface quality, Saleh et al. [6] carried out the results of an investigation on the capacity of WEDM to produce microchannels in the nickel-based alloy, Monel 400. WEDM is one of typical kind of EDM, they are developed by using the phenomenon of spark erosion, and a lot of research has been carried out in various aspects relating to improving performance measures, optimizing the process variables, and monitoring and controlling the sparking process [4,7]. WEDM parameter settings as well compute the weightage of the criteria by applying the ARAS ranking method and AHP method, respectively [27]. Suganthi et al. [28] carried out the comparative experiments about ANN model and ANFIS model and revealed the fact that ANFIS outperformed to ANN in terms of modeling and prediction accuracy. Sarkheyli et al. [29] proposed a hybrid technique anchored in ANFIS and modified genetic algorithm (MGA) to train a model to predict the SR and MRR in WEDM process. Recently, Naresh et al. [22] also have concluded that ANFIS model gave more exact and effective soft computing method when compared to ANN model for superior prediction of WEDM process responses like MRR and SR of Nitinol alloy. In addition, Somashekhar et al. [30] combined ANN and genetic algorithm (GA) in optimizing the MRR in micro-EDM that the back-propagation network data along with the GA can successfully synthesize optimum input condition to maximize the MRR. Ong et al. [31] developed a small mean-squared error (MSE) model of radial basis function neural network to predict the MRR and EWR of the EDM process. Ming et al. [32] conducted cutting parameter optimization in the WEDM process by integrating ANN, and wolf pack algorithm based on the strategy of the leader (LWPA). It was found that the ANN-LWPA integration system has some advantages on reducing the value of fitness functions by comparison with the experimental regression model, ANN model, and conventional LWPA result. Furthermore, Yan et al. successively developed a servo control system based on fuzzy rule-based control strategy and adjusting strategy, as well a hierarchical adaptive control system based on the estimation of workpiece height on-line by using ANN to reduce the wire breakage and improve the machining stability and speed compared to the commonly used gap voltage control system [33,34].
From previous research about modeling and optimization for WEDM (or EMD), classical approaches such as Taguchi, ANN, etc. or their hybrid methods such as ANN-LWPA, GPR-MOGA, etc. are basically used, which studied performance in terms of electrical characteristics such as pulse current and wire speed. However, their research lacks a visual perspective which contains useful and important information in the WEDM process, such as images of sparks and wire vibrations.
In recent years, some novel research has emerged. Zhang et al. [35] first proposed a hybrid technique of WEDM which employs assisted ultrasonic vibration (USV) and magnetic field (MF) to improve the machine performance. Then they implemented theoretical and experimental study to illustrate its improving mechanism and gained the high MRR (44.0%) and the low SR (30.5%) performance as a result [36]. Recently, Ablyaz et al. [37] found that a 118% increase in MRR and an enhancement (613.6%) in the micro-hardness under the influence of magnetic field during the EDM process of AL-SiC metal Matrix Composite. Through analysis of the influence between pulse type and process performance indicators, it showed that MRR and TWR values increased as the number of normal pulses grew while the TWR decreased in the condition of increasing in arcs and delayed pulses [38]. Moreover, it was found that cutting rate and surface roughness were affected significantly by input parameters (Ton, Toff, SV, WF) during WEDM process of Ni-27Cu-3.15Al-2Fe-1.5Mn, and the empirical relations between them were concluded by Aggarwal et al. [39]. Gurupavan et al. [40] proposed a machine vision system which can provide wire electrode status and workpiece surface texture information in WEDM of aluminum silicon nitride (AlSi3N4) composite material via acquire the images of wire electrode and machined surface specimens using the machine vision system. Sanchez et al. [41] presented computer simulation software for the analysis of error of WEDM trapper-cutting which observably reduced experimental work. In order to address the problem about limitation of existing servo systems in machining semiconductors by WEDM, Liu et al. [42] developed a new servo system based on current pulse probability detection. Zhang et al. [43] used Wavelet moment analysis (WMA), Hu moment analysis (HMA), fractal dimension analysis (FDA), local geometric characteristics (LGC), and global geometric characteristics (GGC) to extract the waveform image features and reduce image dimension, and then based on SVM and regression, developed a two-stage classification method for discharge pulse discrimination and classification which to monitor discharge pulse on-line in WEDM-HS process. The rea-son is that high frequency discharge and micro-energy discharge may seriously complicate obstruction discharge signal distortion [44].
The above mentioned approaches showed good performance in some cases; however, they have limitations like low efficiency, instability, and even system breakdown [45], due to the following reasons: (1) voltage and current signals are accompanied with nonstationarity, nonlinearity, and internal coupling characteristics; (2) conventional method conducts a hysteretic control due to the discharge state changes so fast that the controlling strategy resulting from the historical state is not always suitable for the current state.
Different from the previous research, this paper presents a novel approach and perspective to predict the discharge status through spark images captured by high-speed camera. Considering the spark phenomenon in WEDM, spark images from a high-speed camera are collected and a series of experimental analyses are conducted. In the papers reviewed above, most research only focused on the relationship between processing technic and electrical parameters. However, they ignored the essential phenomenon in the process of WEMD (or EDM): the generation of electric spark. Although some research gradually begins to apply ANN [46] and other intelligent algorithms to the research of control system and on-line prediction [47,48], there is still high potential for improvement. In other words, the methods based on electrical parameters and traditional intelligent algorithms encounter a bottleneck effect due to the limitations we have mentioned above. With the increase of computing power, artificial neural network is more effective than traditional methods in image feature extraction and sequence feature extraction [49]. Recently, Zhang et al. [50] presented a novel and intelligent pulse classification method using different recurrent neural networks (RNNs) and the result verified that RNN performed well in the sequence recognition task during EDM process. Also, Lee et al. [51] combined a CNN and RNN to extract time-dependent and time-independent features during the chemical mechanical planarization process. Bustillo et al. [52] found that Adaboost ensembles provided the highest accuracy and were more easily optimized than artificial neural networks during the optimization of a friction-drilling process. In addition, Chen et al. [53] claimed that extracting signal characteristic was fairly time consuming so that they proposed a multi-scale CNN and LSTM model to apply to bearing fault diagnosis. This new study found that deep learning method performs better than traditional methods such as empirical mode decomposition, fast Fourier transform, discrete wavelet transform, etc. Considering the essential phenomena of spark during the process of WEDM and the advantages of new methods of image processing and deep learning, this paper proposes a new spark image identification method based on convolution neural network (CNN) and GRU to predict the discharge status. Through CNN, the features of spark images can be extracted by a series of convolution kernels. In order to train the deep learning network, the relationship between spark images and discharge status is achieved by mapping the voltage-current state (through their waveform areas and their power) to the spark images.
In the past studies, discrete values are basically used to define the processing statesuch as open circuit, short circuit, processing, and other states. Since the discrete processing state is generally obtained through the threshold method, it is very sensitive to the boundary value. Therefore, this approach can cause unstable of the state and increase the probability of misjudgment. In order to overcome this problem, this paper proposes a continuous quantity to define the processing state-that is, the area of voltage waveform, the area of current waveform, and the continuous quantity of power are used to define the processing state. On one hand, using continuous value to evaluate machining state can greatly improve the stability of model. On the other hand, using the processing state of continuous value as the label is conducive to the design of the later neural network model, which can transform the classification task into a regression task, and avoid the problem of difficult convergence caused by the frequent jump of discrete label value under the condition of approximate characteristic input.
Since the frames in the collected spark image sequence are related to each other, a current frame may retain a part of information about its previous frame. The remaining information of a frame becomes interference when a single frame is used as the input of the network model. In view of this, this paper proposes two models named "Se-quence2Sequence" and "Image2Sequence" to predict the discharge status by the spark images. Both models take information about the current and past frames as input. Under these circumstances, the information of the past frame will reflect the motion trajectory and motion state of the spark, which is important to reflect the processing state.
Therefore, the definition of continuous labeling and two kinds of network models proposed in this paper are important work for determining the law between spark image and machining state. The spark image is the most essential and direct phenomenon in the process of WEDM, and the law between spark images and the processing state is conducive to the exploration of higher precision processing technology and lower cost of multi-station real-time control system. This paper is organized as follows. In Section 2, it introduces the working principle of WEDM, the main characteristics of spark image, the principle of RNN, CNN and the dynamic time warping (DTW) algorithm. In Section 3, it introduces the synchronous acquisition and preprocessing of voltage data, current data and image data. Section 4 is about experimental setup. In Section 5, the experimental data is analyzed statistically based on the theory in Section 2. In Section 5, the data is trained based on the two different RNN models, and the results are analyzed and discussed. Finally, a conclusion of the work is provided. Figure 1 shows the image of spark during processing. Let H, W denote the height and width of the spark image, respectively. Point (x 0 , y 0 ) denotes the spark center. p(x 0 , y 0 ) denotes the pixel value of the point (x 0 , y 0 ). According to the characteristics of spark, eight kinds of features were defined as follows, and the representation information is given in the Table 1. Table 1. Features and representation information.

Features Representation Information
Area Represents the area of the spark in the image. To some extent, it reflects the amount of erosion in processing.

Energy
Represents the energy of the spark in the image. It is closely related to processing parameters such as current and voltage.

Energy density
Reflects the concentration of energy. It is the amount of energy per unit area which is closely related to the processing state of processing center.

Spark area distribution
Represents the area distribution of processing region. It is closely related to wire direction Spark energy distribution Represents the Energy distribution of processing region. It is closely related to wire direction.

Spark number
Represents the numbers of the spark. It reflects the morphological characteristics of spark process, such as the gathering spark generated by the discharge and the dissipating spark generated by the open circuit.

HU moment
Represents other geometric features of the spark region in the image which are invariant to rotation, translation, scale, and so on.
As the working ranges of the response variables varies both in units and magnitude, normalization of data is crucial. Each response data is normalized into dimensionless values to make them comparable with each other. Various feature extraction and normalization methods are given as follows. Figure 1 shows the image of spark during processing. Let H, W denote the height and width of the spark image, respectively. Point (x0, y0) denotes the spark center. p(x0, y0) denotes the pixel value of the point (x0, y0). According to the characteristics of spark, eight kinds of features were defined as follows, and the representation information is given in the Table 1.

Spark Feature
Let M(x, y) denotes the threshold image of the spark image, then the threshold function is where P(x, y) denotes the pixel value of the point (x, y). The area of spark (S) can be counted by the formula Then the normalized area (S n ) can be calculated as where H, W denote the height and width of the spark image, respectively.

Energy (E)
According to the previous study [54], the Gaussian distribution of heat input proposed by Patel et al. has been used to approximate the heat from the plasma. The heat flux q w (r) at radius r is given by the following formula [13].
where R pc is spark radius (µm) at the work surface, and the maximum heat flux q 0 can be calculated [13] as where F c is the fraction of total EDM spark power going to the cathode; V is discharge voltage (V); I is discharge current (A). Ikai et al. [55] have derived a semiempirical equation of spark radius (R pc ) namely "equivalent heat input radius" as a function of discharge current (I) and spark on time (T d ), which is more realistic as compared to other approaches. The spark radius R pc is shown as In this paper, the function of spark radius and energy is defined as where point (x 0 , y 0 ) denotes the spark center of spark image, and f (d) is a function of distance (d) between spark point to spark center. They are calculated as Normalized energy (E n ): where K denotes the total energy when the spark image is white i.e., P(x, y) is 255 in Equation (7).

Spark Energy Density (ESR)
where E n and S n are calculated by Equations (3) and (11), respectively. Through Equation (12), it is found that ESR reflects the concentration of energy.

Spark Area Distribution (SD k )
As shown in Figure 2, the spark image is divided into four parts. According to Equation (2), the spark area of each part can be calculated as follows. As shown in Figure 2, the spark image is divided into four parts. According to Equation (2), the spark area of each part can be calculated as follows.

Spark Energy Distribution (EDk)
Similarly to the calculations of spark area distribution, the spark energy distributions can be calculated based on Equation (7).

Spark Energy Distribution (ED k )
Similarly to the calculations of spark area distribution, the spark energy distributions can be calculated based on Equation (7).
ED k reflects the direction of the explosion and indirectly reflects the distribution of the erosion of the workpiece.

HU Moment
Classical geometric moments m pq of an image I xy are calculated with the equation Hu [56] first proposed seven invariant moments u 1 -u 7 by using the normalized central moments of second-order and third-order. HU moments are widely used to image recognition along with a series of basic properties including the rotation, translation, scale invariance [57,58].

Dynamic Time Warping
In the acquisition of time series data, electrical parameters and spark images are different from multiple aspects-such as sample rate, physical property, the time shift characteristics of the occurrence of phenomena, etc. Additionally, the unavoidable noise at the acquisition system also brings about time shifting between the two types of time series data even if they describe the same discharge status. Consequently, it is not appropriate to use Euclidean distance to measure the similarity of two types of time series. In every way, Euclidean distance and its variants present several drawbacks, that make inappropriate their use in certain applications [59].
(1) It compares only time series of the same length.
(2) It does not handle outliers or noise.
(3) It is very sensitive with respect to six signal transformations: shifting, uniform amplitude scaling, uniform time scaling, uniform bi-scaling, time warping, and nonuniform amplitude scaling.
DTW has been proven a very effective similarity measure, since it minimizes the effects of shifting and distortion in time [60]. In this study, the sampling rate of current and voltage is different from that of spark image, and the data obtained by sampling is different in length. DTW algorithm is used for similarity, and the following results are obtained as Figure 3.
In the acquisition of time series data, electrical parameters and spark images are different from multiple aspects-such as sample rate, physical property, the time shift characteristics of the occurrence of phenomena, etc. Additionally, the unavoidable noise at the acquisition system also brings about time shifting between the two types of time series data even if they describe the same discharge status. Consequently, it is not appropriate to use Euclidean distance to measure the similarity of two types of time series. In every way, Euclidean distance and its variants present several drawbacks, that make inappropriate their use in certain applications [59].
(1) It compares only time series of the same length.
(2) It does not handle outliers or noise.
(3) It is very sensitive with respect to six signal transformations: shifting, uniform amplitude scaling, uniform time scaling, uniform bi-scaling, time warping, and non-uniform amplitude scaling.
DTW has been proven a very effective similarity measure, since it minimizes the effects of shifting and distortion in time [60]. In this study, the sampling rate of current and voltage is different from that of spark image, and the data obtained by sampling is different in length. DTW algorithm is used for similarity, and the following results are obtained as Figure 3.

Spark Feature
The relationship between discharge pulse and discharge states is investigated by lots of previous research. The features of the spark image, which was provided previously, contain essential and significant information about processing parameters and conditions in WEDM such as current, power, wire direction, workpiece erosion. However, the spark in a spark image does not disappear immediately and its morphological and motion features also do not appear immediately. As a result, the relationship between spark image feature and discharge states is not directly and entirely related rather than non-linear and multi-frame corresponding.

Sequence to Sequence Model
According to the calculation method of the spark features, all of features extracted by the spark frames form into a feature array like (Len,18,1), where Len is the number of spark frames in a process. In Figure 4, the first model proposed in this paper is called as "Sequence to sequence model" which is based on RNN and takes the feature array of serval frames as input and the corresponding labels array as output. The output of RNN depends on several time step data. In other words, RNN can mine the relationship between frames in the spark image or its feature sequence due to the memory function of its network structure [61]. Hence, it can accurately predict the processing states through serval frames of spark images.

Sequence to Sequence Model
According to the calculation method of the spark features, all of features extracted by the spark frames form into a feature array like (Len,18,1), where Len is the number of spark frames in a process. In Figure 4, the first model proposed in this paper is called as "Sequence to sequence model" which is based on RNN and takes the feature array of serval frames as input and the corresponding labels array as output. The output of RNN depends on several time step data. In other words, RNN can mine the relationship between frames in the spark image or its feature sequence due to the memory function of its network structure [61]. Hence, it can accurately predict the processing states through serval frames of spark images. Given a sequence of inputs (x1, …, xT), a standard RNN computes a sequence of outputs (y1, …, yT) by iterating the equation where W hx and W hh denote the weight of input layer and hidden layer of RNN, respectively. ht-1 denotes the output of hidden layer of RNN at the last time.
where W yh denotes the weight of output layer and ht denotes the output of hidden layer of RNN at the present. The traditional RNN is proved to have the problem of vanishing gradient [62]. Gated recurrent unit (GRU) is an improvement of traditional RNN which has the advantages of fewer parameters and learning about long-term dependence [63,64]. The struct of GRU is given by Figure 5. Update gate is used to decide whether to pass previous O/P (ht−1) to next cell (as ht) or not. Forget gate is nothing but additional mathematical where W hx and W hh denote the weight of input layer and hidden layer of RNN, respectively. h t−1 denotes the output of hidden layer of RNN at the last time.
where W yh denotes the weight of output layer and h t denotes the output of hidden layer of RNN at the present. The traditional RNN is proved to have the problem of vanishing gradient [62]. Gated recurrent unit (GRU) is an improvement of traditional RNN which has the advantages of fewer parameters and learning about long-term dependence [63,64]. The struct of GRU is given by Figure 5. Update gate is used to decide whether to pass previous O/P (h t−1 ) to next cell (as h t ) or not. Forget gate is nothing but additional mathematical operations with a new set of weights (W t ). The variables in Figure 5 are updated by the following formula: where, x t is the input vector, h t is the output vector, h t is candidate activation vector, z t is update gate vector, r t is reset gate vector, W is parameter, σ is a sigmoid activation function while tanh is a hyperbolic tangent activation function. operations with a new set of weights (Wt). The variables in Figure 5 are updated by the following formula: Figure 5. The structure of GRU cell.
where, xt is the input vector, ht is the output vector, h ' t is candidate activation vector, zt is update gate vector, rt is reset gate vector, W is parameter, σ is a sigmoid activation function while tanh is a hyperbolic tangent activation function.

Sequence to Sequence Model
In Figure 6, another model proposed in this article is called as "image to sequence" model. It is combined CNN network with RNN network. Different from the above "sequence to sequence" model, the features were extracted by CNN, rather than extracted by the invariant calculation method provided at the first of paper. CNN is widely used to image feature extraction [65]. In Figure 6, the basic block of CNN contains convolutional layer, max pooling layer, ReLU active layer and batch normalization layer. Mathematically, the computational process can be described as where, I and K is the input image and kernel, and x is the input value of ReLU activation function or Batch normalization function. In BN operation, is added in the denominator for numerical stability and is arbitrarily small constant, and the parameters γ and β are subsequently learned in the optimization process.

Sequence to Sequence Model
In Figure 6, another model proposed in this article is called as "image to sequence" model. It is combined CNN network with RNN network. Then, the output of CNN connects to RNN's input in order to mining the relationship between each frames' features. Because of difference in length between frames and discharge states calculated by current and voltage, a connect part is useful and necessary to match these two unequal sequences. That is, after inputting the RNN's output to the connect part, the discharge states of WEDM are obtained by the finally output of the connect part.
To sum up, the "image to sequence" model extract spatial features (the features of one spark frame) through using CNN, and then the temporal features (the features of serval previous frames) are extracted by RNN.
Above two models would be trained by the samples of experiments. All of samples were separated into the train, validation and the test sets. Figure 6. "Image to sequence" model.  Figure 6. "Image to sequence" model. Different from the above "sequence to sequence" model, the features were extracted by CNN, rather than extracted by the invariant calculation method provided at the first of paper. CNN is widely used to image feature extraction [65]. In Figure 6, the basic block of CNN contains convolutional layer, max pooling layer, ReLU active layer and batch normalization layer. Mathematically, the computational process can be described as where, I and K is the input image and kernel, and x is the input value of ReLU activation function or Batch normalization function. In BN operation, ε is added in the denominator for numerical stability and is arbitrarily small constant, and the parameters γ and β are subsequently learned in the optimization process. Then, the output of CNN connects to RNN's input in order to mining the relationship between each frames' features. Because of difference in length between frames and discharge states calculated by current and voltage, a connect part is useful and necessary to match these two unequal sequences. That is, after inputting the RNN's output to the connect part, the discharge states of WEDM are obtained by the finally output of the connect part.
To sum up, the "image to sequence" model extract spatial features (the features of one spark frame) through using CNN, and then the temporal features (the features of serval previous frames) are extracted by RNN.
Above two models would be trained by the samples of experiments. All of samples were separated into the train, validation and the test sets.

Synchronous Acquisition of Spark Image and Voltage Data
The spark image is captured by the High-Speed camera MEMRECAM ACS-1 M60 which is manufactured by NAC Image Technology Inc., Tokyo, Japan. The voltage is measured by NI USB-6366 device which is manufactured by National Instruments Inc., Texas City, TX, USA. An acquisition and control for WEDM based on the LabVIEW tool is developed to synchronize the acquisition of image and voltage data.
According to the synchronous software of High-Speed camera, the time from a software trigger initiation to the start of shutter is shown below (as shown in Figure 7): Micromachines 2021, 12, x 13 of 28

Synchronous Acquisition of Spark Image and Voltage Data
The spark image is captured by the High-Speed camera MEMRECAM ACS-1 M60 which is manufactured by NAC Image Technology Inc., Tokyo, Japan. The voltage is measured by NI USB-6366 device which is manufactured by National Instruments Inc., Texas, America. An acquisition and control for WEDM based on the LabVIEW tool is developed to synchronize the acquisition of image and voltage data.
According to the synchronous software of High-Speed camera, the time from a software trigger initiation to the start of shutter is shown below (as shown in Figure 7 LabVIEW generates the signal to make NI device to acquire the voltage and output an external signal in order to control the High-Speed camera to capture a list of frames.
At the same time, LabVIEW control MCU device to output square wave signal to control WEDM movement such as generate pulse, servo movement, and so on.
After the acquisition and control system are established, the timing sequence of WEDM control and data synchronous acquisition is shown in Figure 8. Under the condi- LabVIEW generates the signal to make NI device to acquire the voltage and output an external signal in order to control the High-Speed camera to capture a list of frames.
At the same time, LabVIEW control MCU device to output square wave signal to control WEDM movement such as generate pulse, servo movement, and so on.
After the acquisition and control system are established, the timing sequence of WEDM control and data synchronous acquisition is shown in Figure 8. Under the condition of no short circuit, the private server is given a constant value in the collection process. NI device generates synchronous clock signal to the High-Speed camera and then output trigger signal after waiting for the stable output of the clock signal. LabVIEW generates the signal to make NI device to acquire the voltage and ou an external signal in order to control the High-Speed camera to capture a list of fram At the same time, LabVIEW control MCU device to output square wave signa control WEDM movement such as generate pulse, servo movement, and so on.
After the acquisition and control system are established, the timing sequenc WEDM control and data synchronous acquisition is shown in Figure 8. Under the co tion of no short circuit, the private server is given a constant value in the collection cess. NI device generates synchronous clock signal to the High-Speed camera and output trigger signal after waiting for the stable output of the clock signal.

Spark Feature
The waveform data was filtered by median value and the image data was filtere background difference algorithm.

Spark Feature
The waveform data was filtered by median value and the image data was filtered by background difference algorithm.

Waveform Data
The waveform data was filtered by the method of median. In detail, let r l represents the filter left rank, and r r represents the filter right rank, then the result of xi the median y i of {x i −r l , . . . , x i −1, x i , x i +1, . . . , xi+r r } which is a subset of input sequence.
In Figure 9, the red line represents to pulse waveform, while the green line represents the area of wave during current voltage pulse duration.

Waveform Data
The waveform data was filtered by the method of median. In detail, let rl represents the filter left rank, and rr represents the filter right rank, then the result of xi the median yi of {xi−rl, …, xi−1, xi, xi+1, …, xi+rr} which is a subset of input sequence.
In Figure 9, the red line represents to pulse waveform, while the green line represents the area of wave during current voltage pulse duration. And the blue line represents to the power of wave corresponding to green line. The power is calculated as where r s is the rising edge time, and t d is the pulse duration.

Image Data
The image can be expressed as where Y is the output image, X is the real spark image, α represents the internal noise which includes camera noise and line noise, while β represents the external noise which includes environmental disturbances and background objects. The spark has the property of high brightness because of the high energy in the discharge. Hence, in the process of acquiring image, the signal energy of the spark region is much larger than that of the noise. At the same time, during the cutting process, the background changes relatively little, and the ambient objects in the background under the condition of low exposure time set by the high-speed camera, the signal energy is negligible compared with the spark.
By background difference algorithm, the environmental disturbance and most of the noise with small change can be removed. As shown in Figure 10, the difference image (Y bd ) is obtained by subtracting the background image (Y 0 ) from the current frame (Y t ).
Then, the residual random noise is removed by pixel brightness threshold and pixel inversion.
Finally, the filter results are shown in Figure 11.  Then, the residual random noise is removed by pixel brightness threshold and pixel inversion.
Finally, the filter results are shown in Figure 11.

Experiments and Analytics
A three-axis WEDM machine was used to conduct machining experiments. The work piece was AISI 1045 carbon steel (Table 2), which was widely used in industrial production. The other machining parameters in each experiment are shown in Table 3.  Under the condition of Nyquist, the pulse data acquisition frequency was set to 200 kHz while the image data acquisition frequency was set to 5000 fps (see Table 4). A total of 10,000 frames and 4,000,000 pulses were collected synchronously during the machining process. Table 4. Acquisition conditions.

Acquisition Conditions Value
Pulse sample frequency 2,000,000 Hz Image sample frequency 5000 fps Workpiece AISI 1045 carbon steel The pulse frequency f is computed as where T is the pulse cycle.
If the pulse frequency is 5 kHz, it is calculated that the spark image data of 1 cycle corresponds to the pulse data of 1 cycle, that is 1 spark frame corresponds to 400 pulse points. The higher the image sampling rate, the more detail information is restored in a pulse cycle. Through the experiments of high-speed camera, we found that the sampling rate of 5 kHz is the best. Table 5 shows the statistical data of each experiment. For clearly showing, the curves about energy, area and ESR of different factors have plotted by Figure 12.   In Figure 12a, the total energy values of spark (E sum ) are not much affected by pulse frequency due to the fact that the total energy of electricity is related to duty ratio but rarely related to frequency. The phenomenon of small fluctuation on the frequency curve are due to the different probabilities of spark occurring at different frequencies. Hence, E sum will fluctuate naturally. Additionally, it is speculated that under a certain processing frequency, E sum can reach the maximum. However, it cannot be directly proved the fact because there are not enough experiments on the influence of a single factor of frequency in this study. Affected by the power of processing pulse, E sum also changes greatly. Under the condition of maximum machining power, E sum is close to three times that of the original value. At the same time, E sum is greatly affected by the cutting speed and shows a trend of rising first and then falling, which indicates that within a certain range, increasing the cutting speed will increase E sum . Nevertheless, too high cutting speed will lead to partially short or even short circuit, and no spark will be generated, then E sum will also decrease. Figure 12b shows the trend of the relationship between the total area values of spark (S sum ) and each parameter. It can be seen that S sum is negatively correlated with the pulse frequency, that is, the higher the processing pulse frequency is, the smaller S sum is. However, the relationship between S sum and machining power or cutting speed is uncertain. The ratio of energy to area (ESR), which combines the relationship between E sum and S sum , focuses on reflecting the spark image information of the processing center, while ignoring the diffused spark image information. In this way, ESR can mostly directly reflects the processing state at the current moment. As shown in Figure 12c, ESR ends to increase with the increase of the processing pulse frequency, or the processing pulse power, or the cutting speed. This conclusion reflects that occurrence frequency and amplitude (brightness) of spark discharge occurring at the center point both increase with the increase of the three parameters mentioned above.

Analysis of Statistical of Experimental Data
Furthermore, Figure 13a shows the ratio of the spark energy above and below the workpiece (E up :E down ) where 'above' is the position 1 and position 2 of distribution, that is E1+E2. Similarly, Figure 13b shows the ratio of the spark area above and below the workpiece (S up :S down ). The result showed that under the same circumstances of other processing conditions (except direction of wire), E up :E down and S up :S down are above 1 predominantly for the up direction of wire while less than 1 for the down direction of wire. To sum up, E up > E down and S up > S down are true. By observing the experimental phenomenon, when wire goes up, more sparks will explode on the top. This is mainly because the fragments of exploded spark will be affected by the force of the wire. In details, Figure 13c   In order to visualize a spark process during experiment, Figures 14 and 15 show the details of image and statistical data. Here describes the process about spark in Figure 14.
At normal, the pulse is on open states and the spark image shows none of spark which means all characteristic values are zero. When start to spark, the pulse changes into processing status and the spark image shows one small spark point which the shape is approximately a circle distributed in the processing center. It has a small area (S n ) but a large amount of energy (E n ) i.e., energy ratio (ESR) is very high. In the next frames, the number, area, energy, and average speed of spark show the characteristic of increasing and then decreasing. In a spark process (50 frames) (Figure 14), the ESR is the first feature to reach the peak because pulses of processing status are mainly in the previous period. The number of ESR peaks is correlated with the number of processing pulses (Figure 14c). Generally, the area of spark has a more stable trend because the spark that has occurred has a steady dissipating process. Its peak always lags behind the peak of energy and leads to a higher ESR when the spark occurs. This, in turn, supports the fact that the ESR peaked in the first place. In terms of feature distribution, the energy and area values of the four quadrants mainly are affected by the direction of wire. A composite image of 50 frames of spark images captured during the machining process is shown in Figure 14b. The left image in Figure 14b shows the brightness of spark while the right image shows the shape and distribution of spark. In details, the pie chart is shown in Figure 14d.   Figures 16 and 17 show the predict result and train result of both models, respectively. In Figure 16a, it is found that RNN could predict discharge states to some extent, but the result was unsatisfactory. No matter how to adjust the length of input or the layers of RNN, the result could not improve at all. The training loss of "sequence to sequence" model is shown in Figure 17a. It is obvious that the loss is unstable though it has a trend to descend. The model occurred this result due to the input dataset which has the main features, but they are not enough to restore the ordinary spark image data. Hence, the advantage of the model is fast training while the disadvantage is that the precision of prediction is not high. Figure 17b is the loss of "image to sequence" model. It shows some vibration and a gradual downward trend. Figure 16b,d show the tracking of prediction. The dataset of testing includes the training dataset (frame 3000 to 15,000) and testing dataset (frame 1 to 3000). The predict tracking of training dataset showed in Figure 16c works well. It is clearly found that it can tail after the label. Meanwhile, the number of peak and the value of peak nearly equal to the label. When it turns to the testing dataset, it also remained the result of peak number equaling.

Training Results and Discussion
Compare to the "sequence to sequence" model, the "image to sequence" model extracts the features automatically. To sum up, the "image to sequence" model has slow training speed but high prediction accuracy while the "sequence to sequence" model trains the model fast accompany with low prediction accuracy. The mean of precision of the whole dataset is 95% in "image to sequence" model, while it is 90% in "sequence to sequence" model.  Figures 16 and 17 show the predict result and train result of both models, respectively. In Figure 16a, it is found that RNN could predict discharge states to some extent, but the result was unsatisfactory. No matter how to adjust the length of input or the layers of RNN, the result could not improve at all. The training loss of "sequence to sequence" model is shown in Figure 17a. It is obvious that the loss is unstable though it has a trend to descend. The model occurred this result due to the input dataset which has the main features, but they are not enough to restore the ordinary spark image data. Hence, the advantage of the model is fast training while the disadvantage is that the precision of prediction is not high. Figure 17b is the loss of "image to sequence" model. It shows some vibration and a gradual downward trend. Figure 16b,d show the tracking of prediction. The dataset of testing includes the training dataset (frame 3000 to 15,000) and testing dataset (frame 1 to 3000). The predict tracking of training dataset showed in Figure 16c works well. It is clearly found that it can tail after the label. Meanwhile, the number of peak and the value of peak nearly equal to the label. When it turns to the testing dataset, it also remained the result of peak number equaling.

Training Results and Discussion
Compare to the "sequence to sequence" model, the "image to sequence" model extracts the features automatically. To sum up, the "image to sequence" model has slow training speed but high prediction accuracy while the "sequence to sequence" model trains the model fast accompany with low prediction accuracy. The mean of precision of the whole dataset is 95% in "image to sequence" model, while it is 90% in "sequence to sequence" model.

Conclusions
The motivation of this paper is to analyze the spark image of wire electrical discharge machining using image processing and machine learning technology. First, the relationship between spark images and discharge status is studied by image feature extraction through traditional algorithms. It is concluded that the spark image features are related to the discharge status. To predict the discharge status by spark image, a CNN-GRU is proposed, which extracts the image feature by CNN and predict the discharge status by GRU. Experimental results show that the proposed model performs better comparing to the GRU model. The contributions of this paper are as follows: Firstly, different from the traditional research perspective, this paper proposes a new perspective to study the machining state of WEDM, that is, to predict the machining state by WEDM image. It is found that during the process of machining, the pulse waveform of "short" status may not represent the status of non-machining. Hence, it is difficult to recognize the "short" and "short discharge" by the electrical parameters. However, the spark image can provide obvious evidence to recognize them. Because the spark images have the certain morphological and kinematic characteristics, and they are direct phenomena of the machining process so that they can represent the status. Additionally, the above spark images' characteristics are more regular than pulse waveform's characteristics. By using traditional image feature extraction method, the regularities between image and spark are obtained through experimental analysis. They are summarized as follows: In the machining process, the power of the discharge pulse directly affects the sum of the spark energy (E sum ) calculated from the image. Meanwhile, within a certain range, the higher the cutting speed is, the greater the E sum will be. However, too fast cutting speed will lead to short circuit and the E sum will decrease consequently.
The spark area of the image (S sum ) is negatively correlated with the discharge frequency, and its relationship with the discharge power and cutting speed is not correlated. The energy density (ESR) of spark image focuses on the machining center points, so it can directly reflect the processing state. Within a certain range of cutting speed, its value is positively correlated with discharge frequency, power, and cutting speed. The spark distributions (includes area and energy) of the image are mainly related to the wire direction.
Secondly, this paper is among the first to present an approach to define the discharge status by using continuous quantity. The advantages of this approach are: (1) it improves the stability; (2) ease in converging the model. It is beneficial to design the deep learning model to explore the relationship between discharge status and spark images.
Thirdly, the proposed model named "sequence to sequence" was used to explore the relationship between spark characteristics and discharge status. Further, the proposed model named "image to sequence" was trained to extract the features of spark image by CNN and identify the discharge status by GRU. Experimental results show that spark images can accurately predict and track the machining status. The precision of the whole dataset is 95% in "image to sequence" model and is 90% in "sequence to sequence" model.
In this paper, the regularities between the spark image and the discharge state are studied by the statistical analysis and deep learning model. In the future work, the "image to sequence" model and method presented in this paper can be further improved in the aspect of accuracy, stability and speed. Future research directions could be conducted as follows: on-line monitoring discharge state and closed-loop control system of WEDM based spark images. This paper also provides a spark image-based solution for monitoring the discharge state of multi-groove WEDM.