Fault Detection and Identification of Blast Furnace Ironmaking Process Using the Gated Recurrent Unit Network

: It is of critical importance to keep a steady operation in the blast furnace to facilitate the production of high quality hot metal. In order to monitor the state of blast furnace, this article proposes a fault detection and identification method based on the multidimensional Gated Recurrent Unit (GRU) network, which is a kind of recurrent neural network and is highly effective in handling process dynamics. Comparing to conventional recurrent neural networks, GRU has a simpler structure and involves fewer parameters. In fault detection, a moving window approach is applied and a GRU model is constructed for each process variable to generate a series of residuals, which is further monitored using the support vector data description (SVDD) method. Once a fault is detected, fault identification is performed using the contribution analysis. Application to a real blast furnace fault shows that the proposed method is effective.


Introduction
Maintaining the blast furnace system at a stable status is critical to ensure efficient production of high-quality blast furnace hot metal [1]. Therefore, condition monitoring of the blast furnace ironmaking process becomes a significant issue. During the operation of blast furnace ironmaking process, different kinds of faults may happen, such as hanging, low stockline and abnormal gas flow. If the faults cannot be detected and identified in time and accurately, it may lead to loss in production rate or even a significant accident.
The problem of fault detection and diagnosis for blast furnace ironmaking process is a long lasting and well research topic. Traditional methods like expert knowledge and fuzzy logic have been well developed in different kinds of expert systems [2]. However, constructing and maintaining an up-to-date knowledge base is difficult. Alternatively, classification-based algorithms like support vector machine have been applied to diagnose faults in blast furnaces [3]. Liu et al. [4] proposed a novel strategy based on cost-conscious least squares support vector machine (LS-SVM) to achieve rapid diagnosis of blast furnace faults. An et al. [5] proposed a support vector machine for multiple classification to diagnose blast furnace faults. The main assumption of classification-based methods is that sufficient faulty samples can be collected, which is often not true in a real blast furnace. More recently, multivariate statistical methods became popular in the monitoring of blast furnaces. For example, Vanhatalo applied the principal component analysis (PCA) to monitor the status of an experimental blast furnace [6]. A two-stage PCA is considered to deal with multi-modal distribution in blast furnace data [7]. Shang et al. [8] developed a recursive transformed component statistical analysis (RTCSA)-based algorithms to monitor incipiently happened faults in the iron-making process. In addition, other kinds of PCA-based approaches have been introduced to monitor process faults, such as robust PCA [9] and convex hull-based PCA [10].
In order to deal with process dynamics, Zeng et al. applied a state space model to extract residuals from the process data and used the support vector data description (SVDD) to detect blast furnace faults [11]. Also, Vanhatalo and Kulahci [12] considered the impact of autocorrelation to statistical methods like PCA. Dynamic principal component analysis (DPCA) [13,14] and dynamic linear discriminant analysis (DLDA) [15] are also used to handle dynamic processes. From the above analysis, it can be seen that how to handle process dynamics has become an important task in fault detection and diagnosis of blast furnace.
In this paper, a new process monitoring method based on the GRU network [16] is considered to detect and identify process faults in blast furnace. The GRU network is a new type of recurrent neural network (RNN). Comparing to conventional RNN methods like long-short term memory network, it has comparable capability to handle process dynamics, however with a simpler structure and fewer parameters. In fault detection, a GRU neural network is used to make prediction for each process variable, so that the process dynamics can be filtered and a series of residuals can be generated. The generated residuals are then monitored using the support vector data description (SVDD) method [11]. Faulty variables are then identified by inspecting the deviation of the residuals from normal operation condition (NOC). The benefits of the proposed method can be summarized as: (i) the introduction of GRU network can fully capture the dynamic characteristics of the blast furnace data; (ii) faulty variables can be identified by investigating the residual of each variable, which greatly simplifies subsequent fault diagnosis task.

Methodologies
This section describes the methodologies applied in fault detection and identification of blast furnace system. Section 2.1 briefly introduces the GRU network, which is an extension of the LSTM network. Section 2.2 describes the SVDD classifer.

GRU Neural Network
GRU is a type of recurrent neural network. The main difference between RNN and feed-forward artificial neural network is in their structure. In a feed-forward artificial neural network, signals travel from the inputs to outputs and the flow of information is in the forward direction only. Since there is no backward/feedback flow, the name of "feed-forward" is justified. In contrast, an RNN allows feedback from output to input and hence it is called "recurrent". In addition, the output of the previous time step/state in RNN will be used as the input of the next time step, which is different from feed-forward neural network that considers fixed length input and fixed length output only. With this kind of recurrent structure, RNN can be used to learn the characteristics of the time series and make predictions. A widely used RNN is the LSTM network, which is very suitable to capture long-term dependencies and also able to avoid the vanishing gradient problem. As an improvement of LSTM, GRU network inherits its advantages, whilst having an optimized structure and fewer parameters, resulting in lower computation load and better generalization ability.

The Structure of LSTM Cell
LSTM [17] was originally proposed in 1997, in order to solve the vanishing gradient problem faced by RNN [18]. The main difference between LSTM and standard RNN network is the handling of long-term dependencies. In the RNN network, each cycle involves only the last state and the current input. Because each prediction only involves the state at the last moment, the RNN can only establish a dependency relationship between states in a short time. In contrast, LSTM can establish dependencies between states at arbitrary long intervals, so they are called "Long-Short Term Memory network". In addition, the LSTM has a cell state update process similar to the conveyor belt structure. The old cell state will remain on the conveyor belt until it needs to be forgotten by structure called "gate". Through this conveyor belt structure, LSTM can take long-term memory from the conveyor belt at any time for learning the characteristics of time series and make predictions. An LSTM unit consists of a cell, an input gate, a forget gate and an output gate. The cell is used to record state values at different time intervals and the three gates are used to control the flow of the information. The introduction of three gates enables LSTM to keep, utilize, or discard a state when necessary.
Let x t denote a data sample at the tth time instance, C t−1 denote the cell value and h t−1 the hidden state of each cell at the t − 1th time instance. The information of previous time is stored in C t−1 and h t−1 . The input gate regulates to what extent a new value x t is transferred into the cell, the forget gate controls to what extent C t−1 remains in the cell and the output gate regulates to what extent C t−1 is used to calculate the output activation. The structure of a standard LSTM cell is shown in Figure 1. In Figure 1, the green box, blue box and red box correspond to the input gate, the output gate and the forget gate respectively. The mathematic formulation of the forget gate is described as: where W f is the weight matrix of the forget gate; σ is the sigmoid activation function; b f is the bias vector for the forget gate; [h t−1 , x t ] is a vector that merge the previous cell state vector h t−1 and the input vector x t at the current moment. The input gate is used to decide what information will be saved in the cell value. On the other hand, the input gate can be described mathematically as follows.
Here, W i is the weight matrix in the input gate, b i is the bias vector. The input gate adds new information generated by the current input to the cell value, and creates new memories: i t andC t . The current state C t is updated based on the previous cell value C t−1 , the new memories i t andC t as follows.
Finally, the hidden state h t is updated in the output gate as: where W o is the weight matrix in the output gate. In this way, the cell value C t and hidden state h t can be updated whenever a new sample x t is available.

The Structure of GRU Cell
The GRU is a refined version of LSTM with a simpler structure [19]. The main difference between GRU and LSTM is in the process of forgetting and updating cell values. In the LSTM network, update of cell values are controlled by two gates, the forget gate and the input gate. Since two gate structures are required, the structure of LSTM is relatively complex. Compared to LSTM, GRU controls both the forgetting coefficient and the update coefficient for the output with one single update gate, so it involves fewer matrix multiplication calculations. Through this simplification, the GRU can retain the functions of the LSTM and reduce network training time. More specifically, it consists of an update gate and a reset gate, which reduces the number of parameters to only one fourth of the LSTM. The reset gate determines how much previous memory is retained and the update gate determines how much new information needs to be combined with the previous memory. The structure of GRU cell is shown in Figure 2. In contrast to LSTM, GRU has only 2 gate functions. The update gate is shown in the blue box in Figure 2 and the reset gate is shown in the red box. The forward transfer formulations of GRU can be calculated as follows.
where r t is the reset gate determining how much information in the previous state cell should be forgotten; z t is the update gate determining how much information should be brought to the next cell; h t is the intermediate state; h t is hidden state. For the update gate, a greater value of z t means that more new information is brought to the next cell. For the reset gate, a greater value means that more information from the former cell may be ignored [16].

Support Vector Data Description
SVDD is a kernel method, which maps the data samples into the high-dimensional feature space through a non-linear mapping. In the high-dimensional feature space, a compact hypersphere with the minimum radium while covering the maximum number of data samples is obtained by solution of an optimization problem. The SVDD is generally used in anomaly detection. If a new sample is mapped inside the hypersphere, it is regarded as a normal sample, otherwise it is faulty.
Given a data set x i ∈ R d , i = 1, · · ·, N and assume a ∈ R d the center of the hypersphere, R is the radius of the hypersphere, the following objective function can be obtained for SVDD.
Here, ξ i is the relaxation factor, and C is the penalty parameter. In Equation (11), ξ i satisfies ξ i ≥ 0, ∀i. The above optimization problem can be transformed as follows using the Lagrangian multipliers.
where γ i and α i is the Lagrange multiplier and they satisfy α i ≥ 0, γ i ≥ 0. Differentiate Equation (12) with respect to R, a and ξ i and make it equal to 0, the following holds: Combining Equation (13) to Equation (12) one can obtain: where α i is the support vector and 0 ≤ α i ≤ C. Generally, kernel function K is used to calculate whether the distance between the new sample y ∈ R d and the center of the hypersphere is less than the radius R 2 : The kernel term K x i · x j is commonly used to replace the inner product x i · x j , which is the Gaussian kernel here:

Fault Detection and Identification Strategy
In order to detect and identify a process fault, it is essential to characterize the normal operating condition (NOC). Hence, a training dataset collected under normal operational condition is used to construct the GRU neural network. The GRU neural network generates model residuals, which is further used to construct monitoring statistics using SVDD. As described earlier, the GRU model is capable of extracting the spatial and temporal signatures in the data that are important for characterizing complex ironmaking process. The general framework for fault detection and identification based on GRU-SVDD is described in detail in the following subsections.

Fault Detection
In order to detect a process fault, it is required to train a model based on the NOC data. In the ironmaking process, this involves training a GRU with multiple time series to model temporal dynamics and correlations between process variables.
The GRU model is trained on historical normal data. Specifically, the GRU model uses the past information captured by its cell value and current observation to predict the next observation.
Assume a training set x i ∈ R d , i = 1, · · ·, N is collected under NOC, a moving window approach can be applied, with the window length being n, n N. Take the first window as an example, the structure of a two-layer GRU is shown in Figure 3. Here, h i ∈ R d h denotes the hidden state of the first layer at the ith time, h i ∈ R d h denotes the hidden state of the second layer andx n+1 ∈ R d is the predicted value. The hidden state h i of the first layer becomes the input to the second layer of GRU model. The final output is then obtained using the dense layer as follows.x Herex n+1 is the prediction of x at the n + 1th time instance, W is the weight matrix of the dense layer, b is the bias term. A GRU model with more layers can also be used, for the sake of simplicity, however two-layer GRU is considered here.
The model parameters can be trained based on the N − n + 1 windows. Once the model parameters are estimated, estimation of model outputx n+1 can be predicted from the past n samples. A series of residuals can be obtained as e i = |x i − x i | , i = n + 1, · · · , N. The residual series obtained from GRU under NOC is then fed into the SVDD to estimate the parameters, namely the center a and the radius R of the hypersphere. Whenever a new sample is available, the residuals obtained from the GRU can be fed into the SVDD to calculate the the squared distance D 2 according to Equation (15). If D 2 is greater than R 2 , it is faulty, otherwise it is normal.

Fault Identification
Once a fault is detected, the next goal is to identify which variables are the most affected and contribute most to the monitoring statistics. Assume a fault was detected between time t 1 and t 2 , let e i = (e 1 i , e 2 i , · · · , e d i ) denote the residual vector for the l process variables, i = t 1 , · · · , t 2 . The normalized residuals E l i can be used to evaluate the impact of the fault on each variable as: where µ l is the mean value and σ l is the standard deviation of the GRU residuals of the NOC training data. For a clearer exhibition, the deviation E l i of each variable is accumulated to get the total contribution rate CR l = ∑ N i=t E l i . With the contribution rate obtained, operators can know which variables are most sensitive to the process fault. Also, operators can use the contribution plots to identify which kind of fault has occurred.
For completeness, the overall flowchart is summarized in Figure 4, including both the offline training stage (left) and the online monitoring stage (right). The offline monitoring stage can be summarized as follows: 1. Obtain historical NOC data; 2. Remove extreme values and normalize the training data to have a zero mean and unit variance. 3. Set initial parameters of GRU model and train the model; 4. If the GRU model is valid, the GRU residuals will be fed into the SVDD model, and the threshold R 2 of D 2 statistic is obtained.
The online detection stage can be summarized as follows: 1. Collect online samples; 2. Normalize the online samples; 3. Use the GRU model trained in the offline process to make prediction and get the residuals; 4. Calculate the D 2 statistic using SVDD; 5. Determine whether to alarm by comparing the D 2 statistic and the threshold R 2 . If D 2 is greater than R 2 , the process is faulty, otherwise it is normal. 6. If the process is faulty, isolate and identify which variables are most severely affected.

Application Studies
This section presents the application results of the proposed GRU-based fault detection and identification method to the datasets collected from a blast furnace (with the inner volume of 2500 m 3 ) in China. Two case studies are studied, with Section 4.1 introduces the application of GRU-based fault detection and identification method to a hanging fault and Section 4.2 presents the application results to a fault involving fluctuation in molten iron temperature.  Table 1. For comparison, the LSTM-SVDD and PCA-SVDD [20] methods are considered. In order to reduce the impact of extreme values in the process data, the Hampel filter [21] is used to process the training set before feeding into the GRU network. During the training of GRU network, the mean square error loss function and 'Adam' optimizer are used [22]. The length of moving window n is set as 99 by trial and error. The number of hidden states in the first layer d h and the second layer d h are determined in a similar way. Figure 5 shows the modelling errors of the GRU network for u 1 under different combinations of d h and d h .
Considering both the modelling error and structure complexity, the fourth combination in Figure 5 is used so that d h = 32 and d h = 200. Also, to prevent overfitting, a dropout process is used in the training process by randomly discarding a part of units. Here, the dropout rate of p d = 0.2 is selected. The GRU network uses the past values to predict current values. The predicted values obtained by this model not only contains the past information, but also affected by other related variables. Therefore, when a fault happens, the predictions will deviate from the actual values. The modeling results of the GRU model are shown in Figure 6. Figure 6 shows that there are some clear changes occurring in several variables (e.g., CO concentration, CO 2 concentration and H 2 concentration). the obtained residuals are then fed into the SVDD model to perform fault detection.

Fault Detection and Identification
From the previous subsection, the GRU model can be used to generate residuals. As is shown in the Figure 6, there is an obvious change after the 2000th sample. In order to detect this change, SVDD is used here. The parameters of SVDD are set as σ = 10, C = 0.01. With 99% of confidence limit, the monitoring results using SVDD are shown in Figure 7a.  From Figure 7a it can be seen that significant violation of the confidence limit can be observed, indicating that there is a fault happening in the blast furnace system. This is in accordance with the fact that the last 400 samples correspond to a hanging fault. For comparison, the monitoring results using LSTM-SVDD and PCA-SVDD are shown in Figure 7b,c. The LSTM network has the same structure and parameters as GRU-SVDD. For PCA-SVDD, PCA is first performed on the training data and SVDD is used to detect the residual subspace. The number of principal components retained for PCA is 3. Comparing Figure 7a-c, it can be seen that GRU-SVDD and LSTM-SVDD has higher sensitivity than PCA-SVDD. The detection rates of the three methods are shown in Table 2.

Methods
Detection Rate Table 2 confirms the finding that GRU-SVDD and LSTM-SVDD have better detection rates. Considering the simpler structure of GRU, obviously GRU-SVDD is a better method. After the hanging fault is detected, fault identification is then performed based on the GRU residuals. Figure 8 shows the sample by sample GRU residuals, with deeper color indicating greater residuals. For a clearer inspection, Figure 9 presents the accumulated normalized GRU residuals. Figure 9 shows that the hanging fault has significant impact on the concentration of flue gas, with the most significant change happening in the CO 2 , CO, H 2 concentration. This will lead experienced operators to inspect the gas flow and see whether there is any kind of hanging fault happening in the system.

Case 2: Abnormal Molten Iron Temperature
In this subsection, a faulty condition from the same blast furnace is considered. The fault involves an abnormal fluctuation of the molten iron temperature, which caused the operators to adjust the quantity of blast u 1 as well as the temperature of blast u 2 , resulting in change in a series of variables. In the later stage, the fault was corrected, however the temperature of blast was kept at a relatively low level for the sake of safety. Similar to the hanging fault, 2000 samples were collected under the normal operating conditions for model training, and a faulty dataset containing 1000 samples is considered. The fault involves an abnormal molten iron temperature, which caused reduction in blast quantity, blast temperature and fluctuation in a series of variables related to the gas flow. This time, 10 process variables are considered and listed in Table 3.

No.
Variable u 1 quantity of blast u 2 temperature of blast u 3 pressure of blast u 4 quantity of oxygen blasted u 5 temperature of cold blast u 6 top pressure u 7 CO concentration in top gas u 8 CO 2 concentration in top gas u 9 H 2 concentration in top gas u 10 pressure of cold blast Comparing to Table 1, it can be seen that three additional variables, the temperature of cold blast (u 5 ), the top pressure (u 6 ) and the pressure of cold blast (u 10 ) are also included. it should be noted some of them are redundant variables (u 5 and u 10 ) that are highly related with other variables. The purpose for introducing these variables is to show the capability of the proposed method in dealing with variable redundancy.
Similar to Section 4.1, the proposed GRU method is applied, with the same parameter values. And the prediction results for the 10 variables are shown in Figure 10. For a clearer exhibition, only the 1000 faulty samples are presented. It can be seen that for the first 200 samples, the prediction accuracy is acceptable. After that, an obvious fluctuation can be observed and the prediction accuracy deteriorated. After the 2450th sample, the prediction accuracy for all variables except u 2 return normal.
After the predictions are obtained, SVDD is applied and the monitoring results are shown in Figure 10b. It can be seen that the fault was successfully detected since significant number of violations can be observed after the 2250th sample. This again indicates the good capability for the proposed method in fault detection. It should be noted that violations can still be observed even after the fault was corrected after 2450th sample. This can be explained, as to avoid further fault, the operators decided to reduce the temperature of blast(u 2 ), which caused the violations. This can be confirmed by the subsequent fault identification results in Figure 11.
In Figure 11, the first plot involves the fault identification results from samples from 2250 to 2450, while the second plot involves the identification results for samples from 2450 till 3000. As can be seen from the first plot of Figure 11, it can be clearly seen all variables except u 4 have significant contribution to the fault, indicating a significant anomaly arises. This is expected, as to correct the fault, both the quantity of blast and the temperature of blast are reduced, resulting in changes in other variables. From the second plot, it can be seen that after the fault was corrected, the contribution of other variables reduced significantly while that of u 2 remains. This is in accordance with our previous analysis that the operators reduced the temperature of blast to avoid further fault. The application results of the second faulty case also confirmed the performance of the proposed method.

Conclusions
This paper introduces a fault detection and identification method for blast furnace ironmaking process based on the GRU network and SVDD. The GRU model is capable of handling multi-dimensional inputs to make predictions for future inputs. The residuals between the actual inputs and predictions are then monitored using SVDD. A fault identification method is further developed by inspecting the accumulated normalized residuals. The proposed method is tested on a hanging fault observed in a real blast furnace in China. Application results show that the proposed GRU-SVDD model can successfully detect the hanging fault. Compared with the PCA-SVDD model, GRU-SVDD has a higher detection rate. The method proposed in this article is very suitable for monitoring systems with strong dynamics and non-Gaussianity.

Conflicts of Interest:
The authors declare no conflicts of interest.