Internal Crack Prediction of Continuous Casting Billet Based on Principal Component Analysis and Deep Neural Network

: The accurate prediction of internal cracks in steel billets is of great importance for the stable production of continuous casting. However, it is challenging, owing to the strong nonlinearity, and coupling among continuous casting process parameters. In this study, an internal crack prediction model based on the principal component analysis (PCA) and deep neural network (DNN) was proposed by collecting sufﬁcient industrial data. PCA was used to reduce the dimensionality of the factors inﬂuencing the internal cracks, and the obtained principal components were used as DNN input variables. The 5-fold cross-validation results demonstrate that the prediction accuracy of the DNN model is 92.2%, which is higher than those of the decision tree (DT), extreme learning machine (ELM), and backpropagation (BP) neural network models. Moreover, the variance analysis showed that the prediction results of the DNN model were more stable. The PCA-DNN model can provide a useful reference for real production, owing to its strong learning ability and fault-tolerant ability.


Introduction
For a long time, the quality of continuous casting products has been a core issue for steel plants, engineering design institutes, and scientific research institutes. To improve yield and reduce production cost, it is necessary to accurately evaluate each continuous casting product to determine whether it is suitable for hot-delivery or requires cleaning down. Thus, it is of great importance for the online quality prediction of continuous casting products. However, continuous casting is a complex production system, and the strong nonlinearity and coupling among production parameters make the accurate prediction of typical defects extremely difficult. With the development of big data and artificial intelligence (AI), machine learning (ML) algorithms have been widely used in complex production systems because of their excellent nonlinear approximation and their capabilities to handle unclear problems. Studies have shown that ML is an effective method for prediction research in the metallurgy and material fields by integrating ML algorithms and mechanism analysis [1][2][3][4].
Internal cracks in steel billets are difficult to find during the continuous casting process, and the cracks may pass on to the rolled materials. Brimacombe's team has contributed a significant amount of work concerning the crack formation in continuous casting [5][6][7][8][9]. They believed that the internal cracks were not only related to the chemical composition of steel, but also closely related to the reheating and bulging of the continuous casting billets [5,7,9], which provides an important theoretical basis for the internal crack prediction. Currently, the prediction methods for internal cracks primarily include those based on metallurgical mechanisms and ML. Metallurgical mechanism methods are used to assess the severity of internal cracks by establishing a functional relationship between production parameters and internal cracks. Previous studies have shown that internal cracks may occur when the stress acting on the solidification front exceeds the critical stress of the steel [10,11]. Based on this theory, Won et al. [12] proposed a critical strain equation for the prediction of internal cracks by considering the brittle temperature range and strain rate as criteria. Dou et al. [13] established a micro-segregation model to study the effect of cooling rate on zerostrength temperature (ZST), zero-ductility temperature (ZDT), and liquid impenetrable temperature (LIT) of steel in the solid liquid two-phase region and proposed a prediction model for internal crack sensitivity in continuous casting blooms, using the least squares regression method. Han [14] proposed an analytical model to calculate the strain acting on the solidification front caused by bulging, bending, straightening and roll misalignment, and used the critical strain obtained from the experiment as the criterion to predict cracks in continuous casting slab. Poltarak et al. [15] established a thermal-force finite element model for the continuous casting of round billets to calculate the stress acting on the solidification front to evaluate the internal crack risk under different casting conditions. Other scholars have also conducted similar studies to evaluate crack risk [16,17].
However, it is difficult to clarify the nonlinear coupling relationships among process parameters based on the metallurgical mechanism, and their prediction accuracy is unsatisfactory, which restricts its online application. With the rapid development of informatization and automation, scholars have developed a variety of continuous casting expert systems, including the VAI's slab quality control system (VAI-Q) [18], Demag's quality assessment expert system (XQE) [19], Danieli's real-time quality assessment system (QUART) [20], and British Steel's mold thermal monitoring system (MTM) [21]. Among them, most expert systems are composed of sensor detection, database queries, data analysis, and expert knowledge. However, inflexible expert knowledge leads to unsatisfactory applications [16].
Neural networks have been wildly used in continuous casting production, including breakout prediction, steel defect prediction, nozzle clogging detection, steel temperature prediction and mold level fluctuation detection, etc. [22]. Normanton et al. [23] introduced the work conducted by the European Coal and Steel Community (ECSC) for the surface and internal quality prediction of continuous casting products. An online quality prediction system was developed using AI algorithms, such as the MLP network, fuzzy logic, decision tree (DT), and self-organizing map. Zhao et al. [24] established a quality prediction model based on a least-squares support vector machine (SVM) and optimized the model using an improved particle swarm optimization algorithm. Hore et al. [25] established the prediction model based on a multi-layer sensor to predict the oscillation mark depth, mold powder consumption rate, metallurgical length and cracks in the cast products. Varfolomeev et al. [26] and Ye et al. [27] used the random forest algorithm to predict crack occurrence in the continuous casting billets. Furthermore, researchers, including the author, have also applied AI algorithms to predict central carbon segregation in continuous casting billets [28][29][30]. However, the previous studies only assessed whether cracks would occur (two-category problem) or determined the probability of cracks. In actual production, the prediction of crack grade is more meaningful, which can help technicians to make important decisions. In addition, the detailed research methods of the internal crack prediction are rarely reported. The purpose of this study is to find an appropriate combination algorithm to accurately predict the crack grade of steel billets and realize the online quality management of the casting products.
To realize the real-time and accurate prediction of internal cracks, the factors affecting internal cracks were summarized comprehensively, then the continuous casting production data are collected from the manufacturing execution system (MES) of one steel plant. Subsequently, a data-driven model based on principal component analysis (PCA) and DNN was proposed for internal crack prediction. PCA was used to reduce the dimension of the factors affecting the internal crack and eliminate the correlation among the factors, after which the obtained principal components were used as the input variables of the DNN model. In particular, the number of principal components and model hyperparameters were optimized to ensure that the performance of the data-driven models was optimal. Furthermore, the k-fold cross-validation method was used to evaluate the performance of the models.

Factors Affecting Internal Cracks
ML algorithms predict internal cracks by establishing a nonlinear mapping relationship between the continuous casting production parameters and the crack level. Thus, it is essential to identify the factors that have a strong correlation with internal cracks and consider them as the inputs of the data-driven models. It is well known that the formation of cracks in steel billets is a complex metallurgical process involving the coupling action of solidification, heat transfer, flow, and external force. The crack sensitivity of steel is an internal factor of crack formation, whereas the force acting on the shell is an external factor. Therefore, in this study, internal and external factors affecting internal cracks were nalyzed.

Internal Factor Analysis
This study primarily focused on the internal crack prediction of ML40Cr steel billets. Chemical composition is the main factor affecting the crack sensitivity of the steel billets. The chemical composition of ML40Cr steel is listed in Table 1. Generally, a higher content of prone segregated elements increases the risk of cracking in steel. The relationship between the chemical composition and crack sensitivity was analyzed as follows. The formation of the internal cracks depends on the mechanical properties of the solidifying front, which are closely related to microsegregation. The enrichment of S and P elements in the interdendrite decreases the freezing temperature of the liquid, and increases the crack tendency [5,14]. Therefore, the impurity element content is the key factor affecting the internal cracks.
• Carbon (C) and manganese sulfur ratio (Mn/S) The crack sensitivity is closely related to the critical strain at the solidification front. According to Brimacombe et al. [5,9] and Matsumiya et al. [31], the critical strain depends predominantly on the C content and Mn/S in steel. A low C content and a high Mn/S are beneficial for increasing the critical strain. Hence, the C content and Mn/S ratio are also key factors affecting the internal cracks.

External Factor Analysis
The external forces acting on the solidification shell during the continuous casting process primarily include thermal and mechanical stresses. When the total stress acting on the solidification front exceeds the critical stress, an internal crack may occur. In addition, electromagnetic stirring and casting parameters have significant effects on the internal cracks.

•
Casting parameters Casting parameters, such as casting speed and superheat, mainly affect the solidification behavior of continuous casting billets. When the casting parameters fluctuate, the temperature and solidification structure of the continuous casting billet changes, which may lead to internal defects in the steel billets.

•
Cooling parameters The thermal stress caused by secondary cooling is closely related to internal crack formation. Secondary cooling water cannot avoid wide fluctuations when the casting speed varies frequently. This easily induces excessive thermal stress in the billet solidification front and causes internal cracks [32]. Hence, the specific water and water fluxes of each secondary cooling zone were considered as the input variables for the data-driven models.
For continuous casting billets, there are fewer internal cracks caused by bulging under normal production conditions. However, when the production parameters fluctuate unusually, such as lower mold cooling intensity and unstable casting speed, higher superheat, and bulging may occur after the billets leave the mold. Therefore, the water flux of mold cooling and the water temperature difference are also important factors.

•
Electromagnetic stirring parameters Previous studies have clearly shown that internal cracks can be reduced by decreasing the center macrosegregation of carbon and sulfur. Electromagnetic stirring (EMS) is an important method for improving the center macrosegregation of continuous casting strands. Generally, most casters are equipped with M-EMS and F-EMS. To better control the internal quality of the continuous casting strands, some casters are equipped with a S-EMS. In brief, electromagnetic stirring parameters, such as current and frequency, are also important variables for internal crack prediction.

•
Other parameters The strain acting on the billet shell includes bending/straightening strain and misalignment strain. In this study, the continuous casting billets solidify completely at the straightening point, and the bending/straightening strain mainly affects the surface crack [33]. In addition, owing to the regular maintenance of continuous casting equipment, misalignment strains rarely occur.
In summary, the formation of internal cracks can be attributed to the concomitance of multiple intricate mechanisms. This study primarily predicted internal cracks from the perspective of the production process. The input variables of the data-driven models are shown in Figure 1.

Data Collection and Pre-Processing
Mining the nonlinear mapping relationship among datasets is the essence of ML prediction. A flowchart of the internal crack prediction during the continuous casting process is shown in Figure 2. It can be seen that data collection, data pre-processing, and ML modeling are three important tasks for internal crack prediction. Data collection and preprocessing are the basis of internal crack prediction, and ML modeling and optimization are key missions for internal crack prediction. There are two challenges in the internal crack prediction.

•
High dimensions and nonlinearity of the production data There are many noise data and redundant variables in continuous casting production data, which greatly reduce the generalization ability of the ML prediction model. Hence, sufficient attention should be paid to data cleaning and dimension reduction [34].

•
Selection and optimization of prediction models Improving the fault tolerance and generalization ability of data-driven models and avoiding overfitting are also challenges that need to be addressed. Parameter optimization is essential for improving the model performance.

Data Collection
According to the variables affecting the internal crack illustrated in Figure 1, the continuous casting production data of ML40Cr steel were collected from the MES system of one steel plant. The descriptive statistics of the collected parameters are presented in Table 2. The caster is a four-strand circular arc caster that mainly produces billets with a section size of 150 mm × 150 mm. The dataset contained 1600 samples after eliminating the null and abnormal data caused by the instability of the detection equipment, and each sample consisted of 19 industrial parameters. The data are shown in Table 3. According to production practice, the dataset for serious cracks is smaller in size, which affects the training of the data-driven model for data samples. Therefore, the proportion of data for serious cracks was appropriately increased in this study. The assessment of internal cracks is in accordance with the metallurgical industry standard (YB/T 4002-2013). Internal cracks were classified into four levels according to their severity. Level 1 indicates no internal cracks, and levels 2~4 indicate that the internal crack is aggravated in turn. Chemical composition Impurity elements [11,14,17] Critical strain [31,35] X 2 Manganese content (wt.%) X 3 Phosphorus content (wt.%) X 4 Sulfur content (wt.%) X 5 Manganese sulfur ratio X 6 Sum of impurity elements (wt.%) X 7 Water flux of mold cooling (L/min) Cooling parameters Bulging stress [36,37] Thermal stress [5,32] X 8 Mold water temperature difference ( • C) X 9 Specific water (L/kg) X 10 Water flow rate in zone1 (L/min) X 11 Water flow rate in zone2 (L/min) X 12 Water flow rate in zone3 (L/min) X 13 Casting speed (m/min) Casting parameters Solidification structure [38][39][40] X 14 Superheat ( • C) X 15 Frequency of F-EMS (Hz) Electromagnetic stirring parameters Element segregation [11,41] X 16 Current of F-EMS (A) X 17 Frequency of M-EMS (Hz) X 18 Current of M-EMS (A) Y Internal crack grade -

Dimension Reduction
Owing to the complexity of the continuous casting process, there is some correlation among the process parameters. The information reflected by various parameters is repetitive to a great extent, which leads to multicollinearity problems, makes the network training of ML easily fall into a local minimum solution, and eventually reduces its generalization ability. Therefore, dimension reduction is required to eliminate the redundant information. PCA is a data dimension reduction technique that replaces a large number of original correlative variables with a small number of unrelated principal components [42], and it contains as much information as possible from the original data. The main steps of PCA are as follows [42,43]: (1) To weaken the influence of numerical difference on the convergence speed of datadriven models, it is necessary to standardize the original variables. The standardization equation is as follows: (2) Calculate the correlation coefficient matrix R.
(3) Calculate the eigenvalues and eigenvectors of matrix R. First, solve the characteristic equation |λE − R| = 0, and obtain the characteristic roots λ i , then arrange them in order of size. Afterward, find the feature vector e i corresponding to the eigenvalue λ i .
(4) Obtain the contribution rate and cumulative contribution rate of the principal components, as shown in Equations (4) and (5), respectively. The contribution rate of each principal component is arranged in order of size. Generally, the cumulative contribution rate of principal components should be between 85% and 95%.
PCA was used to reduce the dimension for the continuous casting of historical data, and the original variables were transformed into 18 completely independent principal components. In particular, there was no correspondence between the principal components and the original variables. Table 4 shows the statistical results of the PCA. The results indicate that the cumulative contribution rate of the first 16 principal components reaches 100%, which includes all the information of the original data.

Establishment of DNN Prediction Model
The neural network was first proposed in the 1940s. It is a type of machine that simulates human perception, that is, a "perceptron." In the 1980s, single-hidden-layer neural networks based on back propagation became popular. In 2006, Hinton et al. [44] overcame the local optimal solution problem by using the pre-training method, and promoted the hidden layers to seven and, thus, proposed a DNN algorithm. As a DNN has more hidden layers, it can extract more data features and has a better learning effect than a single hidden layer neural network, which has become a research hotspot in recent years. The DNN consists of an input layer, hidden layers, and an output layer. The hidden layers are fully connected with each other; that is, any neuron in the ith layer must be connected with any neuron in the next layer. The network topology structure of the DNN is illustrated in Figure 3. In view of the strong nonlinear mapping ability of DNNs, attempts have been made to develop an internal crack prediction model based on the DNN algorithm. The modeling process mainly includes three steps: dataset partitioning, model construction and optimization, and model validation. Using the random sampling method, the 1600 samples in the dataset were divided into a training and test dataset, among which 1500 samples were used to establish the DNN prediction model, and the remaining 100 samples were used to verify the model. The entire process was conducted using the Matrix Laboratory (MATLAB ®version R2014a, MathWorks, Natick, MA, USA) software. Internal crack prediction is a classification prediction. Category encoding adopts the method in which the output of class i is marked as 1 and the rest as 0. For example, the number "1" in the code (0,0,1,0) is the third digit, indicating that the crack belongs to level 3. With this encoding strategy, each time the neural network is invoked, the confidence level of the sample belonging to class i is obtained.
The number of hidden layers in the DNN has an important influence on prediction accuracy. In this study, hit ratio is used to evaluate the performance of data-driven models. Hit ratio refers to the proportion of the samples whose predicted value is the same as the actual value to the total samples. By optimizing the number of hidden layers layer by layer, the influence of hidden layers on the hit ratio of DNN model can be obtained, as shown in Figure 4. When the number of hidden layers is less than three, it is usually called a shallow neural network (SNN). Because SNNs have fewer hidden layers and neurons, it is difficult to fully capture the nonlinear relationship among continuous casting data, and the hit ratio is not satisfactory. In contrast, the hit ratio of the DNN model is the highest when the number of hidden layers is three. As the number of hidden layers increases, the network structure of the DNN becomes more complex, and an over-fitted phenomenon occurs, which leads to a low prediction accuracy and long operation time. Finally, the best prediction results were obtained when the network structure was 18-(8-6-4)-4. The optimal DNN hyper-parameters were determined using the trial-and-error method, as shown in Table 5.

Establishment of Comparison Models
In this study, the backpropagation (BP) neural network, extreme learning machine (ELM), and decision tree (DT) prediction models were also established for comparison with the DNN model. The above algorithms belong to supervised learning; that is, the classification prediction of unknown data is achieved through the learning of historical data. The following is a brief introduction of the three algorithms.

• BP neural network
The BP neural network is a type of feedforward neural network based on error backward propagation and has been widely used in many fields. It takes the error square as the objective function and uses the gradient descent method to calculate the minimum value of the objective function. Its network structure includes the input layer, hidden layer and output layer, and each neuron between adjacent layers is fully connected for information transmission. The BP neural network model can deal with complex nonlinear problems and has a better fault-tolerant ability.
• ELM Similar to the BP neural network, the ELM is also a single-hidden layer feedforward neural network. In contrast, however, the input weights in ELM are generated randomly without iterative solution, and only the output weight is calculated using the least-square method. Therefore, the learning efficiency of ELM is much faster than that of traditional feedforward network algorithms.

• DT
The DT algorithm relies on a tree-like branch structure to realize classification prediction, and its network structure consists of a root node, decision nodes, and leaf nodes. The entire decision-making process starts with the root decision node. Each decision node represents a variable attribute to be classified, whereas each leaf node represents a category. DT consists of two primary steps: decision tree construction and decision tree pruning. DT is widely used owing to its simple structure and high classification efficiency.
The prediction models of the BP neural network, ELM, and DT were established with the same dataset as the DNN, and their optimal configuration parameters were determined using a trial-and-error method, as shown in Table 6.

Model Improvement by PCA
To further improve the model performance, the original data was transformed into principal components for internal crack prediction. The number of principal components has an important influence on the prediction accuracy of neural network models. The optimum number of principal components was studied to improve the contribution of the PCA to the prediction model. Generally, it is not necessary to conduct data pre-processing for the DT algorithm, and this study also shows that PCA has a negative effect on the DT algorithm. The test results of the DNN, BP neural network, and ELM models with different numbers of principal components are shown in Figure 5. For the DNN and BP neural network models, the prediction accuracy was highest when the number of principal components was 15, whereas the optimal principal component number was 8 for the ELM mode. The minimum number of leaf nodes ("min_sample_leaf") is a key parameter for improving the performance of the DT prediction model. Figure 6 shows the statistical results of different "min_sample_leaf" parameters on the model performance. It is obvious that when the "min_sample_leaf" is between 15 and 20, the accuracy of the model is the highest.    Figure 8. Clearly, the hit ratio of the PCA-BP neural network model was only 76%, which was significantly lower than that of the PCA-DNN model. The excellent performance of the DNN is summarized as follows: (1) Stronger learning ability. The network structure with multiple hidden layers makes its learning ability stronger than that of the BP neural network, which can fully capture the nonlinear characteristics of the data. (2) Stronger fault tolerance. In the case of a large amount of data and complex data structure, DNNs have a stronger fault-tolerant ability, and their global training results can hardly be affected by some damaged neurons. To weaken the influence of data division on the prediction accuracy of data-driven models, the k-fold cross-validation method was used to evaluate the model performance. Figure 9 shows a schematic of k-fold cross verification. The k-fold cross-validation is a no-repeat sampling technique in which each sample has only one chance to be included in the test set during each calculation [45]. In this study, the dataset was randomly divided into five equal parts using a 5-fold cross-validation method. Finally, the mean values of the five prediction results were used to evaluate the prediction models. The simulation results of the data-driven models based on 5-fold cross-validation are presented in Table 7. The prediction accuracy of the DNN, BP neural network, and ELM models increased significantly after PCA optimization. The DNN model had the highest prediction accuracy of 92.2%, followed by the DT model with a prediction accuracy of 84.8%. The prediction accuracy of the BP neural network model and ELM model was unsatisfactory at 73.2% and 69.8%, respectively. Owing to the complex network structure, the computation time of the DNN model was 1.468s, which is longer than that of the BP and ELM single hidden layer models. Because the weight and threshold in the network structure are randomly generated, the computation time of the ELM model is only 0.004 s. Variance is an important index for analyzing the dispersion of data, and it is used to evaluate the 5-fold cross-validation results-the smaller the tolerance, the more stable the output of the prediction model. As expected, the DNN model had the minimum variance in prediction accuracy, indicating that its stability was the best among all prediction models. In summary, the internal crack prediction model based on PCA and DNN has higher accuracy and stronger generalization ability, which is helpful in realizing the online management of continuous casting billet and assist technicians in making important decisions.

Results and Discussion
Intelligent iron/steel manufacturing has drawn increasing attention in recent years. Another important aspect of the future work is to realize the online quality control of the continuous casting billets. The development trend of the online quality control of steel billets is to integrate the prediction model and defect knowledge base [46]. That is, by monitoring the continuous casting production parameters online, the disturbance during continuous casting process is analyzed, and the cause of internal cracks is traced and analyzed by searching the knowledge base of steel billet defects [47]. Finally, according to the prediction results of internal cracks, the process parameters are adjusted in time to avoid the generation of cracks. The flowchart of the online control of steel billet quality shows in Figure 10. It is believed that with the further understanding of metallurgical phenomena and the development of intelligent algorithms, AI has potential to find applications in continuous casting in the future.

Conclusions
In view of the difficulty in accurately predicting internal cracks, a prediction model based on PCA and DNN was proposed. The results are summarized as follows: (1) PCA not only simplifies the structure of data-driven models but also improves their generalization ability. When the principal component numbers are 15, 15 and 8, respectively, the performance of the DNN, BP neural network and ELM prediction models is the best. The prediction accuracy of the models is improved by using PCA.
(2) Compared with the BP neural network model, the DNN model has stronger learning and fault-tolerant abilities. The highest hit ratio of the PCA-DNN model was 94%, which was higher than that of the PCA-BP neural network model.
(3) The 5-fold cross-validation shows that the PCA-DNN model had the highest prediction accuracy of 92.2%, and its computation time was 1.468s. Variance analysis indicated that the DNN model had the best output stability. In contrast, the prediction accuracies of the DT, PCA-BP, and PCA-ELM models were 84.8%, 73.2% and 69.8%, respectively.
(4) The internal crack prediction model based on PCA and DNN is feasible to assist technicians in making important decisions about the offline or hot-delivery treatment of steel billets. In addition, the PCA-DNN model is helpful in the online management of continuous casting billets.  Institutional Review Board Statement: Not applicable.

Informed Consent Statement: Not applicable.
Data Availability Statement: Data sharing not applicable. No new data were created or analyzed in this study. Data sharing is not applicable to this article.