Computational Modeling and Prediction on Viscosity of Slags by Big Data Mining

: The viscosity of slag is a key factor a ﬀ ecting metallurgical e ﬃ ciency and recycling, such as metal-slag reaction and separation, as well as slag wool processing. In order to comprehensively clarify the variation of the slag viscosity, various data mining methods have been employed to predict the viscosity of the slag. In this study, a more advanced dual-stage predictive modeling approach is proposed in order to accurately analyze and predict the viscosity of slag. Compared with the traditional single data mining approach, the proposed method performs better with a higher recall rate and low misclassiﬁcation rate. The simulation results show that temperature, SiO 2 , Al 2 O 3 , P 2 O 5 , and CaO have greater inﬂuences on the slag’s viscosity. The critical temperature for onset of the important inﬂuence of slag composition is 980 ◦ C. Furthermore, it is found that SiO 2 and P 2 O 5 have positive correlations with slag’s viscosity, while temperature, Al 2 O 3 , and CaO have negative correlations. A two-equation model of six-degree polynomial combined with Arrhenius formula is also established for the purpose of providing theoretical guidance for industrial application and reutilization of slag.


Introduction
The viscosity of slag has a significant effect on metallurgical efficiency and slag recycling, which inevitably affects the metal quality and sustainable development of high temperature industries, such as metal-slag reaction and separation, mass and heat transfer in the melts, furnace lining corrosion, as well as slag wool processing [1][2][3]. In the smelting process, if the viscosity of molten slag is too low, the heat preservation and reoxidation resistance will be weakened because of higher slag fluidity and the large-area-exposed molten metal. In addition, the refractory materials will be seriously corroded, which results in a reduced lining life and cleanliness of the metal. However, it is known that the high viscosity of molten slag will show lower fluidity and lead to the inactivation of molten pool, which makes the melting process difficult to proceed smoothly. In general, an appropriate viscosity is conducive for fiber formation in the slag wool processing. It is concluded that slag with suitable viscosity is beneficial for improving the quality of molten metal and reducing the consumption of refractories as well as slag reutilization. Therefore, the viscosity of slag has been studied extensively through experimental measurements and modeling predictions [4][5][6][7][8][9][10][11].
The flow of slag is closely related to molecular structure and rheology, so both the composition of slag and temperature have significant and complex effects on viscosity. For the slag containing silicate, the network former can increase the viscosity while the network modifier can reduce viscosity.
The amphoteric body can exhibit the characteristics of either network former or network modifier [12]. Although the slag is Newtonian fluid at high temperatures, the change of viscosity with temperature at low temperatures will exhibit the properties of non-Newtonian fluid [13,14]. Therefore, great research efforts have been devoted to the establishment of mathematical models in order to automatically predict the viscosity values of different slag systems.
There are three types of mathematical models for predicting viscosity, including the theoretical method, the empirical method, and the semiempirical method. The theoretical model is based on the material structure, the properties of the melt are deduced according to the basic principles of quantum mechanics and statistical mechanics, and finally the viscosity expressions of various melts are obtained. Although this method is based on a clear physical theory, it has two obvious defects such as a poor accuracy and a limited application scope. The empirical method, which combines the theory with data gathered from the experiment to establish the prediction model, can provide more accurate results and has been popularly used in the viscosity calculation of metallurgical melts. The viscosity prediction model proposed by Urbain is one of the empirical models. The viscosity expression in Urbain's model is obtained based on the Weymann-Frenkel equation. In addition, the components of slag can be separated into network former, network modifier, and network amphoteric based on Urbain's model for the purpose of calculating the final viscosity values [7]. This model can achieve a satisfied predictive performance on the SiO 2 -Al 2 O 3 -CaO-MgO system and its subsystems. The semiempirical method, which aims to build a prediction model based on the theoretical correction of specific experimental experience, can provide more accurate predictions and has been widely applied [8,9]. Chou found that the geometric models as one type of semiempirical models could be used to predict viscosity [10,11] and had been widely used in binary or ternary systems. However, these models cannot be generalized to multisystems. In summary, the existing research gaps, including limited number of samples, a large number of reference experience parameters, and the limited application scopes, make it difficult for any existing models to be applied to multiple slag systems to achieve the automatic prediction of slag's viscosity.
In recent years, with the rapid development of information technology, the world has entered the era of big data. Namely, the data scale is growing explosively and the data forms are getting more and more complex. Raw data is meaningless unless it is properly mined to extract potentially useful and hidden information that can provide wisdom for related stakeholders [15]. This process is called data mining, which usually uses statistics, data visualization, machine learning, text mining, and even deep learning methods to detect trends or patterns without prior knowledge of the data [16,17]. Therefore, employing data mining methods to support related decision-making is more recommended than expert experience or intuitions in the big data era [18]. The concept of data-driven decision-making has penetrated into various fields such as government management [19], economics [20], medical treatment [21], education [22], and manufacturing industry [23]. The researchers used several data mining methods to construct prediction models for predicting the conductivity of metallurgical melts and reported that the Gradient Boosting Decision Tree (GBDT) model was the best model with the highest prediction performance [24]. It may also provide new perspective for investigating slag's viscosity.
In order to address the above research gaps, this study proposes a novel dual-stage approach based on data mining methods for automatically predicting slag's viscosity and providing decision-making support for related stakeholders. Therefore, this study aims to answer the following research questions: (1) Could the data mining methods be employed for analyzing and predicting slag's viscosity? (2) Can the proposed dual-stage predictive modeling approach provide better prediction outcomes than baseline models? (3) What are the important factors for predicting and adjusting viscosity in practice?

Data Collection and Preprocessing
A total of 1459 slags' (i.e., samples) data with one variable denoting slags' viscosity and 26 common variables were collected for this study [25][26][27][28][29]. These 25 common variables as shown in Table 1, including 24 slag's component proportional variables and 1 temperature variable, can be adopted as input variables for building prediction models. However, it is found that there are significantly different ranges among different variables in the dataset. It is necessary to perform a data transformation procedure in order to speed up the convergence of the training model. This study transforms all input variables into a range of 0-1 based on the Equation (1). Slag's viscosity is usually recorded and stored by a numeric form. In order to model slag's viscosity performance, slag's viscosity value is transformed into a binary variable as the prediction target marking all samples as high or low viscosity. With regard to high or low criterion, different practitioners have different guidelines. In this study, slag with no less than 10P viscosity can be considered as a high sample and the remaining are labeled as low. Therefore, after data cleaning, the dataset contains 28.17% high viscosity samples and 71.83% low viscosity samples. Table 1 lists all the variables for the following modeling and comparisons.
After data preprocessing, cross validation [30] is applied to split the dataset into two subsets for model training and validation. In this study, splitting 70% is for training and the remaining 30% is for validation. Stratified sampling [31] based on minority category (i.e., high viscosity) is also followed in order to ensure the training and the validation datasets have the same sample distributions. All prediction models are optimized by the validation results to avoid overfitting.

The Proposed Dual-Stage Predictive Modeling Approach
It is clearly shown that the collected slag dataset is imbalanced. Generally, there are two strategies, including adjusting the imbalanced status from the data-level and designing new predictive approaches, for dealing with imbalanced classification problems [32].
It is known that many typical data mining methods, including Logistic Regression (LR), Naïve Bayes (NB), Support Vector Machine (SVM), Gradient Boosting Decision Tree (GBDT), and Deep Neural Network (DNN), have been widely used for building prediction models [24]. In this study, the prediction of slag's viscosity can be considered as a classification task, which means the prediction models can be constructed based on the above data mining methods. However, each prediction model has its own advantages, especially for an imbalanced dataset. It indicates that one model may accurately identify one type of sample and another model can achieve better prediction for another class of samples. Based on this concern, this study proposes a dual-stage predictive modeling approach in order to make more accurate predictions based on the results of these two models. Figure 1 shows the logic flow of the proposed dual-stage predictive modeling approach. It indicates that the proposed approach consists of two stages and the prediction outcomes of the first stage are considered as inputs of the second stage model for final prediction.

The Proposed Dual-Stage Predictive Modeling Approach
It is clearly shown that the collected slag dataset is imbalanced. Generally, there are two strategies, including adjusting the imbalanced status from the data-level and designing new predictive approaches, for dealing with imbalanced classification problems [32].
It is known that many typical data mining methods, including Logistic Regression (LR), Naïve Bayes (NB), Support Vector Machine (SVM), Gradient Boosting Decision Tree (GBDT), and Deep Neural Network (DNN), have been widely used for building prediction models [24]. In this study, the prediction of slag's viscosity can be considered as a classification task, which means the prediction models can be constructed based on the above data mining methods. However, each prediction model has its own advantages, especially for an imbalanced dataset. It indicates that one model may accurately identify one type of sample and another model can achieve better prediction for another class of samples. Based on this concern, this study proposes a dual-stage predictive modeling approach in order to make more accurate predictions based on the results of these two models. Figure  1 shows the logic flow of the proposed dual-stage predictive modeling approach. It indicates that the proposed approach consists of two stages and the prediction outcomes of the first stage are considered as inputs of the second stage model for final prediction.

Data
Data preprocessing Specifically, two predictive models need to be trained in the first stage. One model aims to predict high viscosity samples (MH thereafter) and another model aims to identify low viscosity samples (ML thereafter). It must be emphasized that these two models have the same input variables but the opposite target variable. MH can generate one probability value for each slag sample, which is called PH in this study, indicating the possibility of high viscosity. Similarly, ML can produce one probability value for each slag sample, which is called PL, denoting the possibility of low viscosity. Ideally, the sum of PH and PL (called Pt) should be equal to 1 when using algorithms like Logistic Regression, and Pt should be close to 1 when using ensemble models. It means a high PH sample should obtain a low PL and vice versa. Because the optimal parameters of the model based on GBDT or DNN will be determined by the target variable, which means the parameters of MH and ML will be different for obtaining their individual best prediction accuracy. It is not surprising that some slag samples may obtain both high or low probabilities from MH and ML because of this characteristic. This may result in misclassification and further inappropriate decision-making. This study takes into account this hidden concern and proposes to build the second stage predictive model for performing final predictions. Therefore, the first stage should employ ensemble algorithms such as GBDT and DNN rather than simple classifiers to ensure nonlinear relationships among PH, PL, and Pt. Specifically, two predictive models need to be trained in the first stage. One model aims to predict high viscosity samples (MH thereafter) and another model aims to identify low viscosity samples (ML thereafter). It must be emphasized that these two models have the same input variables but the opposite target variable. MH can generate one probability value for each slag sample, which is called PH in this study, indicating the possibility of high viscosity. Similarly, ML can produce one probability value for each slag sample, which is called PL, denoting the possibility of low viscosity. Ideally, the sum of PH and PL (called Pt) should be equal to 1 when using algorithms like Logistic Regression, and Pt should be close to 1 when using ensemble models. It means a high PH sample should obtain a low PL and vice versa. Because the optimal parameters of the model based on GBDT or DNN will be determined by the target variable, which means the parameters of MH and ML will be different for obtaining their individual best prediction accuracy. It is not surprising that some slag samples may obtain both high or low probabilities from MH and ML because of this characteristic. This may result in misclassification and further inappropriate decision-making. This study takes into account this hidden concern and proposes to build the second stage predictive model for performing final predictions. Therefore, the first stage should employ ensemble algorithms such as GBDT and DNN rather than simple classifiers to ensure nonlinear relationships among PH, PL, and Pt.
Then the three probability outcomes (i.e., PH, PL and Pt) are fed to the second stage model as inputs. The decision tree algorithm is employed for building the second stage model because of its great advantages in visualizing the decision process, and the prediction target is same to that of MH. The second stage model can also be considered as a coordination model. Therefore, the dual-stage predictive modeling approach makes the final predictions based on the prediction results of the MH and ML models, which can be considered as a novel ensemble method using stacking strategy [33]. It is expected that the proposed dual-stage predictive modeling approach can make more accurate predictions for high viscosity samples than just one MH model.

Metrics for Performance Evaluation
Measuring the overall prediction accuracy (i.e., the proportion of the correctly identified samples) and misclassification rate (i.e., the proportion of the misclassified samples) are commonly used in data mining tasks [24]. However, due to the imbalanced characteristics of the dataset (the percentage of samples with high viscosity is less than 30%), it is critical to correctly identify the minority category (i.e., high recall rate). In addition, increasing the false positive cases also needs to be avoided (i.e., high precision rate). Therefore, F 1 score is selected as a harmonic mean of precision and recall. In general, the higher the value of F 1 score, the better the prediction performance of models. Finally, four metrics as shown in Equations (2)-(5), including overall accuracy, misclassification, recall, and F 1 score, are chosen for measuring and comparing models' prediction performance.
True positive (TP) denotes a slag sample whose status is positive and the model also correctly predicts this sample as positive. True negative (TN) indicates a slag sample whose status is negative and the model also predicts it as negative. False positive (FP) means a sample whose status is negative, but the model misclassifies it as positive, and false negative (FN) is the opposite of FP. In this study, positive means high viscosity samples in MH and low viscosity samples in ML, while negative denotes low viscosity samples in MH and high viscosity in ML. In this study, the accuracy, misclassification, and F 1 score are indicators to evaluate the model's overall prediction performance.

The Identification of Significant Factors
In the field of data mining, data scientists need to not only focus on how to improve the prediction accuracy but also discover the significant factors from the large amounts of data. The former can inform related stakeholders about the results at which slag could have a high possibility of being a high viscosity sample, but the researchers and practitioners still do not know how to effectively optimize the components or conditions to improve or change viscosity unless they have been informed which factors can significantly affect slag's viscosity.
In this study, the surrogate modeling method (i.e., using another model to explain a complex model ("Surrogate model," n.d.)) [34] is employed to interpret results generated from the best prediction model. Therefore, the surrogate model has the same sets of input variables, but the target variables are replaced with the predicted values by the best model. The decision tree algorithm is able to simulate the best model with 100% accuracy, and the obtained decision tree can be visualized to show the decision process. Therefore, using a decision tree method to build the surrogate model can help us find the significant factors for providing related stakeholders with insights and guidelines.

Prediction Performance of the Baseline Methods
Firstly, several commonly used data mining methods, including LR, NB, SVM, KNN, GBDT, and DNN, are adopted as baseline methods in this study. These baseline models have the same inputs and prediction target with MH model. The validation results of these baseline methods are listed in Table 2. These results can not only answer the first research question but also be used for comparison with that of the proposed dual-stage predictive modeling approach.  Table 2 shows that the data mining methods can be used for investigating slag's viscosity. It also indicates that the KNN, GBDT, and DNN can achieve both higher accuracy rates and better F 1 scores, but the remaining baseline methods have relatively poor performance, especially the NB method. Although the NB classifier could capture all the high viscosity slag samples in the validation dataset, it can also result in a high misclassification rate, which makes the reliability of the prediction results quite low in practice. Therefore, these commonly used data mining methods are able to predict slag's viscosity, but whether they can outperform the proposed dual-stage approach needs to be further investigated.

Prediction Performance of the Proposed Dual-Stage Predictive Modeling Approach
As mentioned earlier, there are two probabilities (PH and PL) generated from the first stage models in the proposed dual-stage predictive modeling approach. Furthermore, this approach needs to use ensemble methods for building the first stage models so that the sum of PH and PL (i.e., Pt) has a wide range rather than be always equal to 1. Therefore, the GBDT and DNN methods are used for constructing the first stage models (i.e., MHs and MLs), and the validation results in the first stage are listed in Table 3 for further comparison with that of the second stage. Meanwhile, each slag sample could obtain three probabilities (i.e., PH, PL, and Pt) from the first stage models. These three probabilities are considered as inputs of the second stage decision tree model. In order to visually understand the input distributions of the second stage training set, the ascending order of PH values were used to visualize the three probabilities as shown in Figure 2. Figure 2 shows that Pt is not always equal to 1 both for GBDT and DNN. The first subfigure representing the results of MH and ML based on GBDT has a Pt range from 0.844 to 1.305, but the subfigure denoting the results of DNN's models has a wider Pt range from 0.109 to 1.715. It is also found that both subfigures have more than 82% of Pts in the range of 0.90 and 1.10. It is known that 0.5 is traditionally adopted as the default threshold for predictive modeling. It means if the probability is greater than 0.5 the sample will be classified as positive. The second stage aims to build the coordination model to monitor and decide a more optimized threshold. Specifically, the second stage adopts the PH, PL, and Pt as inputs and the high viscosity label as target to train a decision tree model. Table 4 shows the results of the coordination model. The numbers with the parentheses in Table 4 are the results of MHs at the first stage. The results indicate that the coordination model can further improve the recall rates without decreasing the overall accuracy rates, misclassifications, and F1 scores. Therefore, it can be concluded that the proposed dual-stage predictive modeling approach performs better than baseline models (i.e., just MH).
In addition, observing the decision process of the coordination models can obtain the dynamic high viscosity rules. One rule obtained from the coordination model based on GBDT is PH > 0.655. Another rule from the model based on DNN is PL ≤ 0.906 and PH > 0.556. They indicate that the more optimal threshold can be identified for better classification rather than 0.5 default threshold in the proposed dual-stage predictive modeling approach. Furthermore, considering that the 'underfitting' and 'overfitting' are two common issues in the data mining research [24], it is necessary to further verify the robustness of the proposed dual-stage predictive modeling approach. Adjusting the different sizes of training dataset to build models and validate them is a feasible way to investigate the robustness. The validation results of models that were built based on different sizes of training dataset are shown in Figure 3. The results show that containing at least 918 samples in the training dataset is necessary to make sure the reliability of the proposed prediction model. It is known that 0.5 is traditionally adopted as the default threshold for predictive modeling. It means if the probability is greater than 0.5 the sample will be classified as positive. The second stage aims to build the coordination model to monitor and decide a more optimized threshold. Specifically, the second stage adopts the PH, PL, and Pt as inputs and the high viscosity label as target to train a decision tree model. Table 4 shows the results of the coordination model. The numbers with the parentheses in Table 4 are the results of MHs at the first stage. The results indicate that the coordination model can further improve the recall rates without decreasing the overall accuracy rates, misclassifications, and F 1 scores. Therefore, it can be concluded that the proposed dual-stage predictive modeling approach performs better than baseline models (i.e., just MH). In addition, observing the decision process of the coordination models can obtain the dynamic high viscosity rules. One rule obtained from the coordination model based on GBDT is PH > 0.655. Another rule from the model based on DNN is PL ≤ 0.906 and PH > 0.556. They indicate that the more optimal threshold can be identified for better classification rather than 0.5 default threshold in the proposed dual-stage predictive modeling approach.
Furthermore, considering that the 'underfitting' and 'overfitting' are two common issues in the data mining research [24], it is necessary to further verify the robustness of the proposed dual-stage predictive modeling approach. Adjusting the different sizes of training dataset to build models and validate them is a feasible way to investigate the robustness. The validation results of models that were built based on different sizes of training dataset are shown in Figure 3. The results show that containing at least 918 samples in the training dataset is necessary to make sure the reliability of the proposed prediction model.

Significant Factors of the Slag's Viscosity
The surrogate modeling method can be used to interpret results generated from the prediction model (such as GBDT model) in the first stage in order to reveal significant factors. Considering that the ensemble models are recommended in the first stage of the proposed dual-stage predictive modeling approach, the surrogate model is also complex. The top five layers of the surrogate model are shown in Figure 4 in order to clearly present the most significant division variables.    Figure 4 shows the most important factors affecting slag's viscosity are temperature, SiO2, Al2O3, P2O5, CaO, and B2O3. The "0" in Figure 4 denotes the low viscosity slags whose viscosity values are less than 10P, while the "1" means high viscosity slags. Paths for identifying high viscosity slag samples are marked with an asterisk in order to enhance readability.

Significant Factors of the Slag's Viscosity
The surrogate modeling method can be used to interpret results generated from the prediction model (such as GBDT model) in the first stage in order to reveal significant factors. Considering that the ensemble models are recommended in the first stage of the proposed dual-stage predictive modeling approach, the surrogate model is also complex. The top five layers of the surrogate model are shown in Figure 4 in order to clearly present the most significant division variables.

Significant Factors of the Slag's Viscosity
The surrogate modeling method can be used to interpret results generated from the prediction model (such as GBDT model) in the first stage in order to reveal significant factors. Considering that the ensemble models are recommended in the first stage of the proposed dual-stage predictive modeling approach, the surrogate model is also complex. The top five layers of the surrogate model are shown in Figure 4 in order to clearly present the most significant division variables.    Figure 4 shows the most important factors affecting slag's viscosity are temperature, SiO2, Al2O3, P2O5, CaO, and B2O3. The "0" in Figure 4 denotes the low viscosity slags whose viscosity values are less than 10P, while the "1" means high viscosity slags. Paths for identifying high viscosity slag samples are marked with an asterisk in order to enhance readability.    Figure 4 denotes the low viscosity slags whose viscosity values are less than 10P, while the "1" means high viscosity slags. Paths for identifying high viscosity slag samples are marked with an asterisk in order to enhance readability. There are four paths that can lead to a higher chance of the slag being identified as high viscosity, including (rules 1-2 and 2-4), (rules 1-1 and 2-2 and 3-4), (rules 1-1 and 2-1 and 3-2 and 4-3), and (rules 1-2 and 2-3 and 3-6 and 4-9). The rule 1-2 means if the temperature is higher than 980 • C, the probability of a high viscosity sample is 0.124. When the rule 1-2 is satisfied and the slag's SiO 2 component is greater than 91.3%, the high probability is increased from 0.124 to 1. This path indicates that the higher the component of SiO 2 , the higher the viscosity. The second path denotes that if the temperature is less than 980 • C and the slag's Al 2 O 3 component is greater than 2.3% and the SiO 2 component is also greater than 26.8%, the slag's viscosity value must be greater than 10P. These two paths indicate that containing higher SiO 2 is beneficial for improving slag's viscosity value.
The third path shows that if the temperature is less than 980 • C, and the slag's components satisfy: Al 2 O 3 ≤ 2.3%, P 2 O 5 > 72.4%, and CaO ≤ 24%, the high probability is equal to 1. The fourth path denotes that if the temperature is greater than 980 • C, the slag's SiO 2 component is no more than 91.3%, as well as if the B 2 O 3 component is greater than 14.3% and less than 88%, it can be inferred that the slag's viscosity is greater than 10P. The last two paths may indicate that adding P 2 O 5 or B 2 O 3 can increase the slag's viscosity.
The remaining factors that were not been listed in Figure 4 (such as MgO, Fe 2 O 3 , and Li 2 O) have little effect on slag's viscosity. The above paths show that the most important factors related to slag's viscosity are temperature, SiO 2 , Al 2 O 3 , P 2 O 5 , CaO, and B 2 O 3 . Therefore, researchers and practitioners could pay more attention to these significant factors to adjust the slag's viscosity based on their individual requirements. In addition, the paths generated by the surrogate model as shown in Figure 3 can be utilized for guiding decision-making in practice.

Discussion
This study aims to investigate whether data mining methods can be used for analyzing slag's viscosity, whether the proposed dual-stage predictive modeling approach can further improve the prediction performance, and which factors are more important for guiding related stakeholders to make decisions.

High Recall Rate and Low Misclassification Rate of the Proposed Approach
Firstly, the comparison results of the six data mining methods show that although NB can capture 95% high viscosity samples, it also misclassified a great number of low viscosity samples as high category. This method results in a 0.45 misclassification rate, which indicates the NB classifier cannot be used in practice. This finding is consistent with the previous study [24]. The possible reason may be that the NB classifier is built on the assumption of attribute independence and simplifies the real classification task [35], but the input variables in this study are continuous and not independent of each other. The SVM and LR classifiers also achieve poor overall accuracy and recall rates. The KNN, GBDT, and DNN can achieve both high accuracy rates and F 1 scores, but the recall rates are no more than 0.82, which means more than 18% of high viscosity slag samples cannot be correctly identified. Therefore, there is still room for improving the prediction performance.
Considering that each prediction model has its own advantages in identifying some types of samples, this study proposes a dual-stage predictive modeling approach in order to improve the recall rate without increasing the overall misclassification rate. The experimental results show that the proposed dual-stage modeling approach can better achieve both a high recall rate and F 1 score as well as a low misclassification rate compared to just one prediction model (MH). The new thresholds of the proposed approach indicate that combining the results of two models for making final predictions seems to be threshold moving. Threshold moving is a common approach in the field of machine learning for the purpose of changing the model's recall and precision rates [36,37]. Typically, lowing the model's threshold can usually increase the recall rate, but such action will increase the number of false positive cases at the same time. It means traditional threshold moving can increase the misclassification rate. However, the proposed dual-stage modeling approach can increase the recall rate and decrease the misclassification rate by optimizing two models' thresholds with the coordination model. The proposed dual-stage approach can also be considered as an ensemble model that is based on the stacking strategy [33]. Therefore, it is not surprising that the proposed approach can further improve the prediction performance.

The Significant Factors and Prediction for Slag's Viscosity
The changes of slag composition and smelting temperature will greatly affect the viscosity of slag during the smelting process. It is found that among the gathered 25 variables, temperature, SiO 2 , Al 2 O 3 , P 2 O 5 , and CaO had a greater effect on viscosity. The relationship between these variables and viscosity will be discussed below.
Temperature has the most significant effect on the viscosity of the slag, and the viscosity tends to decrease along with increasing temperature. As shown in Figure 3, the critical temperature for Newtonian fluid transferring to non-Newtonian fluid is 980 • C. When the temperature is lower than 980 • C, the slag shows a characteristic of high viscosity and a nonlinear polynomial model can fit the relationship between viscosity and temperature with low error (R 2 = 0.97). Therefore, temperature is the only major factor in this range [38,39]. When the temperature is higher than 980 • C, the Arrhenius formula was applied to express the viscosity of slag [40]. Therefore, a model of six-degree polynomial combined with Arrhenius formula has been established and the expression is as follows: 10T × 10 7 + 8.79 × 10 9 , 500 < T ≤ 980 A· exp −E η /R(T + 273) , T > 980 (6) where η is the slag viscosity, P; T is the temperature, • C; A is a constant; E η is the activation energy; and R is the gas constant. The value of A and E η is mostly related to the composition of the slag as shown in Table 5. It is well known that slag composition is another major factor affecting slag viscosity. According to the above results, both SiO 2 and P 2 O 5 can increase the viscosity of the slag, while the slag containing CaO shows lower viscosity. According to the ionization theory of slag, both SiO 2 and P 2 O 5 belong to the acidic oxides in the oxide slag and can be used as network formers in the slag. These two oxides have a strong ability to compete for oxygen in the slag and have a large electrostatic potential. Therefore, they can form complex network structures with the bridging oxygen in the slag, which can enhance the polymerization degree of the slag network and make the flow resistance increase. Therefore, it is not surprising that the viscosity of the slag increases along with the concentration of two oxides. However, alkaline oxides are mostly network modifiers in the slag, and their ability to compete for oxygen is weak. The complex slag structure is depolymerized in the smelting process, which simplifies the flow structure in the slag and reduces the flow resistance. CaO belongs to alkaline oxides. Therefore, the slag will show the characteristics of low viscosity with the increasing of CaO.
As an amphoteric oxide, the effect of Al 2 O 3 will change with the change of slag composition. In basic slag, with the increase of Al 2 O 3 content, the number of (AlO 4 ) 5− anion groups in the slag will increase, and the structural units inside the slag will be complicated. In addition, Al 2 O 3 is prone in basic oxide to form complex compounds with high melting points, such as spinel (MgO·Al 2 O 3 ), which increases the viscosity of slag containing Al 2 O 3 . In this path, Al 2 O 3 is in the acidic slag, and Al 2 O 3 will provide O 2− , which forms a six-coordination or higher coordination structure with nonbridged oxygen or free oxygen. It can greatly reduce the degree of polymerization, and the slag presents a lower viscosity. Table 5 also shows that the value of A and E η are varied in different slag systems, such as Al 2 O 3 -Gd 2 O 3 slag, Al 2 O 3 -La 2 O 3 slag, and Al 2 O 3 -Nd 2 O 3 slag. For the same slag system, the difference in component content will also cause the difference in Arrhenius formula, which can be proved by the SiO 2 -B 2 O 3 system, P 2 O 5 -CaO system, and other slag systems. From the fourth path of the surrogate results, in the case of the SiO 2 -B 2 O 3 slag, it can be found that when the temperature is higher than 980 • C, the content of SiO 2 is no more than 91.3%, and the content of B 2 O 3 is greater than 14.3% and less than 88%, the slag viscosity value will be greater than 10P. Compared with B 2 O 3 , the effect of SiO 2 on the viscosity in this binary slag is more significant, so this type of slag will show higher viscosity. However, as shown in Table 5, the increase of the content of B 2 O 3 is accompanied with the decrease of the absolute value of the activation energy. It is found that the viscosity of slag containing B 2 O 3 can decrease with the increase of B 2 O 3 content. Due to the low melting point of B 2 O 3 , it is easy to form low melting point substances in the slag and reduce the melting temperature of the slag [41]. In addition, some boron oxygen tetrahedrons [BO 4 ] 5− will change to [BO 3 ] 3− at high temperature, and the slag structure will become loose, which will further reduce the viscosity [42]. The slag, which has high content of TiO 2 or medium content of TiO 2 with a little V 2 O 5 , shows a higher absolute value of the activation energy and lower value of A. Furthermore, with the increase of the number of components in slag system, the changes of the value of A and E η is becoming more and more complicated. Therefore, different slag system components and contents will eventually lead to differences in slag viscosity.
Overall, compared with the traditional viscosity prediction method, this general-purpose data-driven predictive modeling approach firstly classified the high viscosity data and low viscosity data, and then the viscosity value of the slag was predicted by the subsection function. It performs better with higher recall rate and a low misclassification rate to distinguish the range of the significant factors, including temperature and slag composition. Moreover, lots of data collected from experimental values that are the source of empirical or semiempirical models and the prediction results are consistent with that of the Arrhenius formula. It breaks the limitations of theoretical and empirical methods for prediction of slag viscosity with specific conditions. It also indicates that the proposed dual-stage predictive modeling approach is promising for applying to various slag systems with higher efficiency.

Conclusions
This study has proposed an innovative dual-stage predictive modeling approach for automatically predicting the viscosity of the slag and demonstrated its effectiveness on a collected imbalanced dataset. The proposed approach, which seems like an ensemble method, can achieve higher recall rate and lower misclassification cases than baseline methods. Several important factors, including temperature, SiO 2 , Al 2 O 3 , P 2 O 5 , and CaO, have also been identified by employing the surrogate modeling approach. Among them, the viscosity shows an increasing trend with the increasing of SiO 2 and P 2 O 5 content, while the viscosity decreases with the increase of Al 2 O 3 and CaO content. Finally, a two-equation model of six-degree polynomial combined with Arrhenius formula was established. The effects of B 2 O 3 , FeO, TiO 2 , and V 2 O 5 on the viscosity of the partial slag system were revealed and discussed in order to provide theoretical guidance for industrial application and reutilization of slag.