Prediction of Slag Characteristics Based on Artiﬁcial Neural Network for Molten Gasiﬁcation of Hazardous Wastes

: Molten gasiﬁcation is considered as a promising technology for the processing and safe disposal of hazardous wastes. During this process, the organic components are completely converted while the hazardous materials are safely embedded in slag via the fusion-solidiﬁcation-vitriﬁcation transformation. Ideally, the slag should be glassy with low viscosity to ensure the e ﬀ ective immobilization and steady discharge of hazardous materials. However, it is very di ﬃ cult to predict the characteristics of slag using existing empirical equations or conventional mathematical methods, due to the complex non-linear relationship among the phase transformation, vitriﬁcation transition and chemical composition of slag. Equipped with a strong nonlinear mapping ability, an artiﬁcial neural network may be able to predict the properties of slags if a large amount of data is available for training. In this work, over 10,000 experimental data points were used to train and develop a slag classiﬁcation model (glassy vs. non-glassy) based on a neural network. The optimal structure of the neural network was ﬁgured out and validated. The results suggest that the classiﬁcation accuracy for the independent test samples reached 93.3%. Using 1 and 0 as model inputs to represent mildly reducing and inert atmospheres, a double hidden layer structure in the neural network enabled the accurate classiﬁcation of slags under various atmospheres. Furthermore, the neural network for the prediction of glassy slag viscosity was optimized; it featured a double hidden layer structure. Under a mildly reducing atmosphere, the absolute error from the independent test data was generally within 4 Pa · s. By adding a gas atmosphere into the input of the neural network using a simple normalization method, a multi-atmosphere slag viscosity prediction model was developed. Said model is much more accurate than its counterpart that does not consider the e ﬀ ect of the atmosphere. In summary, the artiﬁcial neural network proved to be an e ﬀ ective approach to predicting the slag properties under di ﬀ erent atmospheres. The data-driven models developed in this work are expected to facilitate the commercial deployment of molten gasiﬁcation technology.


Introduction
Molten gasification combining chemical conversion and fusion of inorganic species is a promising technology for the clean processing and safe disposal of hazardous wastes. During this process, the organic species are completely decomposed while the hazardous materials partition into molten slag to be immobilized via the fusion-solidification-vitrification transformation [1][2][3][4]. The inorganic matter in hazardous wastes is turned into molten slag at elevated temperatures, which needs to be discharged in a liquid state. The continuous and smooth discharge of slag is crucial to the long-term operation of the molten gasification furnace. In order to discharge slag stably and reliably, it is necessary to ensure that the properties of the slag, especially its viscosity, meet the requirements of the molten furnace. In addition, the slag should be readily transformed into the glassy form (vitrified) with a minimal leaching propensity so that the hazardous elements contained within will not be leaked into the environment. Practically, the slag's viscosity-temperature relationship is a key factor in determining whether a certain hazardous waste is suitable for molten gasification [1,4,5]. In summary, to ensure the effective immobilization of hazardous elements from waste material through the fusion-solidification-vitrification process, it is very important to figure out the type and viscosity of molten slag, which is the issue to be tackled in this work.
The slags derived from hazardous wastes respond to the change of temperature differently in the molten state. When the viscosity-fluidity changes rapidly with temperature, it may cause slagging and clogging of the gasifier. Slags can be classified into glassy, plastic and crystalline (non-Newtonian) according to the viscosity-temperature curve [1,[5][6][7]. The viscosity of glassy slag decreases with increasing temperature in a gradual manner without abrupt changes. On the other hand, the viscosity of a crystalline slag drastically increases due to a slight decrease in temperature, which is detrimental to molten gasification. The slags with glassy characteristics facilitate the discharge through the liquid state. When the fuel generates a plastic slag, the operation temperature needs to be controlled in a very narrow range to avoid the rapid increase in slag viscosity. In the worst-case scenario, the intense precipitation of solids from the slag may cause unplanned shutdown of molten gasification [1,5,8,9].
There are many factors that affect the slag viscosity at different temperatures, and the governing factor is the chemical composition of ash. The compositions of hazardous wastes vary in a wide range [10][11][12][13][14]. For the raw materials with high SiO 2 content, the polymeric network structure composed of long silicate chains formed during the ash transformation process is dominant so that the flow resistance of the slag should increase at high temperature [15,16]. Alkali species (K and Na) are network modifiers that can shorten the lengths of silicate chains, enhancing the fluidity of slag. Al 2 O 3 is an amphoteric component, which can behave as both a silicate network former and a modifier. The Al 2 O 3 entering the pure SiO 2 melt will weaken the network strength and reduce the viscosity. The role of Ti 4+ in slag varies with temperature and chemical composition. The alkaline earth metal oxides CaO and MgO can decompose the three-dimensional structure of SiO 2 , increasing the molar volume and reducing the viscosity of slag [17,18]. Fe has different valence states in different atmospheres. Fe 2+ acts as a silicate network modifier, while Fe 3+ plays an amphoteric role. The reduction of Fe 3+ / Fe will reduce the viscosity of slag rapidly. When the mole fraction of FeO x in the slag exceeds 10%, the effects of Fe 3+ and Fe 2+ are the same [19]. The TiO 2 entering the SiO 2 melt containing oxides (CaO and MgO) with high bond energy will increase the viscosity [8,20,21].
The type, viscosity and other properties of slag can be measured at different temperatures, but the experiment is costly and time-consuming. It requires substantial professional knowledge and advanced instruments to obtain accurate data [5][6][7]. Therefore, the development of a universal model to predict slag viscosity would greatly promote the design of gasifiers and the selection of raw materials. Predicting the viscosity of slag remains challenging due to the extensive differences in the type of slag and the complex non-linear correlation between slag viscosity and ash composition determined by the raw materials analysis [4,9]. Although empirical equations are available, they are typically only accurate for certain types of ash and not universally applicable [8,15,22]. The ash of hazardous wastes contains many components with different chemical characters. Some chemical components cannot be used as input variables for the model to calculate viscosity [2,3]. Molten slag will be converted into non-Newtonian fluid in the cooling process, which reduces the prediction accuracy of the model [5][6][7]. By comparison, the empirical formula S 2 is available when the silicon percentage is less than 55% and the iron oxide content is less than 5%; the Watt-Fereday formula is more accurate when the silicon percentage is greater than 80% or the iron oxide content is greater than 15% [15]. The data used correspond to silica contents higher than 0.5 mole fraction and to equal proportions of Energies 2020, 13, 5115 3 of 18 alumina and modifier ions, which constitute an important limitation of the model for feldspar melts [8]. Kondratiev et al. compared four models with different prediction biases. The iron content of the samples used during model development has a great influence on the prediction results [22]. The higher Na 2 O content was found to cause the significant change of prediction errors to different models [23,24]. Due to insufficient samples and a small chemical component range, many models cannot be applied to all compositions and temperatures [7][8][9]. In addition, empirical equations cannot explain factors that interfere with viscosity, such as atmospheres [25]. Fortunately, the neural network models that were trained with a large amount of real experimental data have the potential to improve the overall prediction efficiency [26]. Given strong nonlinear mapping capabilities, artificial neural networks can effectively address the variations in the composition of ash and thus be used for a wide range of fuels. Since a neural network does not require any explicit mathematical processing based on theoretical assumptions, it is not limited by the correctness of the relevant theories [27][28][29][30].
Most of the existing neural network models for predicting ash viscosity did not employ a wide range of data from various compositions for training. It has been revealed that the measurement of slag viscosity under different atmospheres will give different results, leading to the difficulties in the accurate prediction of the viscosity [25,31,32]. Moreover, the existing neural network models only consider the influences of ash composition and temperature on viscosity, without taking the other operating parameters such as gas atmosphere into account [8,25,[31][32][33]. Consequently, to improve the performance and applicability of the slag viscosity and type prediction model based on a neural network, it is imperative to use a larger amount of data from a wider range of hazardous wastes under various atmospheres to train the neural network.
A neural network can identify the underlying patterns and characteristics of a multivariable system, serving as a powerful tool in the era of big data [26,34,35]. The need for theoretical knowledge of modeling phenomena is no longer critical, and it is relatively easy to develop and update models. Moreover, the neural network can be easily and quickly trained to adjust the weights and thresholds to accommodate varying types of hazardous wastes. In contrast, correcting an empirical formula is known to be a troublesome process [8,15,36]. In addition, the versatile model may be applied to other materials, such as steel slag [21,37]. In the future, with an increasing amount of experimental data becoming available, a wider range of slag compositions and more influencing factors can be included to establish a more reliable and efficient model for engineering applications [22,31,38].
The objective of this study was to develop and validate a neural network model that can predict the type (glassy/non-glassy) and viscosity of slag under different atmospheres. Over 10,000 experimental data points were used to train the model. The developed model was then tested by a series of independent data to evaluate its accuracy, revealing the optimal neural network structure, activation functions and algorithm.

Topological Structure of Neural Network
Originally, the idea of the artificial neural network was inspired by the way in which the human brain works. With the rapid development of artificial intelligence, many complex algorithms can be implemented on computers. For practical applications, the error back propagation neural network or its variant is generally used, which is the most successful neural network learning algorithm [36,39]. The error back propagation (BP) neural network is a typical multi-layer feedforward neural network, which consists of an input layer, several hidden layers and an output layer [27,28,40]. As shown in Figure 1, the neural network consists of a large number of nodes. Each node can be regarded as an auditor of brain and has the function of independently performing a certain operation. The connection between the nodes is reflected in the weight between each node. The neural network can calculate the error between the output and the target, adjust the weight and threshold continuously, and improve the accuracy of the model until the convergence condition is satisfied. Neural networks can have single Energies 2020, 13, 5115 4 of 18 or multiple inputs and outputs. The specific structure of the neural network needs to be adjusted according to the problem being solved [36,39,40]. activation functions include tansig (tanh(x) = (1 − e (−x) )/(1 + e (−x) )), logsig (sigmoid(x) = 1/(1 + e (−x) )) and poslin (ReLU(x) = max(0, x)). The use of a nonlinear activation function usually leads to better fitting and improved compatibility. The ReLU function is generally limited to use in the hidden layer. Leaky ReLU (f(x) = max(αx, x)) is a variant of ReLU that can restore the transmission of values in the negative region. The training process of a BP neural network is realized by two processes: forward transfer of input variables and back propagation of errors [25,39]. The model continuously adjusts the weights and thresholds through experimental data to reduce the error. These training steps need to be repeated to obtain the desired results [27,36,42,43].

Model Development for the Classification of Slags
The schematic of neural network model development is provided in Figure 2. In addition to defining the research problem and selecting samples, the most important step is to determine the neural network structure, which requires the determination of the activation function, hidden neuron numbers, algorithm and allowable error range to be used in the calculation process [36,40]. (1) Data acquisition and processing In order to establish a model with extensive applicability, a large amount of slag viscosity data was collected from the literature [1,[5][6][7]9,[12][13][14][15][16]18,19,23,25,[31][32][33]38,. It is widely accepted that the gas atmosphere affects the viscosity of hazardous waste slag at elevated temperatures. The The activation function between the input, hidden and output layers determines whether a network model can converge faster and is the key to accurate prediction [2,30,41]. The linear activation function in Matlab is purelin (f(x) = k × x). A linear activation function is a relatively simple function, which can make the input value greater or smaller by linear relation. The nonlinear activation functions include tansig (tanh(x) = (1 − e (−x) )/(1 + e (−x) )), logsig (sigmoid(x) = 1/(1 + e (−x) )) and poslin (ReLU(x) = max(0, x)). The use of a nonlinear activation function usually leads to better fitting and improved compatibility. The ReLU function is generally limited to use in the hidden layer. Leaky ReLU (f(x) = max(αx, x)) is a variant of ReLU that can restore the transmission of values in the negative region. The training process of a BP neural network is realized by two processes: forward transfer of input variables and back propagation of errors [25,39]. The model continuously adjusts the weights and thresholds through experimental data to reduce the error. These training steps need to be repeated to obtain the desired results [27,36,42,43].

Model Development for the Classification of Slags
The schematic of neural network model development is provided in Figure 2. In addition to defining the research problem and selecting samples, the most important step is to determine the neural network structure, which requires the determination of the activation function, hidden neuron numbers, algorithm and allowable error range to be used in the calculation process [36,40].
(1) Data acquisition and processing In order to establish a model with extensive applicability, a large amount of slag viscosity data was collected from the literature [1,[5][6][7]9,[12][13][14][15][16]18,19,23,25,[31][32][33]38,. It is widely accepted that the gas atmosphere affects the viscosity of hazardous waste slag at elevated temperatures. The viscosity data were obtained under the premise that the measurement method was as consistent as possible, which promoted the successful establishment of the model. The data were thoroughly selected to ensure that there was no contradiction between them. The atmosphere in gasifiers is typically reducing with high concentrations of CO and H 2 , so the viscosity is usually measured in a mildly reducing atmosphere (60CO/40CO 2 ) [24,25]. Chinese standard DL/T660-2007 were used to divide slags among glassy/non-glassy. The criteria can also be seen in the literature [24,31,49]. As shown Figure 3a, the viscosity of glassy slags decreases with increasing temperature in a gradual manner without abrupt changes (curve A). The slags with glassy characteristics facilitate the discharge through the liquid state. As the temperature decrease, the amorphous slag exhibits a solid-liquid mixed portion where solid crystals precipitate (curve B). When the system generates a plastic slag, the operation temperature needs to be controlled in a very narrow range to avoid the rapid increase of slag viscosity. On the other hand, the viscosity of crystalline slags drastically increases due to a slight decrease of temperature, Energies 2020, 13, 5115 5 of 18 which is detrimental to molten gasification (curve C). However, in fact, the viscosity of a slag has various changing trends, which may be different from the ideal situation. As shown in Figure 3b, it is difficult to accurately distinguish the structural characteristics of each slag, especially between the crystalline and amorphous slag. The viscosity of the glassy slag without rapid crystallization would change relatively smoothly. Therefore, the structural characteristics of the slag were classified into glassy/non-glassy.

Model Development for the Classification of Slags
The schematic of neural network model development is provided in Figure 2. In addition to defining the research problem and selecting samples, the most important step is to determine the neural network structure, which requires the determination of the activation function, hidden neuron numbers, algorithm and allowable error range to be used in the calculation process [36,40]. (1) Data acquisition and processing In order to establish a model with extensive applicability, a large amount of slag viscosity data was collected from the literature [1,[5][6][7]9,[12][13][14][15][16]18,19,23,25,[31][32][33]38,. It is widely accepted that the gas atmosphere affects the viscosity of hazardous waste slag at elevated temperatures. The viscosity data were obtained under the premise that the measurement method was as consistent as possible, which promoted the successful establishment of the model. The data were thoroughly selected to ensure that there was no contradiction between them. The atmosphere in gasifiers is typically reducing with high concentrations of CO and H2, so the viscosity is usually measured in a mildly reducing atmosphere (60CO/40CO2) [24,25]. Chinese standard DL/T660-2007 were used to divide slags among glassy/non-glassy. The criteria can also be seen in the literature [24,31,49]. As shown Figure 3a, the viscosity of glassy slags decreases with increasing temperature in a gradual manner without abrupt changes (curve A). The slags with glassy characteristics facilitate the discharge through the liquid state. As the temperature decrease, the amorphous slag exhibits a solidliquid mixed portion where solid crystals precipitate (curve B). When the system generates a plastic slag, the operation temperature needs to be controlled in a very narrow range to avoid the rapid increase of slag viscosity. On the other hand, the viscosity of crystalline slags drastically increases due to a slight decrease of temperature, which is detrimental to molten gasification (curve C). However, in fact, the viscosity of a slag has various changing trends, which may be different from the ideal situation. As shown in Figure 3b, it is difficult to accurately distinguish the structural characteristics of each slag, especially between the crystalline and amorphous slag. The viscosity of the glassy slag without rapid crystallization would change relatively smoothly. Therefore, the structural characteristics of the slag were classified into glassy/non-glassy.
(a) (b) First, 272 ash sample data in the mildly reducing atmosphere were used for modeling. There were 124 samples of glassy and 148 samples of non-glassy slags. The weight percentage of each chemical component of ash was the input vector of the neural network. Table 1 illustrates the data range of ash. It should be noted that some of the ash samples were artificially prepared, i.e., not every sample contained all the components listed in Table 1. The samples used in numerous viscosity measurements were different. If the ash was naturally formed, it contained all the components listed in the table. The artificially prepared samples were meant to match the compounds of hazardous waste or to simulate slags with additions of fluxing agents [7,21,44]. In order to investigate the influence of a specific or several components on the viscosity, and make the measurement easier, samples with less components were often artificially prepared to simplify the ash system [18,25,49]. Artificial preparation of samples with corresponding components is more conducive to the smooth progress of the verification experiment [49,51,54]. First, 272 ash sample data in the mildly reducing atmosphere were used for modeling. There were 124 samples of glassy and 148 samples of non-glassy slags. The weight percentage of each chemical component of ash was the input vector of the neural network. Table 1 illustrates the data range of ash. It should be noted that some of the ash samples were artificially prepared, i.e., not every sample contained all the components listed in Table 1. The samples used in numerous viscosity measurements were different. If the ash was naturally formed, it contained all the components listed in the table. The artificially prepared samples were meant to match the compounds of hazardous waste or to simulate slags with additions of fluxing agents [7,21,44]. In order to investigate the influence of a specific or several components on the viscosity, and make the measurement easier, samples with less components were often artificially prepared to simplify the ash system [18,25,49]. Artificial preparation of samples with corresponding components is more conducive to the smooth progress of the verification experiment [49,51,54]. It can be seen from Table 1 that a wide range of experimental data have been used, which should have made the model perform well. As the size of the input value varies greatly, it needs to be processed to reduce the prediction error. The input value was normalized to between −1 and 1 by the map minmax function in Matlab. This method was realized by the mathematical calculation of the input value and the maximum/minimum value among the inputs [28,40]. After transformation, the nonlinear activation function was expected to work well.
(2) The design of BP neural network (a) Model input and output In total, 272 different ash sample data in the mildly reducing atmosphere were used for modeling. The data were divided into a training set and a test set. The training set included 242 kinds of ash data, and was used in the model building process. The model continuously adjusted the weights and thresholds through these data to reduce the error. The test data did not participate in the model development process, making the set equivalent to a series of unknown data to reflect the ability of the model. Therefore, the test set was used to verify the performance of the developed neural network, including 30 kinds of ash data, which contained 15 kinds of glassy and 15 kinds of non-glassy slag. Model inputs encompassed the mass percentages of SiO 2 , Al 2 O 3 , CaO, Fe 2 O 3 , SO 3 , TiO 2 , MgO, Na 2 O and K 2 O. The acid-base ratio (A/B) and silicon-aluminum ratio (S/A) of ash components were also used as inputs. These components affect the viscosity of the slag. The input values and ranges of the model are shown in Table 1. They are sufficient to classify the slag into two types, so the output of the model should be relatively simple. In order to reduce the complexity of the model, the model has two output neurons. When the output values are 1 and 0, the characteristic of slag is glassy. When the output values are 0 and 1, it indicates that the slag is non-glassy.
(b) Determination of the activation function Initially, a single hidden layer neural network was established with a maximum epoch of 500, a momentum factor of 0.9, a learning rate of 0.01, a minimum gradient of 10 −25 and a target mean square error of 0.01. Because the initial weights and thresholds were randomly generated during neural network training, the results obtained were different each time. The initial value was set to be π so that each test can be completely repeated. The initial number of neuron nodes of the neural network was 8, and the starting algorithm was trainlm (Levenberg-Marquardt).

(c) Determination of hidden neuron numbers
Currently, there is no mature method to determine the hidden neuron numbers, so a lot of trials are needed. The empirical equations can only be used a starting point and are not authoritative. In the trial-and-error process, the number of hidden layer neurons was set to be from 4 to 20. Generally speaking, as the number of neurons increases, the model becomes more complex and the error gradually decreases. The hidden layer number was determined by striking a reasonable balance between the complexity and accuracy of the model.
As the system's default algorithm, trainlm is particularly suitable for finding the least square error and making convergence faster (reducing epochs), which is a commonly used algorithm in neural network Energies 2020, 13, 5115 7 of 18 training. Other common algorithms in Matlab include traingda, trainrp, trainscg, etc. [40]. Various algorithms were tested to determine the one that delivered the highest prediction accuracy.

Slag Classification Model for Multiple Atmospheres
Most of the slag viscosity measurements were carried out under a mildly reducing atmosphere. However, there were also many viscosity measurements performed under an inert atmosphere. The previous model may not be accurate for the classification of slag in other atmospheres. In order to make the model universal, the influential factors of atmosphere, which are usually difficult to quantify, were digitized to develop a model that can accommodate multiple atmospheres. The approach of digitizing atmospheric conditions was expected to enhance the prediction capability of the model as the viscosity of a slag depends on the surrounding atmosphere. There were few tests for measuring the slag viscosity under oxidizing atmosphere, so only reducing and inert atmospheres were considered in the model. The model consisted of 12 inputs, with one more input than its counterpart developed for a single atmosphere. The input value of 1 represents a mildly reducing atmosphere (60CO/40CO 2 ), and 0 refers to an inert atmosphere (Ar). The mildly reducing and inert atmospheres were represented by different numerical values, and thereby the prediction results were expressed by mathematical operations. This is a kind of computational science and a way of transforming numbers and realistic meaning into each other. Consequently, the calculation result (numbers) was given a realistic meaning (atmosphere). Similar methods can also be seen in the research on boiler slagging, using different numbers to represent the degree of slagging [69][70][71]. Compared with the input values in Table 1, the ranges only have a small change in the average value. This also shows that the number of samples in a mildly reducing atmosphere is large and the range is wide.
The modeling process was similar to before. In total 379 kinds of ash were used, and their viscosity measurements were carried out in mildly reducing or inert atmosphere by an almost uniform method. The training set had 329 kinds of ash data, which were used in the training process of the neural network. The test set had 50 kinds of ash data, including 25 glassy and 25 non-glassy ashes, and was equivalent to a series of unknown data so can reflect the generalization performance of the model. After adding the atmosphere factor as input, the complexity of the model was increased. In order to improve the performance of the model, a double hidden layer structure was adopted.

Slag Viscosity Prediction Model for Mild Reducing Atmosphere
After the slag was classified with high accuracy, the viscosity was expected to be accurately predicted. The change of viscosity of a glassy slag with temperature should be relatively smooth without the abrupt viscosity surge caused by rapid crystallization. It is an ideal type of slag for establishing a slag viscosity prediction model. In the database, a total of 124 samples behaved as glassy slag under a mildly reducing atmosphere. The composition range of the sample data is shown in Table 2, which is also the range of ash chemical components that can be used to predict viscosity. The neural network model was developed using 7797 viscosity data points. The data were normalized to −1 to 1 by Matlab's mapminmax function to eliminate the influences of different dimensions on the model. When the training error decreases and the test error increases, the model is over fitted, which affects the generalization ability. Matlab's dividevec function was employed to randomly assign 500 data points as test data and another 500 as the validation set to prevent the model from overfitting. The remaining 6797 data points were used for training.

Slag Viscosity Prediction Model for Multiple Atmospheres
To predict the slag viscosity in different atmospheres, the atmospheric conditions were described by numeric values to be set as inputs for the multi-atmosphere model. The mildly reducing atmosphere (60CO/40CO 2 ) and the inert atmosphere (Ar) were represented by numbers 1 and 0, respectively. There were 10,364 viscosity data points from the two atmospheres, which was 2567 more than the mildly reducing atmosphere. Less data were obtained using the intermittent method to measure viscosity under an inert atmosphere. According to a method similar to the prior one, 800 data were randomly assigned as the validation set and 800 data were used as the test set. The remaining 8764 data were used as the training set. Each part used more data than the case under a single reducing atmosphere. The parameters of the neural network were determined in turn, including the activation function, the number of hidden layer neurons and the training function. Compared with the input values in Table 2, the ranges only had a small change in the average value.

Optimized Activation Functions, Model Parameters and Algorithm
(1) Slag classification results under a mildly reducing atmosphere (model 1) For the different activation functions used, the training mean square errors of the model are shown in Table 3. When the hidden layer activation function was purelin or the output layer activation function was logsig, the mean squared error was high. Since the data were normalized between −1 and 1, poslin was not an ideal activation function. When the activation functions were logsig and tansig, the MSE was the lowest. Therefore, the activation functions of the hidden layer and output layer were determined to be logsig and tansig, respectively. According to Figure 4, when the number of neurons was small, the number of epochs required for network training and the final MSE was large. When the number of neurons was varied, the model had the same MSE, but the epoch became different. When the numbers of neurons were 9, 12 and 19, the network could easily reach the convergence goal. Too many neurons will increase the calculation time. When the number of neurons was nine, the model reached the convergence target with the highest classification accuracy. The hidden neuron number was finally determined to be nine [27,28,40].
The errors of models derived from different training functions were compared, which enabled us to select the appropriate algorithm. When the algorithm functions were tradingda and tradingdx, the model exhibited obvious oscillations and significant errors, indicating that these algorithms are not suitable. The MSE was significantly reduced when the algorithms were traincgb, trainscg and trainbr. Only when the training function was trainlm, did the model converge to the target value within 500 epochs. In addition to error analysis, the model should be evaluated based on the classification accuracy. Fortunately, the accuracy when using the trainlm function for training was the highest. Therefore, trainlm was considered the optimal algorithm for this model and was adopted to perform the following studies. Continuing to increase the number of hidden layers cannot effectively improve model performance any further. After the above steps, a BP neural network for classifying the type of slag was obtained. The model was a single hidden layer BP neural network with 11 inputs and The errors of models derived from different training functions were compared, which enabled us to select the appropriate algorithm. When the algorithm functions were tradingda and tradingdx, the model exhibited obvious oscillations and significant errors, indicating that these algorithms are not suitable. The MSE was significantly reduced when the algorithms were traincgb, trainscg and trainbr. Only when the training function was trainlm, did the model converge to the target value within 500 epochs. In addition to error analysis, the model should be evaluated based on the classification accuracy. Fortunately, the accuracy when using the trainlm function for training was the highest. Therefore, trainlm was considered the optimal algorithm for this model and was adopted to perform the following studies. Continuing to increase the number of hidden layers cannot effectively improve model performance any further. After the above steps, a BP neural network for classifying the type of slag was obtained. The model was a single hidden layer BP neural network with 11 inputs and 2 outputs. The activation functions of the hidden layer and output layer of the neural network were logsig and tansig respectively, and the number of neurons in the hidden layer was nine.
(2) Slag classification results under multiple atmospheres (model 2) When the activation functions of the first hidden layer, second hidden layer and output layer of the model were logsig, tansig and tansig respectively, the error was minimal. When the number of initial hidden layers was 23 and the number of second hidden layers was 11, the model could reach the convergence goal and displayed the best performance. The error was the lowest and the prediction accuracy was the highest when the algorithm was trainlm. The structure of the neural network was thus determined.
(3) Viscosity prediction results under mildly reducing atmosphere (model 3) The initial activation function of the first hidden layer was set to be tansig. The output function was tansig and the number of neurons in the first hidden layer was 22. The validation data can effectively prevent overfitting, so the error of the model should be reduced significantly. In order to further reduce the error, the number of hidden layers in the BP network was increased. According to a similar method, when the activation function of the second hidden layer was logsig and the number of neurons was 15, the performance of the model was optimized. Continuing to increase the number of hidden layers cannot effectively improve model performance any further. Therefore, the double hidden structure of the neural network was most optimal.
(4) Viscosity prediction results under multiple atmospheres (model 4) The activation function of the first hidden layer of the neural network was determined to be tansig, and the number of neurons was 27. The activation function of the second hidden layer was (2) Slag classification results under multiple atmospheres (model 2) When the activation functions of the first hidden layer, second hidden layer and output layer of the model were logsig, tansig and tansig respectively, the error was minimal. When the number of initial hidden layers was 23 and the number of second hidden layers was 11, the model could reach the convergence goal and displayed the best performance. The error was the lowest and the prediction accuracy was the highest when the algorithm was trainlm. The structure of the neural network was thus determined.
(3) Viscosity prediction results under mildly reducing atmosphere (model 3) The initial activation function of the first hidden layer was set to be tansig. The output function was tansig and the number of neurons in the first hidden layer was 22. The validation data can effectively prevent overfitting, so the error of the model should be reduced significantly. In order to further reduce the error, the number of hidden layers in the BP network was increased. According to a similar method, when the activation function of the second hidden layer was logsig and the number of neurons was 15, the performance of the model was optimized. Continuing to increase the number of hidden layers cannot effectively improve model performance any further. Therefore, the double hidden structure of the neural network was most optimal.

(4) Viscosity prediction results under multiple atmospheres (model 4)
The activation function of the first hidden layer of the neural network was determined to be tansig, and the number of neurons was 27. The activation function of the second hidden layer was logsig and the number of neurons was 13. The activation functions of the output layer and training function were tansig and trainlm, respectively.

Slag Classification Results for Mild Reducing Atmosphere (Model 1)
In total, 242 kinds of ash sample data were used as training sets for the establishment of neural network models. The model continuously adjusted the weights and thresholds through these data to reduce the error. The neural network displayed strong nonlinear fitting ability and can approach the target value indefinitely. To demonstrate the performance of the established model, the training set Energies 2020, 13, 5115 10 of 18 was first used for simulation. The result is shown in the confusion matrix in Figure 5a. In this model, only two kinds of slag were classified wrongly, and the remaining 240 samples were correctly classified. For the two samples falling outside of the correct predictions, their SiO 2 and Al 2 O 3 contents were higher than 45 wt% and 16 wt%, and the S/A is greater than 2. In addition, the CaO content of one sample was 24.8 wt%, which is relatively high. The trained BP neural network can accurately classify the training samples, and the accuracy rate was 99.2%. one sample was 24.8 wt%, which is relatively high. The trained BP neural network can accurately classify the training samples, and the accuracy rate was 99.2%.
The data of the test set did not participate in the training process and can be regarded as unknown samples. The test data included 15 non-glassy and 15 glassy slags. The slag was classified by the established model and the results are shown in the confusion matrix in Figure 5b. In both types of slag, only two samples were wrongly classified. Again, the SiO2 and Al2O3 contents of these two kinds of ash were higher than 40 wt% and 17 wt%, and the S/A was greater than 2. In addition, the A/B of one sample was greater than 9, which is much higher than the average of all samples. It can be seen that the classification error of the test sample was greater than that of the training sample. Nevertheless, the trained neural network can accurately classify the slag, and the classification accuracy of 30 test samples reached 93.3%. Therefore, this model can be used to predict the slag type, providing guidance for hazardous wastes treatment and industrial operation.

Slag Classification Results under Multiple Atmospheres (Model 2)
First, the training data were used to demonstrate the consistency of the model. The confusion matrix representing the results is shown in Figure 5c. Of the 329 samples used in the training process, two non-glassy slags were incorrectly classified as glassy, and one glassy slag was incorrectly classified as non-glassy. Due to the large number of samples in the mildly reducing atmosphere, only one of them was misclassified. The SiO2 and Al2O3 contents of all these three ashes were over 37 wt% and 15 wt% with S/A ratio greater than 2. Two of the samples contained more than 30 wt% CaO. The accuracy of the model in classifying the training data reached 99.1%. Classification results of the test data under multiple atmospheres can be seen from Figure 5d. In the test set, the ratio between the number of data under reducing and inert atmospheres was 3:2. There was a large amount of sample data from mildly reducing atmospheres, so the proportion of the test data was also high. Among the test samples, five non-glassy slags were incorrectly classified as glassy. Two of them were under an inert atmosphere, and the SiO2 and Al2O3 contents of the two samples exceeded 38 wt% and 30 wt%, respectively. The Fe2O3 content of one sample exceeded 15 wt%. In addition, the S/A vales of the other three samples under a mildly reducing atmosphere were all greater than 2, and the Al2O3 content was greater than 14 wt%. One of the samples had a CaO content of 24.8 wt%. There was also a sample whose combined SiO2 and Al2O3 content exceeded 80 wt%. Although the error rate went up, its accuracy remained at an acceptable level. If the atmosphere was not considered, the classification accuracy of the previous model for 20 slags under an inert atmosphere was merely 65%, and the The data of the test set did not participate in the training process and can be regarded as unknown samples. The test data included 15 non-glassy and 15 glassy slags. The slag was classified by the established model and the results are shown in the confusion matrix in Figure 5b. In both types of slag, only two samples were wrongly classified. Again, the SiO 2 and Al 2 O 3 contents of these two kinds of ash were higher than 40 wt% and 17 wt%, and the S/A was greater than 2. In addition, the A/B of one sample was greater than 9, which is much higher than the average of all samples. It can be seen that the classification error of the test sample was greater than that of the training sample. Nevertheless, the trained neural network can accurately classify the slag, and the classification accuracy of 30 test samples reached 93.3%. Therefore, this model can be used to predict the slag type, providing guidance for hazardous wastes treatment and industrial operation.

Slag Classification Results under Multiple Atmospheres (Model 2)
First, the training data were used to demonstrate the consistency of the model. The confusion matrix representing the results is shown in Figure 5c. Of the 329 samples used in the training process, Energies 2020, 13, 5115 11 of 18 two non-glassy slags were incorrectly classified as glassy, and one glassy slag was incorrectly classified as non-glassy. Due to the large number of samples in the mildly reducing atmosphere, only one of them was misclassified. The SiO 2 and Al 2 O 3 contents of all these three ashes were over 37 wt% and 15 wt% with S/A ratio greater than 2. Two of the samples contained more than 30 wt% CaO. The accuracy of the model in classifying the training data reached 99.1%. Classification results of the test data under multiple atmospheres can be seen from Figure 5d. In the test set, the ratio between the number of data under reducing and inert atmospheres was 3:2. There was a large amount of sample data from mildly reducing atmospheres, so the proportion of the test data was also high. Among the test samples, five non-glassy slags were incorrectly classified as glassy. Two of them were under an inert atmosphere, and the SiO 2 and Al 2 O 3 contents of the two samples exceeded 38 wt% and 30 wt%, respectively. The Fe 2 O 3 content of one sample exceeded 15 wt%. In addition, the S/A vales of the other three samples under a mildly reducing atmosphere were all greater than 2, and the Al 2 O 3 content was greater than 14 wt%. One of the samples had a CaO content of 24.8 wt%. There was also a sample whose combined SiO 2 and Al 2 O 3 content exceeded 80 wt%. Although the error rate went up, its accuracy remained at an acceptable level. If the atmosphere was not considered, the classification accuracy of the previous model for 20 slags under an inert atmosphere was merely 65%, and the classification accuracy of 50 test samples was reduced to 82%. Therefore, it can be concluded that the gas atmosphere is an essential input for the prediction of slag type based on neural network model.

Viscosity Prediction Results for a Mild Reducing Atmosphere (Model 3)
The test data did not participate in the model development process, which was equivalent to a series of unknown data and can reflect the generalization performance of the model. These data were randomly assigned through dividevec function, which was sufficiently representative. Obtaining viscosity data at the same temperature interval was also a strategy to limit its quantity. What is more, continuing to increase the test data would hardly reduce the performance of the model. Therefore, the quality and quantity of data were enough to show the productivities of developed networks. The test data of viscosity had a positive value within 200 Pa·s. It can be seen from the Figure 6a that the viscosity predicted by the neural network was very close to the experimentally measured value, and the fitting coefficient between them was as high as 0.9992 (R 2 ). It shows that using a neural network to predict the viscosity of glassy slag is reliable. The absolute error between the simulated and experimental values was generally less than 4 Pa·s, and the maximum error was below 9 Pa·s. Therefore, it is reasonable to state that the BP neural network developed in this work can accurately predict the viscosity of glassy slag under a mildly reducing atmosphere.
Energies 2020, 13, x FOR PEER REVIEW 12 of 18 the fitting coefficient between them was as high as 0.9992 (R 2 ). It shows that using a neural network to predict the viscosity of glassy slag is reliable. The absolute error between the simulated and experimental values was generally less than 4 Pa·s, and the maximum error was below 9 Pa·s. Therefore, it is reasonable to state that the BP neural network developed in this work can accurately predict the viscosity of glassy slag under a mildly reducing atmosphere.
(a) (b) From the comparison between the experimental and simulated results in Figure 6b, it can be seen that although several data points deviated from the predicted values, the degree of linear fitting

Viscosity Prediction Results under Multiple Atmospheres (Model 4)
From the comparison between the experimental and simulated results in Figure 6b, it can be seen that although several data points deviated from the predicted values, the degree of linear fitting was still very high (R 2 = 0.9968). Overall, the vast majority of simulation results were still very accurate. The absolute error for most samples was less than 10 Pa·s. The simulation error of a small number of samples was rather high but still below 25 Pa·s. The simulation error of the model for multiple atmospheres was relatively higher than that of the single atmosphere model.

Technological Usage of Slag Characteristics Prediction Models
It is very meaningful to ensure the effective immobilization of hazardous elements from waste material through the fusion-solidification-vitrification process. The simulation shows that digitizing the atmosphere factor is an effective method. By increasing the input volume, complex influencing factors can be incorporated into the model development process, and the advantages of the neural network are fully exerted. The neural network can be used to classify the unknown hazardous waste slag in different atmospheres, and the accuracy rate was expected to reach 90%. Overall, the model's ability to classify glassy slag and predict viscosity is satisfactory. When using this model, the chemical composition should be within the range listed in Table 1 as far as is possible for high accuracy prediction. First, the classification model under mildly reducing/multiple atmospheres is used to classify slags into glassy/non-glassy, whose accuracy rate is higher than 90%. If the classification result is non-glassy, it is necessary to change the chemical composition or add fluxing agents to make the slag glassy. It can be seen that the slag classification model can also provide the adjustment method of the chemical composition with high accuracy. Then, the viscosity of glassy slags is obtained by using the prediction model under mildly reducing/multiple atmospheres. Similarly, the chemical composition change strategy can also be obtained according to the simulated viscosity. The overall application method of neural networks is shown in Figure 7. The experimental data were used to verify the correlation of the prediction results. The viscosity of as-prepared slags was evaluated using a self-manufactured, high temperature rotary type viscometer under an inert atmosphere. The measurement details can be found in previous works published in the literature [48,59,72]. The classification model showed that the prepared slags were all glassy. The correlation between experiments and predictions can be seen from Figure 8. The prediction error of SiO2-Al2O3-Na2O/CaO/MgO slag system was relatively large. The ratios of the oxides were artificially prepared for typical content and easy analysis [48]. Especially in experiment 3, the maximum simulation error at low temperature reached 13.33 Pa·s (40SiO2-10Al2O3-50CaO slag system). Similarly to experiment 3, the simulated viscosity of experiment 1 at low temperature was greater than the experimental (60SiO2-10Al2O3-30Na2O slag system). On the contrary, the simulated viscosities of experiment 2 (60SiO2-10Al2O3-30CaO slag system) and experiment 4 (60SiO2-10Al2O3-30MgO slag system) were lower than the experimental. The maximum absolute errors between the simulated and the experimental values were 11.51 and 11.83 Pa·s, respectively. The experimental data were used to verify the correlation of the prediction results. The viscosity of as-prepared slags was evaluated using a self-manufactured, high temperature rotary type viscometer under an inert atmosphere. The measurement details can be found in previous works published in the literature [48,59,72]. The classification model showed that the prepared slags were all glassy. The correlation between experiments and predictions can be seen from Figure 8. The prediction error of SiO 2 -Al 2 O 3 -Na 2 O/CaO/MgO slag system was relatively large. The ratios of the oxides were artificially prepared for typical content and easy analysis [48]. Especially in experiment 3, the maximum simulation error at low temperature reached 13.33 Pa·s (40SiO 2 -10Al 2 O 3 -50CaO slag system). Similarly to experiment 3, the simulated viscosity of experiment 1 at low temperature was greater than the experimental (60SiO 2 -10Al 2 O 3 -30Na 2 O slag system). On the contrary, the simulated viscosities of experiment 2 (60SiO 2 -10Al 2 O 3 -30CaO slag system) and experiment 4 (60SiO 2 -10Al 2 O 3 -30MgO slag system) were lower than the experimental. The maximum absolute errors between the simulated and the experimental values were 11.51 and 11.83 Pa·s, respectively. all glassy. The correlation between experiments and predictions can be seen from Figure 8. The prediction error of SiO2-Al2O3-Na2O/CaO/MgO slag system was relatively large. The ratios of the oxides were artificially prepared for typical content and easy analysis [48]. Especially in experiment 3, the maximum simulation error at low temperature reached 13.33 Pa·s (40SiO2-10Al2O3-50CaO slag system). Similarly to experiment 3, the simulated viscosity of experiment 1 at low temperature was greater than the experimental (60SiO2-10Al2O3-30Na2O slag system). On the contrary, the simulated viscosities of experiment 2 (60SiO2-10Al2O3-30CaO slag system) and experiment 4 (60SiO2-10Al2O3-30MgO slag system) were lower than the experimental. The maximum absolute errors between the simulated and the experimental values were 11.51 and 11.83 Pa·s, respectively.  Figure 8b. The overall trend of simulation and experimental viscosity was in good agreement. However, the simulated viscosity of experiment 6 was obviously higher at low temperatures. The maximum error of experiment 6 was 1.59 Pa·s. The maximum simulation errors of the other three experiments were 1.41, 2.16 and 1.40 Pa·s, respectively. The simulated viscosity of experiment 7 with the largest error (2.16 Pa·s) was higher at high temperatures. The relatively large simulation errors were prone to occur at high temperatures. The simulated viscosity error of multi-component slag was within 15%.  Figure 8b. The overall trend of simulation and experimental viscosity was in good agreement. However, the simulated viscosity of experiment 6 was obviously higher at low temperatures. The maximum error of experiment 6 was 1.59 Pa·s. The maximum simulation errors of the other three experiments were 1.41, 2.16 and 1.40 Pa·s, respectively. The simulated viscosity of experiment 7 with the largest error (2.16 Pa·s) was higher at high temperatures. The relatively large simulation errors were prone to occur at high temperatures. The simulated viscosity error of multi-component slag was within 15%.

Limitations and Improvement Methods of the Model
This data-driven model is expected to be used to predict the viscosities of more kinds of slag. The chemical composition ranges of ash are listed in Table 1/Table 2. Before the neural network training, the input data were normalized to −1 to 1 (Equation (1)). This also means the range of ash chemical composition that can be used to predict viscosity. Neural networks have high predictive performance for these slags. This model can only predict viscosities less than 200 Pa·s, which means that accurate prediction of higher viscosity still requires a lot of work. Each molten gasifier works using more or less stable raw materials, with much narrower chemical composition. In the case of high-iron-bearing slags, the iron valency state makes hundreds of degrees of difference in their viscosities. Therefore, it is necessary to narrow the range of chemical composition to obtain a special neural network for predicting the viscosities of these slags. In practical applications, the Na 2 O content of some hazardous wastes is higher than 20 wt%, and the viscosity prediction of these slags is not accurate enough. Moreover, there are few experiments on the viscosity test of hazardous wastes with high salt content, which is a potential challenge.
With the gradual increase of these viscosity data, the performance of the model is expected to be further improved. On the other hand, the classification of different slags according to the component ranges is expected to improve the prediction accuracy and reduce the maximum error. Furthermore, it is hoped that other influencing factors such as heating/cooling rate can be used as input factors of the model, so that more viscosity data can be used, which may also provide guidance for experimental methods. The versatile model may be applied to other materials, such as steel slag. Models including a wide range of slag components and more influencing factors can be effectively applied in engineering. Later on, more advanced algorithms are expected to further improve the prediction accuracy of hazardous wastes' viscosities. Advanced artificial intelligence such as deep learning needs to be better applied to the viscosity prediction of slag.

Conclusions
The type (glassy/non-glassy) and viscosity of slag are important process parameters for the design and operation of molten gasification of hazardous wastes. Given the complex non-linear relationship between the phase transformation, vitrification transition and chemical composition of hazardous wastes, the existing empirical models and traditional mathematical methods cannot accurately predict the characteristics of slags. In addition, one of the influencing factors of slag viscosity is the gas atmosphere, and it was not properly incorporated into the existing models. To address the research gaps above, novel data-driven models based on artificial neural networks were developed in this work, which can classify and predict the viscosities of slags. Over 10,000 experimental data points were used to train and optimize the structure of neural network. The effect of a gas atmosphere on slag characteristics was integrated into the models developed, allowing for an enhanced applicability to slags under various atmospheres. To sum up, this work represents a crucial step to minimizing the technical risks associated with the unusual phase transition and rheological behavior of certain slags, providing reliable guidance for the selection of operating conditions. Additional significant conclusions drawn from this work are summarized as follows: (1) The slag classification accuracy of the single-atmosphere model is over 93%. Using 1 and 0 as model inputs to represent mildly reducing (60CO/40CO 2 ) and inert (Ar) atmospheres, an effective slag classification neural network model that accounts for different atmospheres was developed.
The slag classification accuracy of the multiple-atmosphere model was above 90%. If the influence of the atmosphere is not considered, the classification accuracy based on the same model set-up will be compromised. (2) A double hidden layer structure enables the neural network to predict the viscosity of glassy slag under a mildly reducing atmosphere. The quality and quantity of data were enough to show the productivities of developed networks. The absolute error of the simulation with 500 test data was generally lower than 4 Pa·s, the maximum absolute error was lower than 9 Pa·s and the average error was 0.86 Pa·s. Using the atmospheric factor data as the input of the neural network, a viscosity prediction model under multiple atmospheres was developed. The absolute error of the simulation with 800 test data was within 10 Pa·s and the average error was 1.60 Pa·s. (3) The experimental data were used to verify the correlation of the prediction results. The classification model showed that the prepared slags were all glassy. Then, the viscosities of glassy slags were obtained by using the prediction model. The relatively large simulation errors were prone to occur at high temperatures. The model was better at predicting the viscosity of multi-component slag. The simulated viscosity error of slag was within 15%. (4) It is necessary to narrow the range of chemical composition to obtain a special neural network for predicting the viscosity of a certain molten gasifier or iron-rich slags. There are few experiments on the viscosity tests of hazardous wastes with high salt content (Na 2 O > 20 wt%), which is a potential challenge. Advanced artificial intelligence, such as deep, learning with more experimental influence factors, needs to be better applied to the viscosity prediction of slag.