Threshold Value Determination Using Machine Learning Algorithms for Ba Interference with Eu in Coal and Coal Combustion Products by ICP-MS

Ba-based ion interference with Eu in coal and coal combustion products during quadrupole-based inductively coupled plasma mass spectrometry procedures is problematic. Thus, this paper proposes machine-learning-based prediction models for determination of the threshold value of Ba interference with Eu, which can be used to predict such interference in coal. The models are trained for Eu, Ba, Ba/Eu, and Ba interference with Eu. Under different user-defined parameters, different prediction models based on the corresponding model tree can be applied to Ba interference with Eu. We experimentally show the effectiveness of these different prediction models and find that, when the Ba/Eu value is less than 2950, the Ba-Eu interference prediction model is y = −0.18419411+ 0.00050737× x, 0 < x < 2950. Further, when the Ba/Eu value is between 2950 and 189,523, the Ba-Eu interference prediction model of y = 0.293982186 + 0.00000181729975 × x, 2950 < x < 189, 523 yields the best result. Based on the optimal model, a threshold value of 363 is proposed; i.e., when the Ba/Eu value is less than 363, Ba interference with Eu can be neglected during Eu data interpretation. Comparison of this threshold value with a value proposed in earlier works reveals that the proposed prediction model better determines the threshold value for Ba interference with Eu.


Introduction
Rare earth elements and yttrium (REY, or REE if Y is excluded) in coal and coal combustion products (CCPs), e.g., fly and bottom ash, have attracted much attention in recent years, not only because of the high international demand for these technologically important elements, but also because of the restrictions on export from China [1,2].Seredin and Dai [3] and Dai et al. [4] have shown that coal has high potential as a REY source, given that the average concentration of REY oxides (REO) in world coal ash is 485 µg/g, which is half the cut-off grade of REO in CCPs (1000 µg/g).In some cases, CCPs contain >1000 µg/g REO; thus, they could constitute an economically viable source for REY extraction.Previous investigations have shown that some coals from China [5][6][7], Russia [3,8], and the USA contain high concentrations of REY [9][10][11], comparable to or even higher than those of conventional REY deposits [3].Other studies concerning REY resources [12,13], modes of REY occurrence in coal and CCPs [14,15], and extraction technology [16] have also suggested the great potential of coal as REY source.
The REY (including Eu) concentration in coal and CCPs can be determined via several methods, including X-ray fluorescence spectrometry (XRF) [17,18], instrumental neutron activation analysis Classification and regression are two typical algorithms in machine learning [57], with the difference between them being that their target variables are discrete and continuous, respectively [58].
Here, we employ a model tree [57] based on linear regression and a regression tree to construct prediction models for this interference.Based on analysis of Ba, Eu, and Ba/Eu, the target variables for Ba interference with Eu are found to be continuous.Thus, we adopt the regression method for prediction of Ba interference with Eu.Empirically, we find that it is difficult to precisely construct a global prediction model using linear regression because of the element data complexity.To overcome this problem, the models are adjusted to split the element data according to many partitions.In the case of the regression tree, classification and regression tree (CART) algorithms [59] are then applied.

Proposed Machine Learning Models for Prediction of Ba Interference with Eu
In this study, machine learning algorithms were used to develop models of Ba interference with Eu in coal.Note that, when a pair of element datasets (e.g., x Ba 1 , . . ., x Ba i and x Eu 1 , . . ., x Eu i for Ba and Eu, respectively) is established, the interference between them is difficult to determine.Effective Ba-Eu interference prediction depends on various factors including the element concentration, element interference, and samples.
Here, three machine learning models were used to predict Ba interference with Eu, incorporating linear regression, regression trees, and model trees.The problem of threshold value identification can be represented as the problem of constructing a prediction model between a Ba interference with  1), the vector w iprediction is the regression weight.Regression is used to find w iprediction and hence, the Ba-Eu interference values are predicted.

Ba-Eu Interference Prediction Error
The error is defined as the difference between the actual Ba interference with Eu y Ba/Eu i and y Ba/Eu iprediction ; i.e.,:The interference error of the above elements can also be expressed in matrix notation, as: This equation is solved using the ordinary least squares method.Hence, w iprediction is predicted according to the best estimate based on the training element data values.
Prediction: Based on the training process described above, the prediction for Ba-Eu interference can be estimated from the formula:  1; thus, a nonlinear model emerges.The nonlinear model partitions the element concentrations and Ba-Eu interference.Every partition can be constructed with linear regression models.Note that in Figure 1, the x-axis is the ratio of Ba vs. Eu in digested solutions derived from solid samples before Ba is separated from Eu in the solutions; and y-axis is the Eu concentration contributed from Ba ions. 2 )}; where N and M are the feature value numbers in the different parts.
Based on the binary split process described above, for every feature x Ba 1 /x Eu 1 , . . ., x Ba i /x Eu i , if the feature value y Ba/Eu 1prediction , . . ., y Ba/Eu iprediction is greater than the best split value, we traverse the left side of the regression tree, i.e., the left subtree Tree le f t .If the feature value is lower than the best split value, we traverse the right side of the regression tree, i.e., Tree right .For Tree le f t and Tree right , we survey every feature and feature value to find the best split until the minimum error is achieved (cf.Equation ( 5)).The binary split process is recursive for many iterations until the feature cannot be split; then, its feature value is the leaf node.Hence, the Ba interference with the Eu regression trees can be determined.

Model Tree
The model tree for prediction of Ba interference with Eu is based on the linear regression and regression tree models described above.The steps of the CART algorithm for this model tree are similar to those for the regression tree.
The feature and feature values extracted here are x Ba 1 /x  6).We repeat the process for every feature and every value to find the best split that minimizes the error; i.e.,: Based on the binary split process above, for every feature T w iprediction } is greater than the best split value, we traverse Tree le f t .If the feature value is lower than the best split value, we traverse Tree right .For Tree le f t and Tree right , we survey every feature and feature value to find the best split until the minimum error is achieved (cf.Equation ( 6)).The binary split process is recursive for many iterations until the feature cannot be split; then, its feature value is the leaf node.Hence, model trees for Ba interference with Eu can be formed.The difference between the regression tree and model tree is that the leaf nodes of the regression tree are constant sets with y Ba/Eu 1prediction , . . ., y Ba/Eu iprediction , but the leaf nodes of the model tree are linear model sets with {(x Ba 1 /x Eu 1 ) T w 1prediction , . . ., (x Ba i /x Eu i ) T w iprediction }.

Machine Learning Process for Ba-Eu Interference Prediction
Based on the constructed regression tree and model tree for Ba-Eu interference prediction, the proposed machine learning process is implemented as follows.
Training 2 )}, Model tree for prediction: After the Ba-Eu interference model tree training, we execute binary splits to obtain P parts recursively, and obtain y Ba/Eu iprediction = c p I x Ba/Eu i ∈ P based on determination of the best split that minimizes the error: From the above analysis, the model tree for prediction of Ba interference with Eu is selected.

Simulation Setup
To implement the Ba-Eu interference prediction models and calculate the Ba/Eu threshold value, the Python programming language [60] was used.The prediction models for the Ba interference with Eu were constructed as follows: (1) All relevant element data were collected, as detailed in Tables 1 and 2.
(2) All input element data were prepared.Note that all Ba, Eu and Ba/Eu concentrations were prepared in a standard Python format list.
(3) The element data were analyzed.Note that all input element data could be analyzed for feature selection.These features included x Ba i , x Eu i , x Ba i /x Eu i , y Ba/Eu i , and C Ba i , C Eu i .(4) The algorithm was trained.To achieve the target variable, i.e., the Ba interference with Eu, and the threshold points of the element, i.e., Ba/Eu, we implemented the model tree based on linear regression and the regression tree in Python.
(5) The algorithm was tested.That is, the performance of the interference prediction model obtained in the above step was tested.

Model Tree for Prediction of Ba Interference with Eu
The input element data sets were prepared as detailed in Table 3.For execution of the model tree for element interference prediction in Python, two variables were necessary: tolS and tolN, the tolerance of the Ba-Eu interference error reduction and the minimum Ba related to Eu data instances in a split, respectively.Note that the model tree was sensitive to the tolS and tolN settings.and different settings yielded different prediction models.We performed model tree experiments for element interference prediction by inputting rare earth element data sets.All prediction models are detailed in Table 4.For (tolS,tolN) = (0,1) and (0,2), the prediction model for Ba interference with Eu is shown in Figure 2a.There are six split values: (1) when the Ba/Eu value is greater than 40,726.5, the Ba-Eu interference prediction model is a linear regression, where y = 0.389451044 + 0.00000121646954 × x, x > 40, 726.5; (2) when the Ba/Eu value is less than 40,726.5 and greater than 17,344.79,the interference prediction model is a linear regression, with y = 0.759759680 − 0.0000125166582 × x, 17, 344.79 < x < 40, 726.5; (3) when the Ba/Eu value is less than 17,344.79and greater than 2950, the interference prediction model is a linear regression, where y = −0.0364849362+ 0.0000332367781 × x, 2950 < x < 17, 344.79; (4) when the Ba/Eu value is less than 2950 and greater than 2247.06, the interference prediction model is a linear regression, with y = 1.29 + 0.0002 × x, 2247.06 < x < 2950; (5) when the Ba/Eu value is less than 2247.06 and greater than 10.88, the interference prediction model is a linear regression, with y = 0.0131201121 + 0.0000291815483 × x, 10.88 < x < 2247.06; and (6) when the Ba/Eu value is less than 10.88, the interference prediction model is a linear regression, where y = −0.06066667+ 0.00833333 × x, x < 10.88.For (tolS, tolN) = (0,3), the model trees for prediction of Ba interference with Eu are shown in Figure 2b, having four split values: (1) when the Ba/Eu value is greater than 27,144.6, the interference prediction model is a linear regression, with  = 0.259761293 + 0.00000193096467 × ,  > 27,144.6;(2) when the Ba/Eu value is greater than 2950 and less than 27,144.6, the interference prediction model is a linear regression, with  = 0.0966488103 + 0.0000156280402 × , 2950.0 <  < 27,144.6;(3) when the Ba/Eu value is less than 2950 and greater than 1938.06, the interference prediction model is a linear regression, where  = −5.88574265+ 0.00265247237 × , 1938.06 <  < 2950.0; and (4) when the Ba/Eu value is less than 1938.06, the interference prediction model is a linear regression, where  = 0.0169820389 + 0.0000320448913 × ,  < 1938.06.
For (tolS, tolN) = (1,3), (2,3), the model trees for prediction of Ba interference with Eu are shown in Figure 2f.(1) When the Ba/Eu value is greater than 2950, the interference prediction model is a linear regression, with y = 0.293982186 + 0.00000181729975 × x, x > 2950; (2) when the Ba/Eu value is less than 2950 and greater than 1938.06, the interference prediction model is a linear regression, where y = −5.88574265+ 0.00265247237 × x, 1938.06 < x < 2950; and (3) when the Ba/Eu value is less than 1938.06, the interference prediction model is a linear regression, with y = 0.0169820389 + 0.0000320448913 × x, x < 1938.06.

Results
For all the model trees of the Ba-Eu interference prediction model illustrated in Figure 2, the training data sets of the Ba/Eu ratio and the Ba interference with Eu were scattered, as shown in Figure 1.The points (2900,1.87)and (2950,1.88)were outliers from the other Ba/Eu ratio and Ba interference with Eu data points.
The optimal values of (tolS, tolN) for the prediction model of Ba interference with Eu were found to be (0,4), (0,5), (0,6), (1,4), (1,5), (1,6), (2,4), (2,5), and (2,6).When the Ba/Eu value was less than 2950, a linear regression was obtained for the interference prediction model, where y = −0.18419411+ 0.00050737 × x, 0 < x < 2950.Further, when the Ba/Eu value was greater than 2950 and less than 189,523, the interference prediction model was found to be a linear regression, with y = 0.293982186 + 0.00000181729975 × x, 2950 < x < 189, 523.From the optimal models, a threshold point value of 363.0370538 could be determined.Note that, when the Ba/Eu value is 363.0370538, it is not necessary to consider the Ba interference with Eu; thus, the Eu values can be interpreted from the data for the investigated samples.

Performance Evaluation
To verify the threshold value for Ba interference with Eu proposed in this paper, a wide dataset of Ba/Eu values covering 2-361 through 379-938 to 1042-3305 from previously published literature was used (Tables 5 and 6) [62,64,65].The data for the testing were selected from Dai et al. [62,64] and Duan et al. [65], because these data points were all obtained via ICP-Q-MS.Thus, the Ba concentrations were expected to interfere with the Eu concentrations in the samples if the Ba/Eu values exceeded the threshold value, either at 1000 (as proposed in previous works) or at 363 (as proposed in this study).A total of 41 coal bench samples from a boehmite-rich 36.37-m-thickPennsylvanian coal seam in Inner Mongolia, northern China, were considered, which were reported by Dai et al. [64].A total of 60 coal bench samples from three Ge-rich Neogene coals from Lincang, Yunnan Province, southwestern China, were considered, which were reported by Dai et al. [62].Further, a total of 27 coal bench samples from Reshuihe, Zhenxiong, Yunnan Province, China, were considered, which were reported by Duan et al. [65].The test datasets presented in Tables 5 and 6 could be classified into three groups: Ba/Eu < 363, Ba/Eu = 363-1000, and Ba/Eu > 1000.We compared the threshold value of 363 determined from our proposed model with the value of 1000 proposed by others (e.g., [4,27,35]).Based on Table 5, the correlation coefficient of Ba and Eu was 0.1326 and 0.9659 when Ba/Eu was < 363 and >1000, respectively.When Ba/Eu varied from 363 to 1000, the Ba and Eu correlation coefficient remained as high as 0.9545, as illustrated in Figure 3A.Based on Table 6, the correlation coefficient of Ba and Eu was 0.231 and 0.9318 when Ba/Eu was < 363 and >1000, respectively.When Ba/Eu varied from 363 to 1000, the Ba and Eu correlation coefficient remained as high as 0.9317, as illustrated in Figure 3B.The distinctively different correlation coefficients for Ba and Eu in the different Ba/Eu ranges show that the threshold value of 363 is more accurate for determination of Ba interference with Eu than the previously proposed value of 1000.
Minerals 2019, 9, x FOR PEER REVIEW 11 of 17 could be classified into three groups: Ba/Eu < 363, Ba/Eu = 363−1000, and Ba/Eu > 1000.We compared the threshold value of 363 determined from our proposed model with the value of 1000 proposed by others (e.g., [4,27,35]).Based on Table 5, the correlation coefficient of Ba and Eu was 0.1326 and 0.9659 when Ba/Eu was < 363 and >1000, respectively.When Ba/Eu varied from 363 to 1000, the Ba and Eu correlation coefficient remained as high as 0.9545, as illustrated in Figure 3A.Based on Table 6, the correlation coefficient of Ba and Eu was 0.231 and 0.9318 when Ba/Eu was < 363 and >1000, respectively.When Ba/Eu varied from 363 to 1000, the Ba and Eu correlation coefficient remained as high as 0.9317, as illustrated in Figure 3B.The distinctively different correlation coefficients for Ba and Eu in the different Ba/Eu ranges show that the threshold value of 363 is more accurate for determination of Ba interference with Eu than the previously proposed value of 1000.5; (B) Table 6.

Conclusions
In conclusion, to determine the threshold value of Ba interference with Eu in the context of ICP-Q-MS data analysis, three machine learning techniques-namely, the linear regression, regression tree, and model tree methods-were used to construct prediction models of Ba interference with Eu in coal and coal-related samples.The CART algorithm was applied to the tree regression.To apply the models for prediction of Ba interference with Eu, all related data, including that on Ba, Eu, Ba/Eu, and Eu interference, were collected and prepared.A Ba-Eu interference linear regression model, regression tree, and model tree were implemented in Python for prediction.The results showed that the model tree is far superior to the regression tree for determination of Ba/Eu threshold points.The extracted feature was Ba/Eu and the extracted feature value was the interference of Ba with Eu.From all obtained prediction models, an optimal threshold point value of 363 was determined.This indicates that, when the Ba/Eu value is <363, the Ba interference with Eu can be neglected; thus, the Eu concentrations in samples can be determined based on ICP-Q-MS data.Based on the results of simulations in which the threshold value of 363 proposed in this study and that of 1000 proposed in other works (e.g., [4,27,35]) were compared, the former is more accurate for determining whether Ba interferes with Eu in investigated samples.In the future, we will use deep learning techniques [66][67][68] to determine the threshold value of Ba interference with Eu.

Interfered Eu (ppm) Ba/Eu Figure 1. Training
[27]nonlinear model partitions the element concentrations and Ba−Eu interference.Every partition can be constructed with linear regression models.Note that in Figure1, the x-axis is the ratio of Ba vs. Eu in digested solutions derived from solid samples before Ba is separated from Eu in the solutions; and y-axis is the Eu concentration contributed from Ba ions.data(fromYan et al.[27]).
Regression tree for prediction: After regression tree training, we perform binary splits to obtain P parts recursively, and obtain a prediction of the Ba-Eu interference y Ba/Eu iprediction = c p I x Ba/Eu i ∈ P based on determination of the best split that minimizes the error: : All x Ba 1 , . . ., x Ba i , x Eu 1 , . . ., x Eu i , x Ba 1 /x Eu 1 , . . ., x Ba i /x Eu i , and y Ba/Eu 1 , . . ., y Ba/Eu i are entered.

Table 4 .
Prediction models for Ba interference with Eu based on model tree.