Pulp Chemistry Variables for Gaussian Process Prediction of Rougher Copper Recovery

: Insight about the operation of froth ﬂotation through modelling has been in existence since the early 1930s. Irrespective of the numerous industrial models that have been developed over the years, modelling of the metallurgical outputs of froth ﬂotation often do not involve pulp chemistry variables. As such, this work investigated the inﬂuence of pulp chemistry variables (pH, Eh, dissolved oxygen and temperature) on the prediction performance of rougher copper recovery using a Gaussian process regression algorithm. Model performance assessed with linear correlation coefﬁcient ( r ), root mean square error (RMSE), mean absolute percentage error (MAPE) and scatter index (SI) indicated that pulp chemistry variables are essential in predicting rougher copper recovery, and obtaining r values > 0.98, RMSE values < 0.32, MAPE values < 0.20 and SI values < 0.0034. RNCA feature weights reveal the pulp chemistry relevance in the order dissolved oxygen > pH > Eh > temperature.


Introduction
High-grade ores worldwide are depleting due to the increasing demand for valuable metal (e.g., copper, gold, rare earth elements) to satisfy technological advancement and applications in automobiles (e.g., electric vehicles), construction and architecture [1][2][3][4].This has warranted the treatment of low-grade ores, which are often difficult to process due to their complex association with other gangue minerals [5][6][7][8][9].For more than a hundred years now, froth flotation has been the main separation technique for the treatment of low-grade copper-bearing ores around the globe [6,10].In froth flotation, valuable minerals are separated from their associated gangue minerals, based on the difference in their surface wettability, to achieve highest recovery at or above the target grade for sale or subsequent metal extraction processes [11,12].The performance of froth flotation is known to be affected by several interconnected variables, which can broadly be grouped into ore-related variables (feed grade, feed particle size and liberation), hydrodynamic variables (airflow rate, bubble size, froth depth, impeller speed) and chemical variables (pulp electrochemistry and reagents concentration), with each playing a critical role to ensure valuable mineral selectivity [13][14][15][16][17][18][19][20].
Minerals 2023, 13, 731 2 of 16 Flotation recovery and concentrate grade are the main performance indicators of the process, and, as such, continuous research efforts have been made toward the maximisation of these two key performance indicators in the area of process automation and optimisation [21].Such efforts include the development of advanced instruments such as the Multi-Stream Slurry XRF Analyser for online slurry chemical composition [22] and the Magotteaux Pulp Chemistry Monitor (PCM ® ), which has the ability to measure pulp chemistry variables continuously in real time.These advanced instruments, together with other long-existing instruments that measure variables such as froth depth, bubble size, slurry density, agitator motor power, feed particle size distribution and airflow rate ensure the smooth running of the process [23].With heaps of real-time and historical data collected on various flotation plants around the globe daily, the strategy has been to predict the overall output of the process in terms of recovery and concentrate grade using data-driven predictive models.
Data-driven models have proven to be more potent than knowledge-based models, particularly for complex nonlinear processes, as detailed operation mechanisms and prior knowledge on research object are not required [21,[24][25][26].Machine learning algorithms have the ability to capture complex nonlinear relationships among flotation variables and also to take on more variables and observations [27][28][29].Since the early 1990s, several machine learning algorithms, including decision trees, Gaussian process regression (GPR), support vector machine, artificial neural network and random forest, have been applied in the field of minerals engineering [13,26,28,[30][31][32][33][34][35].For instance, in Shahbazi, Chehreh Chelgani [13], a random forest algorithm and its associated variable importance measurement were applied in investigating the effect of particle characteristics and hydrodynamic conditions on flotation rate constant, K, and recovery, R. The predictive models developed yielded satisfactory results, with R 2 values of 0.96 and 0.97 for K and R, respectively.A GPR model was also applied in Patel, Gorai [34] for the prediction of iron ore grade.An R 2 value of 0.9569 was yielded, indicating very good prediction accuracy of the model.It should, however, be noted that, irrespective of the higher predictive accuracy of machine learning models over knowledge-based models, the former require retraining when there is a significant drift in the correlation structure between the various input and output variables.
Going through the literature, it was observed that the various machine learning models that have been developed for predicting the metallurgical outputs of froth flotation do not include comprehensive data on pulp chemistry, and, even when they do, it is only on pH, which is just a component of pulp chemistry (Table 1).Pulp chemical conditions, especially those pertaining to electrochemistry, are known to significantly impact froth flotation, owing to their role in mineral collector interactions [36][37][38][39][40]. Studies have shown that pulp redox potential and pulp oxygen content are factors that strongly affect overall flotation performance [15,17,19,41,42].For instance, Plaksin and Bessonov [42] established the relationship between oxygen content and floatability by demonstrating that interactions of xanthate with sulphide minerals increases with increasing pulp oxygen content.Furthermore, flotation experiments on chalcopyrite and galena ore from Black Mountain ore revealed that increasing oxygen level in pulp enhances copper recovery significantly [43,44].In terms of temperature, Lin [45] found out that annual low temperatures (12 • C) had a negative impact on the floatability of a flotation process as compared to ambient temperatures (28 • C).O'Connor and Mills [46] discovered that both recovery and grade increased with increasing temperature during pyrite flotation test work.Foroutan, Abbas Zadeh Haji Abadi [47], Nasirimoghaddam, Mohebbi [48] and Azizi, Masdarian [49] have also carried out studies that have proven the impact of pH on flotation recovery and grade.Whilst the changing ore characteristics and mineral liberation during grinding affects flotation recovery and concentrate grade, online monitoring is not currently deployed for these variables during mineral processing.The change in ore characteristics mostly reflect in the pulp chemistry due to mineral electrochemical reactions (e.g., galvanic interactions) that occur during processing (e.g., grinding) [40].
In our previous work [26], a regularised neighbourhood component analysis (RNCA) algorithm was used to establish some relevant rougher flotation variables that were able to predict rougher copper recovery using a GPR algorithm.With motivation from our previous article, the main goal of this work is to ascertain the predictive influence of pulp chemistry variables in terms of pH, pulp potential (Eh), dissolved oxygen and temperature on rougher copper recovery.The main research questions that will be addressed in this work are: 1.
How does pulp chemistry variable addition to existing model-selected variables affect rougher copper recovery performance?2.
Does the addition of pulp chemistry variables during input variation selection encourage elimination of some originally model-selected flotation process variables in predicting the rougher copper recovery?3.
What is the predictive accuracy of a GPR algorithm in predicting rougher copper recovery with and without pulp chemistry variables?

Methodology
This section highlights the various methodologies utilised in this work.The specific subsections that will be captured are data collection and pre-processing, model development and model performance assessment.Variable selection by RNCA algorithm, as well as the theoretical overview of GPR algorithm, will not be captured here, as detailed explanation has already been given in Amankwaa-Kyeremeh, Zhang [26].MATLAB 2020b (64-bit version) software was used to run all the algorithms.

Data Collection and Pre-Processing
Data for this work was collected from the rougher bank of BHP Olympic Dam, South Australia.Details of the rougher bank under consideration and the mineralisation of the ore treated at the mine are fully highlighted in Ehrig, McPhie [72] and Amankwaa-Kyeremeh, Zhang [26], respectively.While plant process variables are extensively monitored with sensors and other automatic samplers, the story is not the same for pulp chemistry variables.As such, PCM ® was installed on the rougher bank to collect pulp chemistry data on rougher flotation feed for three continuous weeks.The operating principle of PCM ® is below: a.
Sample from a chosen slurry stream is collected into the PCM ® sample vessel; b.
The pulp chemistry sensors (e.g., pH, Eh and dissolved oxygen) are contacted for 2 min in the PCM ® sample vessel.A time of 2 min was selected as it allows stable sensor readings for each batch slurry sampling; c.
The measured data is logged and time-stamped; d.
The PCM ® sample vessel is then flushed clean for new sample collection.This process is repeated every 3-5 min.
While reading occurs, an impeller mixes the sample to ensure the solids remain in suspension.Figure 1 shows an image of an installed PCM ® at BHP Olympic Dam.
To begin the data collection, a three-week span of data on the 9 established rougher flotation variables, together with their corresponding recovery values, were downloaded from the data historian of BHP Olympic Dam.A pseudo-steady state was ensured for the flotation plant in collaboration with the plant operators prior to obtaining the measurements.Due to confidentiality agreement with BHP Olympic Dam, direct measurements of the investigated variables cannot be disclosed.Therefore, the standardized data has been shared with their distribution.Each variable data consisted of 1727 time-stamped observations and had a confidence of 100%.This was further matched with corresponding time-stamped rougher flotation feed pulp chemistry data collected by the PCM ® .The extracted plant process variables and the pulp chemistry variables considered for this work are shown in Table 2. Indexes have been assigned to all variables under consideration for easy identification during variable selection by RNCA algorithm.The rougher copper recovery (output variable) was determined from different process streams (rougher feed, cleaner concentrate and scavenger tails) using Online Stream Analyser (OSA) results.For more information on this method, refer to our previous publication [26].It is evident that an additional feed grade input variable was not added to the list shown in Table 2.This is consistent with our previous work, where RNCA eliminated the feed grade as an additional input variable.The significance of feed grade on flotation behaviour and copper recovery is critical and well-known.It is worth noting that the feed grade implications have been considered and captured for the model development through the rougher copper recovery.min in the PCM ® sample vessel.A time of 2 min was selected as it allows stable sens readings for each batch slurry sampling; c.The measured data is logged and time-stamped; d.The PCM ® sample vessel is then flushed clean for new sample collection.This proce is repeated every 3-5 min.
While reading occurs, an impeller mixes the sample to ensure the solids remain suspension.Figure 1 shows an image of an installed PCM ® at BHP Olympic Dam.Following this, the data was cleaned to get rid of all outliers owing to the transient operations of some data collection instruments, which sometimes occur on the plant.The outliers were removed based on domain knowledge of acceptable operating setpoint of each variable, as established by the metallurgical team at BHP Olympic Dam.In order to have same size dataset for the analysis, outliers detected in a particular variable data were deleted alongside values in the remaining variables data.The entire data cleaning resulted in a dataset of 1660 useful observations for this work.Data standardization was also carried out as part of the pre-processing stage using Equation (1).
where in a dataset of 1660 useful observations for this work.Data standardization was also carried out as part of the pre-processing stage using Equation (1).

Model Development
GPR algorithm with exponential covariance function was used to establish the relationship between input and output variable(s) outlined in Table 2.For this work, three modelling scenarios were considered, using the same output variable in each case but with different input variables.In the first scenario, only the established rougher flotation variables were used as the input variables.The second scenario saw the combination of both the established rougher flotation variables and pulp chemistry variables as input variables.In the last scenario, only variables sub-selected by RNCA algorithm were used as input variables.Table 3 summarises the various modelling scenarios considered in this work.

Model Development
GPR algorithm with exponential covariance function was used to establish the relationship between input and output variable(s) outlined in Table 2.For this work, three modelling scenarios were considered, using the same output variable in each case but with different input variables.In the first scenario, only the established rougher flotation variables were used as the input variables.The second scenario saw the combination of both the established rougher flotation variables and pulp chemistry variables as input variables.In the last scenario, only variables sub-selected by RNCA algorithm were used as input variables.Table 3 summarises the various modelling scenarios considered in this work.In order to avoid overfitting of the models, the pre-processed dataset was randomly divided into 80% training dataset (1328 observations) and 20% testing dataset (332 observations) using the popular hold-out cross validation approach.There is no general rule for the partition ratio; however, the common practice is that the training dataset should be signficantly larger than the testing dataset, capturing the full characteristics of the entire dataset.The models were trained with training dataset and fitted with the training and testing datasets.

Model Performance Assessment Criteria
This work made use of correlation coefficient (r), root mean square error (RMSE), mean absolute percentage error (MAPE) and scatter index (SI) as model performance assessment criteria.Mathematically, these criteria are expressed in Equations ( 2)-( 5) [73,74].It is expected of a good performing model to obtain r values close to 1, with RMSE, NRMSE, MAPE and SI values approaching zero as possible. (2) where, y i = ith true rougher copper recovery value y = mean of true rougher copper recovery ŷi = ith predicted rougher copper recovery value ŷ = mean of predicted rougher copper recovery y imax = maximum true rougher copper recovery value y imin = minimum true rougher copper recovery value n = total number of observations

Results and Discussion
Results of this work have been presented in this section.In Section 3.1, the results of variable selection by the RNCA algorithm were captured, and Section 3.2 ends this section with the results on detailed model performance.

Variable Selection by RNCA Algorithm
An RNCA algorithm was used to select relevant variables for the prediction of rougher copper recovery from a list of variables (Table 2).Variables were selected upon obtaining a feature weight greater than zero.To enhance the performance of the algorithm, lambda value and the number of folds, which are the main hyperparameters of an RNCA algorithm, were tuned simultaneously, as shown in Table 4.The best lambda value was selected after tuning the regularization term, which aids in avoiding RNCA overfitting, for the minimum loss.Additional information on the RNCA algorithm can be found in our previous publication [26].From Table 4, it can be seen that, as compared to the established rougher flotation variables (x 1 , x 2 , x 3 , x 4 , x 5 , x 6 , x 7 , x 8 , x 9 ), where at least one of them was dropped in each instance, the four pulp chemistry variables (x 10 , x 11 , x 12 , x 13 ) were continuously selected among relevant variables for the prediction of rougher copper recovery in all instances.With the goal of selecting the least number of variables for the prediction of rougher copper recovery, results from Table 4 further indicate that a best lambda value of 0.0304 and a 6-fold cross validation selects the least number of variables for the prediction of rougher copper recovery.Selected variables using this combination were feed particle size, throughput, xanthate to tank cell 1, frother to tank cell 1, frother to tank cell 4, froth depth of tank cell 4/5, pH, Eh, dissolved oxygen and temperature.These selected variables were used as input variables for the model development in scenario 3. The tuning of the best lambda value using 6-fold cross validation is visualised in Figure 5.The feed particle size (% passing 75 µm) showed the highest RNCA feature weight, which is consistent with our previous study [26].The pulp chemistry variables relevant for the prediction of the rougher copper recovery are shown by the RNCA feature weight in the order dissolved oxygen > pH > Eh > temperature.With the goal of selecting the least number of variables for the prediction of rougher copper recovery, results from Table 4 further indicate that a best lambda value of 0.0304 and a 6-fold cross validation selects the least number of variables for the prediction of rougher copper recovery.Selected variables using this combination were feed particle size, throughput, xanthate to tank cell 1, frother to tank cell 1, frother to tank cell 4, froth depth of tank cell 4/5, pH, Eh, dissolved oxygen and temperature.These selected variables were used as input variables for the model development in scenario 3. The tuning of the best lambda value using 6-fold cross validation is visualised in Figure 5.The feed particle size (% passing 75 μm) showed the highest RNCA feature weight, which is consistent with our previous study [26].The pulp chemistry variables relevant for the prediction of the rougher copper recovery are shown by the RNCA feature weight in the order dissolved oxygen > pH > Eh > temperature.

Model Performance Assessment
The robustness of the developed predictive models was assessed by computing the difference between true and predicted rougher copper recovery values using r, RMSE, MAPE and SI criteria, as shown in Table 5.

Model Performance Assessment
The robustness of the developed predictive models was assessed by computing the difference between true and predicted rougher copper recovery values using r, RMSE, MAPE and SI criteria, as shown in Table 5.From Table 5, the results show nearly perfect model performances for all scenarios when predictions were made with the training dataset, recording r values > 0.999, RMSE values < 0.0006, MAPE values < 0.0004 and SI values < 0.0006.This is basically because the training dataset is already known to the algorithm during the training phase, and, as such, a high performance is expected when the same dataset is fed to the model for prediction.However, segregation in performance occurred for the models when predictions were made with the testing dataset, as this was entirely new to the trained models.For this reason, emphasis will only be placed on the testing dataset performances of the models in this discussion.
Considering r criterion, as shown in Table 5, 0.9528, 0.9589 and 0.9806 were the values recorded by scenarios 1, 2 and 3, respectively, when predictions were made with their trained models using the testing dataset.This implies that, in effect, scenario 3 had the strongest linear relationship between true and predicted rougher copper recovery values, followed by scenario 2, with scenario 1 having the weakest linear relationship.For the RMSE criterion, which estimates how concentrated data points are around a line of best fit, scenario 3 recorded the least RMSE value of 0.3122, against 0.4897 and 0.4496 for scenarios 1 and 2, respectively, when predictions were made with their trained models using the testing dataset.This shows that scenario 3 produced the shortest distance between true and predicted rougher copper recovery values as compared to scenarios 1 and 2. In other words, scenario 3 attained the minimum spread of true and predicted rougher copper recovery values along its line of best fit as compared to scenarios 1 and 2. This effect has been visualised in Figure 6 using parity plots.
To further assess the performance of the models, computed MAPE values of the models were utilised.From Table 5, MAPE values of 0.2761, 0.2332 and 0.1948 were obtained by scenarios 1, 2 and 3, respectively, when their trained models were used to make predictions with the testing dataset.These results show that in terms of percentages, scenario 3 had the least difference between true and predicted rougher copper recovery values.This was followed by scenario 2, with scenario 1 obtaining the maximum difference between true and predicted rougher copper recovery values.Finally, SI error criterion was used to evaluate model performances.The results, as shown in Table 5, indicate SI values of 0.0052 for scenario 1, 0.0048 for scenario 2 and 0.0033 for scenario 3. Based on the significance of the SI criterion, scenario 3 produced the least expected error, making it a better model than scenarios 1 and 2.
In general, it can be seen that, in as much as a satisfactory performance was obtained using only the established rougher flotation variables as input variables (scenario 1), the predictive accuracy of the GPR algorithm improved with the addition of pulp chemistry variables to the established rougher flotation variables (scenario 2).However, the best predictive performance of the GPR algorithm was obtained when only selected variables by the RNCA algorithm were used as input variables (scenario 3).While the performance in scenario 1 serves as a baseline performance, the improved performance in scenario 2 could be attributed to the inclusion of pulp chemistry variables with the input variables.The inclusion of the pulp chemistry variables helped to increase the amount of explained variance in the rougher copper recovery data, hence the improvement in performance.
The superior performance in scenario 3 could be linked to the benefit of variable selection after three of the established rougher flotation variables (xanthate to tank cell 4, froth depth of tank cell 1 and froth depth of tank cell 2/3) were rendered irrelevant upon the introduction of the pulp chemistry variables.Variable selection by the RNCA algorithm helped to remove irrelevant variables and their associated ambiguous data, which make a model complex.
tank cell 1 and froth depth of tank cell 2/3) were rendered irrelevant upon the introduction of the pulp chemistry variables.Variable selection by the RNCA algorithm helped to remove irrelevant variables and their associated ambiguous data, which make a model complex.
It should be noted that there may be other latent variables that may affect flotation copper recovery; however, the addition of pulp chemistry and the observed results are an indication of the significance of integrating critical process variables.Furthermore, the inclusion of more input variables should be done with caution, as without enough theoretical justification, an overfitting model (a model that performs well on a training dataset but not on an evaluation or testing dataset) may be produced.Additional study, including input variable collinearity examination, Sobol indices and such, for a simple model, is recommended.

Conclusions
This work investigated the impact of pulp chemistry variables in predicting rougher copper recovery.Following variable selection by a regularised neighbourhood component It should be noted that there may be other latent variables that may affect flotation copper recovery; however, the addition of pulp chemistry and the observed results are an indication of the significance of integrating critical process variables.Furthermore, the inclusion of more input variables should be done with caution, as without enough theoretical justification, an overfitting model (a model that performs well on a training dataset but not on an evaluation or testing dataset) may be produced.Additional study, including input variable collinearity examination, Sobol indices and such, for a simple model, is recommended.

Figure 1 .
Figure 1.An installed PCM ® at BHP Olympic Dam.The PCM ® is installed on the rougher flotation circuit of the concentrator.
standardized observation s i = ith observation of sample s = mean of sample s s = standard deviation of sample The data was standardized to have a same scale data for the analysis, as extremely different scale data affects model prediction outcome.Results are presented in normalised data state throughout this work for the purpose of data confidentiality.Figures 2-4 have been used to provide the visualisation of the variation in the data used for this work.Minerals 2023, 13, x FOR PEER REVIEW 6 of 16

Figure 3 .
Figure 3. Visualisation of variation in (a) frother to tank cell 1, (b) frother to tank cell 4, (c) froth depth of tank cell 1, (d) froth depth of tank cell 2/3, (e) froth depth of tank cell 4/5, (f) pH.*Froth depth of tank cells 2 and 4 also represent tank cells 3 and 5, respectively, as they are kept at same level.

Figure 5 .
Figure 5. Optimum conditions for RNCA algorithm (a) estimation of lambda value with the minimum loss using 6-fold cross validation, (b) variable weights of all variables using a best lambda value of 0.0304.

Figure 5 .
Figure 5. Optimum conditions for RNCA algorithm (a) estimation of lambda value with the minimum loss using 6-fold cross validation, (b) variable weights of all variables using a best lambda value of 0.0304.

Figure 6 .
Figure 6.Parity plot of true and predicted rougher copper recovery values using testing dataset for (a) scenario 1, (b) scenario 2 and (c) scenario 3.

Figure 6 .
Figure 6.Parity plot of true and predicted rougher copper recovery values using testing dataset for (a) scenario 1, (b) scenario 2 and (c) scenario 3.

Table 1 .
Comparison between this work and some froth flotation machine learning models.

Table 2 .
Summary of considered variables.
* Froth depth of tank cells 2 and 4 also represent tank cells 3 and 5, respectively, as they are kept at same level.

Table 3 .
Summary of modelling scenarios.

Table 3 .
Summary of modelling scenarios.

Table 4 .
Results of various k-folds and best lambda values in selecting relevant variables.

Table 5 .
Results of model performance assessment.