Advancing Flotation Process Modeling: Bayesian vs. Sklearn Approaches for Gold Grade Prediction

Devasahayam, Sheila

doi:10.3390/min15060591

Open AccessEditor’s ChoiceArticle

Advancing Flotation Process Modeling: Bayesian vs. Sklearn Approaches for Gold Grade Prediction

by

Sheila Devasahayam

WASM: Minerals, Energy and Chemical Engineering, Curtin University, Kalgoorlie, WA 6430, Australia

Minerals 2025, 15(6), 591; https://doi.org/10.3390/min15060591

Submission received: 15 April 2025 / Revised: 17 May 2025 / Accepted: 29 May 2025 / Published: 31 May 2025

(This article belongs to the Section Mineral Processing and Extractive Metallurgy)

Download

Browse Figures

Versions Notes

Abstract

This study explores Bayesian Ridge Regression and PyMC-based probabilistic modelling to predict the cumulative grade of gold based on key operational variables in gold flotation. By integrating prior knowledge and quantifying uncertainty, the Bayesian approach enhances both interpretability and predictive accuracy. The dataset includes variables such as crusher type, particle size, power, time, head grade, and collector type. Comparative analysis reveals that PyMC outperforms traditional Sklearn models, achieving an R² of 0.92 and an MSE of 102.37. These findings highlight the potential of Bayesian models for robust, data-driven process optimization in mineral processing. The higher cumulative gold grade observed for VSI products and PAX collector usage may be attributed to the superior liberation efficiency of VSI, which produces more angular and cleanly fractured particles, enhancing collector attachment. PAX, being a strong xanthate, shows high affinity for sulphide mineral surfaces, particularly under the flotation conditions used, thereby improving selectivity and recovery.

Keywords:

Bayesian linear ridge models; sklearn; PyMC; prior beliefs; gold flotation; posterior predictive checks; trace analysis; data standardization; MCMC convergence

1. Introduction

Gold recovery in flotation circuits is governed by complex interactions between mineralogical, chemical, and hydrodynamic variables. Traditional flotation modelling spans several domains: kinetic models, which simulate mineral recovery rates through differential equations based on reaction order and rate constants [1]; multiphase models, which analyze interactions among gas, liquid, and solid phases to simulate particle–bubble dynamics; and mechanistic models, which incorporate fluid dynamics and mineral surface chemistry for predictive simulation of flotation systems [2]. These approaches offer theoretical insights but often require extensive calibration, detailed mineralogical data, and high computational effort, limiting their adaptability to dynamic ore conditions and real-time decision support.

Recent years have seen increased adoption of data-driven approaches in flotation modelling. Machine learning and regression methods—such as Random Forests, Support Vector Machines, and Neural Networks—have been applied to predict flotation outcomes [3,4,5]. While these models can capture complex non-linear patterns, they generally lack interpretability and do not provide uncertainty quantification or credible intervals for predictions, which are critical for risk-aware operational decisions.

Bayesian Ridge Regression (BRR), with its probabilistic foundation, offers a promising alternative, combining the flexibility of machine learning with the interpretability and uncertainty estimates of statistical modelling. Bayesian methods are well suited for small datasets. BRR enables the integration of prior domain knowledge, improves generalization in data-sparse environments, and quantifies uncertainty in both predictions and model parameters [6,7,8]. These features are particularly valuable in flotation, where variability in ore characteristics and processing conditions is common.

This study introduces a novel comparative framework applying both scikit-learn’s BRR and PyMC-based Bayesian models with domain-informed priors to experimental gold flotation data from the Ballarat gold mine. By quantifying uncertainty and incorporating expert knowledge, the Bayesian approach improves cumulative gold grade prediction while offering the interpretability crucial for real-time flotation circuit optimization. This research addresses key gaps in the current literature, particularly the lack of robust probabilistic models tailored for metallurgical processes [6,9,10].

Research Gap

While numerous studies have applied traditional regression and machine learning techniques to model gold flotation performance, these approaches typically lack the ability to quantify prediction uncertainty, which is critical in process decision-making [3]. Bayesian models have been explored in some cases [6], but often without fully leveraging domain expertise through the use of informed priors or validating generalization across varying ore types and processing conditions. Moreover, non-linear interactions among process variables and time-dependent fluctuations in flotation performance remain underexplored [4,11]. This study addresses these gaps by applying Bayesian Ridge Regression with domain-informed priors and polynomial feature expansion to improve both predictive accuracy and uncertainty quantification in cumulative gold grade estimation.

The objective of this study is to evaluate the use of Bayesian models—specifically, implementations using Scikit-learn and PyMC—for predicting gold flotation performance across various crusher types and particle size classes [12]. The Bayesian models are applied to the experimental data reported by Thatipamula in 2024 [13], which discusses the effects of particle size, power, grinding time, head grade, the type of collectors, and the type of crushers (liberation technique) on gold recovery/gold recovery by flotation. However, this was achieved without the application of machine learning predictive models. The present study addresses these gaps by applying Bayesian Ridge Regression with domain-informed priors to enhance predictive reliability and uncertainty quantification in gold grade estimation.

The analysis aims to identify the conditions that maximize cumulative gold grade (y₂), quantify the influence of critical process variables, and reduce overfitting through the use of regularization techniques. Bayesian Ridge Regression parameters are optimized, and key likelihood components (β, μ, and σ) are estimated to use PyMC’s probabilistic programming framework. Model reliability is assessed through prior and posterior predictive checks to ensure robust inference and generalizability.

2. Materials and Methods

2.1. Data Description

This study utilizes experimental data obtained from gold ore samples (Table 1 and Table 2) and sourced from the Ballarat gold mine [13]. For flotation testing, the ground ore was mixed with water to form a slurry and fed into laboratory flotation cells under aerated conditions. The flotation sequence included the following:

Activator: 50 g/t of copper sulphate (CuSO₄) was added and aerated for 30 s.

Collectors: 100 g/t of collector reagent was introduced.

Frother: 3 drops of DSF002A, supplied by IXOM, Melbourne, Australia (approximately 17 g/t) were added as frother.

After aeration, hydrophobic mineral particles adhered to rising air bubbles, forming a froth that was collected every 10 s to generate the concentrate (“con”) fraction.

To investigate the effect of collector chemistry on flotation performance, two collectors were tested:

Percent solids of slurry: 34%.

PAX (Potassium Amyl Xanthate, C₅H₁₁OCSSK), (Lianyungang Huaihua International Trade Co., Ltd. Lianyungang, China): a strong xanthate widely used for sulphide mineral flotation.

DSP002 (Sodium Dibutyl Dithiophosphate), Shark Chemical Global, Johannesburg, South Africa: an organophosphorus compound known for its selectivity toward sulphide minerals.

Both collectors were used at a dosage of 100 g/t. PAX was applied across all crushing products from Vertical Shaft Impactor (VSI) and High-Pressure Grinding Roll (HPGR) crushers. For the −300 µm size fraction, both PAX and DSP002 were compared on VSI and High-Speed HPGR products to assess performance variations. This size was selected based on observed recoveries and liberation trends at the Ballarat mine. Coarser flotation has been explored in recent research [2], particularly when advanced collectors and high-energy crushers (VSI) are used.

Gold recovery and cumulative gold grade were measured for flotation products from three top size classes: −600 µm, −425 µm, and −300 µm. The target variable in the analysis is cumulative gold grade (y₂), predicted using features such as crusher type, size fraction, power consumption, processing time, head grade, and collector type.

Table 3 presents descriptive statistics for six variables: Size, Power, Time, Head_grade, Cum_recovery, and Cum_grade. These statistics include measures of central tendency (mean, median), dispersion (standard deviation, range), and distribution (minimum, maximum, quartiles). Descriptive statistics help us to understand the central tendency, variability, and distribution of variables like Size, Power, and Time, which are crucial for analysis and decision-making. Table 4 provides insights into the distribution of different categories within the Crusher and Collector variables and their frequencies. Categorical variables were encoded (Table 5). Pandas was used for feature and target variable assignment in datasets containing both numerical and categorical data related to crushers and collectors [14].

2.2. Size Distribution of Crushed Product

To provide context on the physical characteristics of flotation feed, the cumulative particle size distributions for each crusher type and size class are listed in Table 6.

2.3. Data Preprocessing

Data were standardized using z-score normalization (Equation (1)) to ensure variables shared a common scale (mean = 0, standard deviation = 1).

Z = (X − μ)/σ

(1)

where X is the original data point, μ is the mean of the feature, and σ is the standard deviation of the feature. This transformation enabled effective model training.

2.4. Bayesian Ridge Regression Models: Scikit-Learn vs. Pymc

In this study, the scikit-learn implementation is referred to as “Bayesian Ridge Regression”, while the PyMC implementation is distinguished as “Bayesian linear regression with custom priors” or simply the “PyMC model”. Bayesian Linear Ridge Regression was used to predict gold flotation performance (y2) based on features from VSI and HPGR products.

2.4.1. Scikit-Learn Bayesian Ridge Regression

The Bayesian Ridge class from scikit-learn [8,15] applies Bayesian inference with conjugate priors in closed form. Regularization hyperparameters alpha_1, alpha_2, lambda_1, and lambda_2, which control the prior distributions on the noise and coefficients, were optimized using GridSearchCV (Appendix A). This approach allows fast estimation and mitigates multicollinearity and overfitting. However, it exhibited limited flexibility in defining custom priors and handling complex uncertainty structures.

2.4.2. PyMC Probabilistic Programming

To enable greater control and deeper uncertainty analysis, a fully probabilistic Bayesian Ridge Regression model was also implemented using PyMC, Equation (2) [7,16]. The model structure was as follows:

Y = Xβ + ε

(2)

where

Y is the vector of observed responses (e.g., y2 in flotation).
X is the design matrix of input features (e.g., crusher type, particle size, power, collector type).
β~N (0, λ⁻¹I): is the vector of regression coefficients, representing the influence of each feature on the response. β has a zero-mean multivariate normal prior with isotropic covariance.
ε~N (0, α − 1) is the random error term, following a zero-mean Gaussian distribution accounting for noise and model uncertainty.
I is the identity matrix, ensuring that priors are independent across coefficients.
- Assumptions: Normal priors for β; multivariate normal priors for coefficients.
- optimization: The key hyperparameters (λ, α) were optimized through cross-validation, balancing model complexity and predictive accuracy.
- Pros: Custom modelling, richer uncertainty quantification, and posterior inference via MCMC.

Priors were informed by domain knowledge derived from historical flotation performance, with PyMC enabling full posterior inference via Markov Chain Monte Carlo (MCMC) sampling using the No-U-Turn Sampler (NUTS). Definitions are provided in Appendix B. The Bayesian PyMC presents the prior predictive checks, posterior distributions, and parameter uncertainty estimates, generated using the PyMC functions pm.plot_posterior() and pm.sample(), as part of the Bayesian inference workflow.

2.4.3. Comparison and Justification

While scikit-learn offers fast prototyping, PyMC enables custom modelling with domain-informed priors and richer uncertainty quantification through posterior checks and credible intervals.

2.5. Feature and Target Variables

To analyze the experimental dataset (Table 2), the feature and target variables are specified along with interaction terms. Feature variables include Size, Power, Time, and Head_grade, which represent the primary conditions influencing the reactions. The target variable is Cum_grade, which quantifies the reaction outcomes and desired products.

2.6. Feature Transformation to Enhance Model Performance

While ensemble methods like Random Forests and Gradient Boosting capture non-linear relationships inherently, polynomial features explicitly model interactions and non-linearity, benefiting linear regression and neural networks. Polynomial feature expansion is a technique that is used to capture non-linear interactions between variables by transforming the original features into higher-order terms and interaction terms. This method enriches the feature space and enables models such as linear regression to approximate complex functional relationships that are otherwise inaccessible. In the context of flotation, variables often interact in non-linear ways—such as the combined effect of particle size and reagent concentration on surface chemistry or kinetics—making polynomial terms particularly valuable for modelling material behaviour and process kinetics [11,17]. These expanded features facilitate the incorporation of domain knowledge, improve model expressiveness, and support hypothesis-driven exploration of variable relationships.

In this study, the following original features were used: Size, Power, Time, and Head_grade. Categorical variables such as Crusher type (with levels: HPGRHS, HPLS, VSI) and Collector type (DSP, PAX) were encoded using one-hot encoding. Second-degree polynomial expansion (including interaction terms) resulted in features such as the following:

Size, Power, Time, Head_grade, Crusher_HPGRHS, Crusher_HPLS, Crusher_VSI, Collector_DSP, Collector_PAX, Size², Size × Power, Size × Time, …, Collector_PAX²’.

These transformations capture combined effects, offering a more comprehensive view of reaction dynamics.

2.7. Cross-Validation

To generalizability, the study applied 3-fold and 5-fold cross-validation along with GridSearchCV for hyperparameter tuning. Model accuracy was evaluated using Mean Square Error (MSE) and R² metrics [3,15,17,18,19].

3. Results and Discussion

3.1. Cum_Grade Sampling Results

Bayesian regression models were analyzed using PyMC with original and polynomially expanded features and MCMC sampling via NUTS-generated posterior distributions for prediction and uncertainty analysis. The original feature model achieved stable convergence (no divergences, step sizes: 0.24–0.25, 31 gradient evaluations/draw, 193.34–365.16 draws/s), effectively capturing uncertainty across 11 observations (Figure 1). Conversely, the polynomial model encountered 797 divergences, reduced step sizes (0.01–0.02), slower sampling (15.78–35.68 draws/s), and up to 255 gradient evaluations/draw, indicating poor convergence (Figure 2). Despite generating posterior samples, its pathological geometry complicated interpretation. This highlights the trade-off between capturing non-linearity and inference stability—polynomial features increase model dimensionality and convergence issues, whereas original features ensure more reliable posterior estimates. The model’s performance is discussed below.

Interpretations of Figure 1 and Figure 2.

Intercept distribution (second row, left):
- The bell-shaped curve labelled “alpha” represents the posterior distribution of the “Intercept”.
- The peak (around 40 for original and 20 for polynomial features) indicates the most likely value.
- The spread reflects the uncertainty or variability in the estimate.
- This curve combines prior information with the data likelihood to estimate the parameter.
Intercept trace plot (second row, right):
- The trace plot for “Intercept” shows parameter values sampled during MCMC simulation.
- Each line represents a different chain or iteration, with fluctuations around the mean value (around 40 for original and 20 for polynomial) indicating convergence and stability.
- Well-mixed and converged chains suggest a reliable estimate for “alpha”.
- Beta1 distributions (top left):
- Multiple overlapping curves labelled “beta” represent the posterior distributions for different categories or groups.
- Variations in peak and spread indicate differences between these groups.
Beta1 trace plot (top right):
- The trace plot for “beta1” shows sampled values during MCMC.
- Different colours represent different chains or iterations, with convergence and stability essential for reliable parameter estimates.
Sigma1 distribution (bottom row, left):
- The bell-shaped curve labelled “sigma1” represents the posterior distribution of “sigma1”.
- The peak (around 5.6 for original and ~1 for polynomial) indicates the most likely value.
- The spread reflects uncertainty.
Sigma1 trace plot (third row, right):
- The trace plot for “sigma1” shows sampled values during MCMC.
- Stability and convergence are crucial for reliable estimates.

Overall interpretation:

These figures provide insights into the uncertainty associated with the model parameters (alpha, beta1, sigma1) based on the observed data.

3.2. Model Performance (Table 6 (Original Features) and 7 (Polynomial Features))

The Bayesian regression coefficients (Table 7) for original features (see Appendix C for detailed definitions) revealed statistically robust parameter estimates, with high effective sample sizes (ESSs > 2000 for most parameters), negligible Monte Carlo errors, and R-hat values of 1.0 indicating excellent convergence. Head grade (β = 20.66, HDI: [15.53, 25.46]) and Collector_PAX (β = 27.34, HDI: [15.02, 38.80]) emerged as the most influential predictors of cumulative gold grade. Crusher_VSI (β = 24.57, HDI: [10.63, 38.04]) also demonstrated a significant positive impact, reinforcing the role of the crushing method in flotation efficiency. Parameters such as Power and Time exhibited wide credible intervals overlapping zero, indicating weaker or uncertain effects.

For the polynomial model (Table 8), the point estimates of several interaction terms (e.g., Size × Time, β = 8.82) appeared promising. Their wide credible intervals and elevated MCSEs indicated insufficient precision. Head grade retained a positive influence (β = 13.63, HDI: [−2.92, 29.74]), albeit with reduced certainty. Low ESSs and R-hat values exceeding 1.01 for certain terms (e.g., Crusher_HPLS) signal sampling inefficiency and model instability. The posterior standard deviation of the residuals (σ = 1.18) was markedly lower than in the original model (σ = 5.65), consistent with polynomial overfitting.

While polynomial expansion theoretically enriches the model’s expressiveness, the empirical results underscore the superiority of the original feature model in terms of convergence, interpretability, and inferential clarity. For operational deployment in flotation circuits, simpler, well-regularized Bayesian models appear more robust and actionable than the polynomial.

3.3. Comparison of Models:

Original feature models offer better convergence, interpretability, and generalization, while polynomial models add complexity without proportional benefit (Table 9, Figure 3).

Both models highlight Head_grade and Collector_PAX as dominant predictors, with large posterior means and moderately wide HDIs (e.g., β = 13.63, HDI: [−2.92, 29.74]). Polynomial expansion introduces higher-order terms such as Power² (β = 15.51) and Collector_PAX² (β = 14.30), indicating non-linear effects, albeit with wide HDIs and low tail ESSs, suggesting uncertainty and potential overfitting.

The original model provides stable and interpretable estimates, while the polynomial model offers marginally better fit but suffers from convergence and interpretability. For instance, Crusher_VSI gains prominence in the polynomial form (β increases, HDI narrows), and interactions like Size × Head_grade appear relevant. However, weak signals in terms like Size × Crusher_HPGRHS (β = −1.41, HDI = [−14.37, 10.59]) and low ESS values (<200) call for cautious interpretation.

The original model balances interpretability and convergence. The polynomial model captures complexity with greater uncertainty, highlighting the trade-off in Bayesian modelling between stability and flexibility.

Figure 4 illustrates the predicted versus observed cumulative gold grade for both Bayesian Ridge Regression (Scikit-learn) and probabilistic PyMC models with the following metrics: Posterior Predictive Check R² = 0.92 and MSE 102.37; Posterior Predictive_poly Check R² = 1.00 and MSE 0.02; Bayesian Ridge Regression R² = 0.91 and MSE 107.73; Bayesian Ridge Regression with Polynomial Features R² = 1.00 and MSE 0.00. The predictions using polynomial features—represented by green circles (Scikit-learn) and magenta stars (PyMC)—align almost perfectly with the y = x line, confirming an exceptionally high level of predictive accuracy (R² = 1.00, MSE < 0.02 for both models). The near-complete overlap between the PyMC posterior predictions and the Scikit-learn outputs indicates that both models are producing functionally identical results.

This agreement is expected, as both models are implementations of Bayesian Ridge Regression using the same polynomial feature set. In small datasets, the posterior predictive mean from PyMC often closely resembles the point estimates from deterministic models such as Scikit-learn, especially under conjugate Gaussian priors. However, the perfect alignment also reflects a clear overfitting tendency, where both models interpolate the training data due to the high model complexity and limited sample size (n = 11).

In contrast, the original feature models—shown as blue crosses (Scikit-learn) and light blue squares (PyMC)—exhibit slightly more deviation from the y = x line, corresponding to lower R² values (0.91 and 0.92, respectively). These models offer more robust generalizability, with fewer convergence issues and better interpretability.

The comparison emphasizes the trade-off between model flexibility and overfitting, highlighting the importance of choosing feature transformations that balance accuracy with stability, particularly in small-sample process modelling contexts. The best parameters using GridSearchCV are alpha_1: 1 × 10⁻⁶, alpha_2: 1 × 10⁻⁶ for original features and the best parameters for polynomial features are alpha_1: 1 × 10⁻⁶, alpha_2: 1 × 10⁻⁶.

4. Scientific Significance of Bayesian Models

Bayesian models optimize gold flotation by incorporating uncertainty and prior knowledge. PyMC achieves high accuracy (R² = 0.92, MSE = 102.37) and improves further with polynomial terms (R² = 1.00, MSE = 0.02). Bayesian Ridge Regression with polynomial features also achieves perfect prediction (R² = 1.00, MSE = 0.00), demonstrating the flexibility and advantage of Bayesian approaches in handling nonlinearities.

PyMC’s trace summaries confirm MCMC convergence and model reliability. Sklearn’s Bayesian Ridge, though simpler, provides stable results with minimal overhead. Bayesian models support real-time decision-making in flotation circuits by accurately forecasting the grade under variable conditions, affirming their relevance in mineral processing.

5. Conclusions

This study demonstrates the utility of Bayesian modelling—particularly Bayesian Ridge Regression implemented via PyMC—for predicting cumulative gold grade in flotation circuits, while addressing model uncertainty and leveraging prior domain knowledge. Through comparative analysis with traditional scikit-learn implementations, the PyMC approach was shown to achieve a superior performance, with high R² scores and robust parameter convergence, especially when modelling was performed with the original input features.

The variables selected for modelling—particle size, power, time, head grade, crusher type, and collector type—were chosen based on their mechanistic relevance to flotation efficiency. Particle size directly influences mineral liberation and bubble–particle interaction; crusher type affects surface morphology and breakage behaviour; collector chemistry governs surface hydrophobicity; and head grade provides a baseline for recovery potential. Power and Time were included to reflect energy input and residence time, respectively, as these factors indirectly impact surface exposure and reagent interaction.

Posterior inference revealed that head grade, PAX collector, and VSI crushing had the strongest positive influence on cumulative gold grade, confirming the model’s alignment with mineral processing fundamentals. For example, VSI crushers produced more angular, cleanly fractured particles that enhanced collector adsorption. PAX, a strong xanthate, exhibited high affinity for sulphide surfaces under the test conditions, improving both selectivity and recovery. These findings affirm the interpretability and physical relevance of the Bayesian model outputs.

The Bayesian models utilized in this study improve the prediction of cumulative gold grade, providing a more interpretable and uncertainty-aware framework that can support more informed process control and optimization decisions in flotation operations. Beyond predictive accuracy, the Bayesian framework provides credible intervals and trace diagnostics, which offer transparent insights into model confidence and parameter stability. This interpretability makes Bayesian models particularly suitable for industrial flotation optimization, enabling process engineers to balance prediction with decision-making confidence.

Overall, the study confirms that Bayesian modelling is not only effective in predicting flotation performance but is also instrumental in revealing meaningful correlations grounded in process mechanisms. It lays the foundation for broader deployment of probabilistic tools in mineral processing, especially under data-limited or variable operating conditions.

Future research could extend Bayesian models to other minerals or processes like grinding and leaching. Incorporating additional variables or more complex scenarios would test model adaptability. Integrating real-time data streams could enable adaptive control, while hybridizing Bayesian methods with other ML techniques may enhance predictive accuracy and interpretability.

Funding

This research received no external funding.

Data Availability Statement

The authors declare that the data supporting the findings of this study are available within the paper.

Acknowledgments

During the preparation of this work, the author(s) used [ChatGPT] in order to [check python ML codes]. After using this tool/service, the author(s) reviewed and edited the content as needed and take(s) full responsibility for the content of the publication.

Conflicts of Interest

The work described has not been published previously, it is not under consideration for publication elsewhere, and its publication is approved by all authors and tacitly or explicitly by the responsible authorities where the work was carried out. If accepted, it will not be published elsewhere in the same form, in English or in any other language, including electronically, without the written consent of the copyright-holder: The authors have no relevant financial or non-financial interests to disclose. The authors have no competing interests to declare that are relevant to the content of this article. All authors certify that they have no affiliations with or involvement in any organization or entity with any financial interest or non-financial interest in the subject matter or materials discussed in this manuscript. The authors have no financial or proprietary interests in any material discussed in this article.

Appendix A

Significant Model Factors or Hyperparameters of Bayesian Linear Ridge Regression

Alpha Parameters:

alpha_1: Shape parameter for the Gamma distribution prior over the alpha parameter.

alpha_2: Inverse scale parameter (rate parameter) for the Gamma distribution prior over the alpha parameter.

Function: Control the precision of the noise (variance of the target variable) and shape the distribution for the regression model weights.

Impact: Higher values indicate stronger regularization, penalizing large coefficients and reducing overfitting. Lower values suggest weaker regularization, allowing more flexibility.

Lambda Parameters:

lambda_1: Shape parameter for the Gamma distribution prior over the lambda parameter.

lambda_2: Inverse scale parameter (rate parameter) for the Gamma distribution prior over the lambda parameter.

Function: Control the strength of the L1 regularization term and the precision of the noise (error term) in the regression model.

Impact: Higher values increase the penalty for large coefficients, promoting sparsity. Lower values imply a weaker penalty, allowing the model to retain more features.

Overall Role:

These hyperparameters balance fitting the data and maintaining model simplicity.

Higher values of alpha parameters prevent overfitting, while higher values of lambda parameters promote sparsity and feature selection.

They tailor the model to the specific characteristics of the process parameters, achieving optimal performance in predicting outcomes.

Appendix B

Significant model factors used for likelihood estimation in probabilistic modelling using Bayesian PyMC:

Beta (β) represents the mean and standard deviation of the Gaussian distribution, indicating the uncertainty or variability in the observed data. Larger values result in a wider likelihood distribution, reflecting higher uncertainty.

Mu (μ) denotes the mean parameter of the likelihood distribution, representing the expected value or central tendency of the observed data.

Sigma (σ) quantifies the dispersion or spread of the observed data around the mean (μ). Smaller values indicate less variability, while larger values indicate greater variability.

Appendix C

Detailed definitions of Bayesian regression coefficients:

Mean: Average estimated value of the coefficient.
SD: Standard deviation, measuring variability of coefficient estimates.
HDI 3%: Lower bound of the 94% probability interval for the true value.
HDI 97%: Upper bound of the 94% probability interval for the true value.
MCSE Mean: Accuracy of the mean estimate.
MCSE SD: Accuracy of the standard deviation estimate.
ESS Bulk: Effective sample size of the bulk of the distribution.
ESS Tail: Effective sample size of the tails of the distribution.
R-hat: Convergence diagnostic, with values near 1 indicating good convergence.
Beta (β): Coefficients showing the relationship between predictors and the dependent variable.
Sigma (σ): Standard deviation of residuals, indicating data variability around the regression line.

References

Arbiter, N.; Harris, C. Flotation Kinetics. In Froth Flotation; AIME: Vancouver, BC, Canada, 1962; pp. 215–246. [Google Scholar]
Kohmuench, J.; Thanasekaran, H.; Seaman, B. Advances in Coarse Particle Flotation-Copper and Gold. In Proceedings of the MetPlant 2013, Perth, Australia, 15–17 July 2013; The Australasian Institute of Mining and Metallurgy: Carlton, Australia, 2013; pp. 378–386. [Google Scholar]
Estay, H.; Lois-Morales, P.; Montes-Atenas, G.; del Solar, J.R. On the Challenges of Applying Machine Learning in Mineral Processing and Extractive Metallurgy. Minerals 2023, 13, 788. [Google Scholar] [CrossRef]
Jovanović, I.; Nakhaei, F.; Kržanović, D.; Conić, V.; Urošević, D. Comparison of Fuzzy and Neural Network Computing Techniques for Performance Prediction of an Industrial Copper Flotation Circuit. Minerals 2022, 12, 1493. [Google Scholar] [CrossRef]
Nakhaei, F.; Rahimi, S.; Fathi, M. Prediction of Sulfur Removal from Iron Concentrate Using Column Flotation Froth Features: Comparison of k-Means Clustering, Regression, Backpropagation Neural Network, and Convolutional Neural Network. Minerals 2022, 12, 1434. [Google Scholar] [CrossRef]
Yan, H.; Zhu, J.; Wang, F.; He, D.; Wang, Q. Bayesian Network-based Technical Index Estimation for Industrial Flotation Process under Incomplete Data. In Proceedings of the Chinese Control and Decision Conference (CCDC), Hefei, China, 22–24 August 2020. [Google Scholar]
Kruschke, J.K. Doing Bayesian Data Analysis: A Tutorial with R, JAGS, and Stan; Academic Press: Cambridge, MA, USA, 2014. [Google Scholar]
Gelman, A.; Carlin, J.B.; Stern, H.S.; Dunson, D.B.; Vehtari, A.; Rubin, D.B. Bayesian Data Analysis, 3rd ed.; CRC Press: Boca Raton, FL, USA, 2013. [Google Scholar]
Yan, H.; Wang, F.; He, D.; Zhao, L.; Wang, Q. Bayesian Network-Based Modeling and Operational Adjustment of Plantwide Flotation Industrial Process. Ind. Eng. Chem. Res. 2020, 59, 2025–2035. [Google Scholar] [CrossRef]
Yan, H.; Song, S.; Wang, F.; He, D.; Zhao, J. Operational adjustment modeling approach based on Bayesian network transfer learning for new flotation process under scarce data. J. Process Control 2023, 128, 103000. [Google Scholar] [CrossRef]
Fu, H.; Wang, Z.; Nichani, E.; Lee, J.D. Learning Hierarchical Polynomials of Multiple Nonlinear Features with Three-Layer Networks. arXiv 2024, arXiv:2411.17201. [Google Scholar]
PyMC. 2021. Available online: https://www.pymc.io/projects/docs/en/stable/learn.html (accessed on 1 January 2021).
Thatipamula, S.; Devasahayam, S. Study of Coarse and Fine Gold Flotation on the Products from Vertical Shaft Impactor and High-Pressure Grinding Roll Crushers-submitted:MINE-D-23-01610. SSRN 4727563. Available online: https://dx.doi.org/10.2139/ssrn.4727563 (accessed on 14 January 2024).
Albon, C. Machine Learning with Python Cookbook, 1st ed.; Bleiel, R.R.A.J., Ed.; O’Reilly Media: Sebastopol, CA, USA, 2018; p. 366. [Google Scholar]
Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. sklearn.linear_model.BayesianRidge. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
Salvatier, J.; Wiecki, T.V.; Fonnesbeck, C. Probabilistic programming in Python using PyMC3. PeerJ Comput. Sci. 2016, 2, e55. [Google Scholar] [CrossRef]
Brownlee, J. Blending Ensemble Machine Learning with Python. 2021. Available online: https://machinelearningmastery.com/blending-ensemble-machine-learning-with-python/ (accessed on 11 May 2022).
Devasahayam, S. Deep learning models in Python for predicting hydrogen production: A comparative study. Energy 2023, 280, 128088. [Google Scholar] [CrossRef]
Devasahayam, S.; Albijanic, B. Predicting hydrogen production from co-gasification of biomass and plastics using tree based machine learning algorithms. Renew. Energy 2024, 222, 119883. [Google Scholar] [CrossRef]

Figure 1. Simple posterior plot for cumulative grade (original standard data).

Figure 2. Simple posterior plot for cumulative grade (polynomial standard data).

Figure 3. Comparison of original vs. polynomial model coefficients.

Figure 4. Bayesian regression: Cum_ grade (original data and polynomial data).

Table 1. Ore composition wt.% (semiquantitative).

Phase	Weight %
Quartz	50
Muscovite	18
Fe-Dolomite/ankerite	11.5
Chlorite	7.9
Siderite	4.2
Kaolinite	3.4
Albite	3.3
Rutile	0.8
Pyrite	0.9

Table 2. Gold crushing and flotation data.

Crusher	Size, µm	Power (Crusher), kW	Time for Producing Required Size, s	Head Grade, g/t	Cum_Recovery, %	Cumulative Gold Grade g/t	Collector
HS-HPGR (600 µm)	600	7.97	30	2.32	80.67	74.32	PAX
LS-HPGR (600 µm)	600	13.55	41.5	1.09	85.45	38.51	PAX
VSI (600 µm)	600	5.42	28	2.70	86.15	132.92	PAX
HS-HPGR (425 µm)	425	7.80	33	2.34	87.09	83.99	PAX
LS-HPGR (425 µm)	425	13.83	95	2.12	83.80	99.77	PAX
VSI (425 µm)	425	5.52	40	2.76	90.78	126.91	PAX
HS-HPGR (300 µm)	300	7.29	36	0.84	79.98	19.15	PAX
LS-HPGR (300 µm)	300	14.48	122	2.00	93.93	62.90	PAX
VSI (300 µm)	300	5.78	50	2.85	93.69	100.85	PAX
VSI (300 µm)	300	5.78	50	2.46	89.26	95.8	DSP
HS-HPGR (300 µm)	300	7.29	36	2.02	88.72	36.17	DSP

Table 3. Numerical descriptive statistics.

Statistic	Size	Power	Time	Head_Grade	Cum_Recovery	Cum_Grade
count	11.00	11.00	11.00	11.00	11.00	11.00
mean	415.91	8.61	52.82	2.14	87.23	79.21
std	130.03	3.55	29.42	0.65	4.66	37.09
min	300.00	5.42	30.00	0.84	79.98	19.15
25%	300.00	5.78	34.50	2.01	84.63	50.71
50%	425.00	7.29	41.50	2.32	87.09	83.99
75%	512.50	10.76	55.00	2.58	90.02	100.31
max	600.00	14.48	122.00	2.85	93.93	132.92

Table 4. Categorical frequency counts.

Category	Crusher	Collector
DSP	NaN	2.0
HPGRHS	4.0	NaN
HPLS	3.0	NaN
PAX	NaN	9.0
VSI	4.0	NaN

Table 5. Gold crushing and flotation data (features or X values) with dummy variables and collectors.

Size, µm	Power (Crusher), kW	Time for Producing Required Size, s	Head Grade, g/t	Crusher HS_HPGR	Crusher LS_HPGR	Crusher VSI	Collector, DSP	Collector, PAX
600	7.97	30	2.32	1	0	0	0	1
600	13.55	41.5	1.09	0	1	0	0	1
600	5.42	28	2.70	0	0	1	0	1
425	7.80	33	2.34	1	0	0	0	1
425	13.83	95	2.12	0	1	0	0	1
425	5.52	40	2.76	0	0	1	0	1
300	7.29	36	0.84	1	0	0	0	1
300	14.48	122	2.00	0	1	0	0	1
300	5.78	50	2.85	0	0	1	0	1
300	5.78	50	2.46	0	0	1	1	0
300	7.29	36	2.02	1	0	0	1	0

Table 6. Cumulative % passing for various size fractions across crushers.

Size (µm)	VSI	HPGR-High	HPGR-Low
Top Size = −300 µm
212	67.59	86.52	88.33
150	48.41	73.95	79.70
106	42.42	65.46	69.60
75	32.10	60.61	65.34
53	27.52	54.66	59.22
45	26.09	50.70	56.05
38	23.42	48.50	54.16
Top Size = −425 µm
300	59.55	84.70	78.07
212	39.47	73.22	67.95
150	34.33	65.06	62.45
106	29.19	58.91	56.79
75	24.00	53.49	50.54
53	19.89	48.94	44.11
45	18.44	45.87	42.92
38	15.07	43.73	41.22
Top Size = −600 µm
425	72.13	75.78	86.69
300	54.31	60.27	70.74
212	40.29	52.24	61.52
150	32.11	46.86	51.61
106	26.13	41.15	42.42
75	21.68	35.21	37.60
53	18.11	31.16	31.93
45	16.53	28.87	27.92
38	14.17	27.14	23.86

Table 7. Bayesian regression coefficients for Cum_grade in flotation process.

Parameter *	Mean	SD	HDI 3%	HDI 97%	MCSE Mean	MCSE SD	* ESS Bulk	ESS Tail	R-Hat
beta[Size]	9.42	2.95	3.67	14.80	0.06	0.05	2289.0	2370.0	1.0
beta[Power]	−8.83	5.96	−20.28	2.32	0.12	0.09	2524.0	2772.0	1.0
beta[Time]	7.93	4.47	−0.23	16.80	0.10	0.08	1987.0	2171.0	1.0
beta[Head_grade]	20.66	2.69	15.53	25.46	0.05	0.04	2694.0	2929.0	1.0
beta[Crusher_HPGRHS]	1.75	6.12	−9.85	13.23	0.12	0.10	2581.0	2242.0	1.0
beta[Crusher_HPLS]	14.10	8.93	−2.94	30.60	0.16	0.15	3292.0	2587.0	1.0
beta[Crusher_VSI]	24.57	7.36	10.63	38.04	0.14	0.12	2758.0	2479.0	1.0
beta[Collector_DSP]	12.87	6.73	−0.14	25.53	0.14	0.10	2467.0	2773.0	1.0
beta[Collector_PAX]	27.34	6.33	15.02	38.80	0.13	0.10	2415.0	2335.0	1.0
intercept	39.92	6.81	28.29	53.56	0.14	0.10	2403.0	2643.0	1.0
sigma	5.65	0.48	4.78	6.55	0.01	0.01	3815.0	2892.0	1.0

* Mean: Average estimated value of the coefficient.

Table 8. Bayesian regression coefficients for Cum_grade in flotation process (polynomial features).

Parameter	Mean	SD	HDI 3%	HDI 97%	MCSE Mean	MCSE SD	ESS Bulk	ESS Tail	R-Hat
β[Size]	2.14	7.79	−11.75	17.16	0.30	0.16	563.0	1984.0	1.01
β[Power]	−5.09	8.40	−21.36	10.17	0.19	0.20	1912.0	2226.0	1.00
β[Time]	−1.20	8.80	−19.70	13.19	0.38	0.19	510.0	1070.0	1.00
β[Head_grade]	13.63	8.68	−2.92	29.74	0.24	0.21	1353.0	1094.0	1.01
β[Head_grade2]	−1.04	6.67	−13.27	10.03	0.72	0.23	826.0	1040.0	1.02
β[Crusher_HPGRHS]	3.98	8.84	−13.32	19.26	0.22	0.20	1665.0	1217.0	1.00
β[Crusher_HPLS]	2.72	10.42	−15.04	21.31	1.20	0.53	84.0	152.0	1.02
β[Crusher_VSI]	10.88	8.82	−5.16	27.84	0.25	0.19	1312.0	2147.0	1.00
β[Collector_DSP]	4.32	9.04	−12.90	21.14	0.18	0.22	2463.0	2149.0	1.02
β[Collector_PAX]	14.02	9.10	−1.69	31.59	0.68	0.20	178.0	1951.0	1.02
β[Power2]	15.51	9.40	−0.40	32.15	1.09	0.60	164.0	1777.0	1.02
β[Collector_PAX2]	14.30	8.74	−1.58	31.44	0.22	0.18	1822.0	2171.0	1.00
β[Crusher_VSI2]	11.39	8.96	−6.03	27.78	0.21	0.24	1856.0	1580.0	1.02
β[Size2]	−4.53	2.66	−9.26	0.78	0.06	0.06	1731.0	2011.0	1.01
β[Time2]	−2.67	7.18	−15.78	10.71	0.26	0.22	768.0	387.0	1.02
β[Size × Head_grade]	5.40	6.83	−7.31	18.62	0.16	0.19	1860.0	1145.0	1.03
β[Time × Crusher_HPLS]	3.75	9.89	−14.39	21.75	0.57	0.19	364.0	1031.0	1.01
β[Crusher_HPGRHS × Collector_PAX]	10.58	8.63	−6.27	25.66	0.23	0.21	1447.0	980.0	1.02
β[Crusher_HPGRHS × Collector_DSP]	−7.31	9.12	−24.20	7.90	1.15	0.60	68.0	69.0	1.02
beta_poly[Size Power]	3.48	8.00	−12.54	17.45	0.28	0.17	790.0	1849.0	1.00
beta_poly[Size Time]	8.82	7.59	−7.17	21.88	0.28	0.28	763.0	176.0	1.02
beta_poly[Size Crusher_HPGRHS]	−1.41	6.65	−14.37	10.59	0.22	0.16	1043.0	710.0	1.00
beta_poly[Size Crusher_HPLS]	3.04	9.43	−15.77	20.33	0.18	0.28	2817.0	1610.0	1.02
beta_poly[Size Crusher_VSI]	2.02	8.02	−12.57	17.60	0.31	0.20	664.0	633.0	1.00
beta_poly[Size Collector_DSP]	−3.96	9.27	−21.87	13.65	0.18	0.24	2777.0	1474.0	1.01
beta_poly[Size Collector_PAX]	6.39	7.45	−7.50	21.28	0.15	0.17	2512.0	2083.0	1.01
intercept_poly	18.44	8.63	2.83	34.64	0.36	0.17	654.0	2307.0	1.01
sigma_poly	1.18	0.56	0.44	2.24	0.05	0.02	40.0	11.0	1.00

Table 9. Comparison of original vs. polynomial model coefficients.

Variable	Original Model	Polynomial Model
Head Grade (g/t)	Strong positive coefficient, narrow HDI, high ESS, $\hat{r}$ ≈ 1	Still dominant; Head_Grade² introduces curvature with wider uncertainty
Size (mm)	Moderate negative coefficient, stable estimate	Size² amplifies non-linearity; higher variance suggests overfitting
Power (kW)	Moderate positive influence	Power² term significant; more flexible but greater uncertainty
Crusher_VSI	Marginal impact; HDI includes zero	Crusher_VSI² shows impact, but interpretability suffers due to variance
Collector_PAX	Positive effect, reasonable confidence bounds	Collector_PAX² inflates variance, suggesting instability

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Devasahayam, S. Advancing Flotation Process Modeling: Bayesian vs. Sklearn Approaches for Gold Grade Prediction. Minerals 2025, 15, 591. https://doi.org/10.3390/min15060591

AMA Style

Devasahayam S. Advancing Flotation Process Modeling: Bayesian vs. Sklearn Approaches for Gold Grade Prediction. Minerals. 2025; 15(6):591. https://doi.org/10.3390/min15060591

Chicago/Turabian Style

Devasahayam, Sheila. 2025. "Advancing Flotation Process Modeling: Bayesian vs. Sklearn Approaches for Gold Grade Prediction" Minerals 15, no. 6: 591. https://doi.org/10.3390/min15060591

APA Style

Devasahayam, S. (2025). Advancing Flotation Process Modeling: Bayesian vs. Sklearn Approaches for Gold Grade Prediction. Minerals, 15(6), 591. https://doi.org/10.3390/min15060591

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Advancing Flotation Process Modeling: Bayesian vs. Sklearn Approaches for Gold Grade Prediction

Abstract

1. Introduction

Research Gap

2. Materials and Methods

2.1. Data Description

2.2. Size Distribution of Crushed Product

2.3. Data Preprocessing

2.4. Bayesian Ridge Regression Models: Scikit-Learn vs. Pymc

2.4.1. Scikit-Learn Bayesian Ridge Regression

2.4.2. PyMC Probabilistic Programming

2.4.3. Comparison and Justification

2.5. Feature and Target Variables

2.6. Feature Transformation to Enhance Model Performance

2.7. Cross-Validation

3. Results and Discussion

3.1. Cum_Grade Sampling Results

3.2. Model Performance (Table 6 (Original Features) and 7 (Polynomial Features))

3.3. Comparison of Models:

4. Scientific Significance of Bayesian Models

5. Conclusions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A

Significant Model Factors or Hyperparameters of Bayesian Linear Ridge Regression

Appendix B

Appendix C

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI