Comprehensive Comparison of Machine Learning Approaches—Deterministic and Stochastic—In Modeling the Production and Power of an SAG Mill: A Case Study of the Chilean Copper Mining Industry

Saldana, Manuel; Gálvez, Edelmira; Sales-Cruz, Mauricio; Salinas-Rodríguez, Eleazar; Salinas-Maldonado, Ramon G.; Castillo, Jonathan; Toro, Norman; Arias, Dayana; Cisternas, Luis A.

doi:10.3390/min16040412

Open AccessArticle

Comprehensive Comparison of Machine Learning Approaches—Deterministic and Stochastic—In Modeling the Production and Power of an SAG Mill: A Case Study of the Chilean Copper Mining Industry

by

Manuel Saldana

^1,2,*

,

Edelmira Gálvez

^3,*,

Mauricio Sales-Cruz

⁴

,

Eleazar Salinas-Rodríguez

⁵

,

Ramon G. Salinas-Maldonado

⁵,

Jonathan Castillo

⁶

,

Norman Toro

¹

,

Dayana Arias

⁷ and

Luis A. Cisternas

²

¹

Faculty of Engineering and Architecture, Universidad Arturo Prat, Iquique 1110939, Chile

²

Departamento de Ingeniería Química y Procesos de Minerales, Universidad de Antofagasta, Antofagasta 1271155, Chile

³

Departamento de Ingeniería Metalúrgica y Minas, Universidad Católica del Norte, Antofagasta 1270709, Chile

⁴

Departamento de Procesos y Tecnología, Universidad Autónoma Metropolitana—Cuajimalpa, México City 05348, Mexico

⁵

Academic Area of Earth Sciences and Materials, Institute of Basic Sciences and Engineering, Autonomous University of the State of Hidalgo, Pachuca 42184, Mexico

⁶

Departamento de Ingeniería en Metalurgia, Universidad de Atacama, Copiapó 1531772, Chile

⁷

Laboratory of Molecular Biology and Applied Microbiology, Centro de Investigación en Fisiología y Medicina de Altura (FIMEDALT), Departamento Biomédico, Facultad de Ciencias de la Salud, Universidad de Antofagasta, Antofagasta 1240000, Chile

^*

Authors to whom correspondence should be addressed.

Minerals 2026, 16(4), 412; https://doi.org/10.3390/min16040412

Submission received: 16 March 2026 / Revised: 11 April 2026 / Accepted: 15 April 2026 / Published: 16 April 2026

(This article belongs to the Section Mineral Processing and Extractive Metallurgy)

Download

Browse Figures

Versions Notes

Abstract

SAG grinding mills represent critical energy-intensive operations in copper concentrators, accounting for 30%–50% of total plant energy consumption. The accurate prediction of mill power draw and production rate under varying operational conditions is essential for real-time control, production planning, and energy management. This study presents a comprehensive comparison of ML algorithms for modeling Production and Power in a Chilean copper mining industry. Deterministic and stochastic models were fitted and validated using industrial data from a Chilean copper operation. More representative models were re-estimated and subsequently evaluated under different operating regimes to examine their predictive performance under aggregated conditions of the feeding variables. This procedure allowed for the identification of the modeling approaches that provide the most robust performance across varying operational regimes. The results show that XGB achieved the best predictive performance, with test RMSE and R² values of 87.98 and 97.35% for SAG Production, and 431.11 and 95.11% for SAG Power, respectively. Stochastic approaches provided complementary uncertainty quantification, supporting risk-informed decision making under variable operating conditions. The analysis by operational regime indicates that XGB presents better fit in the Thick hydraulic regime, for both responses’ variables, which could be explained why a dense pulp operation provides more predictable grinding dynamics. The comparative analysis reveals trade-offs between model complexity, interpretability, computational requirements, and predictive performance, offering practical guidance for selecting appropriate modeling frameworks based on specific operational objectives and data availability in mineral processing applications.

Keywords:

comminution; grinding circuit optimization; predictive analytics; Bayesian inference; ensemble modeling

1. Introduction

Semi-Autogenous Grinding (SAG) mills are critical operations in mining circuits. In copper concentrators, grinding/comminution commonly accounts for about 30%–50% of the plant’s total energy consumption and constituting one of the main bottlenecks limiting production, yield, and process efficiency [1,2,3]. In Chile, the world’s leading copper producer, the modeling, simulation, and optimization of SAG mills have the potential to directly impact the industry’s economic competitiveness, as well as its objectives of energy efficiency, operational sustainability, green mining, sustainable development, and circular economy, among others. Therefore, the capacity to predicting energy consumption and product levels with certain degree of accuracy, considering feeding variations such as ore characteristics, operating parameters and/or variables, and equipment conditions, is essential for the development and implementation of real-time control systems, such as digital shadows or even digital twins [4,5]. These ones, in turn, allow for the continuous optimization and improvement of production planning and energy management strategies. However, the inherent complexity of comminution systems, and particularly SAG milling systems, which are characterized by nonlinear or monotonic interactions between ore breakage mechanisms, load movement patterns within the mill (fall shape and dynamics), liner wear evolution, and various process disturbances, pose significant challenges for modeling, simulation, optimization, and control initiatives [6].

Although SAG mill power is commonly available from online plant instrumentation, modeling this variable remains relevant from an operational and methodological standpoint. In this work, SAG Power was not treated as an unmeasurable variable, but as a critical measured response whose prediction is useful for anticipatory decision making. A predictive model of SAG Power could support short-term forecasting under changing ore and operating conditions, provide a soft-sensing alternative during temporary sensor failures or missing data, enable what-if evaluation of operating scenarios, and would facilitate a potential integration into digital twins, supervisory control, and energy management strategies. Therefore, the objective is not to replace direct measurement, but to complement it with predictive capability and uncertainty-aware decision support.

Traditional approaches to SAG mill modeling have evolved according to three paradigms, each with specific limitations and advantages depending on the industrial and/or operational conditions. First, there are deterministic or phenomenological models, including Population Balance Models (PBMs) [2], Discrete Element Method (DEM) simulations [7], and Computational Fluid Dynamics (CFD) [8]. These models provide physically interpretable representations of grinding mechanisms and have proven valuable in areas such as equipment design, scale-up studies, and fundamental process understanding, in addition to incorporate the fundamentals of particle breakage, load trajectory dynamics, and energy transfer mechanisms, allowing for the exploration of design modifications and/or alternative operating configurations. On the other hand, a disadvantage of deterministic models is that they require a large amount of data for calibration, significant computing capacity, and in-depth knowledge of the breakage parameters (difficult to maintain and update in industrial contexts), among other factors. Additionally, the inherent uncertainty in the feeding dynamics or process parameters tends to reduce predictive accuracy when the mineral characteristics or equipment conditions deviate from the calibrated states.

ML approaches have emerged as a complementary paradigm to traditional (deterministic) approaches, leveraging the information available in Supervisory Control And Data Acquisition (SCADA) systems in modern processing plants. Techniques such as Artificial Neural Networks (ANNs), Random Forests (RFs), Gradient Boosting Machines (GBMs), and ensemble methods have demonstrated superior abstraction of system dynamics and predictive accuracy for real-time applications [6,9]. These approaches can capture nonlinear and/or non-monotonic relationships without an explicit mechanistic formulation, as well as high adaptability, allowing them to cope with varying operating conditions through retraining and updating. From the literature review, recent researches have reported considerable improvements in predicting indicators such as power or production using ensemble methods (eXtreme Gradient Boosting—XGBoost—XGB or Deep Learning) [10,11], which have been applied to the identification of operating regimes [9], energy consumption prediction [12], and throughput optimization [13], among others. Furthermore, although ML algorithms have demonstrated their strength and versatility in SAG milling modeling, some critical challenges remain related to interpretability, quantifying the uncertainty of the parameters involved, and validating physical consistency. These challenges stem from the black box nature of the process, which hinders a detailed understanding of its inherent dynamics. Additionally, it is important to note that the performance of the modeling depends heavily on the quality of the training data, the sample representativeness, and the representativeness of historical conditions in relation to the projected operating scenarios.

The third modeling paradigm comprises stochastic and probabilistic frameworks, which address the propagation of uncertainty and decision making under partial knowledge of input variables. Bayesian Networks (BNs), along with Monte Carlo-based surrogate models, provide formal mechanisms for quantifying uncertainty and prediction sensitivity by incorporating expert knowledge and supporting operational decision-making under risky conditions [14]. This latter paradigm is particularly useful in mineral processing applications, where variables such as mineralogical variability in the feeding or measurement noise introduce substantial uncertainty into the model outputs. Probabilistic models like BNs allow for the estimation of confidence intervals for the explained variables to generate production forecasts, identify risky operating regimes, and optimize control strategies [15]. However, despite their theoretical appeal, probabilistic approaches have not been sufficiently explored in industrial SAG mill applications.

The literature review reveals some critical gaps that motivate this research. These gaps point to a scarcity of studies that comprehensively compare deterministic, stochastic, and ML paradigms using the same robust and consistent dataset. Comparative analyses are essential to determine in which contexts each model is most suitable, and to identify the limitations and strengths of each one. Second, most ML studies focus on evaluating the fit metrics of individual models, rather than a comparative analysis of a variety of models, physical consistency, sensitivity to operating variables, or analysis of different operational regimes (directly aligned with process engineering knowledge) [16]. Third, there is a lack of quantification of uncertainty and probabilistic prediction in SAG milling systems, despite its importance in control systems and decision support. Finally, analysis based on production regimes (such as hydraulic regimes) has also not been incorporated into comparative modeling frameworks that allow represent that SAG mills operate under dynamic production regimes [9].

This study addresses these gaps through a comparative analysis of ML approaches (deterministic and stochastic) for modeling SAG mill power and production rate using industrial SCADA data from a Chilean copper concentrator. The specific objectives are as follows:

(1): To evaluate the predictive performance of multiple modeling approaches (RF, GBM, XGB, ANN, etc.).
(2): To fit probabilistic models that have the potential to assess uncertainty and allow for estimating responses with partial knowledge of the input variables.
(3): To analyze and compare the behavior of the models under different production regimes (e.g., dilute, optimum, and thick hydraulic regimes).
(4): To evaluate the interpretability of predictions through Sensitivity Analysis and comparison with process engineering knowledge.
(5): To analyze and discuss the implications of selecting approaches differentiated by explained variables and operating regimes in the development of digital twins or real-time control systems in industrial concentrator plants.

This study provides a comparative framework for evaluating the performance of different ML algorithms in modeling the SAG milling process, conducting a comparative analysis of their predictive capacity, and assessing their performance under different operating regimes. This work is intended as a case study based on one industrial SAG mill, and therefore the comparative results should be interpreted as operation-specific rather than as a universal hierarchy of model performance for the mining industry.

2. Background

2.1. Model Comparison Matrix

Table 1 summarizes strengths and limitations of the three model families for SAG milling under operational uncertainty. The opening paragraph explains scope and rationale. Deterministic (phenomenological) means first-principles and population balance/DEM/state-space simulators. ML covers classical regressors, trees/boosting, Gaussian Processes (GP), and deep nets. Stochastic models cover Bayesian Networks, Bayesian hierarchical approaches, and surrogate-based probabilistic analyses.

Then, models cited in Table 1 consider population balance and dynamic simulation [17], energy consumption studies using K-Nearest Neighbors Regressor (KNN-reg), Polynomial Regression (PR), Support Vector Regression (SVR) and Long-Short Term Memory (LSTM) strategies [12], ML optimization and liner age inclusion [18], comparative ML throughput studies with time delays [19], Genetic Programming (GP) and Multi-Equation GP (MEGP) [23], Sobol/GSA control design and pairings [20], surrogate-accelerated GSA [16], probabilistic graphical models for studying the fresh mineral feed rate and the power consumed by the mill [14], BN for Remaining Usable Life (RUL) in mining equipment [15], state-space SAG dynamic models [22], and DEM studies for lifter/grate and discharge [7], among others.

Additionally, hybrid models signify a logical progression in the modeling of comminution processes through the amalgamation of a physics-based structural framework (such as mass and energy balances, charge dynamics, population balance, or Discrete Element Method formulations) [24,25,26] with data-centric elements aimed at capturing residual nonlinearities, high-order interactions, or unaccounted operational effects [27]. Within this framework, critical parameters can be calibrated or refined utilizing ML methodologies [27,28], while the overarching architecture maintains physical coherence, thereby facilitating a physics-informed representation of uncertainty augmented by data-driven calibration [29,30,31]. Computational efficiency is contingent upon the intricacy of the hybrid architecture (for instance, ML surrogates integrated with process simulators or digital twin frameworks) [25,32]; nevertheless, industrial implementation is increasingly regarded as a strategic avenue due to its capacity to merge physical integrity with adaptive learning [24]. Common applications encompass advanced process control, industrial digital twins, and model-based optimization under uncertainty [24,33,34], wherein predictive precision, causal interpretability, and adaptive updating are concurrently requisite.

2.2. Reference Metrics and Validation Practices

The typical metrics used in the literature for regression and classification tasks, along with reported validation strategies, are presented in Table 2. In the reviewed corpus, the supervised SAG tasks are predominantly regression tasks (energy, yield, power, and fresh feed). The reported metrics and validation strategies include the following:

Regression Metrics: Root Mean Square Error (RMSE), normalized RMSE, Mean Average Percentage Error (MAPE) [12], Mean Average Error (MAE) and R² (widely used, although not all articles cite numerical values) [6,19]. Saldaña et al. [18] report simulated production increases and energy savings as operational performance metrics (+4.42% production; −7.62% energy) derived from their ML-driven scenarios.
Model Classification and Correlation: Studies compare models by RMSE/R² and classify models (e.g., Recurrent Neural Network—RNN > GP > SVR in performance prediction) using time-aware retention and preprocessing test suites [19,23].
Probabilistic Classification/Metrics: The Receiver Operating Characteristic—Area Under the Curve (ROC-AUC), the Matthews Correlation Coefficient (MCC), Cohen’s k, the Brier score, the Expected Calibration Error (ECE), and reliability curves are conceptually relevant for miss/detect or binarized regimes but are rarely reported in reviewed SAG regression articles; BN/RUL work assesses probabilistic accuracy and inference performance rather than ROC-specific details [14,15].
Validation Strategies: Splits that account for time series, training/validation/testing splits (e.g., 70/30 used in this industrial study), hyperparameter adjustment, and explicit handling of measurement delays and outliers [18,19,23].
Recommendation: RMSE/MAE/R² for continuous objectives; add calibration and reliability analysis for probabilistic outcomes and classification tasks. Report absolute commercial KPIs (energy error %, performance increase) when available as in Saldaña et al. [18] and López et al. [12].

2.3. Sensitivity and Uncertainty Analysis Methods

This section summarizes the Sensitivity Analysis (SA) methods applied to SAG models and how they have been used in the literature. The literature often combines classical SA (Sobol) with Monte Carlo propagation and surrogate acceleration to make Global Sensitivity Analysis (GSA) manageable for complex models. Key methods and documented applications are presented below:

Global Sensitivity Analysis (GSA) of Sobol: This is used to calculate overall sensitivity indices for input–output matching and control structure design in SAG mills; the Sobol–Jansen indices informed the selection of manipulated variables (fresh ore flow and mill filling fraction; energy consumption and % critical speed) in a Multiple-Input–Multiple-Output (MIMO) control design study [20].
Monte Carlo Analysis/Scenarios: Total propagation of uncertainty through Monte Carlo over parameter ranges and plant simulators to identify where accurate information or online measurement is needed in the flow diagram design [21].
GSA accelerated by substitutes: Supervised ML surrogates (batch and online Extreme Learning Machine—ELM) dramatically reduce the computational cost of Sobol-based GSA and enable online sensitivity estimation for mineral processing equipment [16].
Bootstrap/Monte Carlo for Model Uncertainty: Surrogate + Monte Carlo approaches are recommended to propagate epistemic and random uncertainty when simulators are expensive; proven frameworks perform sampling, surrogate fitting, and GSA [16,21].
Stochastic Population Balance: Stochastic formulations of failure functions and probabilistic load/failure models are used to capture the evolution of the Particle Size Distribution (PSD) under an uncertain load [40].

2.4. Suitability Conditions and Operational Contexts

The choice of model depends on the operating regime, the available sensors, and the decision horizon (control vs. design vs. maintenance). The following are the conditions assigned to the model families, supported by evidence.

Deterministic/population equilibrium/state space/DEM:

These are suitable when a mechanistic perspective is needed on PSD changes, the effects of liner geometry, and hypothetical scenarios at the design level (liner wear stages, grate/lift geometry, and cyclone cut size designs) [17,22].
Examples: Population balance simulators used to predict the temporal evolution of product flow rate, mill load level and energy consumption for redesigns of flow diagrams or circuits [17]; the DEM is used to quantify material transport, drag and collision energy and to evaluate elevator/grid modifications (impacts on discharge flow and collision energy) [7].
These are less suitable when real-time, factory-scale closed-loop control is required without online parameter estimation due to runtime.

Machine learning (lineal, RF/GBM/XGB, GP, MLP, RNN/LSTM, SVR, PLS):

This is suitable when there is abundant SCADA history and the goal is to forecast real-time performance or energy under operational variability (RPM, % solids variations, changes in feed water/inlet water, and PSD/drift) [6,12,18,19].
Handling drift/liner wear/pebble load: ML models can incorporate features such as liner age and capture changes if they are retrained or adapted online; Saldaña et al. [18] explicitly included liner age in the models used for scenario simulation.
Operating regimes: RNN/LSTM architectures achieved the best performance in energy/throughput forecasting across several studies that used bearing pressure, spindle speed, feed tonnage, and inlet water as input variables [6,12,19].

Stochastic surrogates/Bayesian Networks/hierarchical Bayesian models/Gaussian Processes:

These are suitable when decision support, alarm threshold definition, Remaining Useful Life (RUL), and probabilistic reasoning about uncertain events (e.g., sump level regimes, risk of clustering/torque events) are the primary objectives; BNs encode causal relationships and support evidence updating under changing operating conditions [14,15].
Control design: GSA + surrogate + probabilistic models have been used to design decentralized control pairings and to benchmark against Model Predictive Control (MPC) in a MIMO SAG case study [16,20].
Limitations: Fully probabilistic MPC/GP-MPC and Dynamic Bayesian Networks (DBNs) for online control are not demonstrated within this corpus (see Gaps).

Table 3 summarizes the operational variables considered in SAG mill performance modeling, grouped by functional category and describing their data treatment, modeling applicability (deterministic, ML, and stochastic), and qualitative sensitivity relevance reported in the literature.

2.5. Implementation Notes and Implementation Considerations

Practical implementation requires the careful engineering of data pipelines, sensor selection, validation protocols, and model explainability. Below are presented some Core sensor/SCADA tags used in the literature:

Feed and Throughput: Feed tonnage/fresh ore flow rate [12,19].
Mechanical Variables: Mill power draw, spindle/rotational speed, bearing pressure, percentage of critical speed [12,19,20].
Hydraulic/Flow Variables: Inlet water flow, solids percentage/pulp flow, and sump levels (when available) [12,18].
Material Descriptors: Feed PSD/P₈₀ and feed particle size, as well as liner age as an operational covariate [18,19].

Sampling frequency and time resolution: Specific sampling rates are not reported in the reviewed articles; the studies treat the data as time series and explicitly address measurement delays and temporal alignment during preprocessing [11,19,23]. There is insufficient evidence in the corpus to prescribe exact sampling intervals applicable to all installations.

Missing data and outlier handling strategies: Clearly erroneous measurements (e.g., negative speeds and percentages >100%) should be removed, and domain expertise should be applied to determine whether outliers correspond to legitimate but infrequent operating modes [23]. Several studies recommend domain-aware data cleaning and the preservation of representative “rare” events when appropriate [23].

Discretization/priors for BN construction: Bayesian Networks are commonly developed from prior knowledge and subsequently calibrated using training subsets. Practitioners should define variable discretization schemes and white-list/black-list edge constraints based on causal knowledge prior to automated structure learning [14,15].

Priors and hierarchical modeling: Hierarchical Bayesian formulations facilitate information pooling across operating regimes (e.g., liner wear stages). However, explicit hierarchical SAG models are not detailed in the reviewed corpus, and there is insufficient evidence to recommend specific hierarchical architectures in this context.

Explainability and root cause analysis: GP or symbolic regression should be considered when interpretability or closed-form expressions are required. Otherwise, eXplainable Artificial Intelligence (XAI) tools (e.g., SHapley Additive exPlanations (SHAP), Local Interpretable Model-agnostic Explanations (LIME), and feature importance) can be applied to deep and tree-based models to support root cause analysis and operator trust [9,23,41].

Retraining and online calibration: Surrogate-based GSA and sequential online ELM have been shown to enable near-online sensitivity updates [16]. Several ML studies emphasize the need for periodic retraining and hyperparameter tuning; one industrial workflow adopted a 70/30 Train–validation split for offline evaluation prior to deployment [18,19].

Implementation stack: ML models (RNN/LSTM) are considered suitable for integration into real-time infrastructures. Their practical deployment requires ensuring latency budget compliance, robust feature engineering, drift detection mechanisms, and fallback logic to deterministic models or operator rules under anomalous regimes [12,19].

2.6. Industrial Evidence, Performance Results, and Observed Robustness

The evidence comprises a combination of industrial data studies, controlled simulation experiments, and plant-scale case studies. The following summarizes reported performance results, implementation lessons, and robustness observations.

Throughput and energy results from ML-driven studies: Saldaña et al. [18] used statistical and ML models combined with simulations to estimate a potential production increase of 4.42% through fragmentation/operational adjustments and an energy reduction of 7.62% by decreasing rotational speed under specific liner age configurations. Additional studies report that RNN/LSTM architectures outperform classical regressors in energy and throughput forecasting; one LSTM study reported energy prediction errors below 4% RMSE [6,12].
Real-plant datasets and model comparisons: A comparative study using an industrial time series of 20,161 records found that time-aware RNN models were the most accurate for throughput prediction, followed by GP and SVR. Sensitivity Analysis identified rotational speed and inlet water as dominant factors [19,23]. Furthermore, GP-based approaches provide closed-form expressions valued by domain experts for their interpretability [23].
Control and sensitivity evidence: Sobol-based GSA-driven decentralized controller pairing demonstrated performance comparable to MPC in case studies. The suggested pairings were fresh ore feed—fractional mill filling and power—percentage of critical speed for SAG control design [20]. Surrogate-based GSA reduced computational cost and enabled near-online sensitivity indices, supporting robustness under drift conditions [16].
Design engineering and DEM evidence: DEM studies quantified how lifter geometry and pulp lifter design affect transport velocities, discharge flow, and collision energy. Significant increases in discharge flow and collision energy were reported when modifying lifter geometry, supporting liner/grate redesign strategies [7].
Stochastic/probabilistic decision support: BN implementations demonstrated high accuracy in predicting the fresh feed rate and mill power in industrial proof-of-concept datasets and were proposed for decision support under feed uncertainty [14]. BN approaches also have also been successfully applied to predict RUL/reliability for heavy mining motors [15].

Robustness and drift considerations indicate that ML models require periodic recalibration or online adaptation to handle changes in PSD, liner wear, pebble load, and cyclone performance shifts. Surrogate-based and online sensitivity methods provide a viable pathway, but field implementations require structured maintenance plans [16,18,19]. In contrast, deterministic simulators remain robust for design purposes but may fail to capture slow operational drift without online parameter estimation [17,22].

3. Materials and Methods

3.1. General Design of the Study

This study case focuses on the grinding process of a copper concentrator plant located in the Antofagasta region, northern Chile, an open-pit mining operation extracting copper sulfides from mineral deposits. The plant’s primary grinding line (see Figure 1) consists of an SAG mill, which will be represented through the calibration of deterministic, ML, and Bayesian stochastic models. The comminution circuit comprises a 12.2 m × 7.3 m SAG mill, followed by two parallel ball mills for secondary grinding. In addition, the pebbles generated by the SAG mill are recirculated back to the mill.

The main research hypothesis aims to evaluate how different modeling approaches exhibit distinct strengths in representing both SAG Power and SAG Production, through a comparative framework that enables the identification of the modeling paradigm (deterministic or stochastic) that provides the best goodness-of-fit. A comparison of model performance was conducted to determine the relative predictive capacity of each approach, while that the model performance was evaluated under different hydraulic operating regimes, allowing the assessment of how deterministic and stochastic models (the best model in each category) behave under varying process conditions.

3.2. Data, Operational Variables and Data Preprocessing

Historical operational data were collected over an extended period of 27 months and comprised 12 operating variables/parameters used as input features to formulate a representative SAG mill model. The independent variables examined included the following: P₈₀, inlet water flow rate, mill rotational speed, mill pressure, stockpile level, sump level, ore hardness, solids percentage in the feed, pebble load, granulometry above 100 mm, granulometry below 30 mm, and liner age (wear condition).

The response variables correspond to the production rate, measured in tons per hour (tph), and energy consumption, quantified in megawatts (MW). In the case of SAG Power, this variable was directly measured in the plant and used as a supervised response variable, as the purpose of modeling the power was not to substitute its physical measurement, but to develop predictive models able to anticipate power demand under varying operating conditions, support scenario analysis, and provide complementary decision support capability for control and energy management. The explanatory variables/parameters selected for modeling are detailed below.

P₈₀: Size of the mesh opening that allows the passage of 80% of the granulometry.
SAG water feeding (m³/h): Water flow feeding to the SAG mill.
SAG rotational speed (RPM): Mill rotational speed.
SAG pressure (PSI): Bearing pressure signal used as an operational proxy to estimate the internal mill load (mill fill level).
Stockpile level (m): Stockpile level in the feeding stack.
Sump level (m): Thicker downloading pool at the SAG mill.
Hardness: Resistance offered by the mineral to abrasion or scraping.
Solids in the feeding (%): Percentage of solids in the feed pulp.
Pebbles (tph): Pebbles (pebbles, chunks, or small stones) are the result of mineral grinding. These are hard materials and are difficult to reduce to a smaller size in the SAG mill.
Granulometry > 100 mm (%): Percentage of the ore feed whose granulometry is greater than 100 mm.
Granulometry < 30 mm (%): Percentage of the ore feed whose granulometry is less than 30 mm.
Liner age (months): Age of the mill liners. Liners are integral components of the mill and function as protective shells for the internal casing (SAG mill shell), which is subject to progressive wear due to the intense and continuous impact generated by interactions between the ore charge and the steel grinding media.

Prior to model fitting, the raw dataset was screened for duplicated records, missing values, and physically inconsistent observations. Records with clearly impossible values were removed according to engineering constraints, including negative flow, negative rotational speed, negative throughput or power, percentages outside the 0%–100% range, and a liner age lower than zero. For the remaining variables, univariate outliers were identified using the interquartile range criterion, and observations outside the interval [Q₁ − 1.5 × IQR; Q₃ + 1.5 × IQR] were flagged. These flagged observations were then reviewed under the process’s operating criteria, and only those considered non-representative or instrument-related were excluded. After data cleaning, the full dataset was partitioned using a 70/30 split, where 70% of the observations were used for training and 30% were reserved as an independent holdout test set.

3.3. Comparative Models

Table 4 presents a synthesis of the main models considered in the comparison, grouped according to their nature: deterministic or stochastic. It provides an overview of their essential characteristics, advantages, limitations, and requirements, in relation to their applicability to SAG process modeling. The objective of this comparison is not limited to contrasting predictive performance, but also to analyzing interpretability, robustness under uncertainty, computational complexity, and the suitability of each approach across different operating regimes, such as hydraulic conditions. The information synthesized in Table 4 supports the methodological selection adopted in this study and establishes the foundation for the comparative, sensitivity, and uncertainty analyses developed in the following sections.

3.3.1. Multiple Regressions—MRs

For each response variable, a second-order multiple linear regression model with interaction terms (see Equation (1)) was fitted and estimated using Ordinary Least Squares (OLS) with the Python language (version 3.11.14) and Statsmodels library (version 0.14.6) package [42]. The model includes all significant linear (

x_{i}

), quadratic (

x_{i}^{2}

), and first-order interaction terms (

x_{i} x_{j}

) among the explanatory variables, corresponding to a quadratic response surface specification [43].

y = β_{0} + \sum_{\forall i \in I} β_{i} x_{i} + \sum_{\forall i \in I} \sum_{\forall j \in I} β_{i j} x_{i} x_{j} + ε

(1)

3.3.2. Random Forest—RF

RF is an ensemble learning algorithm based on decision trees, where multiple independent trees are trained and subsequently aggregated to produce an averaged prediction (see Equation (2)) or a voting-based output (for classification: hard voting in Equation (3) and soft voting in Equation (4)) [44,45]. An RF typically builds hundreds of trees, each trained on a bootstrap sample of the dataset. At each split, only a random subset of features is considered [44]. The averaging mechanism reduces model variance and prevents any single tree from dominating the final predictions [45,46].

\hat{y} = \frac{1}{N_{t r e e s}} \sum_{\forall i \in N_{t r e e s}} f_{t} (x)

(2)

\hat{y} (x) = \arg \max_{k} (\sum_{\forall i \in B} I (T_{b} (x) = k))

(3)

\hat{y} (x) = \arg \max_{k} (\frac{1}{B} \sum_{\forall i \in B} p_{b k} (x))

(4)

RF has the capability to capture complex interactions among process variables, is robust to the inherent noise present in plant data and generally mitigates overfitting without requiring aggressive hyperparameter tuning [46,47].

3.3.3. eXtreme Gradient Boosting— XGBoost—XGB

XGBoost is a Gradient Boosting-based algorithm optimized for high performance [48,49]. Each new tree is trained to correct the errors of the previous one, forming a sequential ensemble of models. It is one of the most powerful and widely used algorithms in both competitive benchmarks and real-world applications. Whereas RF trains trees in parallel, XGBoost builds trees sequentially, with each tree focusing on the most difficult observations (i.e., those with larger residuals). The general update rule is shown in Equation (5), where

f_{t} (x)

represents the new tree fitted on the residuals or gradients, and η denotes the learning rate controlling the contribution of each added tree.

{\hat{y}}^{(t)} = {\hat{y}}^{(t - 1)} + η f_{t} (x)

(5)

XGBoost employs second-order optimization (gradient + Hessian), which enhances numerical stability and accelerates convergence [48]. It incorporates L1/L2 regularization on leaf weights, minimum gain-based pruning, row and column subsampling, optimized handling of missing values, and parallelized split construction [50].

3.3.4. Gradient Boosting Machine—GBM

GBM is a tree-based ensemble method that constructs a strong model by combining many weak learners (typically shallow decision trees) [49], as shown in Equation (6). Unlike RF, which builds trees in parallel, GBM constructs them sequentially, with each new tree attempting to correct the errors of the previously built ensemble. It is a supervised algorithm capable of performing both regression and classification [49]. While highly flexible and powerful, it requires careful hyperparameter tuning to prevent overfitting and ensure stable performance [50,51].

\hat{y} (x) = F_{M} (x) = \sum_{\forall m \in M} ν γ_{m} f_{m} (x); γ_{m} = \arg \min_{γ} \sum_{i = 1}^{n} L (y_{i}, {\hat{y}}_{i}); {\hat{y}}_{i} = F_{m - 1} (x_{i}) + γ f_{m} (x_{i})

(6)

where

ν

denotes the learning rate,

f_{m} (x)

represents the weak learner at iteration

m

,

L (y_{i}, {\hat{y}}_{i})

is the loss function (measuring the discrepancy between predictions and true values), and

{\hat{y}}_{i}

corresponds to the model prediction.

3.3.5. Artificial Neural Networks—ANNs

ANNs are mathematical models inspired by the functioning of the human brain [52,53]. They operate through a set of nodes (neurons) organized in layers, where each neuron transforms input signals into output signals through weights, biases, and an activation function. They are nonlinear, flexible, and universal algorithms that, if given sufficient neurons and layers, can approximate any continuous function (Universal Approximation Theorem) [52]. An ANN may contain one or multiple hidden layers, depending on the problem complexity. Each neuron in the first hidden layer performs the forward propagation described in Equation (7), and the overall response estimation is given in Equation (8) [54].

h_{j} = ϕ (\sum_{i = 1}^{p} w_{j i}^{(1)} X_{i} + b_{j}^{(1)})

(7)

\hat{y} = \sum_{j = 1}^{H} w_{j}^{(2)} h_{j} + b^{(2)}

(8)

where

p

denotes the number of predictor variables,

H

is the number of neurons in the hidden layer, and

ϕ (\cdot)

represents the activation function (e.g., ReLU, tanh, etc.).

3.3.6. Bayesian Linear Regression—BLR

BLR was used as a probabilistic baseline to model continuous SAG responses (SAG Power and SAG Production) with explicit uncertainty quantification [55,56]. For observation

i

, with response

y_{i}

and predictors

x_{i} \in R^{p}

, the model assumes a Gaussian likelihood

y_{i} | β, σ^{2} ~ N (x_{i}^{⊺}, β, σ^{2})

[57], where

β

are regression coefficients and

σ^{2}

the residual variance. To enhance flexibility while preserving interpretability, the linear predictor included main effects, quadratic terms, and first-order interactions, implemented via a Patsy-compatible formula with robust handling of non-standard variable names. The specification is equivalent to Ordinary Least Square (OLS) on an expanded basis but interpreted under Bayesian assumptions, enabling predictive intervals from the fitted distribution [55].

The model training used a fixed 70/30 Train/Test split. Performance was evaluated using RMSE, MAE, MAPE (with a stabilized formulation), and R²|Adjusted R² on training data. Internal 10-fold cross-validation was conducted within each split, reporting fold-wise RMSE, Coefficient of Variation (CV), and Out-Of-Fold (OOF) RMSE/MAE with bootstrap confidence intervals. Predictive uncertainty was assessed through 90% prediction intervals, deriving 90% coverage, Mean Prediction Interval Width at 90% (MPIW90), and average predictive entropy. Standard diagnostics included multicollinearity (Variance Inflation Factor—VIF; condition number), heteroscedasticity (Breusch–Pagan; White), residual normality (Jarque–Bera), and influence measures (Cook’s distance; leverage) [58]. Local interpretability was examined via sensitivity and elasticity analyses at the mean operating point, summarizing linear, quadratic, and interaction effects through tornado plots.

3.3.7. Bayesian Additive Regression Tree—BART

BART was used as a flexible Bayesian nonparametric model to capture nonlinear effects and high-order interactions in the SAG responses (SAG Power and SAG Production) without prescribing a parametric functional form [57,59]. For each target

y

, BART models the conditional mean as a sum of

m

regression trees,

y_{i} = f (x_{i}) + ε_{i}, f (x_{i}) = \sum_{j = 1}^{n} g (x_{i}; T_{j}, M_{j})

, where each

g (\cdot)

is a piecewise-constant tree with structure

T_{j}

and leaf parameters

M_{j}

, and

ε_{i} ~ N (0, σ^{2})

. The model was implemented in PyMC library (version 5.28.0) and estimated jointly with a residual scale parameter,

σ

, assigned a robust prior

σ \sim H a l f S t u d e n t T (ν = 4, s c a l e = σ_{s c a l e})

. To improve Markov Chain Monte Carlo (MCMC) stability, responses were optionally standardized using training set statistics only,

y^{(z)} = (y - {\bar{y}}_{t r a i n}) / s_{t r a i n}

, and posterior predictive draws were subsequently back-transformed to original units for all reported metrics. Predictor inputs were optionally z-scored using training set means and standard deviations and consistently applied to the test set. Model inference was performed via MCMC with a tuning phase and a high acceptance threshold, while the number of trees

m

controlled model capacity (e.g.,

m = 400

in the final configuration). Predictive distributions were obtained through posterior predictive sampling, using a noise-augmented predictive node

y_{h a t} ~ N (μ, σ)

to ensure coherent uncertainty estimates [56,60].

3.3.8. Gaussian Bayesian Network—GBN

In this study, a node-wise GBN was implemented to model the conditional relationships between process variables [61,62]. The network structure was predefined rather than learned from data; that is, for each response variable, a fixed set of parent variables was specified a priori, and no structure learning algorithm (e.g., score-based or constraint-based DAG search) was applied. Consequently, the Directed Acyclic Graph (DAG) was treated as known and constant across model estimation. For each target node Y, the Conditional Probability Distribution (CPD) was defined as

Y | P a (Y) ~ N (μ (P a (Y)), σ^{2})

, where Pa(Y) denotes the predefined parent set and the conditional mean function

μ (\cdot)

was estimated using BART [57].

This formulation allows flexible, nonlinear and interaction-rich dependencies between predictors and the response while retaining a Gaussian likelihood assumption for the residual term [57]. The posterior distribution of

σ

was simultaneously inferred, enabling full predictive uncertainty quantification. Thus, the proposed approach can be interpreted as a GBN with nonlinear conditional mean functions (BART-GBN), where network structure is fixed and conditional distributions are learned through Bayesian nonparametric regression. Predictive inference was performed using posterior predictive sampling, from which point estimates, credible intervals, coverage probabilities, and predictive log-density metrics were computed [56].

3.3.9. Gaussian Process Regression—GPR

GPR was implemented as a probabilistic, nonparametric surrogate to capture nonlinear relationships between process covariates and the response variables [63,64]. For each target (e.g., SAG Power and SAG Production), the input matrix X was standardized using a StandardScaler fitted on the training set and applied to the test set to ensure a comparable scale for kernel hyperparameters. The response

y

was also standardized (Train statistics only) to stabilize inference; predictive means and variances were later mapped back to the nominal scale by reversing the scaler transformation (

µ = µ_{s} \cdot s c a l e + m e a n

;

V a r = V a r_{s} \cdot {s c a l e}^{2}

) [63].

To reduce the computational cost in moderately large datasets, a sparse GP approximation was adopted through a Fully Independent Training Conditional (FITC) scheme using a fixed subset of inducing points

X_{u}

sampled without replacement from the standardized training inputs [65,66]. The covariance function was defined with Bayesian priors on the kernel hyperparameters: an amplitude parameter η and a length-scale ℓ (optionally Automatic Relevance Determination—ARD) were assigned Half-Normal priors, and the observation noise

σ

was modeled with a Half-Normal prior. The main experiments used a Matérn 5/2 kernel, which provides sufficient smoothness while allowing moderate local variability, suitable for operational process data. Inference was performed via Automatic Differentiation Variational Inference (ADVI) with an Adam optimizer. A lightweight posterior summary was obtained by drawing a limited number of variational samples and using the posterior mean of the hyperparameters as a point estimate for fast analytical prediction, returning predictive mean and variance [67].

3.3.10. Bayesian Neural Network—BNN

BNN was implemented to model nonlinear relationships while explicitly propagating epistemic and aleatoric uncertainty through a probabilistic weight space [68]. For each response variable (e.g., SAG Power and SAG Production), the predictor matrix

X

was standardized on the training subset and applied to the test subset. The response

y

was also standardized using training statistics only to improve numerical stability during variational inference [69,70]; all posterior predictive samples were subsequently transformed back to the nominal scale using the inverse of the fitted scaler (

s a m p l e_{n o m} = s a m p l e_{s} \cdot s c a l e + m e a n

).

The BNN architecture consisted of fully connected layers defined by hidden layers (baseline: one hidden layer with eight units) and a nonlinear activation function (tanh; relu available). Network parameters were treated as random variables: each weight matrix

W

and bias vector

b

was assigned independent Gaussian priors with hierarchical scale parameters,

σ_{w}

and

σ_{b}

, each modeled with Half-Normal hyperpriors [70]. The output layer produced a latent mean function

μ (x)

, and observation noise was represented by a Half-Normal prior on σ. Conditional on μ and σ, the likelihood followed a Gaussian model

y \sim N (μ, σ)

. Posterior inference was performed using ADVI with the Adam optimizer [69]. After convergence, draws from the variational approximation were used to generate posterior predictive distributions. For the training set, posterior predictive samples were obtained from observations (capturing both parameter uncertainty and observation noise) and

μ

(latent mean). For the test set, samples of

μ

were generated and optionally converted to full predictive samples by injecting Gaussian noise using posterior draws of σ, yielding

\hat{y} = μ + σ \cdot ε

[71].

3.4. Comparative Strategy Between Models

Since the comparison focuses on ML models (stochastic and probabilistic), the evaluation framework must assess predictive accuracy, robustness, probabilistic calibration, explanatory capacity, and uncertainty quantification using a set of statistical indicators and metrics that are comparable across both paradigms. All models were trained using a fixed random seed of 42 and were assessed using a fixed 70/30 Train–Test split, including the following goodness-of-fit indicators: RMSE, MAE, MAPE, and R². The holdout test data were not used for model fitting. Additional cross-validation (internal 10-fold cross-validation performed on the training subset), OOF, and bootstrap analyses were used only as complementary robustness assessments and are reported separately from the main holdout test results. Additionally, all preprocessing decisions, hyperparameter tuning procedures (Appendix A Table A1 summarizes the principal model-specific hyperparameters and structural settings used in the final implementations), and scaling transformations were defined using the training subset only and then applied to the independent holdout test subset.

Uncertainty quality was evaluated using Gaussian predictive intervals (90% PI) to compute 90% coverage and MPIW90, and predictive dispersion was summarized through average entropy

0.5 \cdot l n (2 π e V a r)

. Probabilistic accuracy on the test set was further quantified using Negative Log Predictive Density (NLPD) under a normal predictive distribution and mean predictive standard deviation (

{\bar{σ}}_{p r e d}

) from posterior predictive samples. Robustness was examined with internal K-fold CV (light configuration) reporting fold-wise RMSE/MAE, OOF aggregates, and t-based 95% confidence intervals for mean CV errors, complemented by a bootstrap RMSE estimate on the holdout test set. Finally, local sensitivity and elasticity were approximated via finite differences in the GP predictive mean at the average operating point and summarized using tornado charts for interpretability.

Finally, model interpretability was supported through a local sensitivity/elasticity analysis computed by central finite differences on the latent mean

μ

. Perturbations were applied in standardized input space and then converted to nominal units using the ratio of output and input scaler factors, producing sensitivities

\partial \hat{y} / \partial x

and dimensionless elasticities. The results were summarized using tornado charts, while the Model Comparison Framework and Cross-Model Comparison are presented in Table 5 and Table 6, respectively.

3.5. Sensitivity Analysis—SA

SA is a tool used to understand the impact of uncertain parameters on processes and systems. It contributes to operational optimization, safety improvement, and enhanced decision making by identifying critical variables that influence outcomes. In the mining sector, the application of Sensitivity Analysis spans multiple domains, including resource characterization [72], transportation systems [73], equipment reliability [74], ventilation networks [75], waste management [76], and process design [77], among others. Then, in mineral processing, SA can help facilitate understandings of the influence of input variables fluctuations on process efficiency [78], and, although it provides valuable insights into a wide range of mining processes, it is important to recognize its limitations, such as that its accuracy depends on the quality of the feeding data and on the modeling assumptions adopted, and that the complexity of mining systems can make it difficult to capture all the significant variables. Despite these challenges, SA remains a powerful tool for improving the robustness and reliability of mining operations.

A posterior-based local SA was developed for all fitted models. For deterministic models, sensitivities were obtained analytically (when closed-form expressions were available) or via central finite differences, whereas for probabilistic models, sensitivities were computed across posterior samples to propagate parametric uncertainty. This unified strategy enables a coherent comparison of variable influence under a common perturbation scheme. Elasticities were computed at nominal mean operating conditions, and uncertainty was summarized via percentile-based credible intervals across posterior draws.

3.6. Suitability Comparison by Operating Regime

The suitability analysis by operating regime aims to determine under which process conditions each model presents the best performance, considering operational variability (hydraulic, mineralogical, and mechanical). The objective is not merely to identify which model performs best overall, but rather to establish which approach is more reliable under specific operating scenarios. To operationalize this comparison, the process conditions were categorized into representative regimes based on physically and operationally meaningful criteria. Table 7 summarizes some possible factors, the associated process variables, and the proposed threshold definitions used to segment the data. These regimes enable a structured evaluation of model performance across distinct hydraulic states (e.g., Diluted, Optimal, and Thick), mineral hardness levels, liner age conditions, and internal transport/load behaviors. By defining consistent cut-off criteria (based on percentile thresholds and engineering criteria) the framework ensures comparability across models and provides a systematic basis for performance assessment of each regime.

Hydraulic regimes, which are modeled in the following sections, were defined using the solids percentage in the feeding as the primary classification variable, while SAG water in the feeding and SAG pressure were used as supporting indicators of hydraulic condition. Observations were classified as Diluted (D) when the solids percentage was below the 33rd percentile, typically with water flow rate above the 67th percentile and/or bearing pressure below the 33rd percentile, as Optimal (O) when the solids percentage was between the 33rd and 67th percentiles, together with intermediate water flow and bearing pressure conditions, and as Thick (T) when the solids percentage was above the 67th percentile, generally associated with water flow rate below the 33rd percentile and/or bearing pressure above the 67th percentile. This classification strategy was selected because it provides a reproducible framework, avoids the use of arbitrary absolute thresholds that may not be transferable across plants, and remains consistent with the concept of operation-specific operational regions derived from the actual behavior of the process.

4. Results

4.1. Exploratory Analysis

The Spearman correlation analysis shown in Figure 2 illustrates the rank-based correlation among the main sampled variables, thereby validating logical assumptions regarding the operating dynamics of the SAG grinding process. For example, a negative correlation is observed between P₈₀ and the percentage of granulometry below 30 mm, while a positive correlation is identified with the percentage of granulometry above 100 mm. The correlation study also reveals a strong positive association between inlet water flow and production rate (tph). Furthermore, the absence of a Spearman correlation does not imply the absence of a relationship between variables; rather, it indicates that the relationship is not monotonic [79].

The analysis of the SAG grinding process involves a complex and multidimensional system operating under persistent uncertainty and subject to various risk factors inherent to the problem. The best-fitting distributions for the model variables are shown in Figure 3 and Figure 4 for the independent and dependent variables, respectively. Some of the distributions providing the best fit include Student’s t distribution for P₈₀, the generalized normal distribution for SAG water feeding, and the generalized hyperbolic distribution for sump level, hardness, percentage of solids in the feeding, granulometry > 100 mm and SAG Power.

4.2. Analysis of the Comparative Experiment

Table 8 summarizes the results of the comparative experiment conducted for the two target responses—SAG Production and SAG Power—under both the training and holdout test partitions. The comparison includes deterministic (MR), ensemble tree-based (RF, XGBoost, and GBM), neural (ANN and BNN), and Bayesian probabilistic approaches (BLR, BART, GBN, and GPR). The models’ performance is evaluated using complementary accuracy indicators (RMSE, MAE, R², and MAPE) and the scale-independent metric CV_RMSE, allowing cross-model and cross-response comparability. The results are reported separately for the training and test sets to provide a transparent view of in-sample fit and out-of-sample generalization behavior across modeling paradigms.

Table 9 reports the robustness analysis based on OOF resampling, including CI for RMSE and MAE during training, alongside OOF point estimates and holdout test errors. This structure allows for a direct comparison between cross-validated stability and external generalization performance across all ML models. Table 10 complements this evaluation by focusing on probabilistic calibration and predictive dispersion for the stochastic approaches. The metrics include the 90% predictive interval coverage, mean prediction interval width (MPIW), negative log predictive density (NLPD), average predictive standard deviation (Train and Test), and differential entropy.

4.2.1. Non-Stochastic Model Analysis

Of the non-stochastic models, the MR model exhibits the worst performance. This lower performance is explained by structural limitations related to the model’s monotonic behavior when applied to complex industrial processes such as SAG milling. The operation of an SAG mill is influenced by the strong coupling effects between variables and behaviors dependent on the operating regime, which may be associated with hydraulic and/or mechanical operating conditions. MR models approximate relationships using additive terms and limited interaction effects, which typically results in underfitting when the underlying process has complex response surfaces. Therefore, MR models tend to show higher prediction errors and lower goodness-of-fit indicators compared to ML models. However, it should be noted that the fit indicators for MR are not poor; in fact, MR shows R² values of 92.66% and 94.83% for power and production, respectively. Additionally, the RMSE and MAPE indicators are 528.29 and 1.9480% for SAG Power, and 122.97 and 2.35% for SAG Production, respectively. Then, regarding robustness analysis, the MR model shows a stable OOF performance (RMSE OOF = 487.8; 95% CI [473.9, 504.2]) with a moderate degradation on the holdout test set (RMSE_Test = 528.3, +8.3% relative to RMSE OOF). The MAE increasing is smaller (+4.5%), indicating that the loss of accuracy is mostly driven by a limited number of high-error cases in the test set rather than a systematic bias.

Along with MR, RF also exhibits decent performance in terms of modeling power and output, with R² values of 91.94% and 96.30% for power and output, respectively, and RMSE and MAPE values of 553.51 and 2.01% for Power and 104.02 and 1.82% for Production. Unlike MR, RF constructs a set of decision trees trained on bootstrapped samples of the dataset, where each tree recursively partitions the feature space to represent conditional relationships between operating variables and the response. This ensemble learning strategy reduces model variance and allows the algorithm to capture higher-order interactions between variables without requiring explicit specification in the model structure. It should be noted that although the Train errors are substantially lower than holdout Test errors, this gap is expected for RF models with high capacity and does not necessarily imply overfitting. The proposed configuration improves both OOF and holdout test RMSE for both targets, while maintaining similar cross-fold variability, indicating better generalization rather than increased overfitting. The remaining OOF Test gap is attributed to operational heterogeneity and potential distribution shift between training and test subsets.

ANN demonstrates quite competitive performance, outperforming MR and RF for both production and power, reflecting its capacity to abstract complex, nonlinear, and non-monotonic relationships between operating variables and process responses. For production, the network achieved a test RMSE of 93.09 and an R² of 97.04%, indicating a substantial improvement over the MR model and performance comparable to other ML approaches such as GBM and RF. The low MAPE value (1.43%) suggests that ANN provides stable predictions under different operating conditions, meaning that the network can capture complex interactions between variables such as PSD, solids concentration, and SAG operating conditions, which jointly influence production. A similar trend is observed in Power prediction, where the ANN achieved a test RMSE of 474.10 and an R² of 94.09%. While this performance represents a clear improvement over MR, it is still slightly lower than that of boosting-based approaches such as XGBoost and GBM. One possible explanation is that ANNs require the careful tuning of hyperparameters and larger datasets to fully leverage their representational capabilities. In industrial datasets with moderate sample sizes and heterogeneous operating regimes, optimizing the network architecture and training parameters can influence the final predictive performance. Additionally, K-fold cross-validation further supports generalization capacity, as the out-of-fold RMSE aligns closely with holdout and bootstrap-based test RMSE estimates.

The GBM model demonstrates strong predictive capability for both response variables, outperforming MR, RF and ANN. This highlights the capacity of boosting algorithms to capture complex relationships inherent in comminution processes. Unlike RF, GBM builds predictive models sequentially, with each new tree trained to minimize residual errors from the previous set. This iterative learning process allows the algorithm to progressively refine the approximation of the response surface, enabling a more accurate representation of the interactions between the operating variables of SAG milling process. In production, GBM achieved a test RMSE of 91.73% and an R² of 97.12%, representing a substantial improvement over MR and a modest improvement over RF. The MAPE of approximately 1.57 further confirms the model’s ability to provide stable predictions under different operating conditions. Furthermore, the similarity between training and test errors suggests that the model maintains a good balance between predictive accuracy and generalizability. The GBM model achieved a test RMSE of 470.39 and an R² of 94.18% when modeling power. This improvement highlights the ability of Gradient Boosting algorithms to capture complex response surfaces associated with mill energy consumption. The generalization gap between the cross-validated OOF error and test error remained moderate (≈ 8%–9%), indicating that model complexity was adequately controlled and that no severe overfitting effects were observed. The cross-validation results further demonstrated low dispersion across folds (CV_RMSE ≈ 0.05–0.07), supporting the robustness. Additionally, 90% coverage in test set remained close to the nominal level, suggesting acceptable uncertainty calibration under the homoscedastic error assumption.

XGBoost exhibits the best predictive performance among the deterministic ML approaches evaluated in this study, demonstrating a superior capacity to capture the complex relationships governing SAG mill operation, such as ore properties, hydraulic conditions, and mill operating parameters. XGBoost extends the Gradient Boosting framework by incorporating several algorithmic enhancements, including regularization mechanisms, optimized tree construction, and efficient feature interaction management, enabling the model to achieve high predictive accuracy while maintaining a balance between model complexity and generalizability. For production, XGBoost achieved the lowest prediction errors among the evaluated models, with a test RMSE of 87.98 and an R² of 97.35%. An MAPE of ≈1.55 indicates that the model provides stable predictions under different operation conditions, suggesting that the boosting strategy effectively captures the nonlinear dependencies between feeding characteristics and the operating variables that control milling performance. In particular, the sequential correction of residuals allows the model to progressively refine the representation of the response surface, enabling a more accurate approximation of the relationships between the explanatory variables and SAG production. A similar pattern is observed in the prediction of SAG Power, where the model achieved a test RMSE of 431.11 and an R² of 95.11%, outperforming all the other deterministic models evaluated in this study.

Additionally, XGBoost was configured to balance predictive capacity and generalization through controlled complexity and early stopping. Although in-sample training errors were substantially lower than cross-validated and holdout errors (an expected behavior for high-capacity Gradient Boosting models) the model demonstrated strong generalization performance, obtaining one of the best performances of deterministic models. RMSE OOF values were consistently lower than those obtained with Random Forest, and holdout test errors remained within a moderate range (≈6%–10% increase relative to OOF), indicating a controlled generalization gap rather than severe overfitting. Overall, the results indicate that XGBoost provides the most accurate deterministic representation of both SAG production and power in the analyzed dataset.

4.2.2. Stochastic Model Analysis

In terms of the stochastic models evaluated, BLR presents the lowest predictive performance for SAG Production among the Bayesian approaches evaluated in this study, reflecting the structural limitations of linear modeling frameworks applied to complex industrial process. For SAG production, BLR produced a test RMSE of 124.09, an MAPE of 2.43%, and an R² of 0.9473, representing the highest prediction error among the evaluated Bayesian models. The relatively high MAE value (76.54) further indicates that the model struggles to accurately reproduce the variability observed in the operational data, which suggests that the linear structure assumed by BLR is insufficient to represent the complex interactions of modeled process. A similar pattern is observed in SAG Power, where BLR achieved a test RMSE of 534.91, a test MAPE of 1.97% and an R² of 0.9248. Although the probabilistic formulation allows the model to quantify uncertainty in the predictions, its predictive accuracy remains limited when compared with more flexible Bayesian approaches. Despite the error measures, BLR fit is quite robust, and K-fold cross-validation reinforced this behavior, as the RMSE OOF closely matched the holdout and bootstrap-based test RMSE estimates, while the CV_RMSE remained low, confirming its limited sensitivity to data partitioning and the absence of structural overfitting. The 90% coverage prediction intervals in test sets (≈0.91 for Power and ≈0.94 for Production) were close to or slightly above the nominal level, indicating reliable interval estimation without systematic undercoverage. Additionally,

{\bar{σ}}_{p r e d}

showed negligible variation across datasets, supporting homoscedastic behavior under the Gaussian assumption.

On the other hand, BART presents the lowest predictive performance for SAG Power among the stochastic approaches evaluated in this study, with a test RMSE of 590.52 and an R² of 90.83%. This reduction in performance may be associated with the higher variability and dynamic nature of energy consumption in SAG milling operations, where the interaction between mineral characteristics and mill operating conditions introduces complex response patterns that are more difficult to capture with additive tree structures. On the other hand, in SAG Production, the BART model achieved a test RMSE of 89.78 and an R² of 97.24%, indicating a strong predictive capacity comparable to several deterministic ML models. The relatively low prediction errors suggest that the additive tree structure effectively captures the complex interactions between the feed characteristics and operational parameters that influence grinding performance. The above-mentioned suggests that while BART can capture complex interactions in certain operational contexts, additional modeling flexibility may be required to fully represent the variability associated with energy consumption in SAG milling processes. The generalization gap between the training and test errors remained moderate and stable, while K-fold cross-validation confirmed robustness to data partitioning, with CV_RMSE values around 0.05–0.07 and RMSE OOF closely aligned with holdout and bootstrap-based test estimates, which supports the absence of overfitting and reinforces the stability of the fitted model.

GBN presents a regular performance for both SAG Power and SAG Production. For SAG production, the GBN model achieved a test RMSE of 107.35 and an R² of 0.9606, indicating moderate predictive performance compared to other Bayesian models. This could be explained by the structural constraints imposed by the graphical model, where conditional dependencies between variables are defined through the network structure. While this representation provides valuable information about the probabilistic relationships between process variables, it may limit the model’s ability to fully capture the complex interactions present in the process. A similar pattern is observed for power prediction, where GBN obtained a test RMSE of 568.83 and an R² of 0.9149, placing it among the lowest-performing Bayesian approaches in terms of predictive accuracy. The relatively larger prediction error suggests that the Gaussian assumptions and conditional relationships embedded in the network structure may not fully represent the variability associated with energy consumption in SAG milling operations, diminishing their predictive capacity compared to more flexible ML approaches. That

{\bar{σ}}_{p r e d}^{T r a i n} \approx {\bar{σ}}_{p r e d}^{T e s t}

indicates consistent uncertainty quantification outside the training data. Overall, while GBN provides a transparent probabilistic framework capable of representing conditional dependencies and uncertainty propagation within the grinding circuit, its predictive performance is generally lower than that of more flexible Bayesian models.

GPR model shows one of the strongest predictive performances among the Bayesian approaches evaluated in this study, reflecting its capability to flexibly model complex and nonlinear relationships while explicitly accounting for predictive uncertainty. For SAG production, the GPR model achieved a test RMSE of 92.14 and an R² of 97.1%, indicating a strong predictive capability comparable to other advanced ML models. A similar behavior is observed in the prediction of Power, where GPR achieved a test RMSE of 473.83 and an R² of 94.1%. In this case, GPR represents the best-performing Bayesian model for Power prediction, outperforming BLR, BART, GBN, and BNN. The flexibility of the kernel function allows the model to represent complex interactions between ore properties and mill operating conditions that affect energy consumption. Then, 90% coverage remained close to the nominal level (0.924/0.888 for SAG Power and 0.948/0.938 for SAG Production), with comparable MPIW90 (≈1331–1352 for Power and ≈248–253 for Production) and consistent predictive dispersion (

{\bar{σ}}_{p r e d} \approx 0.9854

for Power and

\approx 0.9740

for Production). However, training cross-validation revealed substantially higher error than the holdout evaluation (RMSE OOF = 535.70 for Power and 135.39 for Production), suggesting sensitivity to data partitioning and possible heterogeneity across operating regimes.

Finally, the BNN fitted using ADVI demonstrates the strong and consistent predictive performance across both targets. For SAG Power, the model achieves an R² of 94.3% in training and 92.8% in testing, with a moderate increase in error (RMSE: 464.3 to 527.0), indicating expected OOF degradation rather than substantial overfitting. Although this performance represents an improvement relative to simpler probabilistic models such as BLR, it remains slightly inferior to more flexible approaches such as GPR. One possible explanation is that the performance of the BNN is strongly influenced by the selection of prior distributions, network architecture, and training procedure. In industrial datasets with heterogeneous operating regimes, the optimization of these elements plays a critical role in achieving the full predictive potential of the model. This interpretation is supported by cross-validation results, where the OOF performance remains stable (RMSE OOF ≈ 487.0) and exhibits low relative dispersion (CV_RMSE = 0.0468). Then, 90% prediction intervals are closely aligned with the nominal level (90% Coverage^Test = 0.906), with nearly identical interval widths between Train and Test (MPIW90 ≈ 1537), suggesting coherent uncertainty quantification (NLPD_Test = 7.699). For SAG Production, the BNN shows highly robust behavior, with nearly identical training and testing metrics (RMSE: 89.1 vs. 88.5; R² ≈ 0.972), effectively ruling out meaningful overfitting. The relatively low MAE value (46.89) suggests that the model accurately represents the variability observed in the operational data. These results demonstrate that the BNN effectively captures the system dynamic.

To summarize, the BNN provides a powerful probabilistic modeling framework capable of representing the nonlinear dynamics and complex dependence relationships of SAG milling processes while incorporating uncertainty in the predictions.

4.3. Sensitivity and Elasticity Analysis

Tornado diagrams showed in Figure 5, Figure 6, Figure 7, Figure 8, Figure 9, Figure 10, Figure 11, Figure 12, Figure 13 and Figure 14 presents the relative sensitivity and elasticity of each response variable to perturbations in all predictors, providing a structured representation of local robustness and parameter-driven variability.

Some key points of the Sensitivity Analysis are listed below:

Sensitivity and Elasticity Structure Across Models: Tornado diagrams reveal a clear and recurrent hierarchical structure in the influence of input variables on both SAG Power and SAG Production. Across nearly all the modeling approaches, the dominant drivers are SAG rotational speed, solids in the feeding, SAG pressure, and P₈₀ in the feeding. These variables consistently exhibit the largest absolute sensitivities $(\frac{\partial ŷ}{\partial x})$ and the highest elasticities, indicating that marginal perturbations in these inputs produce the greatest impact on system response. For SAG Power, rotational speed emerges as the most structurally stable driver, displaying large positive sensitivities across deterministic and probabilistic models. This behavior is physically consistent with SAG dynamics, where increased rotational speed elevates energy transfer, and consequently, SAG Power. The solids percentage in the feeding also shows strong positive elasticities, reflecting the increase in effective load and grinding resistance as pulp density rises. Mill pressure has a significant influence on the fitted models, which can be explained by its function as an indicator of the internal load status and mill filling dynamics. On the other hand, while mill rotation remains significant in production, it is considerably more sensitive to parameters such as feed characteristics, including solids concentration and P₈₀. The afore-mentioned aligns with fundamentals of SAG dynamics, where production rate is more controllable through feeding adjustments, while the SAG Power reflects the internal energy dissipation dynamics.
Physical Consistency and Model Behavior: Observed elasticities are physically coherent. Rotational speed maintains a positive sign in nearly all models, while solids’ concentration generally increases both power and production. SAG Pressure also exhibits positive elasticity in the Power models, consistent with the higher internal loading conditions leading to higher energy demand. Granulometric variables (>100 mm and <30 mm fractions) display moderate sensitivities but lower elasticities, indicating secondary yet non-negligible influence. Their impact appears model-dependent, which is expected due to interaction effects with P₈₀ and hardness. Notably, the best-performing predictive models (XGB and ANN in SAG Power; XGB and BNN in SAG Production) exhibit smoother and more structurally coherent tornado patterns, while the sensitivity rankings are stable, signs are consistent, and there are no erratic oscillations. In contrast, models with weaker predictive performance (BART in SAG Power or baseline MR and BLR) show either attenuated or exaggerated elasticities, suggesting reduced structural fidelity.
Relationship with Predictive Performance and Robustness: A strong relationship emerges between structural sensitivity stability and predictive robustness, while the models with the lowest Test RMSE (XGB for SAG Power with RMSE ≈ 431 and R² ≈ 0.95) present well-ordered and physically interpretable tornado diagrams. Their OOF Test gap remains moderate, and their sensitivity structure remains consistent between Train and Test, while models with larger OOF Test discrepancies tend to display more variable or less hierarchically organized elasticities. This suggests that sensitivity coherence may serve as an indirect indicator of generalization capability and that the models extrapolate more reliably outside the calibration sample.
Integration with Uncertainty and Calibration Metrics: Sensitivity Analysis indicates that there is a directly proportional relationship between the variables with the highest sensitivity and the regions with the greatest predictive dispersion. For SAG Power, GPR produces relatively sharp predictive intervals (MPIW ≈ 1352) but slightly undercovers (90% coverage ≈ 0.888), indicating narrower yet more informative intervals. In contrast, BART and GBN achieve higher coverage (≈0.94–0.95) at the expense of substantially wider intervals, implying conservative uncertainty estimation, indicating that structural ranking of dominant variables remains consistent across probabilistic and deterministic approaches. It is important to remark that the inclusion of uncertainty does not alter the hierarchy of influence but quantifies the propagation of variability through the most sensitive drivers. This confirms that the uncertainty structure is aligned with process physics rather than model instability.

Comparative analysis indicates that SAG Power is primarily governed by dynamic operational variables, such as SAG rotational speed, pressure, solids concentration, while SAG Production shows stronger dependence on feeding granulometry and solids percentage in the feeding. Power is energy-driven, whereas production is mass flow-driven. Across all models, the tornado diagrams suggest that the system behaves as a load-sensitive nonlinear dynamic system, where marginal increases in rotational speed and solids in feeding amplify energy consumption more strongly than incremental changes in the feeding granulometry.

4.4. Suitability Indicators by Hydraulic Regime

The analysis by hydraulic regime is restricted to two representative models: XGB as the best-performing non-Bayesian approach, and BNN as the strongest Bayesian model identified in the previous evaluation stage. Table 11 and Table 12 examine how predictive behavior and uncertainty characteristics vary across operational hydraulic regimes (D: Diluted, O: Optimal, and T: Thick), enabling a controlled comparison between Bayesian and non-Bayesian paradigms under different hydraulic working conditions.

4.4.1. SAG Power—Suitability by Hydraulic Regime

XGB is the stronger model across regimes. In Train, XGB achieves very low errors (RMSE ≈ 217–233) and extremely high fit (R² ≈ 0.981–0.990), while the BNN shows substantially larger errors (RMSE ≈ 490–561) and lower R² (≈0.907–0.941). The regime ranking for XGB in Train suggests Thick (T) is the easiest regime to predict (lowest RMSE 217.1, lowest NLPD 6.97, tightest

σ_{p r e d}

353), followed by Optimal (O), then Diluted (D). In Test, XGB remains robust: Thick again performs best (RMSE 360.1, R² 0.959, and 90% coverage 0.928), Diluted is close (RMSE 425.9 and R² 0.965), and Optimal is clearly the most challenging (RMSE 495.2, R² 0.912, and highest NLPD 7.72). That pattern hints that “Optimal” is the regime with either higher functional nonlinearity, overlap with neighboring regimes, or weaker signal-to-noise for SAG Power.

From the uncertainty/suitability side, XGBoost exhibits regime-dependent sharpness: MPIW90 varies materially (≈1161–1452) and

σ_{p r e d}

varies (≈353–441) across regimes, which is what you would expect if the model’s residual dispersion changes with the operating state. The BNN, in contrast, has nearly constant MPIW90 (~1715) and

σ_{p r e d}

(~522) across D/O/T, in both Train and Test. That indicates that the BNN uncertainty is behaving like a global noise scale (or a regime-insensitive predictive variance), so its prediction intervals are not adapting to the regime difficulty; consequently, it pays a penalty in NLPD (≈7.62–7.95) and yields weaker point accuracy. In summary, Thick is the most predictable regime for both models, Optimal is the most difficult (especially for XGB in Test), and XGB dominates BNN in both accuracy and calibrated suitability.

4.4.2. SAG Production—Suitability by Hydraulic Regime

For SAG Production, XGBoost continues to exhibit the strongest overall performance, and the regime structure remains clearly expressed. In Train, error decreases monotonically from Diluted → Optimal → Thick (RMSE 51.68 → 41.21 → 27.23) while the fit stays very high across regimes (R² 0.9930–0.9968). The suitability/uncertainty indicators are fully consistent with this ranking: NLPD improves from 5.76 → 5.36 → 4.95, and predictive dispersion contracts sharply as the regime thickens

σ_{p r e d}

(113.60 → 71.70 → 47.84), with corresponding reductions in MPIW90 (373.72 → 235.87 → 157.39) and average entropy (6.15 → 5.69 → 5.29). This confirms that the Thick regime is the easiest operating state to model, showing the smallest residual uncertainty and the sharpest predictive distributions, whereas Diluted is the most challenging within Train.

In Test, the same qualitative regime ordering is preserved, but with the expected generalization gap—especially for Diluted, where performance degrades the most (RMSE 122.11 andR² 0.9617) compared to Optimal (RMSE 69.07 and R² 0.9815) and Thick (RMSE 57.68 and R² 0.9852). Suitability metrics remain regime-dependent and stable in interpretation: Diluted shows the highest NLPD (6.23) and the lowest sharpness (largest MPIW90 and

σ_{p r e d}

), while Thick retains the lowest NLPD (5.51) and the tightest uncertainty. 90% coverage remains high and relatively consistent across regimes (~0.919–0.932), suggesting the prediction intervals are broadly well-calibrated even as sharpness varies with operating state.

Summarizing, the Thick regime is the most predictable and least uncertain (lowest RMSE/NLPD and tightest

σ_{p r e d}

/MPIW90), Diluted is the hardest and exhibits the largest generalization loss, and XGBoost’s uncertainty proxies behave in a regime-adaptive manner—exactly the behavior you want for suitability assessment by operational state.

5. Discussions

5.1. Machine Learning Performance in Industrial Context

ML approaches achieve substantially superior predictive performance for both SAG Power and SAG Production modeling (see Table 8). For SAG production, XGB model emerged as the best-performing model, representing a marked improvement over other algorithms such as MR or RF [18]. On the testing data, XGB maintained robust generalization, demonstrating minimal overfitting despite the model’s complexity. RF achieved competitive performance, while GBM and BART showed intermediate accuracy, aligning with recent findings demonstrating XGBoost’s effectiveness for SAG mill throughput prediction [23]. The superior performance of ensemble tree-based methods (XGB) in SAG Power is consistent with previous works on SAG mill energy consumption prediction using ML, where ANN models achieved a similar fit [18]. The XGB model fitted in the present study represents a significant advancement, reducing the RMSE by approximately 34%.

Although probabilistic models demonstrated moderate predictive accuracy, these algorithms can provide explicit uncertainty quantification capabilities and estimation of responses in front of partial knowledge in feeding variables. For both responses’ variables, GPR and the BNN achieved the best performances, and, although these models sacrifice some point prediction accuracy compared to XGB, they provide calibrated uncertainty estimates essential for risk-informed decision making in mineral processing operations [20], while the trade-off between predictive accuracy and uncertainty quantification represents a fundamental consideration in model selection for industrial applications. Cross-validation analysis revealed that model robustness varies across the algorithms fitted, with CV_RMSE indicating that ensemble methods (XGB) exhibiting greater stability across different data partitions. These findings emphasize the importance of evaluating models’ stability alongside the goodness-of-fit indicators, particularly for industrial applications where operational conditions may deviate from training and/or testing distributions [80].

Regarding to uncertainty quantification (see Table 9), XGB (in Power prediction) achieved an OOF RMSE of 391.89 with 95% CI [378.65, 404.38], compared to a test RMSE of 431.11. This relatively narrow CI and modest OOF-to-test degradation (9.1%) indicate robust generalization. The superior OOF-to-test consistency of probabilistic models suggests that explicit uncertainty modeling may enhance robustness to distributional shift, a critical consideration for industrial applications where mineral feeding characteristics and operational conditions change continuously [21]. For SAG production, the robustness patterns differed notably, with XGB having an OOF RMSE of 75.05 [72.18, 77.92] and a test RMSE of 87.98 (17.2% degradation), being the model that presented greater sensitivity to test set conditions than that observed for Power prediction. The BNN exhibited an OOF RMSE of 77.13 [74.49, 79.77] with a test RMSE of 88.50 (14.7% degradation), while the larger OOF-to-test gaps for production prediction suggest that throughput is more sensitive to operational regime variations or ore characteristic changes not fully captured in cross-validation. This observation aligns with fundamental grinding theory, where the production rate depends critically on ore breakage characteristics that may exhibit greater temporal variability than the power-load relations [81].

The suitability analysis by hydraulic regime (see Table 10) reveals distinct calibration characteristics across operational regimes. For SAG Power prediction in the test set, XGB achieved 90% coverage of 0.8951, 0.9058 and 0.9278 in D, O and T regimes, respectively, with a corresponding MPIW90 of 1451.56, 1231.49, and 1161.48. This decrease in the interval width from the Diluted to Thick regimes indicates that prediction uncertainty varies with operational conditions, with the Thick regime exhibiting the most predictable behavior. On the other hand, the BNN showed more consistent 90% coverage across regimes (≈0.9) but with substantially wider intervals (MPIW90 ≈ 1713–1720), suggesting that BNN uncertainty estimates are less adaptive to regime-specific characteristics. Additionally, NLPD values for XGB were consistently lower than those of the BNN, confirming superior probabilistic calibration despite XGB not being explicitly designed for uncertainty quantification [82].

For SAG production prediction, the regime-dependent uncertainty patterns were even more pronounced. XGB achieved 90% coverage test of 0.9188 (D), 0.9316 (O), and 0.9304 (T), with MPIW90 decreasing from 373.72 to 235.87 to 157.39 across regimes. This reduction in interval width from the Diluted to Thick regimes indicates that production prediction uncertainty is strongly regime-dependent, with the Thick regime offering greater predictability. The BNN showed a 90% coverage of 0.8727, 0.9715, 0.9563 for D/O/T regimes, respectively, and nearly constant MPIW90 (≈300), confirming that its uncertainty estimates do not adapt to regime-specific heteroscedasticity. NLPD values for XGB (6.23, 5.66, 5.51) decreased systematically from Diluted to Thick, while the BNN maintained relatively constant values, further supporting the conclusion that XGB captures regime-dependent uncertainty structure more effectively than BNN in this application [83,84].

5.2. Regime-Based Suitability Interpretation

The hydraulic regime-stratified analysis shown in Table 11 and Table 12 reveals fundamental differences in the models’ performance across operational conditions defined by hydraulic regime classification. For SAG Power prediction, XGB demonstrated regime-dependent accuracy with a test RMSE of 425.93 (D), 495.20 (O), and 360.06 (T), and an R² of 0.9650, 0.9123, and 0.9594, respectively. The optimal regime exhibited the poorest predictive performance, which suggests that this regime could be characterized by transitional dynamics, charge motion instabilities, or greater sensitivity to mineral characteristic in the feeding. The Thick regime demonstrated the best predictive performance across both responses, which indicates that dense pulp conditions provide more stable and predictable milling dynamics [9]. In the case of Production, the regime hierarchy was more systematic, showing a monotonic improvement from Diluted to Thick regimes, which suggests that production rate prediction benefits from the reduced slurry transport resistance and enhanced discharge efficiency characteristic of thicker pulp conditions, which is consistent with fundamental comminution theory, where thick pulp operation enhances particle–particle interactions and reduces the influence of water rheology on discharge dynamics [22].

Finally the, BNN exhibited contrasting regime-dependent behavior. In Power prediction, the BNN’s test RMSE varied from 583.94 (D) to 530.64 (O) to 646.43 (T), with R² values of 0.9321, 0.9167, and 0.8554, respectively. The degradation in the Thick regime performance is counterintuitive given that this regime showed the best performance for XGB, suggesting that the BNN may struggle to capture the specific nonlinear relationships governing Power under dense pulp conditions. However, in the case of SAG Production, BNN showed more consistent performance across regimes (test RMSE of 124.83, 52.43, and 65.30 for D/O/T), but with R² values (0.9591, 0.9872, and 0.9825) that did not exhibit the systematic improvement observed for XGB. Therefore, while the BNN provides valuable uncertainty quantification, its predictive accuracy may be less robust to regime-specific dynamics than Gradient Boosting approaches [85].

5.3. Sensitivity Analysis and Physical Consistency

Sensitivity and elasticity analysis indicates that models’ prediction exhibits physically coherent responses to operational variables, validating fitted models against fundamental grinding mechanics. For SAG Power, all ML models demonstrated positive elasticity with respect to mill load, rotational speed, and feed rate, consistent with established Power draw relationships. XGB showed the strongest sensitivity to SAG pressure, followed by rotational speed and feeding rate [16]. In the case of production, the sensitivity structure differed notably. Feeding rates emerged as the dominant influence, followed by mill speed and hardness. The negative elasticity regarding hardness indices confirm that models correctly capture the inverse relationship between ore competence and throughput, a fundamental principle of comminution theory. The positive elasticity of water addition indicates that this behavior is consistent with the role of pulp density facilitating mechanism like discharge and transport through the mill grates. These sensitivity patterns validate that ML models recover known process relationships without explicit mechanistic formulation, supporting their use for process optimization and control applications [20]. Tornado diagrams (see Figure 5, Figure 6, Figure 7, Figure 8, Figure 9, Figure 10, Figure 11, Figure 12, Figure 13 and Figure 14) demonstrate that sensitivity rankings are consistent across model types, with dynamic load variables (bearing pressure and mill load) and pulp density variables (water addition and solids percentage) dominating both responses predictions [16].

5.4. Limitations and Future Research Directions

Some limitations of the present study are enumerated as follows:

Analysis based on data from a single SAG mill limits the generalizability of performance metrics and Optimal model architectures to other sites with different ore types, mill geometries, or operational practices. This work is based on a single industrial SAG mill and should therefore be interpreted as a case study. The comparative performance observed among the evaluated models reflects the data structure, instrumentation, ore variability, and operating conditions of the analyzed plant. Accordingly, the results are not intended to establish a universal hierarchy, but rather to provide operation-specific evidence and a comparative framework that can be further tested in other SAG milling contexts. The development of transfer learning approaches that adapt models trained on one site to new locations represents a promising research direction [86].
ML algorithms and hyperparameters evaluated represent only a subset of available methods and emerging techniques such as physics-informed ANNs, and hybrid models that combine phenomenological and data-driven components warrant investigation. Deep Learning architectures designed for time series prediction, may offer advantages for capturing long-term dependencies and improving predictions during transient conditions [36].
Regime classification employed was based on hydraulic characteristics derived from pulp density and discharge flow measurements. Alternative regime definitions based on charge motion patterns may provide different insights into model performance and operational suitability [9].
This study focused on steady-state or quasi-steady-state prediction without explicitly addressing transient dynamics, startup/shutdown procedures, or fault detection and diagnosis, hence extend the modeling framework to handle dynamic transitions, predict mill overload events, and detect abnormal conditions represents an interesting direction [87].

Both the literature and the present work’s results prove that no single modeling approach satisfies all the requirements for digital twin functionality: predictive accuracy, computational efficiency, uncertainty quantification, physical interpretability, and robustness to variations in feeding parameters. Instead, a hybrid architecture integrating multiple modeling paradigms (some of them fitted in this work) offers the most promising pathway toward operational digital twins [88]. For real-time prediction and monitoring applications, XGB emerges as the optimal choice in this work, providing superior accuracy with minimal computational overhead, while the model’s robustness to missing data and measurement noise, combined with its ability to handle high-dimensional feature spaces without extensive preprocessing, makes it well-suited for integration with existing SCADA infrastructure. Additionally, CV_RMSE indicates that XGB maintains performance across diverse operational conditions, a critical requirement for continuous industrial deployment. On the other hand, while Bayesian algorithms sacrifice some predictive accuracy compared to XGB, their graphical structure provides transparency regarding assumed process relationships and enables probabilistic inference across multiple variables simultaneously [89]. The integration of Bayesian approaches with data-driven models provides a framework that allows for the incorporation of expert knowledge, validating learned relationships against fundamentals of process theory, and facilitating the identification of conditions where model predictions could be unreliable [85].

The regime-stratified performance analysis suggests that regime-adaptive strategies may offer improved performance compared to single global controllers, while recent works in milling circuits have demonstrated the value of explicit uncertainty consideration in maintaining setpoint tracking under hardness disturbances, a capability that could be enhanced through integration with the probabilistic modeling approaches evaluated in this study [88].

6. Conclusions

SAG modeling has been evolving toward hybrid approaches that combine the interpretability of physical models with the predictive performance of ML algorithms and the uncertainty management capabilities of stochastic methods. The results of this case study suggest that ML-based approaches can provide relevant operational benefits, while hybrid and probabilistic approaches may offer additional value for uncertainty-aware decision making and longer-term digitalization strategies. Therefore, to achieve success a robust data infrastructure, the integration of experts in the domain, and systematic validation frameworks are required, and the most successful implementations will be those that leverage the strengths of each modeling paradigm while addressing their individual limitations through integration. This study presented a comparison of some ML approaches for modeling SAG Power and Production, showing that, within the conditions of this case study, XGB achieved the best predictive performance for both response variables, outperforming traditional MR and other ML algorithms including RF, GBMs, and ANNs in the analyzed dataset. These findings should be interpreted as results derived from a specific case study based on a single industrial SAG mill, rather than as a universal hierarchy of model performance for the mining industry.

The robustness analysis indicated that XGB maintained favorable generalization performance under the analyzed training and testing partitions, demonstrating modest degradation to test performance while that cross-validation indicated that XGB exhibits consistent performance across diverse operational conditions. The inclusion of probabilistic frameworks helps address an important gap in SAG modeling by incorporating explicit uncertainty quantification alongside point predictions, and although these approaches sacrifice some predictive accuracy, they provide calibrated uncertainty estimates essential for risk-informed decision-making. On the other hand, uncertainty analysis revealed that prediction intervals vary with operational conditions, with the interval widths decreasing from the Diluted to Thick regimes, which has the potential for enabling adaptive control strategies that adjust intervention aggressiveness based on prediction confidence.

The analysis according to the operating regimes reveals some differences depending on the operating conditions. In SAG Power, XGB obtained RMSE values in testing of 426 (D), 495 (O), and 360 (T), showing that the Optimal regime had the worst performance, despite operating under intermediate hydraulic conditions. Conversely, SAG production showed a monotonically increasing performance, from Dilute (RMSE = 122) to Optimal (68) and Thick (58), indicating that production with denser pulp exhibits a more predictable grinding dynamic. This demonstrates that operating regimes have a considerable impact on the abstraction capacity of each algorithm, which in turn supports the value of adaptive modeling strategies. Subsequently, the physical consistency of the fitted models and their predictions were validated through SA, with elasticity parameters aligned with the fundamentals of milling. For SAG Power prediction, pressure, rotational speed, and feed rate emerged as dominant influences, while for SAG production, feed rate, mill speed, and hardness showed the strongest effects, confirming that the models accurately capture the inverse relationship between system variables/parameters and performance.

The accurate prediction of SAG Power enables real-time energy management, load balancing between different circuit setting, and the optimization of grinding circuits to minimize energy consumption, which could have important implications for energy efficiency in the mining industry. The production rate forecasting could support integration, production planning, blending strategies, and coordination with downstream flotation circuits. Therefore, for practitioners selecting modeling approaches, this study provides case-based evidence that may support model selection under conditions like those analyzed here.

Finally, the development of functional digital twins requires the integration of hybrid architectures that consider multiple algorithms, balancing precision, interpretability, quantification, and the incorporation of uncertainty through partial knowledge of the input variables. Along these lines, future research should focus on validating the models in other mining operations, or, failing that, validating the modeling strategies, considering variables such as diverse feed characteristics, plant configurations, and the integration of advanced detection technologies to improve feed information. These advances would allow for a greater transition of operations from a reactive to a predictive stage, and, ultimately, the autonomy of mineral processing plants, a matter that could be framed within the smart mining paradigm. The incorporation of technologies such as digital twins in mineral concentration plants may contribute to improved productivity, more informed operational optimization, enhanced sustainability, and progressive advances toward higher levels of autonomy.

Author Contributions

Conceptualization, M.S., N.T. and L.A.C.; methodology, M.S. and L.A.C.; software, M.S.; validation, E.G., M.S.-C., E.S.-R., J.C., D.A. and L.A.C.; formal analysis, M.S., R.G.S.-M. and L.A.C.; investigation, M.S., E.G., M.S.-C., E.S.-R., R.G.S.-M., J.C., D.A. and L.A.C.; resources, E.G., M.S.-C., E.S.-R. and J.C.; data curation, M.S. and R.G.S.-M.; writing—original draft preparation, M.S.; writing—review and editing, M.S.; visualization, M.S.; supervision, N.T., D.A. and L.A.C.; project administration, N.T. and L.A.C.; and funding acquisition, M.S. and E.G. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by ANID-Chile, ANID/Fondecyt 1240182.

Data Availability Statement

Data available on request from the authors.

Acknowledgments

Manuel Saldana acknowledges the infrastructure and support of the Doctorado en Ingeniería de Procesos de Minerales of the Universidad de Antofagasta.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

Table A1. Hyperparameters’ tuning.

Model	Final Hyperparameters/Structural Settings
MR	OLS with a second-order response surface formulation, including main effects, quadratic terms, and first-order interactions.
RFR	n_estimators = 1000; max_depth = None; min_samples_split = 4; min_samples_leaf = 3; max_features = 0.6; bootstrap = True; max_samples = 0.8.
XGB	n_estimators = 4000 (the maximum number of trees was high only as a ceiling, while the effective adjustment was controlled by early stopping); learning_rate = 0.03; max_depth = 4; min_child_weight = 10; subsample = 0.75; colsample_bytree = 0.65; reg_lambda = 6.0; reg_alpha = 0.2; gamma = 0.2; objective = “reg:squarederror”; tree_method = “hist”; early stopping with es_rounds = 100 and es_val_size = 0.15.
GBM	n_estimators = 1200; learning_rate = 0.03; max_depth = 4; min_samples_split = 80; min_samples_leaf = 40; subsample = 0.7; max_features = 0.6; loss = “squared_error”.
ANN–MLP	SAG Power: hidden_layer_sizes = (128, 32); activation = “relu”; solver = “lbfgs”; alpha = 2 × 10⁻¹; early_stopping = False; tol = 1 × 10⁻⁶; max_iter = 2000. SAG Production: hidden_layer_sizes = (128, 64); activation = “relu”; solver = “adam”; learning_rate_init = 1 × 10⁻³; alpha = 5 × 10⁻³; batch_size = 128; early_stopping = True; validation_fraction = 0.15; n_iter_no_change = 30; tol = 1 × 10⁻⁴; max_iter = 3500. The final architectures were selected empirically for each response. The goal was to optimize stability and generalization for each target, not to impose a single architecture.
BLR	BayesianRidge with fit_intercept = False; second-order formulation with main effects, quadratic terms, and first-order interactions; max_iter = 4000; tol = 1 × 10⁻⁵; alpha_1 = alpha_2 = 1 × 10⁻⁶; lambda_1 = lambda_2 = 1 × 10⁻⁶; compute_score = True. Posterior pruning: PRUNE_ALPHA = 0.10; PRUNE_DRAWS = 2000; KEEP_HIERARCHY = True; MIN_ABS_MEAN = 1 × 10⁻³; PRUNE_SPLIT = 0.10.
BART	m_trees = 400; draws = 2000; tune = 2000; target_accept = 0.95; chains = 4; cores = 4; standardize_X = True; standardize_y = True; add_noise_ppc = True. Lighter CV setup: cv_draws = 200; cv_tune = 100.
GBN	Predefined network structure with BART-based conditional mean functions; m_trees = 400; draws = 1000; tune = 1000; target_accept = 0.95; chains = 4; standardize_X = True; k_params_mode = m_trees + 1. Lighter CV setup: cv_m_trees = 100; cv_draws = 150; cv_tune = 150; cv_target_accept = 0.90.
GPR	Sparse variational GP with FITC approximation; kernel = “matern52”; ard = True; sigma_prior = 2.0; jitter = 1 × 10⁻⁶; n_inducing = 400; advi_iters = 10000; advi_lr = 1 × 10⁻²; draws_post = 200. Lighter CV setup: cv_n_inducing = 60; cv_advi_iters = 800; cv_advi_lr = 5 × 10⁻³; cv_draws_post = 50.
BNN	Single hidden layer with hidden_layers = (8,); activation = “tanh”; advi_iters = 12000; advi_lr = 5 × 10⁻³; draws_post = 800; add_noise_test = True. Priors: sigma_w = sigma_b ~ HalfNormal(0.5); sigma~HalfNormal(1.0). Sensitivity settings: sensitivity_ci_alpha = 0.10; sensitivity_draws = 500.

References

Baawuah, E.; Kelsey, C.; Addai-Mensah, J.; Skinner, W. Comparison of the Performance of Different Comminution Technologies in Terms of Energy Efficiency and Mineral Liberation. Miner. Eng. 2020, 156, 106454. [Google Scholar] [CrossRef]
Salazar, J.L.; Magne, L.; Acuña, G.; Cubillos, F. Dynamic Modelling and Simulation of Semi-Autogenous Mills. Miner. Eng. 2009, 22, 70–77. [Google Scholar] [CrossRef]
Rybinski, E.; Ghersi, J.; Davila, F.; Linares, J.; Valery, W.; Jankovic, A.; Valle, R.; Dikmen, S. Optimisation and Continuous Improvement of Antamina Comminution Circuit. In Proceedings of the SAG Conference, Vancouver, BC, Canada, 25 November 2011; pp. 1–19. [Google Scholar]
Beloglazov, I.I.; Petrov, P.A.; Bazhin, V.Y. The Concept of Digital Twins for Tech Operator Training Simulator Design for Mining and Processing Industry. Eurasian Min. 2020, 2020, 50–54. [Google Scholar] [CrossRef]
Hasidi, O.; Abdelwahed, E.H.; Qazdar, A.; Boulaamail, A.; Krafi, M.; Benzakour, I.; Bourzeix, F.; Baïna, S.; Baïna, K.; Cherkaoui, M.; et al. Digital Twins-Based Smart Monitoring and Optimisation of Mineral Processing Industry. Commun. Comput. Inf. Sci. CCIS 2022, 1677, 411–424. [Google Scholar] [CrossRef]
Avalos, S.; Kracht, W.; Ortiz, J.M. Machine Learning and Deep Learning Methods in Mining Operations: A Data-Driven SAG Mill Energy Consumption Prediction Application. Min. Metall. Explor. 2020, 37, 1197–1212. [Google Scholar] [CrossRef]
Gutiérrez, A.; Ahues, D.; González, F.; Merino, P. Simulation of Material Transport in a SAG Mill with Different Geometric Lifter and Pulp Lifter Attributes Using DEM. Min. Metall. Explor. 2019, 36, 431–440. [Google Scholar] [CrossRef]
Weerasekara, N.S.; Powell, M.S. Performance Characterisation of AG/SAG Mill Pulp Lifters Using CFD Techniques. Miner. Eng. 2014, 63, 118–124. [Google Scholar] [CrossRef]
Lopez, P.; Reyes, I.; Risso, N.; Momayez, M.; Zhang, J. Machine Learning Algorithms for Semi-Autogenous Grinding Mill Operational Regions’ Identification. Minerals 2023, 13, 1360. [Google Scholar] [CrossRef]
Feng, Y.; Wang, X.; Zou, H.; Yan, L. A Composite Power Prediction Model for Semi-au-togenous Grinding Mill Based on Mechanistic Approach and XGBoost. In Proceedings of the 37th Chinese Control and Decision Conference, CCDC, Xiamen, China, 16–19 May 2025; pp. 557–564. [Google Scholar] [CrossRef]
Ghasemi, Z.; Neshat, M.; Aldrich, C.; Karageorgos, J.; Zanin, M.; Neumann, F.; Chen, L. An Integrated Intelligent Framework for Maximising SAG Mill Throughput: Incorporating Expert Knowledge, Machine Learning and Evolutionary Algorithms for Parameter Optimisation. Miner. Eng. 2024, 212, 108733. [Google Scholar] [CrossRef]
Lopez, P.; Reyes, I.; Risso, N.; Aguilera, C.; Campos, P.G.; Momayez, M.; Contreras, D. Assessing Machine Learning and Deep Learning-Based Approaches for SAG Mill Energy Consumption. In 2021 IEEE CHILEAN Conference on Electrical, Electronics Engineering, Information and Communication Technologies, CHILECON 2021; IEEE: Piscataway, NJ, USA, 2021. [Google Scholar] [CrossRef]
Liao, Z.; Xu, C.; Chen, W.; Chen, Q.; Wang, F.; She, J. Effective Throughput Optimization of SAG Milling Process Based on BPNN and Genetic Algorithm. In Proceedings—2023 IEEE 6th International Conference on Industrial Cyber-Physical Systems, ICPS 2023; IEEE: Piscataway, NJ, USA, 2023. [Google Scholar] [CrossRef]
Valencia, J.V.; Vargas, F. A Probabilistic Graphical Model for Semi-Autogenous Grinding Processes. In Proceedings—IEEE CHILEAN Conference on Electrical, Electronics Engineering, Information and Communication Technologies, ChileCon; IEEE: Piscataway, NJ, USA, 2023. [Google Scholar] [CrossRef]
Jana, D.; Kumar, D.; Gupta, S.; Pal, S.; Ghosh, S. Bayesian Network Approach for Studying the Operational Reliability and Remaining Useful Life. J. Reliab. Stat. Stud. 2023, 16, 373–392. [Google Scholar] [CrossRef]
Lucay, F.A. Accelerating Global Sensitivity Analysis via Supervised Machine Learning Tools: Case Studies for Mineral Processing Models. Minerals 2022, 12, 750. [Google Scholar] [CrossRef]
Salazar, J.L.; Valdés-González, H.; Cubillos, F. Advanced Simulation for Semi-Autogenous Mill Systems: A Simplified Models Approach. In Dynamic Modelling; InTech: London, UK, 2010. [Google Scholar] [CrossRef]
Saldaña, M.; Gálvez, E.; Navarra, A.; Toro, N.; Cisternas, L.A. Optimization of the SAG Grinding Process Using Statistical Analysis and Machine Learning: A Case Study of the Chilean Copper Mining Industry. Materials 2023, 16, 3220. [Google Scholar] [CrossRef] [PubMed]
Ghasemi, Z.; Neumann, F.; Zanin, M.; Karageorgos, J.; Chen, L. A Comparative Study of Prediction Methods for Semi-Autogenous Grinding Mill Throughput. Miner. Eng. 2024, 205, 108458. [Google Scholar] [CrossRef]
Mamani-quiñonez, O.; Cisternas, L.A.; Lopez-arenas, T.; Lucay, F.A. Control Structure Design Using Global Sensitivity Analysis for Mineral Processes under Uncertainties. Minerals 2022, 12, 736. [Google Scholar] [CrossRef]
Välikangas, H.; Ohenoja, M.; Brochot, S.; Fernández, M.G.; Ruuska, J.; Ruusunen, M. Evaluation of Model Uncertainty Propagation in Mineral Process Flowsheet Designs. Scand. Simul. Soc. 2025, 211, 456–463. [Google Scholar] [CrossRef]
Yuwen, C.; Sun, B.; Liu, S. A Dynamic Model for a Class of Semi-Autogenous Mill Systems. IEEE Access 2020, 8, 98460–98470. [Google Scholar] [CrossRef]
Ghasemi, Z.; Neshat, M.; Aldrich, C.; Karageorgos, J.; Zanin, M.; Neumann, F.; Chen, L. Enhanced Genetic Programming Models with Multiple Equations for Accurate Semi-Autogenous Grinding Mill Throughput Prediction. In 2024 IEEE Congress on Evolutionary Computation, CEC 2024—Proceedings; IEEE: Piscataway, NJ, USA, 2024. [Google Scholar] [CrossRef]
Jayasundara, C.T.; Zhu, H.P. Predicting Liner Wear of Ball Mills Using Discrete Element Method and Artificial Neural Network. Chem. Eng. Res. Des. 2022, 182, 438–447. [Google Scholar] [CrossRef]
Gupta, A.; Mishra, B.K. Multi-Head Neural Networks for Simulating Particle Breakage Dynamics. Theor. Appl. Mech. Lett. 2024, 14, 100515. [Google Scholar] [CrossRef]
Lu, S.; Zhou, P.; Chai, T.; Dai, W. Modeling and Simulation of Whole Ball Mill Grinding Plant for Integrated Control. IEEE Trans. Autom. Sci. Eng. 2014, 11, 1004–1019. [Google Scholar] [CrossRef]
Dai, W.; Liu, Q.; Chai, T. Particle Size Estimate of Grinding Processes Using Random Vector Functional Link Networks with Improved Robustness. Neurocomputing 2015, 169, 361–372. [Google Scholar] [CrossRef]
Li, Y. Modelling Tumbling Ball Milling Based on DEM Simulation and Machine Learning. Ph.D. Thesis, The University of New South Wales, Sydney, Australia, 2023. [Google Scholar]
Jayasundara, C.T.; Zhu, H.P. Impact Energy of Particles in Ball Mills Based on DEM Simulations and Data-Driven Approach. Powder Technol. 2022, 395, 226–234. [Google Scholar] [CrossRef]
Doroszuk, B. Data-Driven Insight into Ball Mill Scaling Unveiling Differences Across Scales Through Computer Vision, Numerical Simulations, and Design of Experiments. Ph.D. Thesis, Wroclaw University of Science and Technology, Wroclaw, Poland, 2024. [Google Scholar]
Rhein, F.; Hibbe, L.; Nirschl, H. Hybrid Modeling of Hetero-Agglomeration Processes: A Framework for Model Selection and Arrangement. Eng. Comput. 2023, 40, 583–604. [Google Scholar] [CrossRef]
Lu, M.; Xia, Y.; Bhattacharjee, T.; Klinger, J.; Li, Z. Predicting Biomass Comminution: Physical Experiment, Population Balance Model, and Deep Learning. Powder Technol. 2024, 441, 119830. [Google Scholar] [CrossRef]
Yang, J.; Zou, G.; Zhou, J.; Wang, Q.; Song, T.; Li, K. Hybrid Modeling and Simulation of the Grinding and Classification Process Driven by Multi-Source Compensation. Minerals 2024, 14, 1019. [Google Scholar] [CrossRef]
Metta, N.; Ramachandran, R.; Ierapetritou, M. A Computationally Efficient Surrogate-Based Reduction of a Multiscale Comill Process Model. J. Pharm. Innov. 2020, 15, 424–444. [Google Scholar] [CrossRef]
Ruiz, M.A.V.; Gonzales, J.A.V.; Villalba, F.J.B. Multivariable Predictive Models for the Estimation of Power Consumption (KW) of a Semi-Autogenous Mill Applying Machine Learning Algorithms. J. Energy Environ. Sci. 2024, 8, 14–31. [Google Scholar] [CrossRef]
Zhang, D.; Xiong, X.; Shao, C.; Zeng, Y.; Ma, J. Semi-Autogenous Mill Power Consumption Prediction Based on CACN-LSTM. Appl. Sci. 2024, 15, 2. [Google Scholar] [CrossRef]
Pural, Y.E.; Ledezma, T.; Hilden, M.; Forbes, G.; Boylu, F.; Yahyaei, M. Application of Machine Learning for Generic Mill Liner Wear Prediction in Semi-Autogenous Grinding (SAG) Mills. Minerals 2024, 14, 1200. [Google Scholar] [CrossRef]
Harenberg, D.; Marelli, S.; Sudret, B.; Winschel, V. Uncertainty Quantification and Global Sensitivity Analysis for Economic Models. Quant. Econom. 2019, 10, 1–41. [Google Scholar] [CrossRef]
Gao, H.; Huo, X.; Zhu, C. An Input Module of Deep Learning for the Analysis of Time Series with Unequal Length. In Proceedings—2023 IEEE 6th International Conference on Industrial Cyber-Physical Systems, ICPS 2023; IEEE: Piscataway, NJ, USA, 2023. [Google Scholar] [CrossRef]
Guillermo, A. Mineral Movement Simulation through the Grates and Pulp Lifter in a SAG Mill and Evaluation for a New Grate Design Using DEM. Physicochem. Probl. Miner. Process. 2019, 55, 617–630. [Google Scholar] [CrossRef]
Kuk, M.; Bobek, S.; Veloso, B.; Rajaoarisoa, L.; Nalepa, G.J. Feature Importances as a Tool for Root Cause Analysis in Time-Series Events. In Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), LNCS; Springer: Cham, Switzerland, 2023; Volume 14077, pp. 408–416. [Google Scholar] [CrossRef]
James, G.; Witten, D.; Hastie, T.; Tibshirani, R. An Introduction to Statistical Learning; Springer Texts in Statistics; Springer US: New York, NY, USA, 2021. [Google Scholar]
Kuhn, M.; Johnson, K. Feature Engineering and Selection: A Practical Approach for Predictive Models; Chapman and Hall/CRC: New York, NY, USA, 2019. [Google Scholar] [CrossRef]
Biau, G.; Scornet, E. A Random Forest Guided Tour. TEST 2016, 25, 197–227. [Google Scholar] [CrossRef]
Scornet, E.; Biau, G.; Vert, J.-P. Consistency of Random Forests. Ann. Stat. 2015, 43, 1716–1741. [Google Scholar] [CrossRef]
Belgiu, M.; Drăguţ, L. Random Forest in Remote Sensing: A Review of Applications and Future Directions. ISPRS J. Photogramm. Remote Sens. 2016, 114, 24–31. [Google Scholar] [CrossRef]
Probst, P.; Wright, M.N.; Boulesteix, A.L. Hyperparameters and Tuning Strategies for Random Forest. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2019, 9, e1301. [Google Scholar] [CrossRef]
Chen, T.; Guestrin, C. XGBoost. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; ACM: New York, NY, USA, 2016; pp. 785–794. [Google Scholar]
Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; Liu, T.-Y. LightGBM: A Highly Efficient Gradient Boosting Decision Tree. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; von Luxburg, U., Guyon, I., Bengio, S., Wallach, H., Fergus, R., Eds.; Curran Associates Inc.: Red Hook, NY, USA, 2017; pp. 3149–3157. [Google Scholar]
Prokhorenkova, L.; Gusev, G.; Vorobev, A.; Dorogush, A.V.; Gulin, A. CatBoost: Unbiased Boosting with Categorical Features. In Proceedings of the 32nd International Conference on Neural Information Processing Systems, Montréal, QC, Canada, 3–8 December 2018; Bengio, S., Wallach, H.M., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Eds.; Curran Associates Inc.: Red Hook, NY, USA, 2018; pp. 6639–6649. [Google Scholar]
Natekin, A.; Knoll, A. Gradient Boosting Machines, a Tutorial. Front. Neurorobot. 2013, 7, 63623. [Google Scholar] [CrossRef]
Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; The MIT Press: Cambridge, MA, USA, 2016. [Google Scholar]
Hornik, K.; Stinchcombe, M.; White, H. Multilayer Feedforward Networks Are Universal Approximators. Neural Netw. 1989, 2, 359–366. [Google Scholar] [CrossRef]
Wu, Y.C.; Feng, J.W. Development and Application of Artificial Neural Network. Wirel. Pers. Commun. 2018, 102, 1645–1656. [Google Scholar] [CrossRef]
Murphy, K.P. Machine Learning: A Probabilistic Perspective; MIT Press: Cambridge, MA, USA, 2012. [Google Scholar]
Gelman, A.; Carlin, J.B.; Stern, H.S.; Dunson, D.B.; Vehtari, A.; Rubin, D.B. Bayesian Data Analysis; Chapman and Hall/CRC: New York, NY, USA, 2013. [Google Scholar] [CrossRef]
Chipman, H.A.; George, E.I.; McCulloch, R.E. BART: Bayesian Additive Regression Trees. Ann. Appl. Stat. 2010, 4, 266–298. [Google Scholar] [CrossRef]
Montgomery, D.C.; Peck, E.A.; Vining, G.G. Introduction to Linear Regression Analysis, 6th ed.; John Wiley & Sons: Hoboken, NJ, USA, 2021. [Google Scholar]
Hill, J.; Linero, A.; Murray, J. Bayesian Additive Regression Trees: A Review and Look Forward. Annu. Rev. Stat. Appl. 2020, 7, 251–278. [Google Scholar] [CrossRef]
Kapelner, A.; Bleich, J. BartMachine: Machine Learning with Bayesian Additive Regression Trees. J. Stat. Softw. 2016, 70, 1–40. [Google Scholar] [CrossRef]
Scutari, M.; Denis, J.-B. Bayesian Networks; Chapman and Hall/CRC: Boca Raton, FL, USA, 2021. [Google Scholar]
Koller, D.; Friedman, N. Probabilistic Graphical Models: Principles and Techniques; MIT Press: Cambridge, MA, USA, 2009. [Google Scholar]
Rasmussen, C.E.; Williams, C.K.I. Gaussian Processes for Machine Learning; The MIT Press: Cambridge, MA, USA, 2005. [Google Scholar]
Lloyd, C.; Gunter, T.; Osborne, M.; Roberts, S. Variational Inference for Gaussian Process Modulated Poisson Processes. In Proceedings of the 32nd International Conference on Machine Learning, Lille, France, 6–11, July, 2015; Bach, F., Blei, D., Eds.; JMLR: Norfolk, MA, USA, 2015; pp. 1814–1822. [Google Scholar]
Hensman, J.; Matthews, A.G.; Ghahramani, Z. Scalable Variational Gaussian Process Classification. In Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics, San Francisco, CA, USA, 9–12 May 2015; JMLR: Norfolk, MA, USA, 2015; Volume 38, pp. 351–360. [Google Scholar]
Burt, D.R.; Rasmussen, C.E.; Van Der Wilk, M. Convergence of Sparse Variational Inference in Gaussian Processes Regression. J. Mach. Learn. Res. 2020, 21, 1–63. [Google Scholar]
Zhang, C.; Butepage, J.; Kjellstrom, H.; Mandt, S. Advances in Variational Inference. IEEE Trans. Pattern Anal. Mach. Intell. 2019, 41, 2008–2026. [Google Scholar] [CrossRef] [PubMed]
Neal, R.M. Bayesian Learning for Neural Networks; Lecture Notes in Statistics; Springer New York: New York, NY, USA, 1996; Volume 118. [Google Scholar]
Kingma, D.P.; Welling, M. Auto-Encoding Variational Bayes. arXiv 2013, arXiv:1312.6114. [Google Scholar] [CrossRef]
Blundell, C.; Cornebise, J.; Kavukcuoglu, K.; Wierstra, D. Weight Uncertainty in Neural Networks. arXiv 2015, arXiv:1505.05424. [Google Scholar] [CrossRef]
Gal, Y.; Ghahramani, Z. Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning. arXiv 2016, arXiv:1506.02142. [Google Scholar] [CrossRef]
Zagayevskiy, Y.; Deutsch, C.V. A Methodology for Sensitivity Analysis Based on Regression: Applications to Handle Uncertainty in Natural Resources Characterization. Nat. Resour. Res. 2015, 24, 239–274. [Google Scholar] [CrossRef]
Pandey, B.P.; Mishra, D.P. Developing an Alternate Mineral Transportation System by Evaluating Risk of Truck Accidents in the Mining Industry—A Critical Fuzzy DEMATEL Approach. Sustainability 2023, 15, 6409. [Google Scholar] [CrossRef]
Lu, H.; Peng, Y.; Cao, S.; Zhu, Z. Parameter Sensitivity Analysis and Probabilistic Optimal Design for the Main-Shaft Device of a Mine Hoist. Arab. J. Sci. Eng. 2019, 44, 971–979. [Google Scholar] [CrossRef]
Hou, J.; Nie, G.; Li, G.; Zhao, W.; Sheng, B. Optimization of Branch Airflow Volume for Mine Ventilation Network Based on Sensitivity Matrix. Sustainability 2023, 15, 12427. [Google Scholar] [CrossRef]
Bakhtavar, E.; Saberi, S.; Hu, G.; Sadiq, R.; Hewage, K. Fuzzy Cognitive-Based Goal Programming for Waste Rock Management with in-Pit Dumping Priority: Towards Sustainable Mining. Resour. Policy 2023, 86, 104095. [Google Scholar] [CrossRef]
Lucay, F.; Cisternas, L.A.; Gálvez, E.D. Global Sensitivity Analysis for Identifying Critical Process Design Decisions. Chem. Eng. Res. Des. 2015, 103, 74–83. [Google Scholar] [CrossRef]
Ghaffari, A.; Hayati, M.; Shekholeslami, A. Probability and Sensitivity Analysis in Flotation Circuit of Bama Lead and Zinc Processing Plant Using Monte Carlo Simulation Method. Miner. Process. Extr. Metall. Rev. 2012, 33, 416–426. [Google Scholar] [CrossRef]
Montgomery, D.C.; Runger, G.C. Applied Statistics and Probalisty for Engineers, 6th ed.; John Wiley & Sons, Inc.: Hoboken, NJ, USA, 2014. [Google Scholar]
Kazemi, P.; Khalid, M.H.; Szlek, J.; Mirtič, A.; Reynolds, G.K.; Jachowicz, R.; Mendyk, A. Computational Intelligence Modeling of Granule Size Distribution for Oscillating Milling. Powder Technol. 2016, 301, 1252–1258. [Google Scholar] [CrossRef]
Silva, M.; Casali, A. Modelling SAG Milling Power and Specific Energy Consumption Including the Feed Percentage of Intermediate Size Particles. Miner. Eng. 2015, 70, 156–161. [Google Scholar] [CrossRef]
Li, L.; Chang, J.; Vakanski, A.; Wang, Y.; Yao, T.; Xian, M. Uncertainty Quantification in Multivariable Regression for Material Property Prediction with Bayesian Neural Networks. Sci. Rep. 2024, 14, 10543. [Google Scholar] [CrossRef]
Wani, O.; Beckers, J.V.L.; Weerts, A.H.; Solomatine, D.P. Residual Uncertainty Estimation Using Instance-Based Learning with Applications to Hydrologic Forecasting. Hydrol. Earth Syst. Sci. 2017, 21, 4021–4036. [Google Scholar] [CrossRef]
Garcia-Cardona, C.; Scheinker, A. Machine Learning Surrogate for Charged Particle Beam Dynamics with Space Charge Based on a Recurrent Neural Network with Aleatoric Uncertainty. Phys. Rev. Accel. Beams 2024, 27, 024601. [Google Scholar] [CrossRef]
Collis, J.; Connor, A.J.; Paczkowski, M.; Kannan, P.; Pitt-Francis, J.; Byrne, H.M.; Hubbard, M.E. Bayesian Calibration, Validation and Uncertainty Quantification for Predictive Modelling of Tumour Growth: A Tutorial. Bull. Math. Biol. 2017, 79, 939–974. [Google Scholar] [CrossRef]
Li, Y.; Bao, J.; Chen, T.; Yu, A.; Yang, R. Prediction of Ball Milling Performance by a Convolutional Neural Network Model and Transfer Learning. Powder Technol. 2022, 403, 117409. [Google Scholar] [CrossRef]
Hermosilla, R.; Valle, C.; Allende, H.; Aguilar, C.; Lucic, E. SAG’s Overload Forecasting Using a CNN Physical Informed Approach. Appl. Sci. 2024, 14, 11686. [Google Scholar] [CrossRef]
Coetzee, L.C.; Craig, I.K.; Kerrigan, E.C. Robust Nonlinear Model Predictive Control of a Run-of-Mine Ore Milling Circuit. IEEE Trans. Control Syst. Technol. 2010, 18, 222–229. [Google Scholar] [CrossRef]
Saldana, M.; Gálvez, E.; Sales-Cruz, M.; Salinas-Rodríguez, E.; Castillo, J.; Navarra, A.; Toro, N.; Arias, D.; Cisternas, L.A. A Stochastic Model Approach for Modeling SAG Mill Production and Power Through Bayesian Networks: A Case Study of the Chilean Copper Mining Industry. Minerals 2026, 16, 60. [Google Scholar] [CrossRef]

Figure 1. Layout of the primary grinding circuit.

Figure 2. Spearman correlation of the SAG mill operating variables.

Figure 3. Distribution adjustments of the independent variables.

Figure 4. Distribution adjustments of the dependent variables.

Figure 5. SA of MR models for SAG Power (a) and SAG Production (b).

Figure 6. SA of RF models for SAG Power (a) and SAG Production (b).

Figure 7. SA of XGBoost models for SAG Power (a) and SAG Production (b).

Figure 8. SA of GBM models for SAG Power (a) and SAG Production (b).

Figure 9. SA of ANN models for SAG Power (a) and SAG Production (b).

Figure 10. SA of BLR models for SAG Power (a) and SAG Production (b).

Figure 11. SA of BART models for SAG Power (a) and SAG Production (b).

Figure 12. SA of GBN models for SAG Power (a) and SAG Production (b).

Figure 13. SA of GPR models for SAG Power (a) and SAG Production (b).

Figure 14. SA of BNN models for SAG Power (a) and SAG Production (b).

Table 1. Model comparison matrix by dimension.

Dim.	Deterministic/ Phenomenological	ML Models	Stochastic/Probabilistic Models
Typical data needs	Mechanistic parameters, breakage/selection kernels, PSDs, feed fractions, mill geometry; calibration with sampling campaigns and NorBal/MODSIM style data is common [17]	Large, labeled SCADA/plant time series (feed tonnage, power, speed, inlet water, bearing pressure, PSD); studies used 20k+ records and explicit feature selection/delays [9,12,18,19]	Moderate data + expert priors; can combine sparse SCADA data and domain priors to estimate conditional probabilities [14,15]
Uncertainty handling	Uncertainty via Monte Carlo on parameters and UA/GSA using Sobol, with surrogates for speed up [16,20,21]	Usually point predictions; uncertainty via ensembles, quantile models or probabilistic wrappers (not widespread in reviewed SAG ML papers) [12,19]	Native probabilistic outputs, causal queries and RUL/reliability estimation; supports decision thresholds and scenario evidence updates [14,15]
Interpretability	High: physics-based parameters, PSD evolution and mechanistic insight; useful for design and “what-if” simulations [17,22]	Variable: GP and linear models give formulas and feature importances; deep nets require XAI (SHAP/LIME) for explanations [9,23]	High for structure (graph edges) and conditional relationships; enables causal reasoning if graph is well specified [14,15]
Runtime (relative)	Slow to very slow: DEM/DEM–CFD high computational cost; population balance dynamic simulators moderate but heavier than ML in closed-loop use [7,17]	Low-to-moderate inference times: LSTM/RNN suitable for real-time energy/throughput prediction; training costs higher but amortized [6,12,19]	Low for BN inference once learned; Monte Carlo propagation with detailed plant simulators is expensive but surrogates speed it up [14,16,21]
Deployment considerations	Best for control design, digital twins, “offline” optimization and engineering studies; needs online parameter estimation or periodic re-calibration for drift [17,22]	Proven deployable in real plants for energy/throughput forecasting (RNN/LSTM); require robust feature pipelines, delay handling and drift detection [12,18,19]	Well-suited for decision support, alarms and maintenance (RUL), and probabilistic supervisory controllers; BN requires whitelist/blacklist and priors during construction [14,15]
Typical Applications	Design studies, what-if simulations, engineering analysis	ML static: real-time energy/throughput prediction, process optimization; ML temporal: time series forecasting, dynamic process control	Decision support, reliability analysis, probabilistic control

Table 2. Benchmark metrics [uncertainty quantification].

Metric Cat.	Metric Name	Typical Range SAG	Model Types Used	Interpretation	Ref.
Regression Accuracy	RMSE	5%–15% of mean throughput	All model types	Lower is better—measures average prediction error magnitude	[6,12,19]
	MAE	3%–12% of mean throughput	All model types	Lower is better—measures average absolute error	[18,23,35]
	R²	0.75–0.95 for good models	All model types	Higher is better—proportion of variance explained	[6,35,36]
	MAPE	5%–20% typical range	ML and stochastic	Lower is better—percentage error measure	[18,19,37]
Classification Performance	ROC-AUC	0.8–0.95 for operational states	ML classification	Higher is better—discrimination ability	[9,36]
	MCC	−1 to +1 range	Classification models	Higher is better—balanced classification measure	[9,23]
	Cohen Kappa	0.6–0.9 for good agreement	Classification models	Higher is better—agreement beyond chance	[9,35]
Probabilistic Calibration	Brier Score	0.1–0.3 typical	Probabilistic models	Lower is better—measures calibration quality	[14,15]
	ECE	<0.1 for well-calibrated	Probabilistic models	Lower is better—expected calibration error	[14,21]
	Reliability Curves	Visual assessment	Probabilistic models	Diagonal line indicates perfect calibration	[14,21]
UQ	Coverage Probability	0.9–0.95 for 95% intervals	Stochastic models	Should match nominal level	[20,21,38]
UQ	Interval Width	Context dependent	Stochastic models	Narrower intervals preferred if coverage maintained	[16,20,21]
Sensitivity Analysis	Sobol First Order	0–1 for each variable	All with SA	Higher indicates more important variable	[16,20,21]
	Sobol Total Order	0–1 for each variable	All with SA	Includes interaction effects	[16,20,21]
	Morris Mu Star	Variable dependent	All with SA	Screening measure for variable importance	[16,20]
	Morris Sigma	Variable dependent	All with SA	Measures interaction/nonlinearity effects	[16,20]
Operational Performance	Energy Savings	5%–15% typical improvements	All model types	Higher is better—kWh/t reduction	[12,18,39]
	Throughput Uplift	3%–10% typical improvements	All model types	Higher is better—increased production	[18,39]
	Availability Improvement	2%–8% typical	Predictive models	Higher is better—reduced downtime	[14,15]

Table 3. Operational variables grouped by category, data treatment, model usage (deterministic, ML, and stochastic), and qualitative sensitivity importance in SAG mill performance modeling.

Variable Category	Variable Name	Data Treatment	Model Usage	Sensitivity Ranking
Feed Characteristics	Ore Hardness	Laboratory correlation with geology	All	High—primary driver
	Feed Rate	Moving averages to smooth	All	High—direct throughput relationship
	P₈₀ Feed	Size analysis correlation	All	High—affects grinding efficiency
	PSD Fractions	Discrete bins for modeling	Det/ Stoch	Medium—detailed breakage modeling
Operational Parameters	Mill Speed	Critical speed percentage	All	High—affects grinding action
	Water Addition	Flow control validation	All	Medium—affects pulp density
	Percent Solids	Density meter correlation	All	High—critical for transport
	Mill Power	Power meter readings	All	High—energy efficiency target
Mill Condition	Liner Age	Maintenance tracking	ML/Stoch	Medium—wear progression
	Liner Profile	Survey measurements	Det	Medium—affects charge motion
	Bearing Pressure	Pressure transducers	ML/Stoch	Low—condition monitoring
	Vibration Level	Accelerometer data	ML/Stoch	Low—condition monitoring
Product Characteristics	Product P₈₀	Cyclone overflow sizing	All	High—product quality target
	Cyclone Pressure	Pressure measurement	All	Medium—classification efficiency
	Sump Level	Level transmitter	All	Medium—inventory control
Pebble Handling	Pebble Rate	Conveyor scales	All	Medium—circuit balance
Pebble Handling	Pebble Size	Size analysis	Det	Low—detailed modeling
Environmental	Ambient T	Weather station	ML	Low—seasonal effects
Environmental	Ore Moisture	Laboratory analysis	All	Low—feed preparation
Derived Variables	Specific Energy	Power/throughput ratio	All	High—efficiency metric
	Mill Filling	Load cell or model-based	Det/ML	Medium—charge dynamics
	Charge Motion	DEM/video analysis	Det	Medium—grinding mechanism
Uncertainty Factors	Feed Variability	Statistical process control	Stoch	High—uncertainty source
	Equipment Drift	Condition monitoring	ML/Stoch	Medium—model degradation
	Measurement Error	Calibration records	All	Low—data quality

Table 4. Comparative overview of ML and probabilistic models for SAG mill process modeling.

Model	Class	Advantages	Limitations	Data/Requirements	Additional Relevant Information
OLS (Ordinary Least Squares)	Multivariate ML	Handles collinearity; useful with many predictors and few responses	Moderate interpretability; linear by default	Process signal matrix X; standardization recommended	Useful for inferring latent variables (e.g., pulp density and cut size)
Random Forest	ML (tree ensemble)	Robust to noise/outliers; provides variable importance; limited tuning required	Point predictions (not natively probabilistic)	Large, labeled dataset (TpH/MW); basic missing data handling	Good balance between Acc and interpretability; SHAP enhances explainability
XGBoost/GBM	ML (boosting)	High accuracy; strong handling of interactions and nonlinearity; fast inference	Risk of overfitting without validation; sensitive hyperparameter tuning	Large datasets; well-defined features; stratified/temporal validation	Often state-of-the-art for tabular data; combine with calibration if probabilities are used
ANN (MLP)	ML (neural network)	Approximates highly nonlinear functions; strong predictive performance	Black box behavior; requires more data; tuning and regularization needed	Abundant, normalized data; temporal splits; early stopping	Achieved highest R²; complement with SHAP/permutation importance for interpretability
Bayesian Linear Regression (BLR)	Probabilistic parametric ML	Incorporates uncertainty and regularization via priors; interpretable	Assumes linearity; requires appropriate prior specification	Standardized matrix X; defined priors and likelihood	Provides credibility intervals and probabilistic metrics
Bayesian Additive Regression Tree (BART)	Probabilistic nonparametric ML	Captures nonlinearities and interactions; posterior uncertainty	Lower global interpretability; higher computational cost	Sufficient data; hyperparameter tuning; optional standardization	Delivers credible intervals and strong predictive robustness
Gaussian Bayesian Network (GBN)	Continuous probabilistic graphical model	Represents causal dependencies; quantifies joint uncertainty	Assumes Gaussian and linear relationships; structure sensitive to data	Continuous variables; structure learning or expert-defined graph; approximate normality	Enables conditional inference and probabilistic causal analysis
Gaussian Process Regression (GPR)	Probabilistic (nonparametric)	Prediction with uncertainty bands; suitable for probabilistic MPC	O(n³) training cost; challenging with large datasets	Subsampling or approximations; relevant feature selection	Promising for predictive control with confidence intervals
Bayesian Neural Network (BNN)	Stochastic (probabilistic graphical model)	Handles incomplete evidence; quantifies uncertainty; interpretable DAG	Requires discretization; may blur intermediate classes; needs priors/whitelists	Discretized variables using physically meaningful thresholds; ESS/prior specification	Well, suited for “what-if” diagnostics and decision support under uncertainty

Table 5. Model Comparison Framework (deterministic ML vs. probabilistic models).

Model	Comparison Strategies (What to Evaluate)	Key Metrics/Statistics	Analysis Outputs
MR	Baseline linear performance; bias–variance trade-off; multicollinearity effects	RMSE, MAE, and R², Adjusted R²; AIC/BIC; VIF; residual diagnostics	Parity plot; residual plots; coefficient table; influence diagnostics
RF	Accuracy vs. boosting; robustness to noise; variance reduction	RMSE, MAE, R²; OOB error; permutation importance; SHAP	SHAP
XGBoost; GBM	State-of-the-art tabular accuracy; overfitting risk; learning dynamics	RMSE, MAE, and R²; cross-val error; SHAP; learning curves; early stopping metrics	Parity plot; SHAP importance; learning curves; residual distribution
ANN (MLP)	Nonlinearity gain vs. interpretability; regularization effectiveness	RMSE, MAE, and R²; validation loss; early stopping; SHAP (Kernel/Deep)	Train/validation loss curves; SHAP summary; parity plot
BLR	Parametric uncertainty quantification; comparison vs. OLS	RMSE, MAE, and R²; posterior intervals; coverage probability; MPIW; NLPD; WAIC/LOO	Predictive intervals; coverage vs. nominal plot; posterior coefficient distributions
BART	Nonlinear probabilistic performance; uncertainty calibration	RMSE, MAE, and R²; Coverage; MPIW; NLPD; posterior inclusion proportions	Prediction with credible bands; calibration plot; variable inclusion summary
GBN	Conditional dependency modeling; joint uncertainty propagation	RMSE, MAE, R² (continuous nodes); log-likelihood; entropy; KL divergence	Graph structure; conditional probability flows; sensitivity maps
GPR	Predictive uncertainty vs. computational cost; kernel sensitivity	RMSE, MAE, and R²; Coverage probability; average band width; log-marginal likelihood; NLPD	Prediction with confidence bands; calibration curve; kernel diagnostics
BNN	Deep nonlinear uncertainty modeling; robustness under regime shifts	RMSE, MAE, and R²; Coverage; MPIW; predictive entropy; ELBO; NLPD	Predictive distribution plots; uncertainty vs. error analysis; calibration curves

Table 6. Cross-Model Comparison (all models).

Scope	What to Evaluate	Metrics	Outputs
Global robustness and stability	Sensitivity to data splits, noise, and hydraulic regimes	$σ_{R M S E}^{2}$ , $σ_{R^{2}}^{2}$ , and $C V_{E r r o r}$ ; bootstrap stability; Sobol/Morris (ML); MC simulations (Bayesian models)	Stability heatmap; radar chart (accuracy–robustness–calibration–interpretability); suitability matrix by operating regime

Table 7. Operational regime classification criteria for SAG process modeling.

Factor	Variable (s)	Suggested Range	Cut-Off Criterion
Mill Hydraulics	% solids, Water flow rate, Pressure	Diluted	% solids < P33, with water flow rate > P67 and/or bearing pressure < P33
		Optimal	P33 ≤ % solids ≤ P67, with P33 ≤ water flow rate ≤ P67 and P33 ≤ bearing pressure ≤ P67
		Thick	% solids > P67, with water flow rate < P33 and/or bearing pressure > P67
Ore Hardness	Hardness (A × b or proxy)	Low	<P33
		Medium	P33–P67
		High	>P67
Mechanical Condition	Liner age (months)	New	[0, 3)
		Intermediate	[3, 6)
		Worn	[6, +∞)
Internal Load/Transport	Pebbles, Sump level	Low	both variables < P33
		Medium	intermediate combinations/at least one variable between P33 and P67
		High	both variables > P67

Table 8. Goodness-of-fit Indicators of SAG Production and SAG Power [*: measurement of variability between folds (n folds = 10)].

Resp.		Model	RMSE	MAE	R²	MAPE	$C V_{R M S E}$ *
Train	SAG Production	MR	108.87	68.67	95.85	2.18	0.08
		RF	58.94	30.31	98.78	1.01	0.06
		XGBoost	41.39	24.132	99.40	0.76	0.08
		GBM	59.35	33.92	98.77	1.10	0.07
		ANN	75.56	37.55	98.00	1.21	0.05
		BLR	111.10	70.47	95.68	2.24	0.08
		BART	62.42	37.01	98.64	1.18	0.07
		GBN	94.93	50.20	96.85	1.68	0.01
		GPR	78.11	37.72	97.87	1.25	0.14
		BNN	89.13	45.65	97.25	1.48	0.12
	SAG Power	MR	473.73	369.19	94.12	1.82	0.04
		RF	309.03	226.66	97.50	1.13	0.07
		XGBoost	226.44	162.73	98.66	0.80	0.05
		GBM	327.59	250.78	97.19	1.24	0.05
		ANN	311.84	242.33	97.45	1.18	0.04
		BLR	487.08	375.85	93.78	1.85	0.05
		BART	540.61	408.48	92.34	2.08	0.05
		GBN	511.46	385.10	93.14	1.96	0.02
		GPR	381.77	279.60	96.18	1.40	0.08
		BNN	464.29	360.76	94.33	1.78	0.05
Test	SAG Production	MR	122.97	74.04	94.83	2.35	0.08
		RF	104.02	54.20	96.30	1.82	0.05
		XGBoost	87.98	47.38	97.35	1.55	0.06
		GBM	91.73	48.14	97.12	1.57	0.05
		ANN	93.09	43.88	97.04	1.43	0.06
		BLR	124.09	76.54	94.73	2.24	0.04
		BART	89.78	48.03	97.24	1.55	0.05
		GBN	107.35	54.18	96.06	1.80	0.05
		GPR	92.14	44.08	97.10	1.47	0.06
		BNN	88.50	46.89	97.22	1.53	0.06
	SAG Power	MR	528.29	392.27	92.66	1.95	0.06
		RF	553.51	401.57	91.94	2.01	0.05
		XGBoost	431.11	298.48	95.11	1.49	0.08
		GBM	470.39	335.29	94.18	1.68	0.07
		ANN	474.10	325.25	94.09	1.62	0.07
		BLR	534.91	398.49	92.48	1.97	0.05
		BART	590.52	432.44	90.83	2.18	0.05
		GBN	568.83	412.45	91.49	2.08	0.05
		GPR	473.83	326.25	94.10	1.64	0.07
		BNN	526.97	379.95	92.79	1.89	0.08

Table 9. Robustness analysis.

	Model	${C I R M S E}_{O O F}^{T r a i n}$	RMSE OOF^Train	RMSE^Test	$C {I M A E}_{O O F}^{T r a i n}$	MAE OOF^Train	MAE^Test
SAG Power	MR	[473.90, 504.23]	487.81	528.29	[367.24, 383.85]	375.23	392.27
	RF	[505.43, 532.48]	518.69	553.51	[383.01, 401.14]	391.55	401.57
	XGB	[378.65, 404.38]	391.89	431.11	[285.80, 296.08]	290.94	298.48
	GBM	[414.71, 445.36]	430.52	470.39	[316.83, 329.60]	323.21	335.29
	ANN	[417.67, 466.32]	442.34	474.10	[313.37, 325.96]	319.66	325.25
	BLR	[484.80, 518.65]	502.22	534.91	[371.91, 392.05]	381.98	398.49
	BART	[545.90, 589.76]	568.58	590.52	[414.30, 431.63]	422.97	432.44
	GBN	[642.69, 700.96]	671.89	568.83	[487.44, 499.10]	493.27	412.45
	GPR	[503.74, 564.63]	535.70	473.83	[387.75, 411.36]	399.55	326.25
	BNN	[470.27, 502.82]	487.03	526.97	[362.91, 392.23]	377.58	379.95
SAG Production	MR	[105.18, 120.63]	112.25	122.97	[67.51, 72.24]	69.67	74.04
	RF	[90.69, 101.44]	95.75	104.02	[50.09, 54.36]	52.18	54.20
	XGB	[77.53, 87.36]	82.70	87.98	[45.41, 49.14]	47.28	47.38
	GBM	[80.22, 88.80]	84.70	91.73	[45.69, 48.47]	47.08	48.14
	ANN	[82.55, 93.27]	87.99	93.09	[42.85, 46.20]	44.53	43.88
	BLR	[108.23, 121.47]	115.19	124.09	[69.45, 73.91]	71.67	76.54
	BART	[82.04, 91.11]	86.78	89.78	[48.34, 52.83]	50.58	48.03
	GBN	[123.94, 131.71]	127.83	107.35	[76.59, 79.14]	77.87	54.18
	GPR	[120.44, 147.89]	135.39	92.14	[62.81, 70.63]	66.72	44.08
	BNN	[83.06, 99.34]	91.83	88.50	[42.68, 48.43]	45.56	46.89

Table 10. Uncertainty analysis calibration—probabilistic models.

	Model	90% Coverage^Test	MPIW^Test	NLPD_Test	${\bar{σ}}_{p r e d}^{T r a i n}$	${\bar{σ}}_{p r e d}^{T e s t}$	Differential Entropy^Test
SAG Power	BLR	0.91	1617.74	7.70	491.74	491.76	7.62
	BART	0.94	2025.94	7.79	614.81	615.97	7.84
	GBN	0.95	1993.87	7.75	604.86	606.33	7.83
	GPR	0.89	1352.40	7.58	404.59	411.10	7.40
	BNN	0.91	1537.32	7.70	467.30	467.31	7.57
SAG Production	BLR	0.94	368.19	6.24	111.93	111.92	6.14
	BART	0.94	295.56	5.88	86.77	89.93	5.91
	GBN	0.95	383.24	6.09	116.24	116.58	6.17
	GPR	0.94	253.43	5.87	75.46	77.04	5.68
	BNN	0.93	297.19	5.90	90.33	90.34	5.92

Table 11. Suitability indicators by hydraulic regimes—SAG Power.

Model		Reg.	RMSE	MAE	MAPE	R²	Cov90	MPIW90	NLPD	$σ_{p r e d}$
Train	XGB	D	229.17	158.65	0.78	99.00	0.98	1451.56	7.14	441.24
		O	232.63	168.08	0.81	98.09	0.98	1231.49	7.04	374.35
		T	217.14	161.57	0.80	98.47	0.99	1161.48	6.97	353.07
	BNN	D	560.90	442.13	2.21	94.05	0.90	1717.10	7.76	521.96
		O	489.80	387.90	1.93	92.14	0.93	1715.94	7.62	521.61
		T	510.49	398.57	1.93	90.68	0.91	1715.44	7.66	521.46
Test	XGB	D	425.93	312.63	1.57	96.50	0.90	1451.56	7.48	441.24
		O	495.20	306.70	1.52	91.23	0.91	1231.49	7.72	374.35
		T	360.06	275.10	1.37	95.94	0.93	1161.48	7.30	353.07
	BNN	D	583.94	450.76	2.24	93.21	0.92	1719.93	7.80	522.82
		O	530.64	422.83	2.14	91.67	0.89	1712.98	7.70	520.71
		T	646.43	432.00	2.13	85.54	0.89	1716.44	7.95	521.76

Table 12. Suitability indicators by hydraulic regimes—SAG Production.

Model		Reg.	RMSE	MAE	MAPE	R²	Cov90	MPIW90	NLPD	$σ_{p r e d}$
Train	XGB	D	51.68	30.86	0.99	99.30	0.99	373.72	5.76	113.60
		O	41.21	23.95	0.74	99.30	0.98	235.87	5.36	71.70
		T	27.23	17.34	0.54	99.68	0.99	157.39	4.95	47.84
	BNN	D	125.93	66.94	2.27	95.86	0.88	300.22	6.38	91.26
		O	47.66	27.24	0.88	99.03	0.98	299.89	5.57	91.16
		T	73.91	41.06	1.29	97.78	0.95	299.93	5.76	91.17
Test	XGB	D	21.88	69.14	2.32	96.19	0.92	373.72	6.21	113.60
		O	68.44	40.86	1.30	98.17	0.93	235.87	5.66	71.70
		T	57.95	31.07	1.00	98.50	0.93	157.39	5.51	47.84
	BNN	D	24.83	69.29	2.38	95.91	0.87	300.13	6.36	91.23
		O	52.43	29.32	0.91	98.72	0.97	299.75	5.60	91.12
		T	65.30	40.18	1.28	98.25	0.96	300.58	5.69	91.37

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Saldana, M.; Gálvez, E.; Sales-Cruz, M.; Salinas-Rodríguez, E.; Salinas-Maldonado, R.G.; Castillo, J.; Toro, N.; Arias, D.; Cisternas, L.A. Comprehensive Comparison of Machine Learning Approaches—Deterministic and Stochastic—In Modeling the Production and Power of an SAG Mill: A Case Study of the Chilean Copper Mining Industry. Minerals 2026, 16, 412. https://doi.org/10.3390/min16040412

AMA Style

Saldana M, Gálvez E, Sales-Cruz M, Salinas-Rodríguez E, Salinas-Maldonado RG, Castillo J, Toro N, Arias D, Cisternas LA. Comprehensive Comparison of Machine Learning Approaches—Deterministic and Stochastic—In Modeling the Production and Power of an SAG Mill: A Case Study of the Chilean Copper Mining Industry. Minerals. 2026; 16(4):412. https://doi.org/10.3390/min16040412

Chicago/Turabian Style

Saldana, Manuel, Edelmira Gálvez, Mauricio Sales-Cruz, Eleazar Salinas-Rodríguez, Ramon G. Salinas-Maldonado, Jonathan Castillo, Norman Toro, Dayana Arias, and Luis A. Cisternas. 2026. "Comprehensive Comparison of Machine Learning Approaches—Deterministic and Stochastic—In Modeling the Production and Power of an SAG Mill: A Case Study of the Chilean Copper Mining Industry" Minerals 16, no. 4: 412. https://doi.org/10.3390/min16040412

APA Style

Saldana, M., Gálvez, E., Sales-Cruz, M., Salinas-Rodríguez, E., Salinas-Maldonado, R. G., Castillo, J., Toro, N., Arias, D., & Cisternas, L. A. (2026). Comprehensive Comparison of Machine Learning Approaches—Deterministic and Stochastic—In Modeling the Production and Power of an SAG Mill: A Case Study of the Chilean Copper Mining Industry. Minerals, 16(4), 412. https://doi.org/10.3390/min16040412

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Comprehensive Comparison of Machine Learning Approaches—Deterministic and Stochastic—In Modeling the Production and Power of an SAG Mill: A Case Study of the Chilean Copper Mining Industry

Abstract

1. Introduction

2. Background

2.1. Model Comparison Matrix

2.2. Reference Metrics and Validation Practices

2.3. Sensitivity and Uncertainty Analysis Methods

2.4. Suitability Conditions and Operational Contexts

2.5. Implementation Notes and Implementation Considerations

2.6. Industrial Evidence, Performance Results, and Observed Robustness

3. Materials and Methods

3.1. General Design of the Study

3.2. Data, Operational Variables and Data Preprocessing

3.3. Comparative Models

3.3.1. Multiple Regressions—MRs

3.3.2. Random Forest—RF

3.3.3. eXtreme Gradient Boosting— XGBoost—XGB

3.3.4. Gradient Boosting Machine—GBM

3.3.5. Artificial Neural Networks—ANNs

3.3.6. Bayesian Linear Regression—BLR

3.3.7. Bayesian Additive Regression Tree—BART

3.3.8. Gaussian Bayesian Network—GBN

3.3.9. Gaussian Process Regression—GPR

3.3.10. Bayesian Neural Network—BNN

3.4. Comparative Strategy Between Models

3.5. Sensitivity Analysis—SA

3.6. Suitability Comparison by Operating Regime

4. Results

4.1. Exploratory Analysis

4.2. Analysis of the Comparative Experiment

4.2.1. Non-Stochastic Model Analysis

4.2.2. Stochastic Model Analysis

4.3. Sensitivity and Elasticity Analysis

4.4. Suitability Indicators by Hydraulic Regime

4.4.1. SAG Power—Suitability by Hydraulic Regime

4.4.2. SAG Production—Suitability by Hydraulic Regime

5. Discussions

5.1. Machine Learning Performance in Industrial Context

5.2. Regime-Based Suitability Interpretation

5.3. Sensitivity Analysis and Physical Consistency

5.4. Limitations and Future Research Directions

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI