An LSTM Approach for SAG Mill Operational Relative-Hardness Prediction

Avalos, Sebastian; Kracht, Willy; Ortiz, Julian M.

doi:10.3390/min10090734

Open AccessArticle

An LSTM Approach for SAG Mill Operational Relative-Hardness Prediction

by

Sebastian Avalos

^1,*

,

Willy Kracht

^2,3

and

Julian M. Ortiz

¹

The Robert M. Buchan Department of Mining, Queen’s University, Kingston, ON K7L 3N6, Canada

²

Department of Mining Engineering, Universidad de Chile, Santiago 8370448, Chile

³

Advanced Mining Technology Center, AMTC, Universidad de Chile, Santiago 8370451, Chile

^*

Author to whom correspondence should be addressed.

Minerals 2020, 10(9), 734; https://doi.org/10.3390/min10090734

Submission received: 1 August 2020 / Revised: 15 August 2020 / Accepted: 18 August 2020 / Published: 20 August 2020

(This article belongs to the Special Issue Comminution in the Minerals Industry)

Download

Browse Figures

Versions Notes

Abstract

:

Ore hardness plays a critical role in comminution circuits. Ore hardness is usually characterized at sample support in order to populate geometallurgical block models. However, the required attributes are not always available and suffer for lack of temporal resolution. We propose an operational relative-hardness definition and the use of real-time operational data to train a Long Short-Term Memory, a deep neural network architecture, to forecast the upcoming operational relative-hardness. We applied the proposed methodology on two SAG mill datasets, of one year period each. Results show accuracies above 80% on both SAG mills at a short upcoming period of times and around 1% of misclassifications between soft and hard characterization. The proposed application can be extended to any crushing and grinding equipment to forecast categorical attributes that are relevant to downstream processes.

Keywords:

semi-autogenous grinding mill; operational hardness; energy consumption; mining; deep learning; long short-term memory

1. Introduction

In mining operations, the primary energy consumer is the comminution system, responsible for more than half of the entire mine consumption [1]. From all pieces of equipment that integrate the comminution circuit, the semi-autogenous grinding mill (SAG) is perhaps the most important in the system. With an aspect ratio of 2:1 (diameter to length), these mills combine impact, attrition and abrasion to reduce the ore size. SAG mills are located at the beginning of the comminution circuits, after a primary crushing stage. Although there are small SAG mills, their size usually ranges from 9.8 × 4.3 to 12.8 × 7.6 m, with a nominal energy demand of 8.2 and 26 MW, respectively [2], which make SAG mills the most relevant energy consumer within the concentrator. Modelling their consumption behaviour supports the operational control and energy demand-side management [3].

Most theoretical and empirical models [4,5,6] demand input feed characteristics, such as hardness, size distribution and inflow rate, SAG characteristics, such as sizing and product size distribution, and operational variables such as bearing pressure, water addition and grinding charge level. Although they are suitable to provide adequate design guidelines, they lack accurate in-situ inference since most assume steady-state and isolation from up and downstream processes. In response, model predictive control, SAG MPC [7], combines those methods with real-time operational information. However, expert knowledge is required to model the SAG mill dynamics properly.

From a geometallurgical perspective, the integration of new predictive methods that account for space and time relationships over real-time attributes has been defined as a fundamental challenge [8,9] in mining operations, particularly in an integrated system such as comminution. In response, data-driven approaches have been proposed ranging from support vector machines [10] and gene expression programming [11] to hybrid models that combine genetic algorithms and neural networks [12] and recurrent neural networks [13]. As data-driven methods are sensitive to the context (available information) and representation (information workflow), the authors have studied the use of several machine learning and deep learning methods in modelling the SAG energy consumption behaviour based only on operational variables [14].

The energy consumed by a SAG mill is related to several factors such as expert operator decisions, charge volume, charge specific gravity and the hardness of the feed material. Knowing the output hardness material becomes relevant for the downstream stage in the primary grinding circuit. Ore hardness can be characterized at sample support by combining the logged geological properties and the result of standardized comminution tests. They can be used to predict the hardness of each block sent to the process. However, these attributes are not always available. In response, a qualitative characterization of the ore hardness processed at time t, relative to the operational hardness of the ore processed at time

t + 1

can be done using only operational variables rather than a set of mineralogical characterizations. This qualitative characterization is referred and here used as operational relative-hardness (ORH).

We take advantage of previous works [14] by knowing that the Long Short-Term Memory (LSTM) [15] outperforms other machine learning and deep learning techniques on inferring the SAG mill energy consumption. Therefore, Section 2 presents the ORH and LSTM models, Section 3 establishes the SAG mill experimental framework, the results of which are presented in Section 4, and conclusions are drawn in Section 5.

2. Model

2.1. Operational Relative-Hardness Criteria

From the several operational parameters that can be captured and associated to SAG mill operations, we consider the energy consumption (EC) and feed tonnage (FT) to build our operational relative-hardness criteria.

{EC,FT

}_{t}

is collected over a period of time T using a

Δ t

discretization. By considering the one-step forward time difference of energy consumption (

Δ

EC

_{t} =

EC

_{t + 1} -

EC

_{t}

) and feed tonnage (

Δ

FT

_{t}

= FT

_{t + 1} -

FT

_{t}

), a qualitative assessment of the operational relative-hardness can be done. For instance, if the energy consumption is increasing and the feed tonnage is constant, it can be interpreted as an increase in ore hardness relative to the previous period. Similarly, if the feed tonnage is constant and the energy decreases, a decrease in ore hardness relative to the previous period can be assumed. Particularly, when both

Δ

EC

_{t}

and

Δ

FT

_{t}

show the same behaviour, the SAG can be either processing ore with medium operational relative-hardness or being filled up or emptied. To avoid misclassification in this last case, the operational relative-hardness is labelled as undefined. Table 1 summarizes the nine combinations of states and the associated operational relative-hardness.

The qualitative labelling of

Δ

EC

_{t}

and

Δ

FT

_{t}

as increasing, constant or decreasing can be established based on their global distribution over the period T as:

Δ {EC}_{t} = \{\begin{matrix} Increasing & if Δ E C_{t} > λ \cdot σ_{Δ E C} \\ Constant & if | Δ E C_{t} | \leq λ \cdot σ_{Δ E C} \\ Decreasing & if Δ E C_{t} < - λ \cdot σ_{Δ E C} \end{matrix} Δ F T_{t} = \{\begin{matrix} Increasing & if Δ F T_{t} > λ \cdot σ_{Δ F T} \\ Constant & if | Δ F T_{t} | \leq λ \cdot σ_{Δ F T} \\ Decreasing & if Δ F T_{t} < - λ \cdot σ_{Δ F T} \end{matrix}

(1)

where

σ_{Δ E C}

and

σ_{Δ F T}

represent the standard deviations over the period T of

E C

and

F T

, respectively, and

λ

is a scalar value that modulates the labelling distribution. Note that (i) a

λ

value above 1.5 would make the entire definition meaningless since most values would remain as constant, and (ii) the

λ

value definition is an external model parameter and can be guided either subjectively or via statistical meaning.

2.2. Long Short-Term Memory

The Long Short-Term Memory (LSTM) [15] neural network architecture belongs to the family of recurrent neural networks in Deep Learning [16]. They are suitable to capture short and long term relationships in temporal datasets. Internally, LSTM applies several combinations of affine transformations, element-wise multiplications and non-linear transfer functions, for which the building blocks are:

$x_{t}$ : input vector at time t. Dimension $(m, 1)$ .
$W_{f}, W_{i}$ , $W_{c}$ , $W_{o}$ : weight matrices for $x_{t}$ . Dimensions $(n_{H}, m)$ .
$h_{t}$ : hidden state at time t. Dimension $(m, 1)$ .
$U_{f}, U_{i}$ , $U_{c}$ , $U_{o}$ : weight matrices for $h_{t - 1}$ . Dimensions $(n_{H}, m)$ .
$b_{f}, b_{i}$ , $b_{c}$ , $b_{o}$ : bias vectors. Dimensions $(n_{H}, 1)$ .
$V$ : weight matrix for $h_{t}$ as output. Dimension $(K, m)$ .
$c$ : bias vector for output. Dimension $(K, 1)$ .

where m is the number of variables as input, K is the number of output variables, and

n_{H}

is the number of hidden units. Let

τ \in N

be a temporal window. At each time

t \in {1, \dots, τ}

, the LSTM receives the input

x_{t}

, the previous hidden state

h_{t - 1}

and previous memory cell

c_{t - 1}

. The forget gate

f_{t} = σ (W_{f} x_{t} + U_{f} h_{t - 1} + b_{f})

is the permissive barrier of the information carried by

x_{t}

. The input gate

i_{t} = σ (W_{i} x_{t} + U_{i} h_{t - 1} + b_{i})

decides the relevance of the information carried by

x_{t}

. Note that both

f_{t}

and

i_{t}

use sigmoid

σ (x) = {(1 + e^{- x})}^{- 1}

as the activation function over a linear combination of

x_{t}

and

h_{t - 1}

.

By passing the combination of

x_{t}

and

h_{t - 1}

through a Tanh function, a candidate memory cell

{\tilde{c}}_{t} = T a n h (W_{c} x_{t} + U_{c} h_{t - 1} + b_{c})

is computed. The final memory cell

c_{t} = f_{t} ⊙ c_{t - 1} + i_{t} ⊙ {\tilde{c}}_{t}

is computed as a sum of (i) what to forget from the past memory cell as an element-wise multiplication (⊙) between

f_{t}

and

c_{t - 1}

, and (ii) what to learn from the candidate memory cell as an element-wise multiplication (⊙) between

i_{t}

and

{\tilde{c}}_{t}

.

Similar to

i_{t}

and

f_{t}

the output gate

o_{t} = σ (W_{o} x_{t} + U_{o} h_{t - 1} + b_{o})

passes through a sigmoid function a linear combination between

x_{t}

and

h_{t - 1}

. It controls the information passing from the current memory cell

c_{t}

to the final hidden state

h_{t} = T a n h (c_{t}) ⊙ o_{t}

as an element-wise multiplication between

o_{t}

and

T a n h (c_{t})

. At the final step

τ

, the output is computed as

y_{τ} = (V h_{τ} + c)

. When dealing with more than one categorical prediction (

K > 1

), as in the present work for ORH forecasting, a softmax function is applied over

y_{τ}

to obtain the normalized probability distribution, and the category k has a probability of

\hat{p} (k) = \frac{exp (y_{τ, k})}{\sum_{c = 1}^{K} exp (y_{τ, c})}

.

An illustrative scheme of the internal connection at time step t inside an LSTM is shown in Figure 1 (left). The ORH prediction has three categories (hard, soft and undefined) and the probability is computed at the last unit, at time step

τ

, as shown in the unrolled LSTM in Figure 1 (right).

3. Experiment

3.1. Dataset

We used two datasets containing operational data for two independent SAG mills every half hour over a total time of 340 and 331 days, respectively. Each one of the SAG mills receives fresh feed and is connected in an open circuit configuration (SABC-B) where the pebble crusher product is sent to ball mills. At each time t, the dataset contains Feed tonnage (FT) (ton/h), Energy consumption (EC) (kWh), Bearing pressure (BPr) (psi) and Spindle speed (SSp) (rpm). They are split into two main subsets (a validation dataset is not considered since the optimum LSTM architecture to train is drawn from previous work [14]): training and testing (Table 2). This is an arbitrary division, and we seek to have a proportion of ∼50/50, respectively.

As it can be seen in Table 2, the predictive methods are trained with the first 50% and tested with the upcoming 50%, without being fed with the previous 50% of historical data.

Note that the comminution properties of the ore, such as

a \times b

or BWi, are not included in the datasets; therefore, the relationship between forecasted ORH and comminution properties is not explored in this work. The results herein presented, however, serve as a basis to examine such a relationship if those properties were known.

3.2. Assumptions

SAG mills are fundamental pieces in comminution circuits. As no information regarding downstream/upstream processes is available, recognizing bottlenecks in the dataset becomes subjective. We assume that SAG mills will potentially show changes from steady-state to under capacity and vice versa along with the dataset. Thus, stationarity of all operational variable distributions is assumed throughout this work, including the ore grindability. It means that the entire dataset belongs to a known and planned combination of ore characteristics (geometallurgical units). By doing so, we limit the applicability of the present models beyond the temporal dataset without a proper training process.

As explained in the problem statement section, we make use of the temporal average over energy consumption and feed tonnage as input for operational hardness prediction. Thus, we assume an additivity property over those variables as their units are kWh and ton/h, respectively, over constant temporal discretization so averaging adjacent data points is mathematically consistent.

In the operation from which the datasets were obtained, the SAG mill liners are replaced every 5–7 months. Since the datasets cover almost a year, we can ensure that the liners were replaced in each SAG mill at least once during the tested period, which may alter the relationship between energy consumption and other operational variables, inducing a discontinuity in the temporal plots. However, since in this work the temporal window for ORH evaluation is eight hours, the local discontinuity associated with liners replacement is not expected to affect the forecast at that time frame. The ORH is related to what was happening in the corresponding mill within the last few hours, and not to the mill behaviour prior to the last replacement of liners.

3.3. Problem Statement

The aim is to forecast the operational relative-hardness. To do so, we need to label the datasets with the associated ORH category at data point. We know from Equation (1) that the ORH labelling process requires as input (i) the one-step forward differences on energy consumption (

Δ E C_{t}

) and feed tonnage (

Δ F T_{t}

), and (ii) a lambda (

λ

) value. In addition, we are interested in forecasting the ORH at different time supports.

Since the information is collected every 30 min, the upcoming energy consumption EC

_{t + 1}

and feed tonnage FT

_{t + 1}

at 0.5 h support are denoted simply as EC

_{t + 1}

and FT

_{t + 1}

in reference to EC

_{t + 1}^{(0.5 h)}

and FT

_{t + 1}^{(0.5 h)}

, respectively. An upcoming EC and FT at 1 h support, EC

_{t + 1}^{(1 h)}

and FT

_{t + 1}^{(1 h)}

, are computed by averaging the next two energy consumption, EC

_{t + 1}

and EC

_{t + 2}

, and the two feed tonnage, FT

_{t + 1}

and FT

_{t + 2}

. Similarly, by averaging the upcoming ECs and FTs, different supports can be computed. Let

s

be the time support in hours, which represents the average over a temporal interval of a given duration, then EC

_{t + 1}^{(s h)}

and FT

_{t + 1}^{(s h)}

are calculated as:

{EC}_{t + 1}^{(s h)} = \frac{{EC}_{t + 1} + \dots + {EC}_{t + 2 s}}{2 s} {FT}_{t + 1}^{(s h)} = \frac{{FT}_{t + 1} + \dots + {FT}_{t + 2 s}}{2 s}

(2)

In this experiment, three different supports (

s h

) are considered: 0.5, 2 and 8 h.

Figure 2 illustrates the ORH criteria using a half-hour time support on SAG mill 1 dataset. From the daily graph of EC

_{t}^{(0.5 h)}

and FT

_{t}^{(0.5 h)}

at the top, the graph of

Δ

EC

_{t}^{(0.5 h)}

and

Δ

FT

_{t}^{(0.5 h)}

are extracted and presented at the centre and bottom, respectively. Three different bands, corresponding to

λ

: 0.5, 1.0 and 1.5, are shown. The values that are above the band are considered as increasing, the ones below it are considered as decreasing and inside as undefined (relatively constant). The corresponding categories for EC and FT are used to define the operational relative-hardness (as in Table 1). It can be seen that, when

λ

increases, the proportions of hard and soft instances decrease. Since

λ

is an arbitrary parameter, a sensitivity analysis is performed in the range

[0.5, 1.5]

to capture its influence on the resulting LSTM accuracy to suitably learn to predict the ORH at the different time supports.

At each time t the input variables considered to predict ORH

_{t + 1}^{(s h)}

are FT

_{t}

, BPr

_{t}

and SSp

_{t}

. To account for trends, and since FT and SSp are operational decisions, the differences FT

_{t + 1} -

FT

_{t}

and SSp

_{t + 1} -

SSp

_{t}

are also considered as inputs. Therefore, the dataset of predictors and output

{X, Y} \in R^{5} \times R

, at each time support

s h

, has samples

{x_{t}, y_{t}} \in {X, Y}

made by

x_{t} = {

FT

_{t}

, BPr

_{t}

, SSp

_{t}

, FT

_{t + 1} -

FT

_{t}

, SSp

_{t + 1} -

SSp

_{t}}

and

y_{t} = \{{ORH}_{t + 1}^{(s h)}\}

. We also tried several other combinations of input variables, but all led to results with lower quality. A temporal window of the previous four hours (previous eight consecutive data points) are used as input for training and testing the LSTM models.

3.4. Preprocessing Dataset

A preprocessing step is performed over the raw datasets to make them suitable for deep neural network training and inference processes. The aim is to make all input attributes fall into certain regions of the non-linear transfer functions via normalization and to be properly coded in categories via one-hot encoding. Thus, we normalize the entire raw dataset with the mean and standard deviation of the training dataset.

Let

x_{t}^{(v a r)} \in x_{t}

be one of the five input variables (

v a r

) at time t, its normalized expression is computed as

x_{t}^{(v a r)} = \frac{v a r_{t} - m_{v a r}}{s_{v a r}}

, where

m_{v a r}

and

s_{v a r}

represent the mean and standard deviation of

v a r

in the training dataset. We normalize the first three attributes of

x_{t}

, FT

_{t}

, BPr

_{t}

and SSp

_{t}

while for last two attributes, the differences between the original values FT

_{t + 1} -

FT

_{t}

and SSp

_{t + 1} -

SSp

_{t}

, are replaced by the differences between the normalized values of FT and SSp.

The known operational relative-hardness at time t (

y_{t}

) is one-hot encoding such that soft, undefined and hard are encoded as

[1, 0, 0]

,

[0, 1, 0]

and

[0, 0, 1]

, respectively.

3.5. Optimal LSTM Architecture

From the training dataset, sequence

{x_{1}, \dots, x_{τ}}

of length

τ

are extracted to train the LSTM model in order to forecast the operational relative-hardness at next time step

τ + 1

, at different time supports. The chosen length is four hours (

τ

: 8).

The external hyper-parameter to be optimized on any LSTM architecture is the number of hidden units,

n_{H}

. Based on a previous work [14], the optimum number of hidden units was found and here used. They are displayed in Table 3.

Adam Optimizer is used to train the LSTM with hyper-parameters

ϵ = 1 \times e^{- 8}

,

β_{1} = 0.9

and

β_{2} = 0.999

as recommended by [17].

4. Results

Directly from the datasets, the real operational relative-hardness ORH

_{R}

is calculated from Equation (1), varying

λ

in the set

(0.5, 0.6, \dots, 1.4, 1.5)

at each time t and for each time support. On the other hand, a probability vector with soft, undefined and hard ORH states is predicted. By taking the highest probability, the predicted ORH

_{P}

is obtained. Then, a confusion matrix, filled with the number of instances of pairs (RH

_{R}

, RH

_{P}

), is built for each time support and each

λ

value. Table 4 summarizes and presents the cases of

λ

: 0.5,

1.0

and

1.5

, and supports 0.5, 2 and 8 h over the SAG mill 1, while the Table 5 summarizes the same results over the SAG mill 2.

The accuracy of the model prediction, ORH

_{P}

, defined as the percentage of right predictions is computed as:

O R H_{A c c u r a c y} = \frac{# ({soft}_{R}, {soft}_{P}) + # ({und}_{R}, {und}_{P}) + # ({hard}_{R}, {hard}_{P})}{# T o t a l} \cdot 100

(3)

and it represents the percentage of elements in the confusion matrix diagonal. The relative percentage of predictions of each class (rows) is shown in Table 6 for SAG mill 1 and in Table 7 for SAG mill 2.

As shown in Table 6 and Table 7 at 0.5 h time support, the LSTM is able to predict with enough confidence the ORH regardless the value of

λ

. Nevertheless, as

λ

increases, the number of instances of soft and hard ORH decreases improving the final accuracy since the higher the value of

λ

, the more data points are classified as undefined. Particularly, for 0.5 h time support, increasing

λ

from 0.5 to 1.5 makes real undefined points increase from 4325 to 6577 (from 53.0% to 80.7%) in SAG mill 1 and from 3600 to 6469 (from 45.3% to 81.4%) in SAG mill 2. Therefore, increasing

λ

improves accuracy, but the price is resolution. On the other hand, the number of extreme cases (

{soft}_{R}, {hard}_{P}

) and (

{hard}_{R}, {soft}_{P}

) is close to zero. This is a great result, since predicting soft hardness when it is actually hard (or vice versa) may induce bad short term decisions on how to operate the SAG mill, along with other downstream decisions.

The percentage of extreme cases ((

{soft}_{R}, {hard}_{P}

) and (

{hard}_{R}, {soft}_{P}

)) using

λ

:

0.5

increases when moving from 0.5 to 8 h time support, on both SAG mills. However, they decrease to a value close to zero when increasing

λ

from 0.5 to 1.5, at all time supports. However, LSTM loses accuracy in terms of predicting the relevant cases (

{soft}_{R}, {soft}_{P}

) and (

{hard}_{R}, {hard}_{P}

) as soon as the time support increases, on both SAG mills.

The accuracy graph (Figure 3) shows the

λ

sensitivity at all time supports on both SAG mills. The lower accuracy is 51% and is achieved at 2 h time supports with

λ

: 0.5 on SAG mill 1. Its accuracy increases to 66% with

λ

: 1.0 and 81% with

λ

: 1.5. The best results are achieved at 0.5 h time support (same support as the original data) where 77%, 88% and 93% of accuracy are obtained with

λ

: 0.5, 1.0 and

1.5

, respectively on SAG mill 1, and 79%, 85% and 90% of accuracy with

λ

: 0.5, 1.0 and

1.5

on SAG mill 2.

5. Conclusions

This work proposes the use of Long Short-Term Memory networks to forecast relative operational hardness in two SAG mills using operational data. We have presented the internal architecture of the deep networks, how to deal with raw operational datasets, and qualitative criteria to estimate the operational hardness of processing material inside the SAG mill based on the consumed energy, feed tonnage and a statistical distribution using a lambda value. Particularly, Long Short-Term Memory models have been trained to predict the operational relative-hardness based only on low-cost and fast acquiring operational information (feed tonnage, spindle speed and bearing pressure).

The LSTM network shows great results on predicting the relative operational hardness at 30 min time support. On SAG mill 1, using a lambda value of 0.5, the obtained accuracy was 77.3% while increasing the lambda to 1.5 led to an increase in accuracy of 93.1%. Similar results were found on the second SAG mill. As the time support increases to two and eight hours, the accuracy drops to around 52% using a lambda value of 0.5 and 78% with a lambda value of 1.5, on both SAG mills.

The inaccuracy of LSTM, when predicting extreme cases such as soft hardness when it is hard and vice-versa, is pretty low. Extreme misclassification is close to 1% at 0.5 h time support on both SAGs regardless of the lambda value. Although it increases to around 20% when increasing the time support using a lambda value of 0.5, it rapidly decreases to around 1% as lambda increases.

Lastly, the proposed application can be extended to any crushing and grinding equipment, under a similar context of real-data acquisition in order to forecast categorical attributes that are relevant to downstream processes.

Author Contributions

Conceptualization, S.A. and W.K.; methodology, S.A.; codes, S.A.; validation, S.A., W.K. and J.M.O.; formal analysis, S.A.; investigation, S.A.; resources, W.K.; data curation, S.A.; writing—original draft preparation, S.A.; visualization, S.A.; supervision, W.K. and J.M.O.; project administration, W.K.; funding acquisition, W.K. and J.M.O. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Natural Sciences and Engineering Council of Canada (NSERC) grant number RGPIN-2017-04200 and RGPAS-2017-507956, and by the Chilean National Commission for Scientific and Technological Research (CONICYT), through CONICYT/PIA Project AFB180004, and the CONICYT/FONDAP Project 15110019.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

LSTM	Long Short-Term Memory
ORH	Operational relative-hardness
FT	Feed tonnage
BPr	Bearing pressure
SSp	Spindle speed
SAG	Semi-autogenous grinding

References

Cochilco. Actualización de Información sobre el Consumo de Energía asociado a la Minería del Cobre al año 2012; COCHILCO: Santiago, Chile, 2013. [Google Scholar]
Jones, S.M.; Fresko, M. Autogenous and semiautogenous mills 2010 update. In Proceedings of the Fifth International Conference on Autogenous and Semiautogenous Grinding Technology, Vancouver, BC, Canada, 25–28 September 2011. [Google Scholar]
Ortiz, J.M.; Kracht, W.; Pamparana, G.; Haas, J. Optimization of a SAG mill energy system: Integrating rock hardness, solar irradiation, climate change, and demand-side management. Math. Geosci. 2020, 52, 355–379. [Google Scholar] [CrossRef]
Jnr, W.V.; Morrell, S. The development of a dynamic model for autogenous and semi-autogenous grinding. Miner. Eng. 1995, 8, 1285–1297. [Google Scholar]
Morrell, S. A new autogenous and semi-autogenous mill model for scale-up, design and optimisation. Miner. Eng. 2004, 17, 437–445. [Google Scholar] [CrossRef]
Silva, M.; Casali, A. Modelling SAG milling power and specific energy consumption including the feed percentage of intermediate size particles. Miner. Eng. 2015, 70, 156–161. [Google Scholar] [CrossRef]
Salazar, J.L.; Valdés-González, H.; Vyhmesiter, E.; Cubillos, F. Model predictive control of semiautogenous mills (sag). Miner. Eng. 2014, 64, 92–96. [Google Scholar] [CrossRef]
Ortiz, J.; Kracht, W.; Townley, B.; Lois, P.; Cardenas, E.; Miranda, R.; Alvarez, M. Workflows in geometallurgical prediction: Challenges and outlook. In Proceedings of the 17th Annual Conference of the International Association for Mathematical Geosciences IAMG, Freiberg, Germany, 5–13 September 2015. [Google Scholar]
Van den Boogaart, K.; Tolosana-Delgado, R. Predictive Geometallurgy: An Interdisciplinary Key Challenge for Mathematical Geosciences. In Handbook of Mathematical Geosciences; Springer: Berlin, Germany, 2018; pp. 673–686. [Google Scholar]
Curilem, M.; Acuña, G.; Cubillos, F.; Vyhmeister, E. Neural networks and support vector machine models applied to energy consumption optimization in semiautogeneous grinding. Chem. Eng. Trans. 2011, 25, 761–766. [Google Scholar]
Hoseinian, F.S.; Faradonbeh, R.S.; Abdollahzadeh, A.; Rezai, B.; Soltani-Mohammadi, S. Semi-autogenous mill power model development using gene expression programming. Powder Technol. 2017, 308, 61–69. [Google Scholar] [CrossRef]
Hoseinian, F.S.; Abdollahzadeh, A.; Rezai, B. Semi-autogenous mill power prediction by a hybrid neural genetic algorithm. J. Cent. South Univ. 2018, 25, 151–158. [Google Scholar] [CrossRef]
Inapakurthi, R.K.; Miriyala, S.S.; Mitra, K. Recurrent Neural Networks based Modelling of Industrial Grinding Operation. Chem. Eng. Sci. 2020, 219, 115585. [Google Scholar] [CrossRef]
Avalos, S.; Kracht, W.; Ortiz, J.M. Machine learning and deep learning methods in mining operations: A data-driven SAG mill energy consumption prediction application. Min. Metall. Explor. 2020, 37, 1–16. [Google Scholar] [CrossRef]
Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
Goodfellow, I.; Bengio, Y.; Courville, A.; Bengio, Y. Deep Learning; MIT Press: Cambridge, MA, USA, 2016; Volume 1. [Google Scholar]
Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]

Figure 1. Schemes. Information flow inside Long Short-Term Memory (LSTM) (left) and unrolled LSTM where the output is computed at the last recurrence (right).

Figure 2. SAG mill 1. Graphic representation of the relative-hardness inference criteria at 0.5 h time support. Daily graphs of energy consumption and feed tonnage (top), delta of energy consumption (centre), and delta of feed tonnage (bottom).

Figure 3. Accuracy of operational relative-hardness prediction at different time support as function of lambda (

λ

) on both SAG mills.

Figure 3. Accuracy of operational relative-hardness prediction at different time support as function of lambda (

λ

) on both SAG mills.

Table 1. Operational relative-hardness criteria based on one time-step difference of energy consumption and feed tonnage.

Energy Consumption	Feed Tonnage	Operational Relative-Hardness
Constant	Decreasing	Hard
Increasing	Constant	Hard
Increasing	Decreasing	Hard
Decreasing	Decreasing	Undefined
Increasing	Increasing	Undefined
Constant	Constant	Undefined
Constant	Increasing	Soft
Decreasing	Constant	Soft
Decreasing	Increasing	Soft

Table 2. Summary statistics over training testing dataset on semi-autogenous grinding mill (SAG) mills.

SAG Mill 1	Training\|Testing Dataset
Variable	Min		Mean		Max		St Dev		Count
Feed Tonnage (ton/h)	0	0	911	884	2111	1953	497	480	8170	8170
Energy Consumption (kWh)	0	0	9927	8920	12,248	10,809	1245	959	8170	8170
Bearing Pressure (psi)	0	0	12.7	11.9	13.7	13.7	2.2	2.2	8170	8170
Spindle Speed (rpm)	0	0	9.2	9.1	10.3	10.7	0.7	0.7	8170	8170
SAG Mill 2	Training\|Testing Dataset
Variable	Min		Mean		Max		St Dev		Count
Feed Tonnage (ton/h)	0	0	2077	2073	3477	3452	1136	1134	7953	7952
Energy Consumption (kWh)	0	0	16,709	17,445	19,688	19,533	1504	1462	7953	7952
Bearing Pressure (psi)	0	0	13.8	14.8	18.3	18.3	3.5	3.8	7953	7952
Spindle Speed (rpm)	0	0	9.1	8.9	10.0	9.9	0.6	0.6	7953	7952

Table 3. Optimal number of hidden units in the LSTM architecture at different time supports [14].

LSTM	SAG Mill 1			SAG Mill 2
Time support	$O R H^{(0.5 h)}$	$O R H^{(2 h)}$	$O R H^{(8 h)}$	$O R H^{(0.5 h)}$	$O R H^{(2 h)}$	$O R H^{(8 h)}$
Model $(n_{H})$	280	240	516	596	576	488

Table 4. SAG mill 1. Confusion matrices (number of instances) of operational relative-hardness (ORH) predictions using

λ