A New Ore Grade Estimation Using Combine Machine Learning Algorithms

Accurate prediction of mineral grades is a fundamental step in mineral exploration and resource estimation, which plays a significant role in the economic evaluation of mining projects. Currently available methods are based either on geometrical approaches or geostatistical techniques that often considers the grade as a regionalised variable. In this paper, we propose a grade estimation technique that combines multilayer feed-forward neural network (NN) and k-nearest neighbour (kNN) models to estimate the grade distribution within a mineral deposit. The models were created by using the available geological information (lithology and alteration) as well as sample locations (easting, northing, and altitude) obtained from the drill hole data. The proposed approach explicitly maintains pattern recognition over the geological features and the chemical composition (mineral grade) of the data. Prior to the estimation of grades, rock types and alterations were predicted at unsampled locations using the kNN algorithm. The presented case study demonstrates that the proposed approach can predict the grades on a test dataset with a mean absolute error (MAE) of 0.507 and R2=0.528, whereas the traditional model, which only uses the coordinates of sample points as an input, yielded an MAE value of 0.862 and R2=0.112. The proposed approach is promising and could be an alternative way to estimates grades in a similar modelling tasks.


Introduction
One of the critical tasks in mining value chain is to accurately estimate the grades of interest within a mineral deposit. The grades can be in the form of observation or estimation and are used in several stages of mining that ranges from exploration to exploitation. Resource estimation, being one of the initial stages of a mining activity, is an essential step in feasibility studies as well as mine planning [1][2][3][4]. Although the state-of-the-art methodologies used in resource estimation are reasonably advanced in the mining industry, generic application of such methodologies might not be applicable for every complex geological environment [5][6][7][8][9]. Various researchers, therefore, have proposed substantial grade prediction models using diverse techniques such as inverse distance weighing (IDW), kriging and its various versions (simple kriging, ordinary kriging, lognormal kriging, indicator kriging, co-kriging, etc.), and stochastic simulations (sequential Gaussian simulation and sequential indicator simulation) [10][11][12][13][14][15][16]; however, the above-mentioned methods require an assumption in relation to the spatial correlation between samples to be estimated at non-sampled locations [17,18]. In some cases, due to the complex relationships between the grade distribution and spatial pattern variability, geostatistical methods may not give the best estimation results [19]. Consequently, these limitations and complexities inspired researchers to investigate alternative approaches that can be utilised to overcome such obstacles. Over the past few decades, several researchers focused on various learn the relationship between input variables (rock type, alteration level, and geographic position) and the output variable, which is grade.
This article is organised as follows: Section 2 presents brief information on the dataset used to demonstrate the methodology and data pre-processing. Section 3 presents the model development and implementations on a real case study. Section 4 presents the analyses of the results and discusses the findings. Finally, the conclusions of the paper are presented in Section 5.

Data Attributes and Preparation
In this study, data were collected from a gold deposit during the second phase of a resource drilling program. Most of the study area is underlain by mafic, intermediate, more rarely felsic volcanic rocks and associated volcano-sedimentary formations that are cross-cut by post tectonic granitoids. Due to confidentiality agreement, Authors do not have a permission to refer the name of deposit, location, or any other mineralization characteristics that might expose the deposit and/or the mining company. The study area extends over 2 × 3 km 2 and the data contains gold grade values (ppm) from 123 drill holes. The average distance between drill holes is approximately 25 m ( Figure 1). The average length of the boreholes is about 100 m. Samples from the boreholes were collected at intervals of 1 m. The ore is extracted from ten different lithologies. For the present study, raw borehole data were composited based on the lithology, which produced 4721 composited intervals altogether. Raw borehole data that comes from a drill hole is unusable for neural network training due to (a) intense spatial grade variability, (b) unequal sample length, (c) noise due to outlier data points that differs significantly from other observations, and (d) varying range of numerical values existing in the dataset for different variables. Several preparation steps had to be performed to be able to use the raw data in the ANN model. These steps are as follows:

•
Step 1. Creating a composite data: Raw data comes from 123 drill holes were composited in 1 m length, which is equal to the average sample length in one run. Table 1 presents the descriptive statistics of the composite dataset.

•
Step 2. Data pre-processing: Each sample is described by five attributes including coordinates (easting, northing, and altitude), rock type, and alteration level. The original raw data contains 25 different rock types and 5 different alteration types. To generalise the geological distribution, we combined similar lithologies to reduce the original rock type to 10 and used only level of argillic alteration (a0, a1, . . . , a4) that shows the high correlation with gold grade.

•
Step 3. Transformation of the categorical data into numerical values: Neural Network (NN) algorithm performs numerical calculations; therefore, it can only operate with numerical numbers; however, lithology code and alteration level that is present in the dataset has categorical values denoted by characters (GG, AC, a1, a4, etc.). To feed the ANN with numerical information, original categorical values were transformed into Boolean variables through hot-encoding, as can be seen in Table 2. This resulted in representing each of the rock types and alteration level in a series of 0 or 1 values which indicate the absence or presence of a specified condition.

•
Step 4. Data normalisation: In order to handle various data types and ranges existing in the dataset (i.e., geographical coordinates and grades), the values in each feature were normalised based on the mean and standard deviation. This was performed to scale different values of features into a common scale [54]. The value of a data point in each feature was recalculated by subtracting the population mean of a given feature from an individual point and then dividing the difference by the standard deviation of the population [55]. Each instance, x i,n of the data is transformed into x i,n as follows: where µ and σ denote the mean and standard deviation of ith feature respectively [56].

•
Step 5. Splitting dataset into training/test sets: To evaluate the model performance, the dataset was randomly divided into training (80% of the data) and testing sets (20% of the data). The model was only trained using the training data. Its performance was validated using testing dataset. It is important to point out that these two data sets have similar statistical attributes.

Model Development and Implementations
To construct the NN model in this study, spatial positions (easting, northing, and altitude), rock types, and alteration levels were used as inputs to predict the Au grade [57]. To make a prediction at each sample point, it is essential to provide the rock type and alteration information. In fact, this information is only available at the borehole locations. Hence, to use the NN model, rock type and alteration information need be provided for each non-sampled location at which the grade values are to be estimated. This was achieved by using the trained kNN model to make predictions at non-sampled locations.

Lithology and Alteration Prediction
Since its introduction by Thomas Cover [58], kNN has been widely used to solve nearly every regression and classification problem. For a given test query, kNN algorithm finds k-closest points in feature space and performs a prediction based on the values at k-closest neighbours. It repeats this for all the test data; therefore, it is often classified as "instance-based learning", or "lazy learning". Detailed descriptions of the kNN algorithm can be found in a number of textbooks [59][60][61][62]. In this study, a kNN model was developed to predict rock types and alterations at unknown locations. Python 3.6.9 [63] programming language and Scikit-learn [64] machine learning library were used to create the kNN model. The model requires the specification of k-number of neighbours to be used as a hyper-parameter, which is highly specific for a given dataset; therefore, choosing of the optimal k-value is needed prior to the model creation. A small k-value could add noise that would have significant influence on the final model. In contrast, a large k-value would create over-fitting. In this research, grid search with k-fold (K-5) cross-validation was used for determining optimal hyper-parameters for kNN algorithm. Given an acceptable search range, a grid search was applied on all possible values and "best" parameter setting minimising the losses was picked. Once the k parameter was specified, the kNN model was created and the performance of it was evaluated using the test dataset. It should be noted that the training dataset used to create the kNN model was also used for the development of the ANN model. Chosen hyper-parameters of the kNN model can be seen in Table 3. Table 3. Hyper-parameters for the k-nearest neighbours (kNN) model.

Hyper-Parameters Search Range Model Parameters
Lithology/Alteration

Neural Network Model for Grade Estimation
Since the chemical composition of mineral deposit is highly correlated with alteration and lithology, building a three-dimensional geologic model is a crucial step for predicting an accurate grade distribution. After trying a variety of NN configuration, best results with minimum error rate were obtained by an NN architecture comprising one input layer which consists of 18 neurons, 2 hidden layers each of which has 64 neurons, and 1 neuron output layer ( Figure 2). The model was built using the Python 3.6.9 programming language and Keras 2.2.5 [65] deep learning library. In order to compare the model performance against the traditional NN modelling approach in literature, we first created a separate NN model for comparison purposes. This model only uses the sample locations (coordinates) as input variables for the network to predict the Au grade. We then created the proposed model, which not only utilises the sample locations to model the grade distribution, but also the rock type and alteration levels that were predicted by the kNN algorithm. This model corresponds to the NN model. To compare the network prediction performance, NN was fed by location, rock type, and alteration, which already known. This model corresponds to real NN model. Once the three models were created, they were run on the independent test set (unseen and unused by model) to evaluate the model accuracies.
The back-propagation algorithm was used for training the neural network. It is a widely used iterative gradient algorithm designed to minimise the error between the true and predicted model outputs [66,67]. As an optimisation algorithm, RMSProp has been applied to update the weight tensors (w i ) during training. RMSProp utilises an adaptive learning rate rather than a predefined one. It uses a moving average squared gradient to normalise the gradients [68]. A user defined loss function is used to evaluate the predictions against the targets and produces an error value, which is used during training to update the weights (Figure 3). To determine the hyper-parameters of the NN model, k-fold cross validation [69] was used. We set the k-value to 5; therefore, our training set comprised 5 different equally sized partitions, each of which is comprised of 80% training dataset as for training the model and 20% of the training dataset as for model validation. For each partition, we train a network model on the remaining 'K-1' partitions and evaluate its accuracy on partition 'K' (Figure 4). Average of the losses obtained from the folds is used for the training loss. Training hyper-parameters such as number of epochs, optimiser, and learning rate were determined based on trial and error method.
validation T  T  T  T   T  validation  T  T  T   T  T  validation  T  T   T  T  T  validation  T   T  T  T  T  validation   In this paper, three performance metrics, mean absolute error (MAE) , mean square error (MSE), and coefficient of determination (R 2 ), were used to evaluate the prediction performance. Figure 5 shows MAE improvement for each epoch on training data set. To avoid network over-fitting, which is a typical issue for NN applications, 400 epoch were used for model prediction.

Results and Discussion
To assess the model prediction performance, the dataset was randomly divided 30 subsets as described earlier in Section 2. Table 4 and Figure 6 summarizes the Au grade prediction results for every simulation using each subset of data. For each simulation, the test subsets were used for evaluating the performance of the kNN and NN models. Results of Simulation 23 shown in Tables 5-10, the prediction performances of the created kNN model on rock type estimations is similar to that of alteration (80% and 74%, respectively). Although the kNN model appears to have a reasonable estimation accuracy for high rock type estimations, the model did not make any QVS lithology predictions. This is considered to be stemming from the insufficient QVS samples in the training set. Actual QVS lithologies were confused with QV lithologies.
For each test point, sample locations (easting, northing, and altitude) and lithological features were used as input values to get the mineral grade estimates. The suggested NN model in literature yielded a 0.507 for MAE and 0.529 for R 2 in Simulation 23. When the model only used the sample locations as input variables (traditional ANN model), MAE of 0.862 and R 2 of 0.112 were obtained (Table 9). This demonstrates the fact that the grades can be better estimated if lithological features such as rock type and alteration levels were also used (Figures 7-9).
As it can be seen in Figure 10, Model NN and Real NN models provide the closest real distribution grades as compared to traditional NN.
The results have shown that the suggested model has underestimated any grades between 15 ppm and 20 ppm range. Close examination of sample points shows that in the condition where mineralization is controlled by the structure and a test point located near a fault, the network could not ignore the effect of discontinuity of lithology. The proposed model stands on lithological control of mineralization, and it is relatively immune to systems that is structurally controlled. Another notable outcome is that the NN model failed to predict some of the sample points that have a high number of samples in the training set. It could be partly due to a neural network can be successfully utilised to execute probabilistic functions [70]; however, no matter how much network is trained in order to improve the prediction, independent stochastic events cannot be predicted by any neural network models.  Table 5. kNN model lithology predictions. GG  7  0  0  0  0  0  0  0  0  7  AC  1  450  0  1  3  0  0  20  24  499  ACBX  0  0  12  2  2  0  0  0  0  16  ACC  0  2  0  159  4  0  0  6  3  174  LT  0  0  0  1  21  0  0  0  0  22  BX  0  1  0  0  0  3  0  0  0  4  GML  0  0  0  0  0  0  2  0  0  2  QST  0  41  0  1  1  0  0  39  5  87  QV  0  27  1  12  0  1  0  7  77  125  QVS  0  0  0  0  0  0  0  1  8  9  All  8  521  13  176  31  4  2  73 117 945

Conclusions
Accurate grade estimation is a significant part of a mining project, as it highly influences the decisions made in both exploration and exploitation stages of mining process. In this paper, we proposed a grade estimation methodology that utilises kNN and multilayer feed forward neural network that incorporates sample locations, lithological features, and alteration levels. It has been shown in the paper that the a kNN-NN hybrid model can be successfully utilised in a grade estimation task. The proposed model does not require complicated mathematical knowledge or deep assumptions. It is a data-driven method to that utilises the relationship between input and output values. Since the model incorporated geological features, alteration levels, and sample coordinates in the learning process, it can be tailored to fit any other grade modelling tasks. Although the model successfully figured out the relationship between the input and output variables, it ignored the structural influence of mineralization, which is expected as it is generally difficult to recognise. The proposed approach primarily provides the following advantages: • The grade estimation can be more accurately modelled by having an intermediate modelling step that predicts rock types and alteration features as an input for the subsequent NN model. • The suggested model does not require any assumptions on the input variables as in geostatistics.

•
The method can be easily modified and applied into other mining resources as compared to classical resource estimation techniques.
While the proposed method has the significant advantages, the following drawbacks requires further attention in the application of the proposed method in different cases.

•
Inadequate data and depth of network can easily cause over-fitting issues in complex NN structures. The network is highly sensitive to following hyper-parameters: number of hidden layers, activation function type, number of epoch, and learning rate. • The suggested model is based on lithological control of mineralization and is highly sensitive to the existence of geological discontinuities.
It would be worth it to compare the proposed method with the traditional geostatistical methods on a case study. Furthermore, log of the grade values can be use instead of direct grades itself in the ANN model generations. Comparison of the predicted grades generated by direct value of grades itself vs. log value of the grades can be investigated as future research.