A Case Study of Rock Type Prediction Using Random Forests: Erdenet Copper Mine, Mongolia

: In a mine, knowledge of rock types is often desired as they are important indicators of grade, mineral processing complications, or geotechnical attributes. It is common to model the rock types with visual graphics tools using geologist-generated rock type information in exploration drillhole databases. Instead of this manual approach, this paper used random forest (RF), a machine learning (ML) algorithm, to model the rock type at Erdenet Copper Mine, Mongolia. Exploration drillhole data was used to develop the RF models and predict the rock type based on the coordinates of locations. Data selection and model evaluation methods were designed to ensure applicability for real life scenarios. In the scenario where rock type is predicted close to locations where information is available (such as in blocks being blasted), RF did very well with an overall success rate (OSR) of 89%. In the scenario where rock type was predicted for two future benches (i.e., 30 m below known locations), the best OSR was 86%. When an exploration program was simulated, performance was poor with a OSR of 59%. The results indicate that EMC can leverage RF models for short-term and long-term planning by predicting rock types within drilling blocks or future blocks quite accurately.


Introduction
Machine learning (ML) has been applied to mining and geology problems for at least two decades now [1][2][3][4][5][6]. On the mining geology side, grade estimation has been a major area of focus [7][8][9][10][11]. Machine learning techniques that were commonly applied were neural networks (NN) and support vector machines. Many also tried hybrid approaches [12]. In order to estimate iron ore grades at a mine, researchers [6] used an "extreme learning machine" (a feed forward NN) algorithm in combination with a "particle swarm optimization" approach. To fill the data gaps for geochemical element grades in a porphyry copper deposit, a multi-layer NN was used [13] along with a Gustafson-Kessel clustering algorithm. In a case study to generalize assay values for known and unknown sampled locations of a mineral sand deposit a hybrid NN was deployed. The combination included a trained, tested, and validated feed forward NN along with a geostatistics model [14]. In another instance, a genetic algorithm (GA) was used to train a NN [11] for predicting iron grades.
Researchers investigated methods for generalization, considering the complications typical in earth science data [2,15,16]. Addressing these issues, some researchers have used GA to split datasets properly into training and testing subsets [17,18]. To be method agnostic, recommendations were made on how data should be split to ensure proper evaluation of artificial intelligence models [19].
Some recent examples used ML to identify rock types based on machine operation data from drills (such as drill penetration rate) or other sensor data. Logistic regression, neural networks and gradient boosting were used by [20] to identify rock types based on sensor data in oil well directional drilling. Clustering and other techniques were applied to

Data Selection Approaches
This paper uses two approaches for selecting data for training and testing subsets, segment-based (SB) and hole-based (HB). The reasoning for the two approaches is explained in a subsequent section.
SB and HB approaches are demonstrated using Figure 2. The figure shows a dataset consisting of four holes, H1-H4. Each hole contains several lithological segments. Segments are 5 m in thickness, except when the lithological segment is less than 5 m in thickness or not a perfect multiple of 5 m. For example, consider a granodiorite intersection of 23 m, followed by 3.5 m of diorite. The granodiorite intersection will be split into five

Data Selection Approaches
This paper uses two approaches for selecting data for training and testing subsets, segment-based (SB) and hole-based (HB). The reasoning for the two approaches is explained in a subsequent section.
SB and HB approaches are demonstrated using Figure 2. The figure shows a dataset consisting of four holes, H1-H4. Each hole contains several lithological segments. Segments are 5 m in thickness, except when the lithological segment is less than 5 m in thickness or not a perfect multiple of 5 m. For example, consider a granodiorite intersection of 23 m, followed by 3.5 m of diorite. The granodiorite intersection will be split into five segments SB method, 21 segments are selected for training. Of course, segments are selected so that the training and testing subsets are similar in their distribution of rock types [19] or meet the real life considerations. In the SB method, each hole will likely contribute to both training and testing subsets. In the HB method, selection is made by holes and not by segments. Therefore, 75% of the holes are selected for the training subset. Each segment in the selected hole contributes only to the training subset. Segments in the other holes are all in the testing subset. Note that regardless of method, there would be exactly 28 rows of total data in the data set. However, while the number of rows in the training subset will be 21 in the SB approach, this will be different for the HB approach. It depends on which holes are selected for training and testing subsets. For example, if H1 is sent to the testing set, the training subset would have 22 rows.

Operational Situations and their Relationship to Evaluation Methods
In a mine, there is information about rock type in areas that are drilled. However, information is often preferred at a more granular level for operational reasons. Many times, in this scenario, there is information available close to and surrounding the nondrilled location. This operational situation is reflected in the SB strategy, where rock types are predicted at locations close to where information is available. For example, if segments 3 and 5 in Figure 2 are in the test set, they are locations close to where information is available (segments 1, 2, 4, 6). Segments are about 5 m apart. Therefore, this is similar to desiring to know the rock type in a particular production blast, since drillhole spacing in a typical blast is 5-by-5 m at EMC. Knowing the rock type has immediate operational value as it can help predict grades or mineral processing complexities.
Another situation that occurs at a mine is when information is needed for areas where hole density is sparse. This scenario is captured by the HB method. Since the holes in the In Figure 2, there are a total of 28 segments between the four holes. The figure also shows two lines that indicate two arbitrary elevations (1020 and 1000). These lines will be used later to explain additional concepts.
Assume that it is determined that 75% of the data will be selected for training. In the SB method, 21 segments are selected for training. Of course, segments are selected so that the training and testing subsets are similar in their distribution of rock types [19] or meet the real life considerations. In the SB method, each hole will likely contribute to both training and testing subsets. In the HB method, selection is made by holes and not by segments. Therefore, 75% of the holes are selected for the training subset. Each segment in the selected hole contributes only to the training subset. Segments in the other holes are all in the testing subset.
Note that regardless of method, there would be exactly 28 rows of total data in the data set. However, while the number of rows in the training subset will be 21 in the SB approach, this will be different for the HB approach. It depends on which holes are selected for training and testing subsets. For example, if H1 is sent to the testing set, the training subset would have 22 rows.

Operational Situations and Their Relationship to Evaluation Methods
In a mine, there is information about rock type in areas that are drilled. However, information is often preferred at a more granular level for operational reasons. Many times, in this scenario, there is information available close to and surrounding the non-drilled location. This operational situation is reflected in the SB strategy, where rock types are predicted at locations close to where information is available. For example, if segments 3 and 5 in Figure 2 are in the test set, they are locations close to where information is available (segments 1, 2, 4, 6). Segments are about 5 m apart. Therefore, this is similar to desiring to know the rock type in a particular production blast, since drillhole spacing in a typical blast is 5-by-5 m at EMC. Knowing the rock type has immediate operational value as it can help predict grades or mineral processing complexities.
Another situation that occurs at a mine is when information is needed for areas where hole density is sparse. This scenario is captured by the HB method. Since the holes in the test set are not known to the model, this method simulates predicting an entire drillhole between known drillholes. The difference with SB is that the distance of testing segments from training segments is much larger in HB. In HB, when a prediction is made for a test segment, it is made based on segments (training data) that are in other holes. Since holes are 50 m or more apart, predictions are essentially for locations 50 m or more away from known data. In SB, however, predictions are made based on segments, some of which are in the same hole (perhaps as close as 5 m away). SB is thus a scenario where predictions are for locations that are near to locations with known data. Hence, SB-versus-HB is also a near-versus-far comparison.
A variant of the above scenario is when information is required at depths beyond the current drilling depth. In this situation, named "SB specific to elevation" (SBE), information is available up to a given elevation, while there is interest in knowing the rock types below this elevation. Therefore, using information up to this elevation, rock type has to be predicted for deeper locations (future benches) for short-term or long-term planning purposes. In this method, all segments above the specific elevation are in training subset, while locations deeper than that are in the test subset. To define terminology, SBE-1600-1300-30 indicates the SB evaluation method where segments between 1600 m and 1300 m elevations are part of the training subset. The "30" refers to the segments in the next 30 m of depth (1270-1300 m elevation). This 30 m forms the test set. Thus, the evaluation is occurring at 1300 m elevation, with 1600-1300 m being the training set and 1270-1300 being the test set.
In the label SBE-1600-1300-30, 1600-1300 is referred to as the training interval (TI) with a training width (TW) of 300 (1600-1300 = 300), while 30 is the evaluation width. Incidentally, the highest collar elevation is 1600 m and, therefore, when the training interval starts at 1600 m, it implies all segments up to a certain depth are included in the training subset.
One may also use Figure 2 to understand this method. When applied to Figure 2, SBE-1020-1000-5 would imply that all segments of the dataset between the thick blue line and the dashed blue line would be used in the training set. Predictions will be made for 5 m below this line, i.e., one segment below the dashed line. Note that in the dataset a segment is represented by the coordinates of its centroid. Therefore, unlike Figure 2, it is always clear whether a segment is above or below a line.
In the SB and HB strategies, training and testing subsets are selected by randomly splitting the datasets [25]. In the results section, it is shown that despite the random shuffling, the characterization of the subsets is almost identical in both strategies. In the SBE strategy, training data is everything within a particular training interval, while testing data is everything within a particular evaluation width that is just outside the training interval. Since the two subsets represent different 3D spaces, there is no reason for them to be similarly characterized. Normally, this would be an improper modeling approach. However, that concern does not apply here as the intention is to test if ML can predict just outside its training area.
The ML method used in the paper is random forest (RF). RF were used for two major reasons [26]. One, unlike geostatistics, RF do not require any assumptions on the distribution of data. Two, as explained in the section below, RF tend to generalize well. RF are not new to mining geology [27,28], but since they are not a common technique in mining they are briefly presented next.

Random Forest: Background
This paper is not intended to be a manual on random forest (RF). Those seeking a deeper understanding are referred to [29], the source for this introduction. First, a note on terminology. In machine learning terminology, 'feature' refers to a database field. A drillhole database that contains the coordinates (northing, easting, elevation) and the rock type code has four features. A RF developed to determine the rock type will then be based on three features (northing, easting, elevation).
To understand random forests, one must first understand decision trees. A decision tree is a series of yes/no questions that are used to sub-divide the samples in the training Minerals 2021, 11, 1059 6 of 12 set. A question applied to a group of data acts like a boundary, as it splits the parent group into two. The child groups can then be further split using boundaries of their own. The application of decision trees is explained through an example.
Consider the training set in Figure 3 where each sample consists of x-coordinates, y-coordinates, and a binary class indicator (1 or 0). In this example, the goal of the decision tree is to determine the class for a given (x, y) location. To understand random forests, one must first understand decision trees. A decision tree is a series of yes/no questions that are used to sub-divide the samples in the training set. A question applied to a group of data acts like a boundary, as it splits the parent group into two. The child groups can then be further split using boundaries of their own. The application of decision trees is explained through an example.
Consider the training set in Figure 3 where each sample consists of x-coordinates, ycoordinates, and a binary class indicator (1 or 0). In this example, the goal of the decision tree is to determine the class for a given (x, y) location. Assume that the tree starts with the blue boundary (Y > 36), splitting the data into two. The two resultant groups are further split using the red (bottom group) and yellow (top group) boundaries. The four subgroups are numbered I-IV to assist in the description. Assume that the above was the extent of the tree, and the modeler wishes to know the class for the test point (20,5). When the decision tree is applied to the point, it lands in Group III. Therefore, the class assigned to (20,5) is the class implied by the samples in Group III. Since 1's form the majority in Group III, the class assigned to (20,5) is 1. In a regression decision tree, the assigned value can be the mean or median (or any other appropriate statistic) of the group into which the point lands. In this example, any point being evaluated will face at most two boundaries. Therefore, the depth of the tree is 2. Figure 4 shows a representation of the decision tree, with the "yes" branch progressing to the left. The location at which a boundary exists is called a node, i.e., a group of data points is a node. The final nodes are also shown (I, II, III, and IV). Assume that the tree starts with the blue boundary (Y > 36), splitting the data into two. The two resultant groups are further split using the red (bottom group) and yellow (top group) boundaries. The four subgroups are numbered I-IV to assist in the description. Assume that the above was the extent of the tree, and the modeler wishes to know the class for the test point (20,5). When the decision tree is applied to the point, it lands in Group III. Therefore, the class assigned to (20,5) is the class implied by the samples in Group III. Since 1's form the majority in Group III, the class assigned to (20,5) is 1. In a regression decision tree, the assigned value can be the mean or median (or any other appropriate statistic) of the group into which the point lands. In this example, any point being evaluated will face at most two boundaries. Therefore, the depth of the tree is 2. Figure 4 shows a representation of the decision tree, with the "yes" branch progressing to the left. The location at which a boundary exists is called a node, i.e., a group of data points is a node. The final nodes are also shown (I, II, III, and IV).
When a node is to be divided, one must first decide which feature to use for the boundary. In this example, two features are available to be used as a basis for dividing the boundary. The first boundary in the above example could have been on the X-axis instead of the Y-axis. The next design choice is to identify where to locate the boundary on the selected feature. In this example, the choice was to locate the first boundary at 36 (i.e., Y > 36). Most decision tree algorithms make both choices at once. If the number of features is low, one could systematically apply boundaries in all the features, and then pick the one where the resultant child groups have the least error (i.e., each node is homogenous and contains only or mostly samples from the same category). Notice how group IV contains only 0. This node can no longer be divided as it is fully homogeneous. The process of dividing nodes can continue till the final nodes are all homogenous or have at least one sample. One may also choose to limit the depth of the tree. Usually, a tree that is too deep When a node is to be divided, one must first decide which feature to use for the boundary. In this example, two features are available to be used as a basis for dividing the boundary. The first boundary in the above example could have been on the X-axis instead of the Y-axis. The next design choice is to identify where to locate the boundary on the selected feature. In this example, the choice was to locate the first boundary at 36 (i.e., Y > 36). Most decision tree algorithms make both choices at once. If the number of features is low, one could systematically apply boundaries in all the features, and then pick the one where the resultant child groups have the least error (i.e., each node is homogenous and contains only or mostly samples from the same category). Notice how group IV contains only 0. This node can no longer be divided as it is fully homogeneous. The process of dividing nodes can continue till the final nodes are all homogenous or have at least one sample. One may also choose to limit the depth of the tree. Usually, a tree that is too deep may not be generalized. When the number of features is large, to reduce computations, the algorithm may randomly choose a set of features to be used a basis for the boundary. Different features are then considered for different boundaries.
In a decision tree, algorithms will generally yield the same set of boundaries for a given training set if all the features are considered for every boundary. In a random forest with N training data points, decision trees are formed by randomly selecting (with replacement) N of the training data points. Thus, the same data point may be selected many times for modeling a tree, at the cost of other data points that are not selected. Multiple trees are formed this way to make the forest. When the forest is applied to determine the category for a given test point, the decisions of the various trees in the forest are combined to form the final decision. One may use different strategies to combine the decisions. Random forests have been found to be superior to a single decision tree, with generalization not being an issue [26].

RF Modeling and Results
RF models were developed using the RandomClassifier() tool in scikit [30]. Only one hyper parameter was set: maximum tree depth (MTD). It was set using trial and error runs. Tree depth was increased until performance did not increase. In other words, the shortest tree depth for the highest performance was used as the setting. The task of the RF was to predict the rock class, GDIR (1) or not (0). Table 1 shows the distribution of GDIR rock type in the training and testing subsets for the various strategies. Table 2 shows the performance of the RF models for the various strategies.
The results demonstrate the following: • The proportion of GDIR in the training and testing subsets depend on the evaluation strategy. In a decision tree, algorithms will generally yield the same set of boundaries for a given training set if all the features are considered for every boundary. In a random forest with N training data points, decision trees are formed by randomly selecting (with replacement) N of the training data points. Thus, the same data point may be selected many times for modeling a tree, at the cost of other data points that are not selected. Multiple trees are formed this way to make the forest. When the forest is applied to determine the category for a given test point, the decisions of the various trees in the forest are combined to form the final decision. One may use different strategies to combine the decisions. Random forests have been found to be superior to a single decision tree, with generalization not being an issue [26].

RF Modeling and Results
RF models were developed using the RandomClassifier() tool in scikit [30]. Only one hyper parameter was set: maximum tree depth (MTD). It was set using trial and error runs. Tree depth was increased until performance did not increase. In other words, the shortest tree depth for the highest performance was used as the setting. The task of the RF was to predict the rock class, GDIR (1) or not (0). Table 1 shows the distribution of GDIR rock type in the training and testing subsets for the various strategies. Table 2 shows the performance of the RF models for the various strategies.  The results demonstrate the following: • The proportion of GDIR in the training and testing subsets depend on the evaluation strategy.
In SB and HB, despite random shuffling, GDIR is split about evenly between training and testing subsets. This similarity between training and testing subsets is appropriate as both represent the same 3D space.
In the SBE strategies, the training subsets are much larger than the testing subsets, since the training interval (e.g. 1600-1300 implies a 300 m training interval) is much larger than the evaluation widths (e.g. 30 m). Since the two subsets represent completely different 3D spaces, the proportion of GDIR and non-GDIR in the two subsets can be quite different.
• SBE models were developed for elevations of 1300 and 1200 m, as the mine is currently operating approximately between those levels. • RF performs quite well in the SB strategy. 81% of GDIR in the test subset is detected, while 90% of non-GDIR is detected. The overall success rate (OSR) was 87%, i.e., 87% of the rocks are recognized correctly as GDIR or non-GDIR.

•
In the SBE strategy (also see Figure 5): Notice how the performance lines in Figure 5 are inclined downwards to the right. In each scenario, the performance falls as the evaluation width increases from 30 m to 60 m. This is not surprising, as a larger evaluation width tests space farther away from the modeling space. The overall accuracy is higher for higher training intervals ( Figure 6). Thus, at 1300 m, 1600-1300 (training interval = 300) outperforms 1400-1300 (training interval = 100). Similarly, at 1200 m, 1500-1200 outperforms 1300-1200. The effect is more pronounced at 1200 m elevation. The seemingly flawless performance for SBE-1300-1200 is misleading ( Table 2, column GDIR_success_prop). The ability to classify 95% of the GDIR rock type as GDIR is paired with a 71% false positive rate. In other words, the classification of rock as GDIR is unreliable. This strategy classifies most segments as GDIR. Though that results in capturing all the GDIR, it also ends up classifying non-GDIR as GDIR. This is seen in the low success rate for classifying non-GDIR.

•
The false positive rate of 9-15% (for most cases) is decent. This means that when a rock is classified as GDIR, it is most likely to be GDIR. • HB strategy showed that predicting entire holes is difficult. When a hole is hidden in its entirety, only 42% of the GDIR rock segments in the hole are classified accurately. This is accompanied by a 29% false positive rate, which is not good. um Tree Depth; NTrain = Total rows in training subset; GDIR_Train = Number of rows in training set with GDIR; rop = Proportion of GDIR in training subset; NTest = Total rows in testing subset; GDIR_Test = Number of rows in GDIR; GDIR_Test_Prop = Proportion of GDIR in testing subset; nonGDIR_Test = Number of rows in testing set with n GDIR

Discussion
Most mining operations either use the manually developed rock type models or sensor technologies to make assumptions on the rock types contained within a drill block, or in future benches/blocks. This paper tested ML algorithms as an alternative to both approaches.
The SB strategy demonstrated that given a good density of information, the gaps can be predicted with high accuracy. This would suggest that ML of existing information may be a good substitute for using technologies to detect rock types, when information is available for nearby locations.
The SBE strategies demonstrated that mine planning can benefit from ML. Erdenet Copper Mine, with a bench height of 15 m, can predict rock type two to three benches below the current depth with significant reliability.
The HB strategy demonstrated that RF machine learning cannot yet replace a drilling campaign. The HB strategy simulated data sparsity. Without data density, ML can have problems. A research team [31] cited inadequate data as the reason for overfitting when applying neural networks to estimate grades based on sample locations, lithological features and alteration levels. Another team [28] cited data density as a concern when applying RF for mineral prospectivity mapping.
Despite the mixed results, there are advantages to using RF. Unlike geostatistics, no assumptions are made about the statistical characterization of drillhole data. However, RF performs about as well as geostatistics [32]. Performance aside, geostatistical methods take advantage of spatial relationships as defined by variograms. RF does not explicitly take advantage of spatial relationships. The K-nearest neighbor machine learning technique [33], which is a version of the common inverse distance squared technique in geostatistics, does take distances into consideration. However, it is not a sophisticated algorithm. It is possible that by incorporating spatial relationships such as variograms, RF or other machine learning techniques may perform better. This would be an excellent topic for future research, would be along approaches being attempted in recent times [18].

Discussion
Most mining operations either use the manually developed rock type models or sensor technologies to make assumptions on the rock types contained within a drill block, or in future benches/blocks. This paper tested ML algorithms as an alternative to both approaches.
The SB strategy demonstrated that given a good density of information, the gaps can be predicted with high accuracy. This would suggest that ML of existing information may be a good substitute for using technologies to detect rock types, when information is available for nearby locations.
The SBE strategies demonstrated that mine planning can benefit from ML. Erdenet Copper Mine, with a bench height of 15 m, can predict rock type two to three benches below the current depth with significant reliability.
The HB strategy demonstrated that RF machine learning cannot yet replace a drilling campaign. The HB strategy simulated data sparsity. Without data density, ML can have problems. A research team [31] cited inadequate data as the reason for overfitting when applying neural networks to estimate grades based on sample locations, lithological features and alteration levels. Another team [28] cited data density as a concern when applying RF for mineral prospectivity mapping.
Despite the mixed results, there are advantages to using RF. Unlike geostatistics, no assumptions are made about the statistical characterization of drillhole data. However, RF performs about as well as geostatistics [32]. Performance aside, geostatistical methods take advantage of spatial relationships as defined by variograms. RF does not explicitly take advantage of spatial relationships. The K-nearest neighbor machine learning technique [33], which is a version of the common inverse distance squared technique in geostatistics, does take distances into consideration. However, it is not a sophisticated algorithm. It is possible that by incorporating spatial relationships such as variograms, RF or other machine learning techniques may perform better. This would be an excellent topic for future research, would be along approaches being attempted in recent times [18].

Conclusions
The machine learning technique random forest was applied to the exploration drillhole database at Erdenet Copper Mine in Mongolia to predict the presence of rock type granodiorite. Granodiorite is an important rock type at the mine as it contains 43% of the copper. The data consisted of 90,033 drillhole segments from 2823 drillholes. Most segments were 5 m in thickness. Two data selection approaches, segment-based and hole-based, were utilized to ensure that models could be tested to align with real life needs. Models were developed to test for three operational scenarios. The base SB method tested for the scenario when rock type is predicted at locations close to where rock types are known. This simulates the typical block that is blasted as part of day-to-day operation, where rock type is known in a relatively dense grid. The base HB method tested for the scenario where rock type is unknown for the entire length of a drillhole in between other drill holes. The SBE method tested for the scenario where rock type is known up to a given elevation but is unknown beyond that elevation. In the SBE method, rock types were predicted for 30, 45 and 60 m (evaluation width) beyond a specific elevation. The information made available to the models in the SBE method, or the training interval, varied from 100 m to 300 m. Given the 15 m benches at the mine, the 30, 45 and 60 m evaluation widths implied predictions to 2, 3 and 4 benches below where rock types were known.
The models performed very well in the SB scenario, with 86% of granodiorite being predicted accurately, with a false positive rate of 9%, resulting in an overall accuracy level of 89%. In the SBE method, the overall accuracy varied from 52% to 86%. Performance was better for higher training intervals, and for shorter evaluation widths. Performance was best in the SBE method at 1200 m, i.e., rock type was predicted better at 1200 m than at other elevations. The highest performance was achieved at 1200 m elevation with a training interval of 300 m and evaluation width of 30 m. The performance in the HB method was not encouraging, with an overall success rate of 59%.
This paper demonstrated that random forest-based machine learning can be very effective for predicting rock types in near distances. Predicting the entire length of a missing drillhole is, however, another story. The good performance of near-distance predictions should prompt mines to perhaps switch to machine learning over traditional manual modeling (or imperfect sensor technologies) to predict rock types in ore blocks blasted for production.