An Application of High-Dimensional Statistics to Predictive Modeling of Grade Variability

: The economic viability of a mining project depends on its efﬁcient exploration, which requires a prediction of worthwhile ore in a mine deposit. In this work, we apply the so-called LASSO methodology to estimate mineral concentration within unexplored areas. Our methodology outperforms traditional techniques not only in terms of logical consistency, but potentially also in costs reduction. Our approach is illustrated by a full source code listing and a detailed discussion of the advantages and limitations of our approach.


Introduction
Any mining project begins with the exploration stage which includes estimation of the expected economic returns, which can be made after significant mineralization has been encountered. Naturally, each exploration stage is designed to proceed to the next stage only if the results of the current stage justify this continuation. Therefore, the key goal remains the identification of high-priority targets and the description of a conceptual model upon which the mine design and the project's net present value can be established.
There are notable commercial interests in developing methods to accelerate an exploration and to more precisely identify the most realistic economic scenario. Thereby, the focus is on making predictions for unexplored areas based on limited drilling data and using this to estimate anticipated costs and potential revenues. Our method can be very useful in this context. Namely, in situations where the data is insufficient to perform any reasonable resource estimation, an Exploration Target Statement can be obtained. However, it may be that early drilling results may delineate the Exploration Target and lead to an Inferred Mineral Resource.
In this work, we utilize recent developments in high-dimensional statistics which provide a control on prediction accuracy by the reduction of the so-called "generalization error". This error is a crucial notion in modern statistical analysis and represents a cumulative result deterioration (due to uncertainty in model specification and its estimation errors) when applying predictions to new observations. The reduction of generalization error is achieved using appropriate penalization and regularization techniques, which are usually combined with some parameter selection method. In our approach, such parameter selection is based on in-sample estimation of predictive performance, effected by the so-called "cross-validation". This approach has been successful (see [1][2][3]) in estimation of very weak dependencies in large data sets. This work presents a statistically accounted decision support methodology targeted on: • improving prediction accuracy of mineral presence, which may lead to more cost-efficient exploration, • assessment of the spatial mineralization characteristics, which may provide an alternative viewpoint which contrasts with existing methodologies, • introduction of advanced statistical/artificial intelligence methods into the traditional area of geostatistics.
In this work, we demonstrate how drilling data can be combined with penalized regression to establish a statistical model focused on predictive performance. The results of this approach are compared to a traditional technique.

Background
In the literature, the term spatial prediction refers to an interpolation by weighted averages of the observed data [4]. Thereby, any existing method used in resource definition has its own drawbacks. Prediction of grade variability is often viewed as an interpolation methodology, which has been criticized for a number of restrictive assumptions [5][6][7]. Traditional geostatistical methods described in [7][8][9][10][11] presume a certain probabilistic structure obtained from the variogram. In some sense, it is supposed that the observation of a mineral concentration at location S ∈ O follows a random field (Z(S)) S∈O indexed by points of a given geological structure O ⊂ R 3 . Having adopted appropriate assumptions on this random field such as stationarity and its covariance structure and given measurements (Z(S i )) I i=1 at locations S i ∈ O, i = 1, . . . I, the estimation of Z(S 0 ) at an unobserved point S 0 ∈ {S 1 , . . . , S I } is defined in terms of conditional expectations E(Z(S 0 ) | Z(S 1 ), . . . , Z(S I )). (1) While a probabilistic framework and estimation (1) provides foundation for this work, the proposed approach is different. Instead of relying on stationarity with the inconvenient need to specify a certain dependence type, it is supposed that the dependence between spatial observations is modeled naturally, in terms of the so-called ore body-a hidden structure which contains a certain concentration of a mineral. The shape, size, and content of the ore body are random and are determined by a superposition of the so-called prime ore deposits, deposits which are modeled by ball-shape structures with uncertain radius, location and rock mass content. Having assumed that the ore deposits may occur at virtually any location of the domain, we obtain an over-determined model which can be estimated using modern techniques of high-dimensional statistics. More importantly, the proposed estimation is performed by the selection of a most appropriate model from the viewpoint of its predictive accuracy.

Remark 1.
Here, we use stimation in the statistical sense of likelihood estimation for a parameterized family of densities. In this context, the notion overdetermined stands for a situation where there are different parameter sets which represents the same likelihood maximizing density. These concepts are introduced and explained in the following sections.

High-Dimensional Statistics: LASSO and Ridging
A linear regression model is characterized by the relation where Geosciences 2020, 10, 116 are unobserved random variables ( zero-mean, with unknown variance σ 2 ), n is the sample size, p denotes number of predictors.
Note that the matrix entries X = (X i,j ) n,p i=1,j=1 are non-random by assumption. This situation is referred as fixed design. The columns X ·,j for j = 1, . . . , p are called predictor realizations. The model (2) is linear because its dependence from the coefficients β * = (β * j ) p j=1 , is linear. As mentioned above, the predictor (p variables) can be obtained via non-linear transformation of some (multivariate) explanatory factors. Similarly, the response variable can also be described by some non-linear transformation of the actual observations, for instance by their measurement on a different (logarithmic) scale. On this account, linear models are able to capture non-linear dependencies between random phenomena. In applications of linear models, the estimation of β * from observation of X and Y is typically unstable when columns of X are correlated. In this case, the least-squares estimation may be ill-posed (if eigenvalues in the matrix X X are small). Furthermore, there are applications with more potential explanatory variables than observations (n < p). In such situations, the ridge regression [12] (p. 215) which penalizes the size of the regression coefficients, suggests solving a penalized least squares problem for Euclidean norm · , λ > 0 with an appropriate penalization parameter. Note that even in the case that X does not have full rank, this problem possesses a unique solution, which is obtained from the equation where 1 stands for the identical matrix. The ridge penalty λ causes a shrinkage of the coefficientŝ This usually yields a variance reduction Var(β(λ)) < Var(β(λ )) for λ > λ > 0 but introduces a systematic error (bias) The LASSO is introduced in [1] as a regularization method for a linear regression which uses an 1 penalty to achieve a sparse solution and the name "LASSO" was also introduced as an acronym for Least Absolute Selection and Shrinkage. With the goal of the variable selection in mind, the LASSO brings a major improvement from the viewpoint of the generalization error reduction since LASSO determines β(λ) as a minimizerβ Geosciences 2020, 10, 116 4 of 18 with l 1 -norm (instead of the Euclidean norm) on the coefficients Note that in difference to the ridge regression, this minimization is not solvable by simple linear algebra but needs quadratic programming or related algorithms. More importantly, such a solution is appealing because usually many coefficients inβ(λ) vanish (which yields a kind of model selection by variables reduction). Let us explain whyβ(λ) is usually a sparse vector. Finding the unconstrained minimum is equivalent to finding the constraint minimum β → Y − Xβ 2 subject to β 1 ≤ R for an appropriate boundary parameter R = R(λ). Since the ball β 1 ≤ R has sharp edges, the solution β(λ) is a sparse vector.
Using advanced numerical schemes, it is possible to determine β(λ) for many (all relevant) penalization parameters λ ∈ R + , thus one can chose the most appropriate LASSO penalization λ through the so-called cross-validation (CV). Here are typical steps for the LASSO:

2.
Determine β(λ) on the train data, and with this β(λ) calculate the averaged mean-square error on the test data. Average these errors over all 10 samples. Do this for (all relevant) λ ∈ R + . 3.
Determine a penalty parameter λ, which yields the best averaged result on the test data. 4.
The literature suggests that various types of optimization problems can be solved with the 1 -norm as a regularizer. An overview of some basic optimization algorithms for convex problems, with an emphasis on aspects of particular relevance to regularized estimators such as the LASSO can be found in [3].

Proposed Design
Suppose that within a predetermined mineralized rock mass O ⊂ R 3 there are sparsely distributed observations at locations which are based on samples obtained from drill-core assay. Then the distribution is being estimated at each spatial point within the targeted range. To address such problem, a number of techniques have been suggested which are usually based on interpolation, such as inverse distance-weighted and radial basis functions, global polynomial and local polynomial interpolations (see [13]). However, our approach is different and originates from the observation that the occurrence of worthwhile ores is associated with an ore body, which is a geological structure containing a distinct (compared to the average area) concentration of metals. Specifically, it is proposed to estimate ore reserves through the reconstruction of such an ore body, whose shape and distribution should give a best possible explanation of the taken samples. The core idea is that having determined a notably high concentration y ∈ R + at a location x ∈ O ⊂ R 3 , the conditioned probability density for the mineral concentration at any other point x ∈ O ⊂ R 3 depends on the Euclidean distance x − x to the location and on the concentration measured there. Although this dependence is difficult to quantify, it is assumed that the higher y and the smaller the distance x − x , the more likely it is to find the mineral (in high concentration) at the point x . This idea is represented in terms of a kernel density with the assumption that given a location l ∈ O, the function y → k(x − l) represents a a spherical ore body, concentrated around a point l ∈ O. This statement can be quantified as follows: given the occurrence of a prime ore deposit located at point l ∈ O, the expectation of the (averaged) gold concentration within each rock mass volume V ⊂ O is represented by a random variable whose (conditioned) expectation is given by To reflect the expected concentration decay depending only on the distance, without taking into account any particular direction, it is assumed that k(x) depends on Euclidean distance x only and decreases with x .
Suppose further that such mineral deposits can be located at p ∈ N different points (l j ) p j=1 ⊂ O and have different intensities (β j ) p j=1 . Note that the intensity coefficient β j describes the ore concentration and, according to our modeling, the dependence of concentration on the coefficients (β j ) p j=1 is linear. To resemble an ore body by a superposition of such deposits, we suppose that the expectation of the (averaged) mineral concentration within each rock mass volume V ⊂ O is represented by a random variable whose (conditioned) expectation is given by Finally, we agree to explicitly construct such random variables for each rock mass volume V ⊂ O in terms of indexed by all Borel sets B(O) of O. Here, σ ∈ R + is a pre-specified parameter and the family {ε(V) : V ∈ B(O)} represents a so-called Gaussian white noise. This is a family of Gaussian random variables which satisfy where η denotes the Lebesgue measure and N stands for normal distribution.
Having agreed on the assumption (6) and (7), determining the shape of the ore body boils down to the estimation of the parameters l = (l j ) p j=1 and β = (β j ) p j=1 , given the observations y = (y i ) n i=1 of the mineral concentration within a number n ∈ N of volume samples (V i ) n i=1 . A generic method therefore is the maximization of the log-likelihood density where the numbers (µ i (β, l)) n i=1 stand for the expectation of the mineral concentration within the sample volume V i under the assumption that the deposit locations are l = (l j ) p j=1 and their intensities are β = (β j ) p j=1 : Following the above maximum-likelihood principle, a maximum of the function must be determined. However, a straight-forward application of such a concept has a number of drawbacks: • The log-likelihood function is not convex, and the dimension 4p of its arguments is in general very high.
• There are numerical issues determining its global maximum.
• The choice of the number p ∈ N of prime ore deposits is arbitrary and there is no obvious way to determine this number. Choosing p too low results in a poor model, whereas if p is too high, then a large number of parameters may cause over-fitting and deteriorate the predictive performance of the model.
In what follows, a different approach to prediction is suggested. The idea is based on advances in high-dimensional statistics. Utilizing these tools, the obstacles described above can be overridden. From an abstract perspective, a moderate-dimensional non-convex problem is reformulated into an extremely high-dimensional but convex optimization problem. Furthermore, a built-in penalization of parameters with the focus on the minimization of the generalization errors leads to dimension reduction while maximizing the model's predictive performance.
Let us explain our approach. Having supposed that the volume samples (V i ) n i=1 have equal sizes η(V i ) = ν 0 ∈ R + i = 1, . . . , n and assuming that the locations (l j ) p j=1 of all prime ore deposits are fixed, we observe that the log-likelihood function must be maximized with respect to the intensity parameters β = (β j ) p j=1 . However, the problem of finding a maximum of can be formulated using a standard framework of linear models, where the unknown coefficients β ∈ R p are estimated under the assumption that Y = (y i ) n i=1 are observations of the response variable Y = Xβ + ε with independent identically normally distributed random variables ε = (ε i ) n i=1 and a non-random design matrix . . , n, j = 1, . . . , p.
Note, however that the centers l = (l j ) p j=1 of prime ore deposits are fixed. To address this issue, we propose that their total number p is sufficiently high, which allows a potential prime ore deposit to exist at virtually any location. At the same time, we suggest to use a penalized regression. Instead of solving the ordinary regressionβ we address an adequate formulation in the spirit of LASSO (4) with l 1 -norm (5) on the coefficients. As mentioned above, the penalized regression has several advantages: It allows a view from the perspective of high-dimensional statistics whose methods are designed to solve linear models under the assumption that the number of explanatory variables exceeds (by magnitudes) the number of observations, having in mind that only a few (not knowing which) of the explanatory variables have non-zero coefficients. Furthermore, the penalized regression in the LASSO formulation performs an automated variable selection, by determining those non-zero coefficients which maximize the predictive performance through an estimation of the generalization error via cross-validation. Note that this method is appropriate for our statistical estimation since we suppose that at virtually each location there could be a prime ore deposit of unknown intensity. Realistically, almost all intensities shall vanish, but it is not known which deposits contribute with non-zero intensity.
Finally, let us emphasize that the goal of a geostatistical model is not to fit/smooth a given sample of observations, but to infer a mineral concentration in the entire domain with enhanced predictive ability. Of course, all relevant physical and geological considerations (for instance, the mineral clustering described in terms of the ore body concept) must be incorporated in some way into such a model. These goals are approached by our methodology.

Experiment
We illustrate our techniques on a data set which, in a real-world situation, would represent commercial in confidence information about the geological situation. This is to avoid problems with information disclosure which we outline next. Typically, in early stages of exploration, the investors have limited access to data necessary to make an investment decision. This means that any leak, reference or inaccurate statement (pessimistic or over-optimistic) may lead to a change of share prices and/or give rise to insider trading allegations if new information was not properly disclosed. That is, exploration companies tend to operate privately and competitively, which explains the difficulties in obtaining, analyzing, and presenting large and interesting real-world data sets to the public. For this reason, the following study may appear limited from many perspectives as it does not use additional geological information which would normally be available in a real-world exploration project. The reader should consider the following section as a conceptual illustration of the methodology, rather than a real-world geological investigation. We illustrate our techniques by a data set which represents restricted information about geological situation, in order to avoid problems with information disclosure

Experimental Data Presentation
A dataset was obtained from a mine site at the production phase during the grade control program. Geologically, it relates to porphyry deposits and represents an important resource and the dominant source of copper and gold. It is located approximately 100 km north of Orange NSW and contains two major elements as a source of revenue: copper (Cu) and gold (Au) with triuranium octoxide (U 3 O 8 ) as sub-product.

Remark 2.
Note that since only gold data was available, for the purpose of demonstration of the procedures described in Section 4 it is assumed that only gold is of primary interest. Lithology, stratigraphy and lithochemistry have not been available and hence have not been considered.
Assume the pit carried out a grade control exercise using samples obtained from vertical data, in order to delineate the gold-bearing ore for the next four benches. Thereby, 119 vertical blastholes horizontally spaced as 5 × 2.5 m were developed from the surface by rotary drilling practice as shown in Figure 1 and sampled in 2.5 m vertical intervals. Basic geometrical and statistical descriptions for the experimental data can be found in Tables 1 and 2. Geometrical limits of the blasthole drilling grid can be found in Table 1. On-site analysis revealed the presence of multiple clusters of gold close to the vertical blast holes (10 m each). No accurate assumptions regarding gold distribution were made at the time of the grade control program due to the lack of reliable data in between the 2.5 m intervals. The existence of gold within the bench (the area is 245 m×33 m, or 8085 m 2 ) provided an opportunity for a test of the proposed methodology for estimation of the distribution on un-sampled locations. Using actual sampling data ( Figure 1) and Tables 1 and 2, Cartesian coordinates and grades were assigned to 5975 valid Au samples. Considering that exploratory drilling cannot meaningfully take place, it was revealed that 1.
Gold content is represented by several, potentially narrow structures.

2.
The edges of the structures are poorly defined, making their definition at the time of extraction difficult.

3.
Economic cut-off grade is unknown. Lowering the cut-off grade may result in a steady increase in the tonnage to be extracted and milled. If the cut-off grade increased, the impact of dilution may be large.
Knowledge of the content between the sampled locations is highly desirable because it helps in making gold projections from one horizontal section to another and thereby for identifying high-priority structures (extraction targets) before the actual exploration commences. Thereby a typical dilemma lies in predicting the distribution between sampled locations  The area has been split into four horizontal sections to depict the third dimension (depth) in order to explore the distribution. By placing the so-called isograms of content intervals, the gold grades can be interpolated (g/tonne) to arbitrary (documented) points. Figures A1-A4 illustrate the result obtained from kriging.

Method Implementation
The input data for modelling are as follows: 5975 valid samples, blasthole ID, X; Y; Z coordinates, the positions of drilling collars on surface, the length of the sampled sections of each blasthole (given in meters), the true depth (given in metres) and the gold grade (given in g/tonne). The estimation of model parameters through an iterative application of LASSO methodology has been implemented (see Appendix B) as follows (a) At the beginning a number (p = 700) of deposit centers is randomly scattered around the area to be investigated. Each kernel is represented by a normal density whose mean equals the corresponding deposit center, with the same variance. (b) The coefficients are estimated using standard LASSO method with 10-fold cross validation for penalization parameter selection. (c) Each deposit center, which has non-zero coefficients, is disturbed (its kernel variance and kernel center are changed randomly). The deposit with zero coefficients are removed and those obtained via disturbance are added to the potential set of deposits. (d) The steps (b) and (c) are repeated (five to seven times).

Results
The distribution of gold has been successfully estimated by solving a high-dimensional convex optimization problem using standard techniques, returning results which are potentially applicable for evaluation. Both procedures (LASSO and kriging) indicate the presence of a trend of increasing grade with depth in the NE direction and the existence of internal dilution in SE direction. Both methods suggest the existence of localized occurrence of middle/high localized grades within the underlying subsection of the mining bench (flitch), the development of which may be economically profitable. Our methodology uses standard open-source implementations, is stable and requires low computing resources. More importantly it is fast: the average run time on a standard PC ranges from 110 s to 160 s. One crucial aspect of our technique is that the algorithm determines virtually the same solution for a wide range when used with different starting values and model parameters, which is in line with our expectations. Note that, in contrast with traditional interpolation, the estimated mineral concentration is smooth and plane (in the sense that it exhibits moderate spatial changes and extreme values). A comparison of LASSO computations with those obtained from kriging provides clear evidence that both methodologies yield similar results. However, the models are conceptually different: unlike kriging, the LASSO does not require variography and model fitting. Since both methods are based on different concepts, that the analysis of prediction disagreements may be helpful in uncovering potential shortcomings in both methods.
The distribution of gold has been successfully estimated by solving a high-dimensional convex optimization problem using standard techniques, returning results which are potentially applicable for evaluation. Both procedures (LASSO and kriging) indicate the presence of a trend towards North-East direction with overall increase of grade with depth and the existence of internal dilution in SE direction. Both methods suggest the existence of localized occurrence of middle/high localized grades within the underlying flitch, the development of which can be economically profitable. Our methodology uses standard open-source implementations, is stable, and requires low computing resources.
More importantly, it is fast: the average run time on a standard PC ranges from 110 s to 160 s. One crucial aspect of our technique is that the algorithm determines virtually the same solution for a wide range with different starting values and model parameters, which is in line with our expectations. Note that in difference to traditional interpolation, the estimated mineral concentration is smooth and plane.
We have concluded that the generalization error control avoids undesirable concentration spikes and turbulences inherent to interpolation methods. A comparison of LASSO computations with those obtained from kriging provides clear evidence that both methodologies render semantically comparable plots. However, the models are conceptually different: unlike kriging, the LASSO does not require variography and model fitting. Since both methods are based on different concepts, the comparison of prediction disagreements can be helpful in uncovering the singularities and shortcomings of both methods.

Interpretation
Kriging: To explore and compare the distributions, four gold favourability maps of 2.5 m thick layers were created using kriging. The cut-off grade = 0.5 g/t has been artificially applied to all maps to display the contrast between low and higher grades. The realizations (Appendix A) shown in Figures A2 and A3 indicate that depending on the depth, the bench (flitch) 10 m may contain gold concentrations ≥19.00 ppm or 19 Au g/t. Low Au grades are present at the locality 5470 N/10,230 E. Relatively high grades may occur at 5485 N/10,210 E. The cross-sections ( Figures A3 and A4) display the presence of developing trend in increasing grades with depth. The extraction and processing of rock mass of underlying flitch can be economic. However, the Eastern region shows a very low reading. From an engineering point of view, the depth below RL445 should be worthy for close attention because the underlying horizontal levels are likely to have larger amount of gold. Comparing four cross-sections side-by-side (Figures A1-A4), one may conclude that the distribution is not gradational. Obviously, the sampling intervals 2.5 m are too large to make an accurate assumption on gold potential.
LASSO: At 1.875 m depth the model predicts the presence of a developing trend towards North-East direction with overall increase of grade and the existence of internal dilution in South-East and East directions (Figure 2). At 3.75m depth a stable trend identified and model indicates significant geometrical changes accompanied by the mean and median displacements (Figure 3). At 5.625 m model renders consistent increase of the apparent grade with the depth. The distribution reflects insignificant reshaping and suggests the displacement of maximums ( Figure 4). Finally, at 7.5 m depth ( Figure 5) the apparent grade increases considerably with the increase of depth. Contours illustrate a preferential trend in the northeasterly direction. The extraction of the underlying level, i.e., below RL445 can be economically profitable. The adjacent flitch at the Easterly locality requires for careful grade control.  Comparison: Note that the above methods are not competing as they investigate the same problem from different perspectives using different concepts and metrics. However, both methods indicate the presence of an increasing trend in grades with depth. From an engineering perspective, the rate of change of gold grades in the "kriged" models is to0 abrupt, in particular between RL455-RL447.5 (as depicted in Figures A2-A4). Furthermore, the borders of the extraction targets are not well defined. The shape of dilution (waste or barren rock) is more critical during plant operation than the location of the erratic and insignificant concentrations of the gold. The Cu/Au deposits such as Marvin are usually extracted by blocks, in a 10 m lift by a shovel with bucket capacity >20 m 3 . That means the impact of the excessive amount of dilution may be extremely large. Although little evidence is available from the data to suggest that LASSO prevails over kriging in terms of predictive accuracy, note LASSO technique is designed to select a model with the strongest predictive performance-a quality target, which is absolutely not present in the kriging approach.

Advantages and limitations:
Being designed to deal with very large data sets, out methodology may be applicable to a huge amount of geological data: in a test study, the present code was able to handle more than 50.000 samples. Furthermore, for Reverse Circulation (RC) samples (which are usually large, often longer than 3.0 m and much less variable) the analysis by kriging is complex so that LASSO provides solutions significantly faster. Moreover, the variography and model fitting are not required for LASSO, unlike for kriging which is based on anisotropy depicted by variogram in different directions. That is, in cases when an ore body is unstructured and the shapes of low-grade boundaries are gradational, LASSO can assist in fast decision making on equipment. Finally, let us mention the advantages implementation: LASSO implementation as R-package glmnet is a part of open-source environment and benefits from numerous data analysis and visualization techniques which are free, well-maintained, and easily accessible. However, this advantage comes at the cost of dealing with advanced scientific computing methods which require appropriate skills.
Outlook: The authors conclude that a further development of their methodology can yield a user-friendly implementation within R-environment, similarly to our code listed in Appendix B, whose use merely requires regular drilling data updates. Such implementation can be easily integrated into any on-board drilling software. Using this, the exploration and extraction crews can observe in real time the effects of their decisions in the search for high-priority mineralization.