Spatial Component Analysis to Improve Mineral Estimation Using Sentinel-2 Band Ratio: Application to a Greek Bauxite Residue

: Remote sensing can be fruitfully used in the characterization of metals within stockpiles and tailings, produced from mining activities. Satellite information, in the form of band ratio, can act as an auxiliary variable, with a certain correlation with the ground primary data. In the presence of this auxiliary variable, modeled with nested structures, the spatial components without correlation can be ﬁltered out, so that the useful correlation with ground data grows. This paper investigates the possibility to substitute in a co-kriging system, the whole band ratio information, with only the correlated components. The method has been applied over a bauxite residues case study and presents three estimation alternatives: ordinary kriging, co-kriging, component co-kriging. Results have shown how using the most correlated component reduces the estimation variance and improves the estimation results. In general terms, when a good correlation with ground samples exists, co-kriging of the satellite band-ratio Component improves the reconstruction of mineral grade distribution, thus affecting the selectivity. On the other hand, the use of the components approach exalts the distance variability.


Recovery of Minerals from Stockpiles and Tailings
Raw material and metal extractions have been conducted since pre-historic times. Mining has been present everywhere in Europe, although nowadays the majority of sites are closed. This does not mean that resources have been completely depleted. Ancient mining could not benefit from the most modern extraction and processing techniques and has left significant amounts of mining residue (including tailings and stockpiles) currently present in the territory, in the forms of semi-artificial hills, lakes, and ponds. Some of them were completely stable and never reacted with the environment, while others (especially those coming from metal mining) significantly modified the environment where they were stocked. According to "Mining and Metal in a Sustainable World 2050" [1], a major gap exists in effective retreatment technology (reuse, resize, or remove) of mining residues to meet the sustainability objectives of United Nations Development Program (UNDP) in 2030 [2].
Moreover, the depletion of the in-situ reserves, the increasing need of using lower grade materials, and advances in recovery and processing technologies are the main reasons why mining wastes are considered as recoverable resources. Moreover, the environmental aspects have caused a strong push for more effective management of mining residuals in many mining sites [3,4]. As a first step, the raw material concentrations in tailings must be quantified and classified and a reliable expected revenue model should be developed to assess the feasibility of production, by giving particular attention to the presence of critical raw materials (CRMs) for the European Union (EU), with high impact on the economy and, at the same time, high risk of supply shortage [5,6].
A quantitative evaluation of the available resources in mining residues requires an exhaustive sampling that must be justified, so that a preliminary characterization phase is necessary for deciding if it is justified, proceeding with a full resource evaluation. Earth observation (EO) data can be useful for the abovementioned preliminary mapping and quantification of these mining residues, usually abandoned, in harsh environment, and with limited possibilities for full and fast sampling campaigns [7]. The main potentials of EO include the large number of easy to access data over large areas and their continuous acquisition over time, which may allow continuous land monitoring.

Exploitation of Remote Sensing Information
The most popular applications of satellite imagery refer to mapping problems, where the spectral content of images is used to recognize and characterize the observed surface, for instance, in terms of land cover or physical and chemical properties, among others. For these purposes, satellite images often require some kinds of geometrical and spectral calibration [8]. Before calibration, indeed, images are affected by artefacts, which depend on the sensor characteristics and the conditions at the time of acquisition, and that is to be removed to enhance the information considered useful. The general problem for mineral exploration and reserve characterization is the spatial distribution of the target variable, because, in most practical cases, in situ information is limited and sparse. Satellite images, instead, can provide dense additional information over the full area of interest, which can be used, especially when correlation is found with the in situ data.
Because of the fast and accessible information, many EO analyses have been used in mining areas and abandoned tailings, mainly for mapping pollution and environmental variables, beginning many years ago [9][10][11][12][13]. In the presented case studies, authors used remote sensing data and tools (such as imaging spectrometer data, and/or hyperspectral imagery combined with in situ data) to improve map accuracy of environmental pollutants affected by mining activities and their abandoned residue. In many applications, to improve map accuracy and to validate results, geostatistical approaches were used with integration of remote sensing [14][15][16]. The base of the geostatistical theory is the spatial correlation among georeferenced data, correlations exploitable for a correct and optimal numerical modeling of the regionalized variable (RV). Besides mapping the RV with high accuracy, as an important advantage, using geostatistical approaches provides an extra tool to improve the quality of the estimation, measured by the estimation variance maps [17]. In addition, multivariate geostatistical approaches provide an extra possibility to use more than one RV and model the correlations between RVs within so called co-regionalization modeling [18]. This analysis is able to not only verify some global results as the statistical correlations, but also to identify possible correlations between spatial components of the main variable with extra variables (the so-called secondary or auxiliary variables), including spatial anisotropies. Therefore, independent of the variables at hand (temperature, concentration, discovery probability, etc.) and of the spatial distribution model (estimation, simulation), geostatistics allow tackling the central problem: finding meaningful correlations and spatial modeling the unknown surface distribution of the interest variable, by extra information (which can be satellite data for example) [19,20].
As mentioned above, in the mining sector, while direct information is expensive, a good opportunity can be to exploit indirect information, much denser, but with good correlation with the direct variable. This is the case of the low-cost data provided by satellite images. There are many examples of mineral characterizations, where, by knowing the spectral properties of a surface feature, simple mathematical operations among spectral bands (called band ratio) have contributed to localize outcrops and surface deposits [21][22][23][24][25][26]. To detect a specific feature or mineral, usually at least two spectral bands are necessary, one band with higher reflectance features of the given material and another one with strong absorption features for the same material [23]. There are many studies on spectral analysis of minerals based on different satellite images and specific band ratios for different minerals in a target area mainly in geological mapping [27,28]. Iron oxides are among the most studied materials using band ratio techniques [25], because of their selective absorption of light in the visible and near-infrared range caused by transitions in the electron shell [29]. For example, for detection and mapping of an iron ore formation, ferric (Fe 3+ ) and ferrous (Fe 2+ ) iron oxide specific band ratios are suggested, and thanks to their correlation with iron samples, some iron ore grades maps can be obtained [26]. Hence, the band ratio, as an appropriate index, can be considered as additional information while mapping a specific mineral of a geological feature on the surface. This paper proposes to map the iron concentration as the strategic metal within a bauxite residue in Greece. In the first step, only direct samples from the site were used (performing ordinary kriging-OK estimation). Then, the band ratio identified for iron detection was used as additional information to map the iron variability within the bauxite residues (performing co-kriging-CK estimation). To improve the map accuracies, a new method (component co-kriging-CCK estimation) was proposed to re-construct the co-regionalization model between the sample data and the band ratio information, by exploiting the possibility of extracting a specific component from the satellite data and using it in the co-regionalization models. Finally, all three models and their results were compared to check the improvement given by the proposed model in iron estimation maps.

Co-Regionalization Model and Application: Current Practice
The most classical estimation method to map the spatial variability of a RV variable is OK, which uses available samples to predict in "not-available" points [17]. To add an additional variable in the kriging system, and moving into multivariate geostatistics, which in many cases improves the target variable prediction, there is a need of, at least, a second variable, with the spatial correlation among them [18]. This property can be calculated within the cross-covariance. Given two stationary random functions, Z 1 (x), Z 2 (x), with the means of m 1 and m 2 , the spatial cross-covariance C 12 (h) = C 21 (h), are defined in Equation (1): The secondary variable can be known in all points of the domain and displaced in a regular grid. If remote sensing data are used as a secondary variable, the space-time concept should be considered because ground samples are taken in a certain time, while satellite information is repeated over time; this produces photographs of "different" stockpiles.
In this study, an iso-time framework was adopted. An image of the stockpile at the time zero (t 0 ) was considered, as well as all the ground samples referred to the same surface remote-sensed. Therefore, the 3D reconstruction of the distribution of concentrations is relevant to a constant time.
The objective was to map the distribution of the target variable (iron concentration as the strategic metal), using Sentinel-2 satellite data as the secondary variable. Sentinel-2 is a European satellite mission for Earth observation, which is part of the Copernicus program. It provides global coverage of multispectral imagery, composed by 13 bands in the spectral range between the visible and the short-wave infrared, with a revisiting time of five days at the equator [30]. It was selected for the present study because of the availability of the images at the date of sampling, and the good spatial resolution (pixel size from 10 to 60 m, depending on the band). Moreover, the spectral bands available in Sentinel-2 data allow the computation of different band ratios useful for mineral exploration. For iron deposits, in particular, different band ratios were proposed in the literature for other multispectral sensors [21][22][23][24][25][26], which can also apply to Sentinel-2 images [31].
To map the iron variability, the classical steps are: • Using one variable: iron samples, spatial variability analysis of target variable (sample variogram and its model), and finally using the OK estimation method; • Adding extra information (as an example band ratio of iron as the secondary variable): spatial variability analysis of target variable (sample variogram and its model) and the secondary variable, the cross-correlation analysis between the target variable and the secondary variable, and finally using the CK estimation method.

•
At the end, to compare the map accuracies results, cross-validation should be performed to check if adding information can improve the results.
The first approach for mapping the mineral concentration is using just the in situ samples, which in this case are represented by the mineral concentrations from the mining residues sampling. The ordinary kriging (OK) method can be used and the estimated values in all nodes of the grid can be found by using Equation (2): where: A(x α ) is the variable known in the points x a (mineral grade from samples); λ α OK are the weights calculated with ordinary kriging method; A OK (x i ) is the estimate of the main variable in the points x i (grid nodes).
Regarding the OK model definition, its spatial variability is object of the variogram analysis. The standard model of a stationary random function with nested structures is presented in Equation (3): where: γ A (h) is the variogram model of the main variable (mineral concentration from sampling); a nug is the nugget effect; γ u (h) are the models of different nested structures (spatial components); a u are the sills of each model component.
The second approach attempts to improve the estimation of the main variable, using the secondary (auxiliary) variable. The prerequisite is to verify if a correlation exists between two variables. The value of the correlation coefficient is defined by Equation (4): where: ρ AB is the correlation coefficient between the primary (mineral's grade from sampling) and secondary variable; σ AB is the covariance between the primary and secondary variable; σ 2 A is the variance of the primary variable; σ 2 B is the variance of the secondary variable.
The CK variance allows theoretically verifying the effect of the secondary variable on reducing the estimation smoothing. The estimates by CK can be found by applying Equation (5): where: B(x i ) is the auxiliary variable known in the points x i (satellite grid nodes); λ α CK and ν CK i are the weights for the primary and secondary variables calculated by CK; A CK (x i0 ) is the estimate of the main (primary) variable in the points x io one of the grid nodes.
Moreover, in the case of CK, often a linear co-regionalization model is expected (Equation (6)): where: Usually, when the secondary variables are dense, namely available at more points than the main variable, and sufficiently correlated with the main variable, CK is typically of advantage [18].

New Perspective: Use of Spatial Components
In the case of multivariate geostatistics and in presence of a linear co-regionalization model, the selected variables (A as the target variable and B as the secondary variable) can be considered as a linear combination of independent random variables (factors) Y nug, Y i monostructure, called scale components, in addition to the mean (Equation (7)): where m A , m B are the means of the variables A and B; Y A nug , Y A u , Y B nug , Y B u are the structural components of the main and auxiliary variables, each of them being a {0,1} standard variable with a specific variogram structure.
The correlation coefficient between the iso-structure components of the main and auxiliary variable are presented in Equation (8): Given the independence of factors, the total variance of the variables is just the sum of the sills of each component. Variances and covariances are presented in Equation (9): The correlation coefficient between the main (primary) and the secondary variable is presented in Equation (10): In the case of having several scale components, there could be an advantage of using as auxiliary variable just one component instead of the whole initial variable. The justification for that derives by the observation that the correlation between the main and the auxiliary variables exists only at one scale (Equation (11)): Such observation derives by the co-regionalization model, showing only one structure in the cross variogram, or the independence of any other structure common to the main and auxiliary variables (more generally, this approach allows also the filtering the effect of a noise [18]).
In this paper, the methodology is performed to check the improvements of the iron concentration maps, due to the increase of correlation when the auxiliary variable is just the correlated component, as shown in the following relationships (Equation (12)): In terms of variograms (Equation (13)): In terms of CK, using one of the components, Y u c , the estimation results are different with respect to using the original auxiliary variable B, since the new secondary variable and the weights differ from the original one. In fact, the component co-kriging system (CCK) is similar to the original CK, but with different coefficient matrix, since the submatrix of variogram of secondary variable changes (Equation (14)): where: Y u c (x i ) is the structural component of the auxiliary variable known in the points x i (satellite grid nodes); λ α CKY and ν i CCK are the weights for the primary and auxiliary variables calculated by CCK; A CCK (x i ) is the estimation of the main (primary) variable in the points x i (grid nodes).
Note that Y B (x i ) is the true component that we do not know, so that we can implement the CCK if we estimate it by factorial kriging (FK) which respects the actual data [18] in Equation (15): The second part of Equation (15) can be checked to control the estimated value of components Y FK (x i ). We can consider that, in case of an image band, the information is dense and the estimation quality is satisfying, so that it looks justified to use the estimated value of components in Equation (15).
The estimation variances σ 2CK and σ 2CCK allow comparing the precision of the estimations. The two estimation variances are presented in Equation (16): Moreover, to check the adopted variogram models and to check if CCK could improve the estimation results, cross-validation can be performed on data. The principle of crossvalidation is to remove the target variable at each sample point x α and then predict by kriging with the proposed model. Therefore, since the true values are available, it is possible to compute the kriging error [19].

Case Study: The Bauxite Residuals of Greece
Bauxite residues (BR) remained from Bayer processing of bauxite (also commonly known as "red mud") represents important strategic wastes from mining and processing activities and they were inserted in 2020 in the list of critical raw materials for the European Union [5]. The significant amount of raw materials within these types of residues can be used as a new source of materials, specifically critical metals and rare earth elements [32]. Due to the analysis done on the BR, it has potential as a secondary resource for REE extraction [33] and TiO 2 , The case study used in this research is the bauxite residue from the alumina refinery of Mytilineos S.A. in Greece, located on the Gulf of Corinth, 136 km from Athens ( Figure 1). The exact location is at latitude 38.354177 • and longitude 22.704671 • , CRS WGS84, and its dimension is around 700 m × 600 m. Since 2006, four filter presses have been used to dewater the BR, and since 2012, all BR produced has been filter-pressed and stored as a "dry" (water content < 26%) by-product in an appropriate industrial landfill [31]. Producing the dry BR is currently known as the best technique for BR materials piling, because a lower volume of deposits is stocked, with a subsequent decrease of the risk of dam failures. The samples used were collected from daily accumulated materials (daily data) including the tonnage of materials with their mean concentration value, and the area in which they were piled within the BR areas from June to the end of July. The samples exact locations (coordinates) were assigned where trucks discharged their daily load of materials. Figure 2 shows the daily data during two months (June and July 2019). Therefore, since, during the two months, materials were not over-accumulating, the Sentinel 2 image selected at the end of July (date: 30 July 2019) is representative of materials during June and July (accumulated in the area from the first of June until the end of July). The target RV selected is iron concentration Fe 2 O 3 (%) as a strategic metal from samples obtained from BR of Greece. The daily samples were analyzed through X-ray fluorescence analysis-XRF, to obtain the iron concentration (Fe 2 O 3 %) information at alumina refinery of Mytilineos S.A. To select the secondary variable, a preliminary test has been done to choose the most relevant band ratio (with highest correlation coefficient) for iron mapping (Table 1). Since the Sentinel-2 data are used in this study, the most common iron band ratios for identifying iron [30] have been applied. It is worthwhile to note that these ratios involve only bands at the highest spatial resolution (pixel size 10 m). From the presented correlation coefficients between iron concentration and bandratios (Table 1), the one with the highest correlation (ferrous iron oxides: 4/11) is chosen as the secondary variable to map the iron concentration variability within the BR.
The histogram of iron concentration samples and the correlation between the selected band ratios (ferrous iron oxides (4/11) band ratio) is exposed in Figure 3.
Band data are extracted from the Sentinel-2 image (see Figure 2) only inside the boundaries of the BR area. The base map of the extracted band ratio values and the histogram of data are shown in Figure 4.
Considering the iron concentration as the target variable and the ferrous iron oxides band ratio as the secondary variable, it is possible, first, to map the iron variability in BR using only iron samples. Secondly, it is possible to check if map accuracies can be improved by adding the band ratio variable. Finally, by decomposition of the band ratio variable, in the case of higher correlation, using one component can improve the iron estimation maps.

Results
In the first step, using only iron samples, to perform OK, the sample variogram and variogram model is shown in Figure 5, including two spherical structures. The red bars below the variogram show the frequency of sample pairs used in variogram calculations. The variogram model parameters are presented in Table 2. Using the variogram model of Figure 5, it is possible to perform OK. Maps of Fe 2 O 3 (%) concentration variability and estimation standard deviation are presented in Figure 6.
To improve the iron estimation results, in the second step, the presented secondary variable (the ferrous iron oxides (4/11) band ratio) can be added to map the iron concentration variability.  To perform co-kriging, the sample variograms and variogram models considering both variables (iron as main variable and ferrous iron oxides band ratio as the secondary variable) and cross variogram are calculated and shown in Figure 7. The fitted model for iron concentration is equal to the model used in OK. Using the same model makes the comparisons more logical between the OK and CK. The structure and model details are presented in Table 3.  Using the presented variogram models of Figure 7, it is possible to perform CK. Maps of Fe 2 O 3 (%) concentration variability and estimation variance are presented in Figure 8. Finally, the new approach was performed by decomposition of the secondary variable (the ferrous iron oxide band ratio).
In the first step, to choose the appropriate component, the correlation coefficients are calculated, using the variogram models' structures, and due to Equation (9): Hence, the correlation coefficient between components of ferrous iron oxides band ratio and iron concentration can be calculated as bellow: Two components are related to the small range (R = 70 m) and the large range (R = 180). Therefore, it is possible to calculate the correlation coefficients: Since the first component has the highest correlation coefficient (negative correlation) with iron concentration, it is selected as the appropriate component to test the CCK.
In this step, to use the selected component in Equation (13), there is a need of estimating the component in all points of the grid. To do it, the CK is performed on band-ratio data, using only the first structure of the variogram model. To check the coherency of results from Equation (15), the estimated maps of both components (small range, R = 70 m, and large range, R = 180 m), plus the estimated mean are shown in Figure 9.
The sum of three maps (estimated components and mean estimated) is equal to the original values of ferrous iron oxides band ratio as Equation (15).
The small range component (R = 70 m) as having the higher coefficient correlation can be used as the secondary variable to perform CCK.
The parameters of variogram models are shown in Table 4. The same as CK, the fitted model on target variable (iron concentration) is equal to the previous models. Maps of Fe 2 O 3 (%) concentration using CCK method and its estimation variances are presented in Figure 10.  Finally, Figure 11 shows the main statistics of the cross-validation for three solutions, OK, CK, and CCK. Cross-validation performed by removing sample values (one-by-one) and estimating them using the selected model and neighborhood for three methods (OK, CK, and CCK). The scatter plots between estimated and true values of Fe 2 O 3 (%) at 60 sample points, standardized estimation error, and scatter plot between error and estimated values are compared for all three solutions.

Discussion
Mapping a metal distribution within an artificial resource, such as a mining waste area is quite challenging and complex. Therefore, there are not many examples of using geostatistical methods for tailings characterization. Some researchers tried to map metal variability within mine tailings using in field samples and performed ordinary kriging.
However, they faced with the challenge of a small number of samples while performing geostatistical modeling [34]. Another example is characterizing the mining residues using geostatistical co-kriging estimation [35,36]. In both examples, the traditional co-kriging method is used. However, an efficient estimation of metal variability is essential, since all economic evaluations are based on metals variability maps and estimation. Therefore, the higher accuracy of the map can make the difference, when deciding whether the exploitation of a strategical resource is economically feasible and sustainable. Iron maps variability as the main target in this work is focused within a bauxite residue in Greece. Most classical estimation method of OK is performed and estimation map has shown mainly three high grade parts (more than 50% of Fe 2 O 3 ) in north east and south east of BR. The estimation standard deviation map identified the lowest estimation variance at the samples points and a high variation where the number of samples are low (east and middle part of the BR).
By adding EO data and specifically Sentinel-2 image (a free and easily accessible image) at the date of sampling, the improvement of iron mapping was tested. The ferrous iron oxides band ratio was selected as the secondary variable to see if additional information (in a regular grid at all estimation points) can help the iron estimation mapping. Results of CK has shown a higher variability with more anomaly points (with Fe 2 O 3 concentration of more than 50%) in the estimation map. Moreover, adding the band ratio data could decrease the variability and the values of the estimation variance map. Finally, the new hypothesis of using the most correlated component was tested. Due to the co-regionalization structures, it was possible to decompose the band ratio values into mean, nugget effect, and two different range components: a small range component (70 m) and a large range component (180 m).
The correlation coefficient between each component and iron was calculated and the first component with the small range (70 m) was selected due to its higher correlation with iron. To use this component in the CK system, there was a need to estimate it for all grid points, and then the estimated component was used to perform CCK. The appropriate check was done to control the equality of Equation (13). In Figure 9, by mathematically summing at each grid node, the values of all three maps (estimated component 1 and 2 and the estimated mean), the original map of the ferrous iron oxides band ratio is obtained. This mathematical check confirms that the estimated component 1 can be used in the CCK estimation based on Equation (15).
To do the comparisons among the three utilized methods (OK, CK, and CCK), in all three estimations, the iron variogram model is the same. Moreover, the neighborhoods used for estimations were equal, to have the same condition, while mapping the iron variability.
At the end, the cross-validation was performed to check the efficiency of three methods. For the scatter plots between the true and estimated values of iron (Fe 2 O 3 ) at sample points, the higher correlation between the estimated values and true values is related to the CCK, with ρ = 0.68. The histogram of the standardized estimation errors provides an idea about the unbiasedness and also the quality of the estimate. It also helps locate the outliers, which are outside the two vertical lines corresponding to the threshold value. In all three methods, the mean of the histogram is close to zero and shows the acceptable estimation results. Then the scatter plot of the standardized estimation errors versus the estimated values is calculated, which should be with no preferential shape. The reason is the independency of the standardized error with the estimated values. Since the two variables are theoretically independent, this cloud should have no preferential shape, and this was confirmed by the low correlation coefficients calculated.

Conclusions
Mapping the strategic metals is one of the most delicate phases, and by using satellite images, an important improvement in the model quality of surface distribution can be performed. Geostatistical models offer a wide variety of powerful tools for a deep study of metals mapping and estimations. A strategic case study (a bauxite mining residue) is reported as an example to check the best method for mapping the iron concentration (Fe 2 O 3 %). The proposed method of component co-kriging highlights not only the best secondary variable for iron estimation (with higher correlation coefficient), but also improves the classical ordinary-kriging and co-kriging estimation maps. The cross-validation results confirm the improvements of the results. Hence, to sum up:

•
Remote sensing data are essential when mapping a surface feature, such as mapping the iron concentration variability; • Band ratio can be considered an important auxiliary variable in geostatistical modeling, when there is correlation between in field samples and band ratios; • Component co-kriging is an efficient method and, in case of high correlation coefficient between one component of the auxiliary variable and the main variable (in this work, the iron concentration), it can substantially improve the mapping results.