Visual Interpretation of Machine Learning: Genetical Classiﬁcation of Apatite from Various Ore Sources

: Machine learning provides solutions to a diverse range of problems in high-dimensional datasets in geosciences. However, machine learning is generally criticized for being an enigmatic black box as it focusses on results but ignores the processes. To address this issue, we used supervised decision boundary maps (SDBM) to visually illustrate and interpret the machine learning process. We constructed a SDBM to classify the ore genetics from 1551 trace element data of apatite in various types of deposits. Attribute-based visual explanation of multidimensional projections (A-MPs) was introduced to SDBM to further demonstrate the correlation between features and machine learning process. Our results show that SDBM explores the interpretability of machine learning process and the A-MPs approach reveals the role of trace elements in machine learning classiﬁcation. Combining SDBM and A-MPs methods, we propose intuitive and accurate discrimination diagrams and the most indicative elements for ore genetic types. Our work provides novel insights for the visualization application of geo-machine learning, which is expected to be a powerful tool for high-dimensional geochemical data analysis and mineral deposit exploration.


Introduction
Machine learning has become an increasingly important interdisciplinary tool in several fields of science, including geoscience [1,2]. Particularly, supervised classification is one of the tasks that are most frequently applied in geoscience [3,4]. These studies usually use training set to frame models with suitable algorithms after data collection and to evaluate the model performance using the testing set to generate the final classifier with sufficient accuracy in a rapid fashion [5][6][7][8][9][10]. However, machine learning approaches are often referred as a black box, without providing a transparent working process between the data input and output [11]. Because of the absence of interpretability behind the decision functions of most machine learning algorithms, scholars have challenges in understanding, customizing, and trusting these methods [12], which have caused skepticism regarding the reason for the predictions. Obtaining results with high accuracy and strong interpretability is still a problem in the application of machine learning in earth science. Although some approaches have tried to explain machine learning models by using feature importance, decision map, or SHAP (SHAPley Additive exPlanations) tool to select the indicative features of classification, machine learning data production process are still vague [13][14][15][16][17].
Supervised decision boundary maps (SDBM) is an advanced method for producing classifier decision boundary maps [18]. Attribute-based visual explanation of multidimensional projections (A-MPs) is a new visual approach for exploring the potential relationship between classification and data labels [19]. Both of these methods provide the possibility to explain the machine learning process. In this study, we introduce a novel visualization technique that combines SDBM and A-MPs to investigate the genesis classification of apatite.
Apatite is a ubiquitous accessory mineral in igneous, metamorphic, and clastic sedimentary rocks [20][21][22][23]. The composition of apatite varies when the tectonic environment, host-rock composition, or texture changes. Thus, apatite is considered an ideal indicator mineral for tracing the origin and evolution of geological systems and plays a key role in indicating petrogenesis and genesis of ore deposits [24][25][26][27][28][29]. Based on data analysis or machine learning, previous studies have constructed a series of binary discrimination diagrams to distinguish apatite provenance and ore genetic types [30][31][32][33]. An individual apatite trace element analysis can yield abundances of tens of trace elements, while discrimination diagrams typically only use information from two or three variables [34,35]. Because of the complex chemistry of apatite and the inherent difficulty of two-dimensional diagrams, traditional methods, such as binary or ternary discrimination diagrams, are limited in distinguishing the genetic types of apatite [5,[36][37][38][39]. Previous studies have applied machine learning methods for solving apatite classification problems and achieved accurate results [31,40]. However, the explanation of the machine learning process is still indistinct.
This study aims to shed new light on trace element features in the projection and machine learning classification and process by building A-MPs on SDBM that encapsulates different ore genetic types. The new approach proposed in this study provides a novel and more intuitive interpretation machine learning method through which users can more conveniently obtain the predicted results and understand the prediction process.

Apatite Trace Element Dataset
We collected 1551 mineralized apatite LA-ICP-MS analyses from an open-access apatite trace element dataset that covers the published data from previous studies (https://doi. org/10.5281/zenodo.7648664, accessed on 17 February 2023). The dataset covers five common ore deposit types located worldwide, including porphyry, skarn, orogenic Au, iron-oxide copper gold (IOCG), and iron-oxide apatite (IOA or Kiruna type) deposit. Table 1 summarizes the collated apatite data. Among the dataset, the 14 most commonly analyzed trace elements, La, Ce, Pr, Nd, Sm, Eu, Gd, Dy, Yb, Lu, Sr, Y, Th, and U, were selected as features to provide a consistent and optimized dataset for the subsequent work.

Data Pre-Processing
The original dataset included zero and null values caused by values below the detection limit (bdl) or values that were not reported. Therefore, null values were excluded in the dataset. Normal distribution of the dataset is a prerequisite for most machine learning methods [51]. We transformed the dataset by applying a log-ratio transformation (x transformed = lg(x + 1)) to obtain the Gaussian distribution [14]. Zero values can also be handled by this transformation. The stratified sampling method is a common sampling method that divides the dataset into several layers (five genetic types in this study), followed by random sampling from each layer, while maintaining the exact proportions of each class. The selected dataset is randomly divided into a training dataset (80%) and a testing dataset (20%), using the stratified sampling method [52].
There were 534 trace element data collected from the skarn deposit, while only 78 were collected from the IOCG deposit in the apatite trace element dataset. We applied the synthetic minority oversampling technique (SMOTE) to minimize the possible effects and eliminate imbalanced data size resulting from variations in sample size in the skarn and IOCG deposit [53,54]. This did not overestimate the results, as only the training set was oversampled, using SMOTE to eliminate the effect of the imbalanced dataset. The workflow is shown in Figure 1a.
The original dataset included zero and null values caused by values below the detection limit (bdl) or values that were not reported. Therefore, null values were excluded in the dataset. Normal distribution of the dataset is a prerequisite for most machine learning methods [51]. We transformed the dataset by applying a log-ratio transformation ( = lg (x + 1) to obtain the Gaussian distribution [14]. Zero values can also be handled by this transformation.
The stratified sampling method is a common sampling method that divides the dataset into several layers (five genetic types in this study), followed by random sampling from each layer, while maintaining the exact proportions of each class. The selected dataset is randomly divided into a training dataset (80%) and a testing dataset (20%), using the stratified sampling method [52].
There were 534 trace element data collected from the skarn deposit, while only 78 were collected from the IOCG deposit in the apatite trace element dataset. We applied the synthetic minority oversampling technique (SMOTE) to minimize the possible effects and eliminate imbalanced data size resulting from variations in sample size in the skarn and IOCG deposit [53,54]. This did not overestimate the results, as only the training set was oversampled, using SMOTE to eliminate the effect of the imbalanced dataset. The workflow is shown in Figure 1a.  [18]). y is labels of the dataset as the genetic types of apatite. f (x) is evaluation metrics of accuracy and f1 score. The generated colored diagram is SDBM (The clear SDBM is shown in Figure 2. The remaining parameters are described in the text. (c) A-MPs pipeline. The generated colored diagram is visual encoding of A-MPs (The clear plot is shown in Figure 3).

SDBM Visualization
Visualizing decision boundaries of modern machine learning classifiers can notably help in classifier design, testing, and fine-tuning [55,56]. Most visualizing methods are essentially dimensionality reduction (DR) methods: visualizing the boundaries and/or zones by projecting a high-dimensional dataset (D) to a two-dimensional scatterplot (P(D)) using projection methods (P) [12]. Based on the trained classifiers (f), similar samples (x) were grouped into the same cluster in the scatterplot. If the point P(x) is the same color, they can be considered as the same group, and vice versa. However, these 2D scatterplots have a limitation, in that it is not clear what the blank space represents.
Recently, a novel attempt called decision boundary maps (DBM) was developed to address this limitation [12,57]. The DBM method projects D to scatterplot P(D) and then inversely projects all pixels P(x) in the 2D bounding box of P(D) to create synthetic highdimensional data points P −1 (x). The points P −1 (x) are classified by classifier f, and then their corresponding pixels P(x) are colored by the assigned class labels f(P −1 (X)). DBM extends classical multidimensional projections by filling in the gaps between the projected points from a labeled dataset used to train a classifier [18].
More recently, a deep learning DR method called self-supervised network projection (SSNP) was proposed. SSNP is a rapid and uncomplicated method that reduces dimensions by replacing the true label with pseudo-labels assigned by some clustering algorithms [58] using the capabilities of clustering and reverse projecting. Using SSNP, DBM provides improved SDBM. Compared with DBM, SDBM produces results that are easier to interpret and use, while still having enough versatility.
As an extension of machine learning classification algorithms, SDBM provides an advanced visualization technique that depicts the high-dimensional decision space in a 2D visualized space. In this study, we used the support vector machine (SVM) [59] to train the classifier based on the apatite trace element dataset and then built SDBM to generate the decision boundary and/or zone. The workflow is shown in Figure 1b

Attribute-Based Visual Explanation of Multidimensional Projections
We applied attribute-based visual explanation of multidimensional projections (A-MPs) to correlate the features (apatite trace elements in our study) and decision boundaries/zones [19].
There are N n-dimensional elements p i = p 1 i , · · · , p n i in the dataset D, where N is the number of the sample and n is the dimension of the dataset. The projected element is D P = {q i = P(p i ∈ D)}. For each 2D projected point q i , we first defined its 2D neighborhood v P i = q ∈ D P q − q i ≤ ρ as all projected points closer to q i than a given radius ρ. So, an nD neighborhood of point P i is defined as v i = p ∈ D P(p) ∈ v P i . Then, we computed the global variance GV = var p 1 , · · · , var(p n ) of all dimensions over all points in D (n = 14 in our study), and the local variance LV i over v i for each point i. Next, for indicating the relative importance of dimensions, we computed the ratio between the local variance and global variance and normalize this ratio. Finally, we generated the rank µ j i of dimension j for point. The function is as follows: Lower values of rank indicate a higher interpretability of dimension and homogeneity [19]. For example, if µ Sr i is the lowest, dimension Sr is best for explaining a local neighborhood v i . These points in the 2D scatterplot cluster together because the values of Sr show a high similarity in the local neighborhood v i .
We selected top-ranked C-dimension (C = 8 in this study) for most points and colored all the points through the classification color map. Dimensions (elements) that are top-rank for many points are mapped to distinct colors. Low number of points are not colored. These dimensions are summarized into "others". The workflow is shown in Figure 1c.

Evaluation Metrics
Macro F1 score and accuracy are used to quantitatively evaluate the classifier and SDBM in this study. The calculation processes of the evaluation metrics are shown in Table 2 and Equations (1)- (4).

Results
We established the optimal classifier and SDBM for the genesis classification task constructed on our dataset. The classifier could effectively distinguish the genesis of apatite with a cross-validation accuracy of 94% and test accuracy of 89% (Table 3). The IOA deposit yielded the highest F1 score and all of the analyses were predicted correctly. The accuracy of the IOCG deposit was the lowest (F1-score = 69%). SDBM was built via the SVM classifier. The overall accuracy of SDBM was~86%, which was slightly lower than the classifier. Five genetic types of apatite were distinguished well visually and most of the analyses matched their corresponding zones. An exception was the apatite in IOCG and orogenic Au deposit, for which there was slightly overlapping (Figure 2a). A similar phenomenon was seen in the testing set ( Figure 2b). Based on the A-MPs approach, we computed the ranks of all of the dimension points and generated a visual interpretation of the SDBM diagram. All samples were colored according to the dimension (element) that contributed the most (Figure 3). The top seven dimensions that affected the clustering performance were U, Lu, Pr, Nd, Sm, Ce, and La. Most analyses clustered well on different dimensions. Eu contributed the most to the good clustering of the porphyry deposit samples after projection, and the concentration of Eu of the apatite sample in the porphyry deposit showed a high similarity. For the IOA deposit samples that performed well in the SDBM diagram, they were not controlled by one element actually, but Lu, Sm, and U simultaneously contributed the most to distinguishing

Visualization in High-Dimensional Space
Machine learning approaches are often referred to as a black box, in which the process between the input and output is invisible and unexplained [11]. SDBM offers the possibility of understanding how machine learning models work as well as an enhancement of traditional machine learning models. The shape of the decision zone and the distance of the cluster indicate the difficulty of the classification, e.g., the smooth decision boundaries represent easier classification, especially for IOA and skarn deposit [56]. Although apatite samples from skarn deposit fall into two decision zones, they have little overlap with other classes, and these two parts are clustered well separately. The performance of distinguishing apatite from the IOCG and orogenic Au deposit is relatively poor, with a large amount of apatite samples overlapping. It is the same as the result of the classifier, where the F1-score of the IOCG deposit is the lowest ( Table 3).
The proximities of the samples to the closest decision boundaries represent the confidence of classification, while they are directly proportional to uncertainties. Apatite samples from the porphyry deposit are plotted near the boundary between the IOCG and porphyry deposit. Therefore, the confidence of the predicted porphyry label was relatively low, although it was well clustered (Figure 2a). Figure 2b shows that most apatite samples from the porphyry deposit in the testing set fell near the boundary. The low confidence may be mainly attributed to the complexity of porphyry mineralization processes. Because of porphyry, mineralization spanned a broad temperature range from 250 to 1000°C; therefore, apatite crystallize in different stages of porphyry deposit may have quite different trace element signatures [60][61][62]. Apatite from the IOCG deposit is also plotted near the boundary between the IOCG and orogenic Au deposit, which also explains its low accuracy.
Despite accurate results, SDBM (DR methods) has been proven to also have some limitations. Decision zones were drawn on a specific projection plane, not on an individual dimension, and the coordinates did not represent specific features (trace elements). SDBM diagrams were not plotted in traditional two-or three-variable scatterplots. Despite

Visualization in High-Dimensional Space
Machine learning approaches are often referred to as a black box, in which the process between the input and output is invisible and unexplained [11]. SDBM offers the possibility of understanding how machine learning models work as well as an enhancement of traditional machine learning models. The shape of the decision zone and the distance of the cluster indicate the difficulty of the classification, e.g., the smooth decision boundaries represent easier classification, especially for IOA and skarn deposit [56]. Although apatite samples from skarn deposit fall into two decision zones, they have little overlap with other classes, and these two parts are clustered well separately. The performance of distinguishing apatite from the IOCG and orogenic Au deposit is relatively poor, with a large amount of apatite samples overlapping. It is the same as the result of the classifier, where the F1-score of the IOCG deposit is the lowest ( Table 3).
The proximities of the samples to the closest decision boundaries represent the confidence of classification, while they are directly proportional to uncertainties. Apatite samples from the porphyry deposit are plotted near the boundary between the IOCG and porphyry deposit. Therefore, the confidence of the predicted porphyry label was relatively low, although it was well clustered (Figure 2a). Figure 2b shows that most apatite samples from the porphyry deposit in the testing set fell near the boundary. The low confidence may be mainly attributed to the complexity of porphyry mineralization processes. Because of porphyry, mineralization spanned a broad temperature range from 250 to 1000 • C; therefore, apatite crystallize in different stages of porphyry deposit may have quite different trace element signatures [60][61][62]. Apatite from the IOCG deposit is also plotted near the boundary between the IOCG and orogenic Au deposit, which also explains its low accuracy.
Despite accurate results, SDBM (DR methods) has been proven to also have some limitations. Decision zones were drawn on a specific projection plane, not on an individual dimension, and the coordinates did not represent specific features (trace elements). SDBM diagrams were not plotted in traditional two-or three-variable scatterplots. Despite this, SDBM still provided a novel way to gain insight into how machine learning works. The degree of cluster and the distance between clusters explained the predictive score of the machine learning model. Decision zones matched equally to known properties of training samples zones for the classifier [63]. A small-size "island" of one color embedded in large zones of different colors suggested misclassifications or training problems [55].

Explanation of Multidimensional Projections
SDBM visualized the machine learning process in the high-dimensional space, and solved the "black box" problem to a certain extent. However, it still needs further studies to address the roles of the features considering the training process and the shapes of the data clusters. Based on the A-MPs approach, we described the most decisive dimensions in multidimensional projection and explained the machine learning classification [19]. SDBM showed that apatite samples in the skarn deposit fell into two decision zones (Figure 2a). Correspondingly, A-MPs also showed these two parts of the skarn deposit. Samples in the skarn deposit clustered together at the upper decision zone were mainly controlled by La, whereas samples plotted together at the lower right decision zone were mainly controlled by Ce. Furthermore, some samples from the skarn deposit were plotted at the lower right decision zone, which were controlled by La. However, these samples were located near the decision boundary and were lower than the pixel limits ( Figure 4). Ce and La contributed the most to identifying the apatite from the skarn deposit.  The A-MPs approach effectively explained the role of the features (trace elements) in the machine learning process and the projection's layout. The dimensions (elements) that were decisive for the multidimensional projections were labeled with different colors, which demonstrated the correlation between the apatite trace element data and the genetic types. Combined with the SDBM method, the A-MPs approach visualized the results of the machine learning model and solved the overlap in the IOCG and orogenic deposit.

Other Interpretation Approaches
Feature importance is a model inspection technique. After a single feature in the test dataset was shuffled, the test data were reclassified. If the test score dropped, it indicated that the model depended on this feature to a great extent. Depending on how much the model performance declined, the features were listed in order from highest to lowest to find out the most effective feature for the classification [15]. Figure 5a shows that Th, U, Eu, Sr, and Lu were the most effective elements for distinguishing the ore genetic type. However, it is not clear how these five elements affected the classification and whether they had a positive or negative impact on the classification. SHAP (SHAPley Additive exPlanations) is a game theoretic method and can explain the outputs of the machine learning models [64]. Based on the machine learning model, an interpretable model was generated. For each test sample, interpretable model generated a predicted value and assigned a numerical value (SHAP value) to each feature of An overlap between samples in the IOCG and orogenic Au deposit was recognized ( Figure 4). The SVM classifier showed that the test score of the IOCG deposit was the lowest. The A-MPs approach showed the reason for the poor performance. The samples in the A-MPs approach were not divided into IOCG and orogenic Au clusters similar to SDBM, but the classifications of both IOCG and orogenic Au deposit were mainly controlled by Pr, Nd, and Sm. In the dimensions Pr, Nd, and Sm, samples from IOCG and orogenic Au deposit were clustered into six clusters (IOCG: A1, B1, and C1; orogenic Au: A2, B2, and C2; Figure 4). The clusters in the IOCG zone were in close proximity to clusters in orogenic Au zones, which is the reason for the overlap between the IOCG and orogenic Au deposit in the SDBM diagram. In addition, the main clusters (B1 and C1) in the IOCG zone were located near the decision boundary, while the main clusters (B2 and C2) in the orogenic Au zone were located in the middle of this decision zone. It also explains why these samples were overlapped, but the testing score of IOCG was only 69% and the testing score of orogenic Au was 89% (Table 3). Samples in the IOCG deposit were simultaneously controlled by six elements (U, Lu, Pr, Nd, Sm, and La; Figure 4), suggesting training problems. This is possible, because there may have been some issues with the apatite trace element data collected from the IOCG deposit due to the limitations of how laboratories record and publish the data.
The A-MPs approach effectively explained the role of the features (trace elements) in the machine learning process and the projection's layout. The dimensions (elements) that were decisive for the multidimensional projections were labeled with different colors, which demonstrated the correlation between the apatite trace element data and the genetic types. Combined with the SDBM method, the A-MPs approach visualized the results of the machine learning model and solved the overlap in the IOCG and orogenic deposit.

Other Interpretation Approaches
Feature importance is a model inspection technique. After a single feature in the test dataset was shuffled, the test data were reclassified. If the test score dropped, it indicated that the model depended on this feature to a great extent. Depending on how much the model performance declined, the features were listed in order from highest to lowest to find out the most effective feature for the classification [15]. Figure 5a shows that Th, U, Eu, Sr, and Lu were the most effective elements for distinguishing the ore genetic type. However, it is not clear how these five elements affected the classification and whether they had a positive or negative impact on the classification. SHAP (SHAPley Additive exPlanations) is a game theoretic method and can explain the outputs of the machine learning models [64]. Based on the machine learning model, an interpretable model was generated. For each test sample, interpretable model generated a predicted value and assigned a numerical value (SHAP value) to each feature of the samples. Subsequently, the SHAP values were visualized and sorted in the summary plot to improve the transparency and interpretation of the machine learning models [16,65]. In different ore sources, the indicative elements were different and the concentration of elements also had an impact on the classification (Figure 5b-f). The ability to correlate element concentration with its contribution to classification was a significant advantage of SHAP [66]. Nevertheless, although SHAP displays the contribution of each sample for the classification in different classes well, it was still unclear how multiple features simultaneously controlled the classification results. For example, via Figure 5b, we found both low U and high Th were helpful to identify the IOA deposit. However, it is unknown which genetic type apatite with both low Th and U should be classified into. In addition, SDBM is an advanced multidimensional projection method, while SHAP is a game theoretic method. These two methods do not work well together.
In summary, neither feature importance nor SHAP provided a transparent working process. SDBM generated an intuitive discriminate diagram and revealed the classification process. According to the proximities of the samples to the closest decision boundaries and the shape of the decision zones, it explains why samples were distinguished to the specific class. On the basis of SDBM, A-MPs displayed control of the features over classification, including how features controlled the identification, and what roles the features played in the classification. Even in the IOCG region controlled by multiple elements, A-MPs exhibited the different characteristics played by different elements. However, SDBM also has the limitation that the correlation between the element concentration and classification interpretation could not be observed intuitively. Furthermore, the indicative features obtained by different calculation methods were not identical (A-MPs: U, Lu, and Pr; SHAP: Th, U, and Sr). The combination of SDBM with A-MPs and SHAP may have the potential to provide a more effective interpretation visualization approach.
tures played in the classification. Even in the IOCG region controlled by multiple elements, A-MPs exhibited the different characteristics played by different elements. However, SDBM also has the limitation that the correlation between the element concentration and classification interpretation could not be observed intuitively. Furthermore, the indicative features obtained by different calculation methods were not identical (A-MPs: U, Lu, and Pr; SHAP: Th, U, and Sr). The combination of SDBM with A-MPs and SHAP may have the potential to provide a more effective interpretation visualization approach.

Future Work
There were phenomena remaining in the SDBM and A-MPs diagrams. Samples from the skarn zones showed a certain trend from the lower right to upper left, and a low number of skarn samples were also located inside the IOCG field (Figure 2a). The IOA samples fell well within their assigned region (Figure 1a), while these points were divided into three clusters by U, Sm, and Lu ( Figure 3). For the further research, SDBM in combination with A-MPs has the potential to explore the underlying correlation between the trace elements and the ore genetic type.

Conclusions
In this study, using combined SDBM and A-MPs approaches, we provide a novel machine learning visualization method with a high accuracy and strong interpretability. SDBM offers the possibility to understand how machine learning models work and intuitively and accurately distinguish apatite genetic types. A-MPs describes the dimensions that contribute the most to the post-projection clustering and demonstrates strong correlations between high-dimensional trace-element geochemical data of apatite and ore genetic types. Under the control of La and Ce, the skarn deposit is separated into two parts from the others (mainly controlled by La and mainly controlled by Ce). IOCG and orogenic Au deposit are simultaneously controlled by Pr, Nd, and Sm; thus, there are some overlap features. Our method provides a novel insight for the visualization application of geo-machine learning and is expected to be a powerful tool for high-dimensional geochemical data analysis.