Rock Classification in a Vanadiferous Titanomagnetite Deposit Based on Supervised Machine Learning

As the potential locations of undiscovered ore deposits become deeper, a technique for predicting promising areas in the subsurface media has become necessary. Geoscience data on a wide range of underground media can be obtained through geophysical field exploration, but integration and interpretation of multi-geophysical data are difficult because of differences in spatial resolution. We developed a rock classifier that can predict promising vanadiferous titanomagnetite deposits from multi-geophysical data using supervised machine learning. Vanadiferous titanomagnetite ores are the main source of vanadium, which can be used as a large-scale energy storage system. Model training was conducted using rock samples from drilling cores, and the density of rock samples was used as a criterion for data labeling. We employed the support vector machine, random forest, extreme gradient boosting, LightGBM, and deep neural network for supervised learning, and the accuracy of all methods was 0.95 or greater. We applied trained models to three-dimensional geophysical field data to predict ore body locations. These candidate regions were distributed in the northeast of the geophysical survey area, and some classified areas were verified using a geological map.


Introduction
Mineral exploration is the first step in mine development. As the number of discovered deposits has decreased, effective exploration methods are becoming increasingly important for further development [1][2][3]. Comprehensive geoscience approaches, including geophysical, geological, and geochemical surveys, as well as drilling, contribute to the identification of ore deposits. Surveys are conducted depending on the type and location of the ore deposit, and potential ore zones are identified through the analysis of integrated data [4].
Geophysical surveys provide a wide range of information about the subsurface media. These surveys aim to detect anomalous signals generated by geophysical sources and include seismic, magnetic, electromagnetic, electrical resistivity, and induced polarization surveys [5]. To increase the success of exploration, multi-geophysical surveys, which include two or more survey types, are conducted with consideration of the physical properties of the target mineral. In the case of magnetite ore, airborne magnetic surveys are useful due to the strong magnetic force involved, and electrical resistivity and induced polarization surveys can also be performed to investigate structures in detail in the local area. However, the interpretation of multi-geophysical data is challenging. Signals recorded from each geophysical source have different spatial resolutions, complicating quantitative analysis. Furthermore, joint inversion of multiple geophysical data types requires optimization of the objective function and regularization, and initial model selection based on various geophysical properties may be difficult [6,7].

Study Area
The study area is the Gonamsan intrusion (851-873 Ma) in the northeastern region of South Korea. The Yeoncheon vanadiferous titanomagnetite (TM) deposits were the main target, located within the Gonamsan intrusion. These TM deposits have been mined since 1934, and are currently being exploited by Samyang Resources in the Gwanin magnetite mine. Although investigations using geoscience approaches have been performed over several decades [15,16], the subsurface structures have not been fully delineated due to the limitations of exploration technology and high cost. Therefore, an additional survey will contribute to the detection of unidentified ore deposits. Figure 1 shows a geological map of the Gonamsan intrusion. The intrusion extends about 3 km in the north-south direction and 1.5 km in the east-west direction. The minerals in the intrusion include vanadiferous titanomagnetite ore (Fe-Ti-V), oxide gabbro, monzogabbro-monzodiorite, and quartz-monzodiorite [17]. Our target is an orthomagmatic deposit differentiated from alkali gabbro magmas in the middle Proterozoic. Because mafic rocks are readily distinguished from surrounding rocks based on their geophysical properties, a rock classifier can help to determine the locations of ore bodies in the subsurface media from geophysical field data.  [16]). Numbers below the well symbols indicate the installation year and the number of wells. The red dashed lines represent the three parallel lines of the electrical resistivity and induced polarization survey.

Rock Samples from Drilling Cores and Magnetite Mine
In the Gonamsan intrusion, the Korean Institute of Geoscience and Mineral Resources (KIGAM) conducted a total of seven drilling operations in 2019 and 2020. The locations of the wells are shown in Figure 1; they are near the Gwanin magnetite mine. The depth of the drillings was about 300 m. We obtained 541 rock samples at different depths from drilling cores and classified their lithofacies into low-grade ore, gabbro, quartz-monzodiorite, metamorphic, and dyke. Because drilling cores do not contain highgrade ore (HO), we added 55 ore rock samples from the Gwanin magnetite mine.
We used density as the criterion for labeling data in the multi-classification problem, as magnetite has greater density than surrounding rocks. We measured the dry density of rock samples by the buoyancy method using water-saturated and dehydrated grain mass [18]. The first class consists of rock samples containing HO, with a density above 4.40 g/cm3. The average density of HO from the Gwanin magnetite mine was 4.57 g/cm3, with a small standard deviation (0.06). Thus, we selected 4.40 g/cm3, which is around the minimum density of HO, as the criterion for the first class. The second class contains candidate ore (CO), with a density ranging from 3.50 to 4.40 g/cm3. The specific value of 3.50 g/cm3 is the upper boundary of the density range of gabbro [19]. Because our targets are differentiated from alkali gabbro magmas, the potential ore zone is expected to have greater density than gabbro. The third class, host rock (HOST), has a density of 3.50 g/cm3 or less, and its physical properties are most distinct from those of ore deposits. The number of rock samples analyzed in the HO, CO, and HOST classes was 56, 126, and 415, respectively. Figure 2 shows classified rock samples in each density class.  [16]). Numbers below the well symbols indicate the installation year and the number of wells. The red dashed lines represent the three parallel lines of the electrical resistivity and induced polarization survey.

Rock Samples from Drilling Cores and Magnetite Mine
In the Gonamsan intrusion, the Korean Institute of Geoscience and Mineral Resources (KIGAM) conducted a total of seven drilling operations in 2019 and 2020. The locations of the wells are shown in Figure 1; they are near the Gwanin magnetite mine. The depth of the drillings was about 300 m. We obtained 541 rock samples at different depths from drilling cores and classified their lithofacies into low-grade ore, gabbro, quartz-monzodiorite, metamorphic, and dyke. Because drilling cores do not contain high-grade ore (HO), we added 55 ore rock samples from the Gwanin magnetite mine.
We used density as the criterion for labeling data in the multi-classification problem, as magnetite has greater density than surrounding rocks. We measured the dry density of rock samples by the buoyancy method using water-saturated and dehydrated grain mass [18]. The first class consists of rock samples containing HO, with a density above 4.40 g/cm 3 . The average density of HO from the Gwanin magnetite mine was 4.57 g/cm 3 , with a small standard deviation (0.06). Thus, we selected 4.40 g/cm 3 , which is around the minimum density of HO, as the criterion for the first class. The second class contains candidate ore (CO), with a density ranging from 3.50 to 4.40 g/cm 3 . The specific value of 3.50 g/cm 3 is the upper boundary of the density range of gabbro [19]. Because our targets are differentiated from alkali gabbro magmas, the potential ore zone is expected to have greater density than gabbro. The third class, host rock (HOST), has a density of 3.50 g/cm 3 or less, and its physical properties are most distinct from those of ore deposits. The number of rock samples analyzed in the HO, CO, and HOST classes was 56, 126, and 415, respectively.

Laboratory Experiment
KIGAM conducted geophysical field exploration, including electrical resistivity, induced polarization, and airborne magnetic surveys, from 2019 to 2021. Through inversion of the geophysical data resulting from such surveys, electrical resistivity, chargeability, and magnetic susceptibility can be calculated.
To use the geophysical properties of drilling cores as features for training, we measured the electrical resistivity, chargeability, and magnetic susceptibility of all 596 rock samples [17]. Resistivity (ρ) is a coefficient of Ohm's law that represents the impedance of direct current and is calculated from the cross-sectional area and length of the rock. As metals generally have high electrical conductivity, i.e., low resistivity, the resistivity distribution is useful for the exploration of magnetite deposits [20]. Chargeability (mV/V) represents the overvoltage effect of induced polarization. When the current between electrical poles stops, the voltage does not immediately reach zero, and overvoltage remains for a short time due to the effect of polarization [21]. In the time domain, the chargeability can be theoretically measured as the ratio of the measured voltage to the overvoltage, but obtaining overvoltage at the moment at which the current is cut off is difficult. Therefore, the integral over a specific time period is generally used, which is known as the apparent chargeability (ms). Magnetic susceptibility (dimensionless) is a geological property defined by the ratio of magnetization of the rocks to magnetic field strength and is an important property for the exploration of magnetite, where basic and ultra-basic rocks have high magnetic susceptibility [17]. Figure 3 shows the distribution of geophysical properties using histograms of logarithmic values. It can be seen that the geophysical properties are distributed in a wide range even after log transformation, especially in electrical resistivity and magnetic susceptibility.  The geophysical properties of the HO, CO, and HOST classes are shown in Figure 4. Higher density classes have lower electrical resistivity, along with higher chargeability and magnetic susceptibility. These trends are consistent with the characteristics of magnetite ore outlined in the previous paragraph. Therefore, the measured geophysical properties can be used as features for supervised learning to classify rock samples.

Laboratory Experiment
KIGAM conducted geophysical field exploration, including electrical resistivity, induced polarization, and airborne magnetic surveys, from 2019 to 2021. Through inversion of the geophysical data resulting from such surveys, electrical resistivity, chargeability, and magnetic susceptibility can be calculated.
To use the geophysical properties of drilling cores as features for training, we measured the electrical resistivity, chargeability, and magnetic susceptibility of all 596 rock samples [17]. Resistivity (ρ) is a coefficient of Ohm's law that represents the impedance of direct current and is calculated from the cross-sectional area and length of the rock. As metals generally have high electrical conductivity, i.e., low resistivity, the resistivity distribution is useful for the exploration of magnetite deposits [20]. Chargeability (mV/V) represents the overvoltage effect of induced polarization. When the current between electrical poles stops, the voltage does not immediately reach zero, and overvoltage remains for a short time due to the effect of polarization [21]. In the time domain, the chargeability can be theoretically measured as the ratio of the measured voltage to the overvoltage, but obtaining overvoltage at the moment at which the current is cut off is difficult. Therefore, the integral over a specific time period is generally used, which is known as the apparent chargeability (ms). Magnetic susceptibility (dimensionless) is a geological property defined by the ratio of magnetization of the rocks to magnetic field strength and is an important property for the exploration of magnetite, where basic and ultra-basic rocks have high magnetic susceptibility [17]. Figure 3 shows the distribution of geophysical properties using histograms of logarithmic values. It can be seen that the geophysical properties are distributed in a wide range even after log transformation, especially in electrical resistivity and magnetic susceptibility.

Laboratory Experiment
KIGAM conducted geophysical field exploration, including electrical resistivity, induced polarization, and airborne magnetic surveys, from 2019 to 2021. Through inversion of the geophysical data resulting from such surveys, electrical resistivity, chargeability, and magnetic susceptibility can be calculated.
To use the geophysical properties of drilling cores as features for training, we measured the electrical resistivity, chargeability, and magnetic susceptibility of all 596 rock samples [17]. Resistivity (ρ) is a coefficient of Ohm's law that represents the impedance of direct current and is calculated from the cross-sectional area and length of the rock. As metals generally have high electrical conductivity, i.e., low resistivity, the resistivity distribution is useful for the exploration of magnetite deposits [20]. Chargeability (mV/V) represents the overvoltage effect of induced polarization. When the current between electrical poles stops, the voltage does not immediately reach zero, and overvoltage remains for a short time due to the effect of polarization [21]. In the time domain, the chargeability can be theoretically measured as the ratio of the measured voltage to the overvoltage, but obtaining overvoltage at the moment at which the current is cut off is difficult. Therefore, the integral over a specific time period is generally used, which is known as the apparent chargeability (ms). Magnetic susceptibility (dimensionless) is a geological property defined by the ratio of magnetization of the rocks to magnetic field strength and is an important property for the exploration of magnetite, where basic and ultra-basic rocks have high magnetic susceptibility [17]. Figure 3 shows the distribution of geophysical properties using histograms of logarithmic values. It can be seen that the geophysical properties are distributed in a wide range even after log transformation, especially in electrical resistivity and magnetic susceptibility. The geophysical properties of the HO, CO, and HOST classes are shown in Figure 4. Higher density classes have lower electrical resistivity, along with higher chargeability and magnetic susceptibility. These trends are consistent with the characteristics of magnetite ore outlined in the previous paragraph. Therefore, the measured geophysical properties can be used as features for supervised learning to classify rock samples. The geophysical properties of the HO, CO, and HOST classes are shown in Figure 4. Higher density classes have lower electrical resistivity, along with higher chargeability and magnetic susceptibility. These trends are consistent with the characteristics of magnetite ore outlined in the previous paragraph. Therefore, the measured geophysical properties can be used as features for supervised learning to classify rock samples.

Data Preprocessing
To ensure that training is unbiased, the skewness and kurtosis of the data must be checked [22]. Table 1 shows the skewness and kurtosis of the raw and transformed training data. As the raw data for electrical and magnetic susceptibility are quite skewed, we applied log transformation to these parameters, while square root transformation was applied to chargeability to remove its relatively mild skewness. The absolute skewness of all transformed data was reduced after each transformation, and kurtosis was acceptable. We decided to divide the whole data into 80% as a training dataset and 20% as a test dataset. The number of rock samples for the HO, CO, and HOST classes of the training dataset was 45, 100, and 331, respectively. To compensate for the imbalance of data, we oversampled HO and CO data and undersampled HOST data, which made the number of classes equal to 150. The Synthetic Minority Oversampling Technique (SMOTE) based on the k-nearest neighbor algorithm was used to create synthetic HO and CO data [23].

ML for Rock Classification
To generate a rock classifier using ML methods, we devised SVM, RF, XGB, and LGBM models using the Scikit-learn library [24], and a DNN model with the Keras package [25]. Figure 5 shows schematic diagrams of each method. SVM solves the classification and regression problems using hyperplanes determined from the maximum margin between groups [26]. Soft margins were set to correct data deviating from the average value, and kernel tricks were used for nonlinear classification. These tricks replaced the inner product with kernel functions to reduce the computational cost incurred when mapping low-dimensionality spaces into high-dimensionality spaces. The kernel functions included linear, polynomial, and Gaussian radial basis functions. RF is an ensemble learning method based on a decision tree model [27]. The decision tree model has layer structures with edges and nodes and is used for classification or regression while breaking a dataset down into smaller subsets (Figure 5b). However, the trained model based on the decision tree was vulnerable to overfitting, complicating its application to other datasets. On the other hand, RF can derive a generalized solution through voting on results from multiple tree models. To reduce the correlations between tree models, bagging (boost aggregating) and randomized node optimizations were used. XGB is a representative gradient boosting

Data Preprocessing
To ensure that training is unbiased, the skewness and kurtosis of the data must be checked [22]. Table 1 shows the skewness and kurtosis of the raw and transformed training data. As the raw data for electrical and magnetic susceptibility are quite skewed, we applied log transformation to these parameters, while square root transformation was applied to chargeability to remove its relatively mild skewness. The absolute skewness of all transformed data was reduced after each transformation, and kurtosis was acceptable. We decided to divide the whole data into 80% as a training dataset and 20% as a test dataset. The number of rock samples for the HO, CO, and HOST classes of the training dataset was 45, 100, and 331, respectively. To compensate for the imbalance of data, we oversampled HO and CO data and undersampled HOST data, which made the number of classes equal to 150. The Synthetic Minority Oversampling Technique (SMOTE) based on the k-nearest neighbor algorithm was used to create synthetic HO and CO data [23].

ML for Rock Classification
To generate a rock classifier using ML methods, we devised SVM, RF, XGB, and LGBM models using the Scikit-learn library [24], and a DNN model with the Keras package [25]. Figure 5 shows schematic diagrams of each method. SVM solves the classification and regression problems using hyperplanes determined from the maximum margin between groups [26]. Soft margins were set to correct data deviating from the average value, and kernel tricks were used for nonlinear classification. These tricks replaced the inner product with kernel functions to reduce the computational cost incurred when mapping low-dimensionality spaces into high-dimensionality spaces. The kernel functions included linear, polynomial, and Gaussian radial basis functions. RF is an ensemble learning method based on a decision tree model [27]. The decision tree model has layer structures with edges and nodes and is used for classification or regression while breaking a dataset down into smaller subsets (Figure 5b). However, the trained model based on the decision tree was vulnerable to overfitting, complicating its application to other datasets. On the other hand, RF can derive a generalized solution through voting on results from multiple tree models. To reduce the correlations between tree models, bagging (boost aggregating) and Minerals 2022, 12, 461 6 of 13 randomized node optimizations were used. XGB is a representative gradient boosting method (GBM) that provides parallel computation [28]. While the bagging method collects values separately from each tree model, GBM uses the residuals of previous models and reduces errors therein using gradient descent algorithms.
LGBM is an improved GBM algorithm that employs leaf-wise (vertical) growth in the tree model [29]. This algorithm can reduce the calculation time and memory requirements while retaining good accuracy. The overfitting issue occurring with leaf-wise growth can be alleviated through the selection of appropriate hyperparameters. DNN is a type of artificial neural network that contains two or more hidden layers [30]. Figure 5c illustrates the basic structure of the DNN, which consisted of input, hidden, and output layers. The nodes of each layer were connected to those of the adjacent layer through weight and bias with activation functions. The activation function types included sigmoid, hyperbolic tangent, and rectified linear unit (ReLU); we selected a function according to the type of problem and training data. The network was trained by updating each weight and bias while reducing errors between the target and output through back-propagation. The structure with multiple hidden layers contributed to the solution of complex non-linear problems.
6 of 13 method (GBM) that provides parallel computation [28]. While the bagging method collects values separately from each tree model, GBM uses the residuals of previous models and reduces errors therein using gradient descent algorithms.
LGBM is an improved GBM algorithm that employs leaf-wise (vertical) growth in the tree model [29]. This algorithm can reduce the calculation time and memory requirements while retaining good accuracy. The overfitting issue occurring with leaf-wise growth can be alleviated through the selection of appropriate hyperparameters. DNN is a type of artificial neural network that contains two or more hidden layers [30]. Figure 5c illustrates the basic structure of the DNN, which consisted of input, hidden, and output layers. The nodes of each layer were connected to those of the adjacent layer through weight and bias with activation functions. The activation function types included sigmoid, hyperbolic tangent, and rectified linear unit (ReLU); we selected a function according to the type of problem and training data. The network was trained by updating each weight and bias while reducing errors between the target and output through back-propagation. The structure with multiple hidden layers contributed to the solution of complex non-linear problems.

Optimizing the Hyperparameters of ML Methods
Each ML method has hyperparameters, which have fixed values during training. Because the performance of the model is affected by the values of these hyperparameters, they should be optimized according to the dataset. Table 2 shows the optimal hyperparameters for each ML method obtained from the grid search algorithm with five-fold cross-validation based on accuracy. The grid search algorithm compared the scores of all combinations defined by the user and identified the optimal hyperparameter set. Other hyperparameters were set to the default values of software packages. For DNN, the number of nodes and activation functions in the last layer were fixed to 3 and the softmax

Optimizing the Hyperparameters of ML Methods
Each ML method has hyperparameters, which have fixed values during training. Because the performance of the model is affected by the values of these hyperparameters, they should be optimized according to the dataset. Table 2 shows the optimal hyperparameters for each ML method obtained from the grid search algorithm with five-fold cross-validation based on accuracy. The grid search algorithm compared the scores of all combinations defined by the user and identified the optimal hyperparameter set. Other hyperparameters were set to the default values of software packages. For DNN, the number of nodes and activation functions in the last layer were fixed to 3 and the softmax function, respectively.

Model
Optimal Hyperparameter Set

Validation of ML Methods
Using the optimal hyperparameters listed in Table 2, we evaluated the ML methods via metrics including accuracy, recall score, precision, and F1 score, as shown in Table 3. For all metrics, DNN had higher scores than the other tested methods, but most other methods had scores greater than 0.95. Therefore, the rock samples could be classified with the supervised ML, based on the geophysical properties obtained through the laboratory experiment. The limitations of this evaluation include the small number of datasets and imbalance among classes. The total number of samples in the test was 120, including 11 for HO, 25 for CO, and 84 for HOST. The HO rock samples in the test dataset were properly classified by all ML methods, indicating that they are readily distinguished from other classes.

Application to Geophysical Field Data
The geophysical field data collected by KIGAM in 2019 were applied to the trained model. An extensive airborne magnetic survey was conducted first, which covered a rectangular area (3 km × 5 km) including the Gonamsan intrusion. From the magnetic anomalies, we determined the most appropriate area for electrical resistivity and induced polarization surveys in consideration of accessibility; survey locations are represented by red dashed lines in Figure 1. The electrical resistivity and induced polarization surveys were conducted on three parallel profile lines using a SuperSting R8/IP (Advanced Geo-sciences, Cedar Park, TX, USA) instrument, and three-dimensional (3D) inversion was performed to obtain geophysical properties. It is beyond the scope of this paper to provide the details of each inversion process. Figure 6 illustrates the 3D inverted electrical resistivity, chargeability, and magnetic susceptibility data obtained from each survey. Originally, magnetic inversion covered a larger region, which was downsized to the local area to match the scales of the other two surveys. The anomalous area, which is common among the three sets of inversion results, is located northeast of the exploration area and has relatively low electrical resistivity, but high chargeability and magnetic susceptibility. Before we apply the trained models to field data, it is necessary to examine the correlations between geophysical properties obtained by inversion of field data. As described in Section 3.2., in rock samples, higher density classes have lower electrical resistivity, and higher chargeability and magnetic susceptibility. However, because the resolution of the inversion results is different, the correlations between the inverted geophysical properties may not be consistent with those between the geophysical properties of rock samples. Table 4 shows the correlations between the transformed geophysical properties of rock samples and field data; mean and standard deviation values are also provided. The statistics for rock samples were obtained for 596 samples, which were used in training, and those for field data were obtained by 382,084 points of inversion results. The correlations between geophysical properties of the rock samples and field data in Table 4 are not close to each other, but have the same sign and do not differ significantly. Table 4. Correlation, mean and standard deviation (SD) of values of transformed electrical resistivity, chargeability, and magnetic susceptibility for drilling cores and field observations.  Figure 7 shows the classification results for the survey area, which were obtained using five trained models with inverted geophysical properties as input data. No areas were classified as HO in any of the analyses, and few areas were predicted to be CO. Most of the areas classified as CO were located in the northeast, corresponding to the location of an anomaly in the inversion results shown in Figure 6. The area classified as CO varied among the five ML models, with the largest area being obtained by RF and the smallest by SVM. The classification results can serve as reference information for selecting locations for further drilling or exploration. Before we apply the trained models to field data, it is necessary to examine the correlations between geophysical properties obtained by inversion of field data. As described in Section 3.2., in rock samples, higher density classes have lower electrical resistivity, and higher chargeability and magnetic susceptibility. However, because the resolution of the inversion results is different, the correlations between the inverted geophysical properties may not be consistent with those between the geophysical properties of rock samples. Table 4 shows the correlations between the transformed geophysical properties of rock samples and field data; mean and standard deviation values are also provided. The statistics for rock samples were obtained for 596 samples, which were used in training, and those for field data were obtained by 382,084 points of inversion results. The correlations between geophysical properties of the rock samples and field data in Table 4 are not close to each other, but have the same sign and do not differ significantly. Table 4. Correlation, mean and standard deviation (SD) of values of transformed electrical resistivity, chargeability, and magnetic susceptibility for drilling cores and field observations.  Figure 7 shows the classification results for the survey area, which were obtained using five trained models with inverted geophysical properties as input data. No areas were classified as HO in any of the analyses, and few areas were predicted to be CO. Most of the areas classified as CO were located in the northeast, corresponding to the location of an anomaly in the inversion results shown in Figure 6. The area classified as CO varied among the five ML models, with the largest area being obtained by RF and the smallest by SVM. The classification results can serve as reference information for selecting locations for further drilling or exploration. To verify the ML results, we compared the top view of the classification results with the geological map shown in Figure 8. The exploration area where the electrical and induced polarization surveys were conducted is located on the border between quartzmonzodiorite and monzogabbro-monzodiorite, and the eastern area is generally more mafic than the western one. Although the verification using the surface data does not guarantee the distribution in the subsurface, most areas predicted to be CO at the surface are located in the monzogabbro-monzodiorite, which is consistent with the distribution shown on the geological map. To verify the ML results, we compared the top view of the classification results with the geological map shown in Figure 8. The exploration area where the electrical and induced polarization surveys were conducted is located on the border between quartz-monzodiorite and monzogabbro-monzodiorite, and the eastern area is generally more mafic than the western one. Although the verification using the surface data does not guarantee the distribution in the subsurface, most areas predicted to be CO at the surface are located in the monzogabbro-monzodiorite, which is consistent with the distribution shown on the geological map.

Mean
the geological map shown in Figure 8. The exploration area where the electrical and induced polarization surveys were conducted is located on the border between quartzmonzodiorite and monzogabbro-monzodiorite, and the eastern area is generally more mafic than the western one. Although the verification using the surface data does not guarantee the distribution in the subsurface, most areas predicted to be CO at the surface are located in the monzogabbro-monzodiorite, which is consistent with the distribution shown on the geological map.

Difference between Laboratory Experiment and Inversion Results
We generated a rock classifier using ML methods with the geophysical properties of drilling cores obtained by a laboratory experiment and predicted promising areas for ore within the survey area using inverted geophysical properties as input data. However, predictions in the survey area are limited in that the geophysical properties obtained from the two sample groups are fundamentally different. The geophysical properties of drilling cores are measured directly through a laboratory experiment, but the inverted properties are estimated from signals recorded at the surface, resulting in differences in resolution and accuracy. Nevertheless, the correlations between the features of the two sample types are similar, as delineated in Table 4, and the same scaler was applied for data processing during training, such that the classified areas correspond to the anomaly observed in geophysical field data. In future research, to compensate for the problems caused by the differences between these two groups, we will adopt a deep learning technique such as domain adaptation [31], which can handle data with differing domains.

Accuracy of Inversion Results for Field Exploration
The inverted geophysical properties illustrated in Figure 6 are not unique solutions. These values were calculated using non-linear optimization solutions derived from field data obtained through electrical resistivity, induced polarization, and airborne magnetic surveys [32]. The quality of the inversion results depends on various factors, including the number of data samples acquired, location of the survey line, performance of equipment, inversion algorithm, and exploration environment. Thus, for the inverted values to serve as representative geophysical properties in the survey area, improving the accuracy and resolution of inversion results is essential.

Difference between Laboratory Experiment and Inversion Results
We generated a rock classifier using ML methods with the geophysical properties of drilling cores obtained by a laboratory experiment and predicted promising areas for ore within the survey area using inverted geophysical properties as input data. However, predictions in the survey area are limited in that the geophysical properties obtained from the two sample groups are fundamentally different. The geophysical properties of drilling cores are measured directly through a laboratory experiment, but the inverted properties are estimated from signals recorded at the surface, resulting in differences in resolution and accuracy. Nevertheless, the correlations between the features of the two sample types are similar, as delineated in Table 4, and the same scaler was applied for data processing during training, such that the classified areas correspond to the anomaly observed in geophysical field data. In future research, to compensate for the problems caused by the differences between these two groups, we will adopt a deep learning technique such as domain adaptation [31], which can handle data with differing domains.

Accuracy of Inversion Results for Field Exploration
The inverted geophysical properties illustrated in Figure 6 are not unique solutions. These values were calculated using non-linear optimization solutions derived from field data obtained through electrical resistivity, induced polarization, and airborne magnetic surveys [32]. The quality of the inversion results depends on various factors, including the number of data samples acquired, location of the survey line, performance of equipment, inversion algorithm, and exploration environment. Thus, for the inverted values to serve as representative geophysical properties in the survey area, improving the accuracy and resolution of inversion results is essential.

Training Materials from Drilling Cores
The total number of rock samples used for training is 596, which is insufficient to cover areas around the Gonamsan intrusion. The reliability of training can be improved by investigating diverse rock lithologies with large numbers of rock samples. As KIGAM plans drilling operations near the Gonamsan intrusion, we will obtain additional data in the future.
We labeled our data based on density. As our ultimate aim is the identification of areas that may contain ore, the grade of ore is a reasonable criterion. However, measuring the grade of ore for all rock samples was impractical, due to both the cost and time requirement. Thus, readers should interpret the results carefully, given that density was used as the criterion for labeling data.

Verification of the Classfication Results from Field Data
We applied geophysical field data to the trained ML models and verified the results through the comparison of the top view with a geological map. However, because our target is a promising area in the subsurface, another method was needed to verify the subsurface composition associated with the classification results. The most reliable method to verify composition is through drilling, but drilling in all areas is unrealistic. An alternative verification method is the creation of geological models for the survey area, and the generation of synthetic data through numerical modeling. The geological model can be determined by geoscience approaches, including geophysical, geological, and geochemical surveys with drilling. As synthetic data are labeled, the trained model can be verified using synthetic data prior to its application to field data. Verification using synthetic data is possible if the geological model is accurate, i.e., if there is a small difference between the field and synthetic data.

Conclusions
We generated a rock classifier using supervised ML to investigate vanadiferous titanomagnetite ore deposits. The training materials were rock samples from drilling cores, and geophysical properties including electrical resistivity, chargeability, and magnetic susceptibility were used as features for training. Those properties were obtained through laboratory measurements and were labeled by density into HO, CO, and HOST classes.
We used SVM, RF, XGB, LGBM, and DNN for supervised ML, and optimized the hyperparameters of each method using a grid search algorithm. With the test dataset, the accuracy of DNN was highest, at 0.97, and all methods had values of 0.95 or greater. The trained model was applied to field exploration data acquired by electrical resistivity, induced polarization, and airborne magnetic surveys. The classification results of the trained models contained no areas of HO and few of CO. We verified the classified areas through comparison with a geological map at the surface. Most areas classified as CO are located on the eastern side of the boundary of monzodiorites, which is consistent with the distribution shown on the geological map.
The rock classifier that we generated can predict the distribution of promising ore zones in subsurface media and help guide the selection of locations for future drilling or exploration. However, our method has several problems and limitations to be solved. First, the method of obtaining geophysical properties for rock samples and field data is different, which can cause a difference in resolution and accuracy. To reduce the gap between different domains, additional techniques including domain adaptation are required. The second problem is the lack of training samples and the quality of inversion results of field data. Adding rock samples with various lithology through additional drilling and improving the quality of inversion results increase the reliability of the trained model. The other problem of this study is the lack of verification of classified results in the subsurface media. The synthetic data based on the geological model can help to verify the trained model before applying the field data.