Machine Learning-Based Lithological Mapping from ASTER Remote-Sensing Imagery

: Accurately mapping lithological features is essential for geological surveys and the exploration of mineral resources. Remote-sensing images have been widely used to extract information about mineralized alteration zones due to their cost-effectiveness and potential for being widely applied. Automated methods, such as machine-learning algorithms, for lithological mapping using satellite imagery have also received attention. This study aims to map lithologies and minerals indirectly through machine-learning algorithms using advanced spaceborne thermal emission and reﬂection radiometer (ASTER) remote-sensing data. The capabilities of several machine-learning (ML) algorithms were evaluated for lithological mapping, including random forest (RF), support vector machine (SVM), gradient boosting (GB), extreme gradient boosting (XGB), and a deep-learning artiﬁcial neural network (ANN). These methods were applied to ASTER imagery of the Sar-Cheshmeh copper mining region of Kerman Province, in southern Iran. First, several spectral features that were extracted from ASTER bands were used as input data. Second, correlation coefﬁcients be-tween the original spectral bands and features were extracted. The importance of the random forest features (RF’s feature importance) was subsequently computed, and features with less importance were removed. Finally, the remained features were given to the models as input data in the second scenario. Accuracy assessments were performed for lithological classes in the study region, including Sar-Cheshmeh porphyry, quartz eye, late ﬁne porphyry, hornblende dike, granodiorite, feldspar dike, biotite dike, andesite, and alluvium. The overall accuracy results of lithological mapping showed that ML-based algorithms without feature extraction have the highest accuracy. The overall accuracy percentages for ML-based algorithms without conducting feature extraction were 84%, 85%, 80%, 82%, and 80% for RF, SVM, GB, XGB, and ANN, respectively. The results of this study would be of great interest to geologists for lithological mapping and mineral exploration, particularly for selecting appropriate ML-based techniques to be implemented in similar regions.


Introduction
The concepts of geological units and lithological mapping in geology are closely related; however, they are distinct from one another.On one hand, different geological units can be categorized based on their characteristics, such as composition, age, and origin, including rock layers, formations, and other distinct bodies of rock.On the other hand, lithological mapping provides information about the distribution and geological history of the Earthʹs crust, together with its characteristics [1].Therefore, it plays a significant role in bedrock surveys and mineral exploration [2].In many cases, ore deposits have been first discovered on the ground by recognizing hydrothermally altered host rocks.To understand the distribution, properties, and characteristics of different rock types within a particular area, regional lithological maps can play a significant role in lithological mapping as part of geology and mineral exploration [1].Acquiring such lithological maps is time-consuming, needs intensive fieldwork, and can be challenging in cases where the study area is difficult to reach.
Remote sensing is one of the valuable approaches for lithological mapping to explore commercially viable mineral resources.Remote-sensing mapping techniques can locate high potential zones for ore mineralization in a vast area by recognizing hydrothermally altered rocks, while consuming less time and money and achieving a higher accuracy than ground-based field surveys [3,4].However, the medium and coarse spatial and spectral resolution of remote-sensing data may make the implementation of such systems difficult.The technical characteristics of multispectral and hyperspectral remote-sensing sensors are crucial for lithological mapping and mineral exploration [5][6][7].Sensors that are equipped with hyperspectral technology are capable of simultaneously acquiring images with 100 to 200 contiguous spectral bands, allowing for a unique combination of spectrally contiguous images [8].A substantial amount of spectral information can be derived from satellite-based hyperspectral data, allowing mineral compositions to be determined from the spectra [9].Yet, in addition to spectral confusion, difficulties in data processing, relatively narrow swath widths, and atmospheric interference, high-resolution hyperspectral images are prone to spectral interference [10].Thus, a single pixel in the image provides coverage for a large ground surface area (e.g., 1000 m 2 ), making selections of pure pixel spectra for training samples in the supervised classifier difficult and challenging; furthermore, the lithological classification accuracy is potentially low as a result [11,12].Hyperspectral data are often not openly available or are costly.
Thanks to the availability of high-resolution multispectral data, such as SPOT and GF-2, the problem of low classification accuracy has been solved to a certain extent.Despite an impressive capability of showing structural and textural features, high-spatialresolution multispectral satellite data have a narrow spectral range.There are only a few visible and near-infrared bands and a marked absence of other spectral bands such as short wave and thermal infrared.Most high-resolution images are also costly and not publicly available.
The advanced spaceborne thermal emission and reflection radiometer (ASTER) sensor can identify lithological units and hydrothermal alteration mineral zones [13][14][15][16][17]. AS-TER (Ministry of Economy, Trade, and Industry, Tokyo, Japan) provides worldwide coverage with high revisit times (16 days) at a relatively high spatial resolution (15-90 m).Using this technique, it is possible to identify imagery that is free from cloud cover or that is seasonal in order to minimize the effects of vegetation.It has been demonstrated that the remote identification of iron oxide minerals can be easily achieved using ASTER's visible and near-infrared (VNIR) bands [14,[18][19][20].The fundamental absorption features of Al-O-H, Mg-O-H, Si-O-H, and CO3 for identifying hydrothermal alteration minerals (e.g., phyllosilicates, sorosilicate, and carbonates) can be detected using the shortwave infrared (SWIR) bands of ASTER [21][22][23][24].Furthermore, ASTERʹs thermal infrared bands (TIR) can distinguish silicate lithological groups through the emissivity spectra that are derived from Si-O-Si stretching vibrations [15,[25][26][27].
To map lithological units and identify alteration mineral zones, several image processing algorithms, namely band math, minimum noise fraction, spectral angle mapper, principal component analysis, false color composite, and matched filter, have been commonly applied to ASTER data [5,8,9,11,28].The results from these conventional algorithms contain some drawbacks, such as unclassified and misclassified units, which are challenging.Hence, these techniques typically might reduce the accuracy of lithological and alteration mapping [29,30].Recently, machine-learning (ML) algorithms have been more effective than conventional classification methods when classifying geological targets [31,32].ML, which is a sub-domain of artificial intelligence, is a data-driven technique that helps to extract useful information and recognize patterns in data with minimal human involvement [33][34][35].ML algorithms have several advantages, especially in automatically solving the most complex nonlinear problems, and are more robust in handling missing data than traditional image-processing methods [33,36].Particular attention is devoted to the task of supervised lithology classification for the prediction of classes representing the spatial distributions of geological materials.
Some researchers, such as Bachri et al. [37] and Cracknell and Reading [33], have assessed and evaluated applications of ML algorithms in geological mapping using remote-sensing imagery.They showed considerable potential in various areas, such as mapping lithological units and the identification of alteration zones that are associated with a variety of ore mineralization processes [37].Extensively applied ML algorithms in geology and mineral mapping include support vector machines (SVM) [33], artificial neural networks (ANNs) [33], random forest (RF) [38], maximum likelihood classifier (MLC) [38], knearest neighbors (k-NN) [33], and naïve Bayes (NB) [33].Advancement in ML algorithms for image processing based on satellite data has considerably assisted in enhancing the detection of lithological and structural features, and in identifying alteration zones for mineral exploration.Lithological mapping could be made more feasible by using state-ofthe-art ML algorithms like gradient boosting (GB), extreme gradient boosting (XGB), and artificial neural networks (ANNs).
A neural network is an artificial intelligence algorithm that is capable of analyzing patterns, learning tasks, and solving problems like humans [39].The ANN is widely used to solve complex problems in diverse fields, including regression and classification problems [40].The performance of ANNs depends on several key parameters, such as activation functions, loss functions, optimizers, hidden layers, the number of nodes, and regularization layers [41].The GB [42] is a sequential ensemble learning technique where the modelʹs performance improves over iterations [43].This method creates the model in a stage-wise fashion.It infers the model by enabling the optimization of an absolute differentiable loss function [43,44].The XGB algorithm is an extended version of the gradient boosting algorithm.It is designed to enhance an ML model's performance and speed.Xiong et al. [45] analyzed deep-learning algorithms and big data in skarn-type (sedimentary-igneous intrusion contacts) iron mineralization in China.Their results showed a strong spatial relationship between known mineralization areas, which were mapped prospectively by a deep-learning method.Elahi et al. [46] investigated the potential of two ML algorithms, including SVM and ANN, using Sentinel-2 optical data for lithological mapping in Pakistan.They reported an overall accuracy of 95.78% and 95.73% for SVM and ANN, respectively.Utilizing ML methods of XGB and ANN algorithms on ASTER data has a high potential and great advantages for lithological mapping and mineral exploration.
The study's main objective is to propose an approach for identifying the most optimized and efficient ML approach for lithological mapping using ASTER remote-sensing data.This research compares several traditional machine-learning algorithms, such as RF and SVM, to novel ensemble machine learning techniques, such as GB, XGB, and deeplearning ANN, for the spatial modeling of lithological units.It also aims to find the most relevant features and spectral regions for lithological mapping.This study represents an inclusive evaluation of RF, SVM, GB, XGB, and deep ANN algorithms for lithological mapping using ASTER data.The models can provide geologists with accurate lithological mapping and mineral exploration, especially when applied to similar regions.

Geology of the Study Area
The current study focuses on mineral exploration in Iran.Most of the country is semiarid with sparse, mainly herbaceous vegetation on surfaces that are well exposed.This makes remote-sensing-based geological mapping an ideal method of study [47].The Sar-Cheshmeh copper mining region in Kerman Province (southeast Iran) was selected as a case study (Figure 1A,B).The Sar-Cheshmeh porphyry copper deposit is considered the second largest global deposit of this metal, the most important in Iran, and has been exploited since ancient times.It contains roughly 1200 million tonnes of ore with an average grade of 1.2% copper, 0.03% molybdenum, 3.9 g/t Ag, and 0.11 g/t Au [48].It is the first time that ML-based techniques (RF, SVM, GB, XGB, and deep-learning ANN) have been used for lithological mapping using ASTER remote-sensing data in the Sar-Cheshmeh copper mining region (Figure 1).The study area is 160 km southeast of Kerman City (55.865556°E, 29.946111° N) and south of the Urmia-Dokhtar volcanic belt (Figure 1A).It is located in an area of Eocene volcanic rock and Oligo-Miocene subvolcanic granitoid rock.
It is believed that the oldest host rocks of the Sar-Cheshmeh porphyry copper deposit are derived from the Eocene volcanogenic complex [49], which consists of the following: pyroxene trachybasalt, potassic and shoshonitic pyroxene andesite [50], less abundant andesite, agglomerate, tuff, and tuffaceous sandstone.During the Oligocene-Miocene transition (~23 Ma), granitoid phases such as quartz diorite, quartz monzonite, and granodiorite were intruded into these rocks.These granitoid rocks are cut by intramineral porphyry dikes composed of hornblende porphyry, feldspar porphyry, and biotite porphyry.The Sar-Cheshmeh copper deposit is placed in Eocene volcanic rocks, where a Miocene sub-volcanic granitoid unit intruded into andesitic host rocks [48].Porphyry copper mineralization in this area is associated with well-developed zones of hydrothermal phyllic, argillic, propylitic, silicification, and jarositic alteration zones.
The deposit is located at an average altitude of 2620 m asl and its highest altitude reaches 3280 m asl.Generally speaking, the regional climate is characterized by cold, snowy, and windy winters, and mild summers.The temperature ranges from −15 to +35 °C.The average rainfall is reported to be 250 mm or less per year.As a result, the surface of the earth is well exposed, given that there is little or no vegetation cover, which makes the remote-sensing approach very suitable.

ASTER Data Characteristics and Preprocessing
ASTER is a moderate spatial and spectral resolution instrument on the Terra satellite platform, which observes the Earth's surface through various electromagnetic wavelengths from visible to thermal infrared [51].This sensor has 14 separate bands: (1) 3 bands in the visible and near-infrared (VNIR) (0.52 to 0.86 µm) with a spatial resolution of 15 m (i.e., Bands 1, 2, and 3), (2) 6 bands in the shortwave infrared (SWIR) (1.60 to 2.43 µm) with a spatial resolution of 30 m (i.e., Bands 4, 5, 6, 7, 8, and 9, and (3) 5 bands in the thermal infrared (TIR) with a spatial resolution of 90 m (i.e., Bands 10,11,12,13,and 14) [52].This study used ASTER Level 2 surface reflectance VNIR and crosstalk-corrected SWIR (AST_07XT) and the surface radiance TIR (AST_09T) datasets (ASTER Level 0: raw data; Level 1A: calibration of Level 0 and conversion to units of radiance; Level 1B: converts radiance to at-sensor reflectance; Level 2: applying atmospheric correction and achieve surface reflectance).The image was acquired on 20 May 2006.AST_07XT includes two product files that have been atmospherically corrected for the VNIR and SWIR derived from the Level 1B data.AST_09T data are atmospherically corrected and provide surfaceleaving radiance at the 90 m spatial resolution for ASTER thermal bands.It contains surface-emitted and surface-reflected components.These products are freely available on NASA's Earthdata website (https://earthdata.nasa.gov(accessed on 18 September 2021)).Orthorectification was applied to AST_07XT and AST_09T by the ENVI's ASTER Preprocessing Toolkit using a group of ground control points.Using the bilinear method, ASTER SWIR and TIR with 30 m and 90 m were resampled to 15 m to match VNIR data.Finally, all bands were stacked.

Mineral Spectral Characteristics
As a result of vibrational overtones, electronic transitions, charge transfer, and conduction, many minerals have diagnostic absorption features in the solar-reflected spectral region (0.3-2.5 m) [19].There is a prominent Al-OH absorption feature at 2.2 µm and a less intense one at 2.35 µm that are characteristic of deictically altered rocks (i.e., molten or plastic rock injected into cavities or between layers) that contain sericite.An advanced argillic alteration is characterized by kaolinite and alunite with Al-OH absorption lines at 2.165 µm and 2.2 µm, respectively.Chlorite, epidote, and calcite are commonly present in propylitically (chemically) altered rocks, with Fe, Mg-OH, and CO3 absorption features from 2.1 to 2.3 µm (Figure 2A) [53].Minerals containing iron oxides and hydroxides, such as limonite and hematite, tend to exhibit spectral absorption features between 0.4 and 1.1 µm of the electromagnetic spectrum (Figure 2B) [20].
Using ASTER SWIR bands for lithology and mineral mapping of lithological units, Yamaguchi and Naito [54] proposed several spectral indices using a linear combination of reflectance in each ASTER SWIR band, including the kaolinite index, alunite index, calcite index, and montmorillonite index.Considering the spectral absorption characteristics of vegetation, minerals, and rocks in the various bands of ASTER data, Ninomiya [55,56] proposed a vegetation index and several mineralogic indices utilizing VNIR and SWIR, as well as several lithologic indices using TIR spectra such as the stabilized vegetation index (SVI), OH-bearing altered minerals index (OHI), the quartz index (QI), and the carbonate index (CI).Features extracted from ASTER bands that were used in this study are summarized in Table 1.

Implementation of Machine-Learning (ML) Algorithms
In extracting spectral features from ASTER bands, we aimed to use these as input for ML algorithms that identify the most important features using random forest (RF) feature importance (FI) and extracting Pearson's correlation coefficients (r) among all original spectral bands and features.In this analysis, the random forest (RF), support vector machine (SVM), deep-learning ANN, gradient boosting (GB), and extreme gradient boosting (XGB) methods for lithological mapping were selected.A total number of 33 features and bands were considered and have been specified to the ML algorithms through two scenarios.In the first step, all features were used as the inputs to the algorithms.Pearson's correlation coefficients between all original spectral bands and features were extracted in the second scenario.The RF's FI was then computed.All two by two features with an absolute correlation greater than 0.9 were considered, and the feature of lesser importance was removed.The remaining features (i.e., 17) were utilized as the model's input data.ML algorithms were implemented using open-source Python Scikit-learn (1.0 version) (https://scikit-learn.org/(accessed on 5 April 2023)) and Keras (2.3.0 version) (https://keras.io/(2.3.0 version)) software packages.
The sampling data were selected through the stratified train-test division.For each class, 35% of the samples were used as test data, and the remaining 65% were used as training data.Parameter tuning of each machine-learning algorithm was conducted through grid search cross-validation (GridSearchCV).GridSearchCV is an existing function in Scikit-learn (Python).GridSearchCV is a process of tuning the model's hyper parameter to find the optimal values for the parameters in the specific model.The accuracy assessment was followed in two steps.First, all spectral bands and features were given as the modelʹs input.Second, the importance of random forest features was applied to all spectral bands and features.Considering an absolute correlation greater than 0.9 between features, the feature with the higher importance was preserved, and the feature of lesser importance was removed.Finally, the remaining features were provided as input to the ML algorithms.The flowchart of the methodology that was applied in this study is illustrated in Figure 3.

Random Forest
Random forest, which was developed by Breiman [57], is an ensemble tree-based learning algorithm and a powerful non-parametric technique for solving various data mining problems.RF is less affected by outlier data than decision trees (DTs) and can handle various input data without overfitting the dataset [58].An RF fits many DTs from a randomly selected subset for training the dataset.RF consists of many DTs fitted to the training data.The DT method's main problem is that it tends to fit closely to the training data; in other words, DT has an overfitting problem [59].RF uses averages to improve the regression problems' accuracy and takes majority voting for classification problems [44].Thus, RF solves the DT's problem of overfitting the training data.The parameters that were selected for providing input to the GridSearchCV are shown in Table A1.

Support Vector Machines
Support vector machines (SVMs), formally described by Cortes and Vapnik [60], are powerful supervised machine-learning algorithms used for classification and regression problems [61].In the original formulation, the SVM model tries to find a hyperplane that separates the training dataset into a predefined number of classes.The decision boundary was obtained during training steps, which minimizes the number of misclassifications related to optimal separation hyperplanes.During an iterative procedure, learning occurs to find the optimal decision boundary to separate the training samples, conceivably in the high-dimensional space [62].The resulting hyperplane is an n-1 subspace in an n-dimensional space.Training samples specify the decision boundary, a subset of original data, called support vectors.In SVM, we frequently use nonlinear kernel functions to transform input data onto a high-dimensional space and make them more separable [63].A radial basis function (RBF) is an excellent choice for transforming input data prior to the implementation of nonlinear models [44].The details of grid search parameters are shown in Table A2.

Deep-Learning ANN
ANN methods try to model problems using interconnected artificial neurons like the human brain to solve machine-learning problems [44].An ANN is a feed-forward multilayer perceptron consisting of one input, hidden, and output layer.In neural networks, a layer's neurons can be connected to all other layers' neurons, but not to other neurons within the same layer [64].In a fully connected ANN, each neuron is connected to all neurons in the previous and following layers.Each connection between them has its own weight.
Two main characteristics of each ANN are its architecture and the manner in which it learns.The main issue in determining the ANN architecture is selecting the appropriate number of hidden layers and the number of neurons.Several methods propose how the number of these hidden layers and neurons can be selected [65,66].In this study, we have determined the number of the neurons using Equation (1): where Nn is the number of neurons in each layer, N is the number of input neurons, and m is the number of layers.We closely scrutinized various activation functions for the deep ANN model, including ReLU (rectified linear unit), Sigmoid, Tanh (hyperbolic tangent function), and Linear [67].Early stopping was used to avoid overfitting in the deep ANN model.Therefore, 20% of the training data were selected as the validation data that were used in the learning process.

Gradient Boosting
The gradient boosting (GB) decision tree is a variant of ensemble learning [68], where multiple weak predictive models are generated, then combined and weighted in a function approximating or predicting the output variable from the input variable ensemble.Boosting and bagging are two prevalent types of ensemble learning."Boosting" is defined as the process of converting the ensemble of multiple weak learners into a few strong learners, thereby reducing model bias."Bagging" refers to bootstrap aggregation, the process by which the variance of the dataset is reduced, while simultaneously avoiding the overfitting of the final ML model.Variance reduction is accomplished by generating multiple decision trees from independent subsets randomly drawn from the data.GB is employed in classification and regression problems in the same manner as RF [69].GB usually involves three steps: (1) establishing a loss function, which should be optimized; (2) generating weak learners (typically decision trees) to make a prediction; and 3) creating an additive model to include weak learners in a manner that minimizes the loss function [44].GB trains many models in a sequential and additive way.
XGB was created to implement GB, which is an algorithm that is highly impressive and flexible.Tianqi Chen (Carnegie Mellon University) devised XGB software (in Python Scikit-learn 1.0 version) to be compatible across various platforms (C++, Python, R; http://datascience.la/xgboost-workshop-and-meetup-talk-with-tianqi-chen/(accessed on 8 March 2022)).All codes are provided on Github (https://github.com/szilard/benchm-ml(accessed on 12 April 2023)).XGB is fast compared to the other implementations of GB [68].The details of the parameters that were selected for the GridSearchCV are shown in Table A3.

Accuracy Assessment
A confusion matrix is widely used for classification assessment to evaluate an algo-rithmʹs performance.The confusion matrix routinely reports the number of true positives (TP), true negatives (TN), false positives (FP), and false negatives (FN) (Table A4).

Accuracy = TP + TN TP + FP + FN + TN
Here, TP represents the number of pixels where both ground truth data and the machine-learning algorithm indicate the same label for test data.FP is the item in which the true label was negative, but the algorithm incorrectly predicted it as being positive.FN is the items in which the true label was positive, but the algorithm incorrectly predicted it as being negative.Finally, the variable TN is the items in which the true label and predicted label correctly matched as negative.Based on these definitions, three criteria were selected to assess each class (Equations ( 2)-( 4)).Moreover, one criterion was chosen to assess each method's overall accuracy (Equation ( 5)), which is computed by dividing the sum of correct classification samples by the total number of samples [70].

Extraction of Features from ASTER
It is necessary to evaluate the predictive capability of lithological features to acquire more accurate map susceptibility modeling because some features may have a negative effect on the ML models that have been generated.Moreover, an existing strong correlation between these features will also decrease the models' performance.Figure 4A,B show the correlations between the selected features.As can be seen in Figure 4A, the intra-group correlation between VNIR bands is high (0.92-0.97).The correlation between SWIR and TIR bands is also high (SWIR: 0.84-0.98;TIR: 0.98-1.00).The intra-group correlations for VNIR, SWIR, and TIR bands are shown respectively as 3 × 3, 6 × 6, and 5 × 5 blocks along the diagonal of the correlation matrix (Figure 4A); the inter-group correlations are the large, off-diagonal groups of estimates.The blocks of inter-group correlations particularly between TIR (Bands 10-14) versus the NVIR and SWIR bands were mostly weak and negative (−0.38 to −0.52).
Since the number of features was comparatively large and the resulting visualization of the correlations between features was not clear, we rearranged the variables to emphasize the strongest groupings, as shown in Figure 4B.It should be noted that since the correlations between VNIR and SWIR bands were especially high, we selected four bands from the groups as being representative of the original VNIR (Band 1), SWIR (Bands 5 and 6) and TWIR bands (Bands 10 and 11) among the indices that are featured in Figure 4B.
Figure 5 shows the FI (feature importance) values ranked from least (<0.02: chlorite, ASTER Band 3) to most (close to 0.06: mafic, quartz) important.ASTER Band 10 (TIR) is the third most important feature.Many of the remaining bands cluster at the center of this ranking (FI = 0.03), namely Bands 4, 7, 8, and 9 in the SWIR (shortwave infrared), and Bands 13 and 14 in the TIR (thermal infrared).The first and second importance features of the random forest (Figure 5) are depicted in Figure 6A,B.Sar-Cheshmeh porphyry and quartz eye classes have low values of the mafic index, while granodiorite and andesite have relatively high values of the mafic index.For the quartz index, granodiorite and andesite have comparatively high values of the quartz index.Two typical image transform algorithms were applied to the data, including principal component analysis (PCA) [71] and independent component analysis (ICA) [72,73], and compared with the two most important features (mafic and quartz).In addition to improving images, these techniques reduce the spectral redundancy of the image [74].Whether these transformations are suitable for lithological mapping can be further explored.Using PCA, the raw, frequently intercorrelated variables in the highdimension multivariate dataset were reduced to a smaller number of more easily interpretable orthogonal (uncorrelated) composite variables or components [75].
Only one sub-component among the mixture of sub-components comprising the signal is assumed to be Gaussian: higher-order statistics to separate signals and extract features.The default calculation of ICA yields a number of components, which can (should) be equal in number to those of the source variables.The relative importance of these components is difficult to determine, given that they do not have a ranking [75].Yet, when compared to PCA, ICA can provide more spectral information, which could enhance lithological differentiation in the geological context of remote sensing [74].The RGB image composition of PCA and ICA is summarized in Figure 7A,B, respectively.Red, green, and blue designate respectively PC1, PC2, and PC6 (Figure 7A).According to the analysis, PC1, PC6, and PC2 have the highest RF FI, respectively.In Figure 7B, IC1 is R, IC2 is G, and IC4 is B. While PCA does not distinguish between the litho-contacts at the boundaries of the litho-contacts, ICA performs well.The results presented Figure 7 suggest that biotite dike and andesite can be better contrasted using the band combinations that were obtained using both ICA and PCA transformations.

Training Sample Selection
The 7338 samples, representing nine lithology types, were designated as testing areas.The testing area covered about 35% of all samples.Moreover, the validity and accuracy of the training samples were evaluated and validated by field observation and the analysis of microscopic thin sections.The number of test samples for each class are included in Table 2.Note that the andesite and feldspar dike have the highest and lowest number of test samples, respectively.The training area was introduced to the RF, SVM, GB, XGB, and deep-learning ANN classifiers.The scenarios that are referred to in the table are the bands alone (Scenario 1 or S1) and the bands plus other features (Scenario 2 or S2).The difference between the two separate input datasets in all algorithms is less than 2%.Overall, RF and SVM, in nearly all classes, had the highest value in precision, recall, and F1-score utilizing either the first or second scenario.Among the nine classes, the prediction of alluvium using the RF model with RF's FI as the input exhibits the greatest precision (Scenario 2: 0.98), despite having a sample size of 131.Furthermore, using the RF model and performing RF's FI works the best in correctly in predicting alluvium relative to the other models and their respective scenarios (Table 2).The worst estimated precision is exhibited by the biotite dike (Scenario 1: 0.20) for predictions made with the ANN (Table 2, n = 87).The two aforementioned mineral classes have low sample sizes, which may account for their respective performances, but at least they could be estimated in terms of precision, recall, and F1-scores.Feldspar dike exhibited the absolute worst performance in that these metrics could not be made, given its sample size (n = 13).Thus, variation in sample size may be a crucial determinant.
In discounting the factor sample size (n) and despite high average correlations among the metrics, individual measurements of precision and recall may be more informative than accuracy in determining the performance of the models and their associated scenarios.Indeed, the mapping of alluvium, granodiorite, and andesite exhibited the best precision according to their rankings (precision > 0.9 for almost all algorithms; see Table 2).Granodiorite and andesite likewise exhibited the greatest recall (Friedman test: p < 0.0001), while granodiorite had the highest overall mean rank for F1-scores (Friedman test: p < 0.0001).Feldspar dike consistently exhibited the poorest performance across model scenarios for all three metrics.
In terms of the consistency of the precision estimates within mineral classes (omitting the feldspar dike), the ranking among the model-scenarios (10 categories) was significant, but moderately strongly concordant (Kendall's W = 0.736, p < 0.0001).Indeed, model precision (mean ranks) could be ordered Figure 8 summarizes the results of the ML algorithms without considering RF FI.This visualization map showed that SVM was better trained than the other ML algorithms.Figure 9 shows the evaluation criteria of the testing samples by utilizing SVM and all bands and features as the input.

Assessment of Specific Spectral Regions as an Input to ML Algorithms
The results of utilizing specific spectral regions as input data to RF and SVM can be seen in Tables 3 and 4, respectively.As is obvious, the ASTER VNIR bands showed a low capability of mapping lithological units in RF.ASTER SWIR and TIR showed a greater potential for mapping lithological units other than VNIR bands.The overall accuracy of ASTER TIR bands is higher than that of ASTER SWIR bands using RF (Table 3).The individual performance metrics for the RF model displayed increasing concordance across the bands based on their respective rankings in each mineral class, equal to 0.799 for precision, 0.827 for recall, and 0.901 for F1-scores.Consistent with expectations, the performance among the mineral classes also differed within each band.The precision was the highest in granodiorite and lowest in hornblende dike, despite the low degree of concordance exhibited by this metric.Recall was the highest for andesite and lowest for the biotite dike, very consistently across bands.A similar degree of concordance across bands was exhibited by F1-scores.
SVM results showed a very similar accuracy for the three different ASTER spectral regions, although TIR was about 5% lower than the VNIR and SWIR estimates (Table 4).Despite having the same accuracy (0.61), VNIR (0.61) was judged to be slightly better than SWIR, as well as TIR (0.59).For a more objective determination of accuracy, each of the three performance metrics was ranked across the three bands and over the nine separate mineral classes.Mean rank precision, recall, and F1-scores could not be distinguished among VNIR, SWIR, and TIR responses (p ≥ 0.823).Given their zero performance metrics under SVM, we determined whether significant differences among bands remained undetected following the removal of andesite, together with the biotite and feldspar dike classes.The reanalysis of the remaining ranked mean metrics did not reveal any underlying differences in the bands.

Effect of the Number of Training Samples in ML Algorithms on Overall Accuracy
The number of training data samples is important in determining classification accuracy [76].It is crucial to obtain optimum classification results using the appropriate number of training samples [77].In most cases, ML algorithms require a suitable number of training samples.When training datasets are reduced, the performance of different ML algorithms is worth considering.This analysis investigated effects of the number of testing samples (training and testing datasets) on the precision, recall, F1-score, and overall accuracy when testing includes 15%, 20%, 25%, 30%, 35%, and 40% of the dataset as an example of ML algorithm performance (Table 5).
The overall accuracy of RF and SVM for different testing sample sizes was calculated for both testing and training datasets (Figure 10A,B).Generally, there was a slight overall reduction in the testing accuracy of 1.5% when the testing sample size increased (from 84.9% to 83.7%).The overall accuracy for all RF training models shows a maximum value of 1 (Figure 10A), meaning that all predictions of models and original lithology classes are the same.Thus, the model has achieved a reasonably good (although not the best possible) understanding of the training dataset.
As a result, it is essential to remember that just because a model can achieve a very high training accuracy does not necessarily mean that it is a good model despite its aforementioned value [78].However, the test accuracy among all models differs by roughly 15%.This can be a sign of overfitting in this mode.Furthermore, the effects of testing sample size using SVM over the overall accuracy of both training and testing can be seen in Figure 10B.Figure 10B shows that the overall accuracy of training increased as the testing sample size increased.However, the testing accuracy tends to be lower with a larger sample size.

DEM Assessment as an Additional Feature to Input Data
This section assesses the effects of adding a digital elevation model (DEM) as an input to the ML algorithms.Generally, adding DEM slightly improved the overall accuracy (Figure 11).The most notable results were for RF, which was improved by about 1.6% compared to the other algorithms.

Discussion
One factor that must be considered in lithological mapping is the specific spectral reflectance that is associated with each mineral.In other words, lithological units consist of a mixture of spectral reflections from minerals that make up their composition [79,80].Pixel size is yet another factor that must be considered when classifying lithological units.It should be noted that different datasets generate pixels of different sizes.Even in a single specific sensor, various spectra have different resolutions.In this study, ASTER bands were resampled to the resolution of VNIR bands (i.e., 15 m).According to the results of FI, the TIR bands and indices that were associated with these bands have higher accuracy.However, the TIR resolution is coarser than that of other spectral regions (e.g., VNIR and SWIR).Including thermal data at a lower resolution may lead to more accurate lithological mapping.Yet, it should be noted that some classes were narrow and elongated in the study area.Therefore, classifying these classes may be difficult, as in the case of high-spatial resolution, two or more lithological classes mix in one pixel.Lower spatial resolution due to too many detailed objects can equally pose a problem [81].Another parameter that can affect the overall accuracy of ML algorithms is how training and testing samples are selected.There was some uncertainty in selecting training samples based on the visual interpretation of geological maps [82].Yet, the training and testing samples were chosen randomly in this study.In addition, the training models have been repeated by a diverse selection of training samples, and an average accuracy across various situations has been attained to ensure that this studyʹs overall accuracy is stable.
This study investigated the accuracy of five ML algorithms (RF, SVM, GB, XGB, and a deep-learning ANN) to map lithological units in the Sar-Cheshmeh copper mining region and compared their overall accuracy to one another.A comparison with other studies of ML algorithms in lithological mapping has been made herein.Shebl et al. [83] utilized Sentinel-2 multispectral data and radiometric data to assess the potential of SVM to classify 13 lithological classes in Egypt, including igneous, metamorphic, and sedimentary rocks.Their dataset contained from 955 to 3397 observations per class for training the model.They reported an overall accuracy of between 0.756 to 0.857.Nugroho et al. [84] investigated the potential of several forms of remote-sensing imagery, including Sentinel-2, ALOS PALSAR, and DEM, together with geophysical data.This included magnetic and electromagnetic data to map lithology in Indonesia using an RF algorithm.Their number of training samples per class was between 14 to 337.They reported an accuracy of 0.73 to 0.81 for lithology classes.Bachri et al. [82] utilized several forms of remote-sensing data, including Landsat 8 OLI, DEM, and ALOS PALSAR, to assess lithological mapping in Morocco.They reported an overall accuracy of 0.85 using the SVM ML algorithm.Manap and San [81] reported that adding SAR and DEM data improved the model's overall accuracy by roughly 10%.They also reported that SVM and ANN were more accurate than RF.In the current study, SVM outperformed other ML algorithms with respect to overall accuracy in the Sar-Cheshmeh copper mining region.Adding DEM could not significantly improve the overall accuracy of ML algorithms (<2%).
According to Table 2, the number of training samples for minor and major classes is considerably different in this study, so the imbalance ratio is quite high.The number of features also reached 33 (both bands and indices).This clarifies that in this study, a complicated problem was confronted, as seen from the accuracy that was reported for the class of feldspar dike by almost all models given that it had the lowest number of sampling data points.Considering the above conditions, it is clear that the results of all ML algorithms provided a relatively high accuracy.One way to improve the accuracy of the ML algorithms in this study is simply by adding more data to train the model.It also should be noted that adding data is not always the best case in ML algorithms.Adding additional data, such as geophysical measurements, may improve the accuracy, as has been well stated by Nugroho et al. [84].One limitation of such ML algorithms is the presence of vegetation in the area, affecting the spectral information fed back to the sensor and, therefore, classification accuracy [85].However, the case study region in this analysis was arid, lacking extensive vegetation.The presence of vegetation in the study area can be addressed by utilizing synthetic aperture radar (SAR) coverage (depending on the type and height of vegetation).

Conclusions
This study used five ML algorithms, namely RF, SVM, GB, XGB, and a deep-learning ANN together with ASTER multispectral datasets, to evaluate the accuracy of lithological mapping over the Sar-Cheshmeh copper mining region in southeast Iran.Two scenarios were considered in this study.First, ASTER spectral bands and several features were provided as the input data to the models.We then applied the RF's FI to all features.Considering the features with an absolute correlation greater than 0.90, those with less importance were removed, and the remaining features were provided as the models' input data.Among the selected ML algorithms in this study, the SVM model has a higher accuracy than all other classification models in lithological mapping.The overall accuracy of the SVM model was 85%.The results of RF FI showed that ASTER TIR data have greater importance than other ASTER bands.The results also showed that combining all features without considering RF's FI offered slightly better classification accuracy.The overall accuracy of lithological mapping using all bands and features revealed that SVM has the highest overall accuracy (0.85) compared to other ML algorithms.The results further showed that adding additional information, such as DEM (digital elevation model), can slightly improve the overall accuracy.Increasing the testing size also can lead to a decrease in the testʹs overall accuracy.Among all classes, alluvium was detected well, while feldspar dike exhibited a lower accuracy.The results showed that ML algorithms can map lithology by utilizing ASTER data, which are significantly cost-effective, saving both time and resources in fieldwork.Nevertheless, a definite statement might require some ground observations.

Figure 1 .
Figure 1.(A) The geographical location of the Sar-Cheshmeh area in the Urumia-Dokhtar magmatic belt of southern Iran, and (B) lithological map of the Sar-Cheshmeh copper deposit [41].

Figure 3 .
Figure 3. Flowchart of the methodology that was applied in this study.

Figure 4 .
Figure 4. (A) Pearson's correlation matrix between ASTER VNIR, SWIR, and TIR bands; (B) Pearson's correlations between features and bands (some bands were rearranged to better emphasize the correlations).

Figure 5 .
Figure 5. RF feature values and their rankings.

Figure 6 .
Figure 6.The mafic index (A) and quartz index (B) were extracted from ASTER bands; (C) Lithological map of the Sar-Cheshmeh copper deposit.

Figure 8 .
Figure 8.The geological maps of the Sar-Cheshmeh copper mining region (A), including results of lithological classification that were derived from (B) RF, (C) SVM, (D) GB, (E) XGB, and (F) deep-learning ANN.

Figure 9 .
Figure 9.The results of classification by utilizing SVM for all bands and features.Not that the acronym P in this figure refers to porphyry.

Figure 10 .
Figure 10.Effects of testing sample size over the training and testing data accuracy in (A) RF and (B) SVM.

Figure 11 .
Figure 11.Overall accuracy of various ML algorithms with and without the inclusion of DEM (digital elevation model).

Table 1 .
List of the spectral features that were used in this study.

Table 2 .
The results of accuracy assessments for the testing area.1st scenario: by using all bands and features, 2nd scenario: by applying feature selection.
classification accuracy using RF FI has a slightly lower accuracy than using all features.

Table 3 .
Performance of VNIR, SWIR, and TIR bands in identifying nine mineral classes using RF.The values in boldface refer to the highest value of precision, recall, and F1-score for each class.

Table 4 .
Performance of VNIR, SWIR, and TIR bands in identifying nine mineral classes using SVM.The values in boldface refer to highest value of precision, recall, and F1-score for each class.

Table A4 .
A confusion matrix example.