A Comparative Study of Convolutional Neural Networks and Conventional Machine Learning Models for Lithological Mapping Using Remote Sensing Data

Lithological mapping is a critical aspect of geological mapping that can be useful in studying the mineralization potential of a region and has implications for mineral prospectivity mapping. This is a challenging task if performed manually, particularly in highly remote areas that require a large number of participants and resources. The combination of machine learning (ML) methods and remote sensing data can provide a quick, low-cost, and accurate approach for mapping lithological units. This study used deep learning via convolutional neural networks and conventional ML methods involving support vector machines and multilayer perceptron to map lithological units of a mineral-rich area in the southeast of Iran. Moreover, we used and compared the efficiency of three different types of multispectral remote-sensing data, including Landsat 8 operational land imager (OLI), advanced spaceborne thermal emission and reflection radiometer (ASTER), and Sentinel-2. The results show that CNNs and conventional ML methods effectively use the respective remote-sensing data in generating an accurate lithological map of the study area. However, the combination of CNNs and ASTER data provides the best performance and the highest accuracy and adaptability with field observations and laboratory analysis results so that almost all the test data are predicted correctly. The framework proposed in this study can be helpful for exploration geologists to create accurate lithological maps in other regions by using various remote-sensing data at a low cost.

components as inputs [15]. In another study, the lithological map of Cameroon's center, south, and east regions was updated by using the MLP applied to Landsat images under the ENVI (environment for visualizing images) platform [29].
On the other hand, deep learning, also known as deep neural networks, has caught the interest of Earth science researchers in recent years. Novel deep-learning-based ML methods can address some of the shortcomings of previous geological remote-sensing attempts. Deep learning is one of the fastest-growing trends in big-data analysis [10,32], and a wide variety of deep-learning approaches, such as deep belief networks [33], autoencoders [34], and convolutional neural networks (CNNs) [35], have been developed in the past decade, particularly for processing multimedia and high-dimensional datasets [36]. CNNs have a wide range of applications, such as image classification, voice recognition, traffic-sign recognition, and medical-image analysis [34,37]. CNNs have gained more attention and been used in image-processing tasks and demonstrated their superiority over other methods in a number of applications [32,38,39]. CNNs are so powerful and useful because they can generate excellent predictions with minimal image preprocessing, since neural networks do most of the heavy lifting in processing an image and extracting features. Moreover, the CNNs are immune to spatial variance and, hence, are able to detect features anywhere in input images. Recently, there has been an increase in the application of CNNs for processing remote-sensing data [40], although they have been less considered for mapping potential mineralization zones [41]. CNNs have been used to classify multi-and hyperspectral remote-sensing data at high and medium spatial resolutions [42,43]. In a study [10], CNNs were used to assist in mapping geological features by offering an objective initial layer of surface materials that experts can change to accelerate the production of maps and increase the accuracy of mapped areas. Moreover, researchers [34] introduced a new method for high-resolution geological mapping that uses unmanned aerial vehicles (UAVs) and CNN algorithms.
In this study, we aimed to provide the lithological map of a region in the southeast of Iran, which is a potential region of different metallic mineralizations, using various combinations of remote-sensing data and ML methods. We used three different types of multispectral remote-sensing data: Landsat 8 operational land imager (OLI), advanced spaceborne thermal emission and reflection radiometer (ASTER), and Sentinel-2. We applied conventional ML methods, namely SVM and MLP, and a deep learning method, i.e., CNN, to the given satellite data, a novel method in spatial data analysis. Finally, we compared the efficiency of the respective data types and methods for discriminating between lithological units and generating a reliable lithological map. In addition to proposing a framework, we provided an open-source Jupyter notebook based on the Python programming language and packages to make exploration geologists able to reproduce the results and evaluate the efficiency of the framework in other regions.

Geological Setting
The study area is located in the southeast of Iran and close to Mirjaveh, Sistan, and Baluchestan province, covering an area of about 66 square kilometers ( Figure 1). Based on field observations and the study of thin sections, rock units mainly involve Quaternary and Oligocene igneous rocks and Eocene sedimentary rocks. Igneous rocks include dacite, andesite, and quartz monzonite, and Eocene sedimentary units include shale and sandstone [44]. The flysch units that date back to the Eocene cover most of the study area and mainly consist of shale and sandstone, and, in some areas, can be seen in a metamorphic form, such as slate and phyllite [44]. The sandstones are feldspar type and show a clastic (granular) texture. They are mostly affected by a poor degree of metamorphism, and slate somewhat expands with the growth of sericite crystals in relatively parallel directions. The major components of the sandstones are quartz and feldspar grains and a matrix consisting of clay minerals (kaolin) and calcite cement. Sericite minerals have also been added to this complex during the transformation. Dacite units in the study area are thick and consist of plagioclase, amphibole, biotite, and quartz, showing a porphyry texture. These units are coarse-grained and light in color. The dominant types of alteration in these rocks are ganese-impregnated siliceous veins are observed in the margins of igneous units, and geochemical analyses confirm the gold anomaly. This type of mineralization corresponds to epithermal deposits based on field observations and sample analyses. In epithermal systems, metal precipitation from hot aqueous hydrothermal fluids can occur along lithological contacts. Planar discontinuities, such as sedimentary bedding, metamorphic foliations, planar igneous bodies, and unconformity surfaces, provide conduits for fluid flow and gold deposition [45]. In the study area, gold and other metals, such as manganese, lead, and zinc mineralizations, are found along the contacts of quartz monzonite and sedimentary units. It is noteworthy that younger alluvial units have covered quartz monzonite units in parts of the study area, and undercover exploration is required to identify potential mineralization.

Figure 1.
Simplified tectonic map of Iran on the left and a true-color image of the study area obtained by Sentinel-2 data on the right. The red points shown on the satellite image refer to the samples collected from the study area. Thin section studies were carried out on samples A-C, and samples D and E were geochemically analyzed. Simplified tectonic map of Iran on the left and a true-color image of the study area obtained by Sentinel-2 data on the right. The red points shown on the satellite image refer to the samples collected from the study area. Thin section studies were carried out on samples A-C, and samples D and E were geochemically analyzed.
Major alteration types in the study area include argillic, sericitization, chloritization, silicic, propylitic, and iron oxide. The alteration zones are scattered throughout the study area and associated with igneous rocks. The predominant trends of faults are northwest-southeast (NW-SE) and northeast-southwest (NE-SW). The alteration zones mostly emerged due to the placement and formation of igneous units within the Eocene sedimentary units. Lead, zinc, manganese mineralization, and the association of pyrite with manganese-impregnated siliceous veins are observed in the margins of igneous units, and geochemical analyses confirm the gold anomaly. This type of mineralization corresponds to epithermal deposits based on field observations and sample analyses. In epithermal systems, metal precipitation from hot aqueous hydrothermal fluids can occur along lithological contacts. Planar discontinuities, such as sedimentary bedding, metamorphic foliations, planar igneous bodies, and unconformity surfaces, provide conduits for fluid flow and gold deposition [45]. In the study area, gold and other metals, such as manganese, lead, and zinc mineralizations, are found along the contacts of quartz monzonite and sedimentary units. It is noteworthy that younger alluvial units have covered quartz monzonite units in parts of the study area, and undercover exploration is required to identify potential mineralization.

Remote-Sensing Data and Ground Truth
This study uses three different types of multispectral satellite images, including Landsat 8 OLI, ASTER, and Sentinel-2, for mapping lithological units. The technical performance and attributes of these data types are summarized in Table A1. Landsat 8 satellite was launched in 2013, carrying two sensors of OLI and thermal infrared sensor (TIRS) [46]. It provides images in 11 spectral bands with a spatial resolution ranging from 15 m (m) for a panchromatic band to 30 m in the visible and near-infrared (VNIR) and short-wave infrared (SWIR) ranges. The last two thermal bands, i.e., bands 10 and 11, have a resolution of 100 m [46]. ASTER sensor was launched on the Terra platform in 1999 [47], which significantly improved the capability of geological remote sensing for mapping purposes. ASTER has three VNIR bands with a spatial resolution of 15 m, six SWIR bands with a 30 m resolution, and five thermal infrared bands with a 90 m resolution [47]. The Sentinel-2 is a group of twin satellites in the same sun-synchronous orbit, phased at 180 degrees apart: Sentinel-2A and Sentinel-2B. The multispectral instrument onboard collects data in 13 spectral bands, ranging from VNIR to SWIR. The spatial resolution of this satellite is varied from 10 to 60 m [48]. In this study, we used those spectral bands that are of interest in geological remote sensing because they show characteristic behaviors, such as high absorption or reflectance in target lithological units. Accordingly, six bands of OLI (2, 3, 4, 5, 6, and 7), nine bands of ASTER (1, 2, 3, 4, 5, 6, 7, 8, and 9), and ten bands of Sentinel-2 (2, 3, 4, 5, 6, 7, 8, 8a, 11, and 12) are selected as the input data for mapping lithological units [7].
A cloud-free Landsat 8 scene covering the study area was acquired from the US geological survey Earth resources observation and science (EROS) center (earthexplorer. usgs.gov accessed on 20 November 2021). This level-1T (terrain corrected) image was captured on 4 June 2017. The ASTER scene used in this study was captured on 12 September 2003. It is a cloud-free level-1-precision terrain-corrected registered at-sensor radiance product (ASTER_L1T) obtained from the USGS EROS center. A cloud-free Sentinel-2A scene covering the study area was also acquired from the European space agency via the Copernicus open-access hub (scihub.copernicus.eu accessed on 20 November 2021) captured on 6 July 2017. The Sentinel-2 image used in this study is a level-1C top-ofatmosphere reflectance product, including radiometric and geometric corrections and orthorectification [48].
Ground truth refers to the information gathered on-site and represents mapped features and materials on the ground [49]. The ground-truth dataset consists of a set of images and labels, as well as an object-recognition model that involves the count, location, and relationships of key features. Depending on the location of the study area, labels are either specified by hand or automatically by image processing. Those features detected as ground-truth data are fed into a classifier at run-time to calculate the correspondence between identified and modeled features [50]. We used ground-truth data to train and test our models and create classified maps. Figure 2 represents a map showing the ground-truth dataset used in this study and consisting of nine rock types obtained by fieldwork.

Preprocessing
The satellite data used in this study were pre-georeferenced to the universal transverse Mercator (UTM) zone 40 North; hence, no geometric correction is needed. The Landsat 8 OLI and ASTER data are radiometrically corrected by the log-residual algorithm, available within the ENVI software package (l3harrisgeospatial.com/Software-Technology/ENVI, accessed on 20 November 2021), which reduces the noise from topography, instruments, and sun illumination. This method works with the visible and near-infrared to short-wave infrared (VNIR-SWIR) wavelength range and provides an atmospheric-corrected surface reflectance image of the study area. It is a quick solution for converting radiance-calibrated data to apparent reflectance. The SWIR bands of the ASTER data are resampled to the spatial resolution of the visible and near-infrared (VNIR) bands, i.e., 15 m, using the nearest neighbor technique, and a stacked data layer, using the VNIR plus the short-wave infrared Remote Sens. 2022, 14, 819 6 of 20 (SWIR) bands, is created for further processing. The atmospheric correction is included by the Sentinel-2 data type used in this study. The VNIR + SWIR bands of the Sentinel-2 data are stacked by using the nearest neighbor technique, and a ten-band dataset with a spatial resolution of 10 m is generated. Prior to processing the remote-sensing data, they are all resized to a specific frame covering the target area.
Remote Sens. 2021, 13, x FOR PEER REVIEW 6 of 20 Figure 2. Spatial distribution of nine rock types obtained by fieldwork and considered ground-truth data for training and testing the models.

Preprocessing
The satellite data used in this study were pre-georeferenced to the universal transverse Mercator (UTM) zone 40 North; hence, no geometric correction is needed. The Landsat 8 OLI and ASTER data are radiometrically corrected by the log-residual algorithm, available within the ENVI software package (l3harrisgeospatial.com/Software-Technology/ENVI, accessed on 20 November 2021), which reduces the noise from topography, instruments, and sun illumination. This method works with the visible and near-infrared to short-wave infrared (VNIR-SWIR) wavelength range and provides an atmospheric-

Machine Learning Methods
ML methods are able to discover complex relationships in high-dimensional data [51]. For instance, these methods can automatically learn the relationship between reflectance spectra and target features in a study area, such as mineral occurrences. Machine-learning Remote Sens. 2022, 14, 819 7 of 20 methods have been demonstrated to be robust against noise and uncertainties in spectral and ground-truth observations [52]. SVMs are supervised ML methods for classification and regression problems [53]. During training, SVM models create a hyperplane with the greatest maximum distances between data points in various groups, i.e., the largest margins [54]. Support vectors are the data points nearest to the hyperplane that specify the hyperplane's orientation. The dimensions of the hyperplane decision boundary are determined by the number of input features that must be classified [55]. The SVM has greatly found its application in remote sensing, particularly in geological remote sensing for classifying geological features and targeting ore deposits [1,[56][57][58]. Remote-sensing data usually involve groups that are not linearly separable, which becomes a challenge for conventional ML methods. In such instances, a nonlinear kernel function is used to transfer the data from the input feature space to a higher-dimensional feature space, spreading the data points so that they can be separated by a linear hyperplane [26]. A polynomial (homogeneous or heterogeneous) function, a Gaussian radial basis function, and a sigmoid (hyperbolic tangent) function, among others, are nonlinear kernels that are widely used in SVMs [26].
Neural networks are motivated by the learning process of biological neural systems and have been widely used to analyze remotely sensed data [7]. Over the past few decades, there has been tremendous progress in the area of neural networks; hence, canonical neural networks are known as simple neural networks, and larger and more complex architectures are known as deep neural networks or deep-learning methods [32,40]. The prominence of neural networks in remote-sensing data analysis is mainly due to their capacity to learn complex patterns while accounting for any complex nonlinear relationship in data [59]. Simple neural networks have demonstrated high efficiency in classifying geological features and discovering mineralization zones [20,54,60]. This study applies simple neural networks, also known as MLP, as a supervised learning algorithm to train our model, considering its capability to learn nonlinear relationships. MLP can learn a nonlinear function approximator for classification or regression, given a collection of features and objectives. It differs from logistic regression models since it can include one or more nonlinear layers, which are known as hidden layers between input and output layers [61].
CNNs are one of the different types of deep neural networks that feature the automatic extraction of features from image-based datasets [62]. CNNs caused advancements in image processing, target identification, and other disciplines [63]. CNNs have been widely utilized in image processing and have matured into a potent and ubiquitous deep learning model [64]. Recently, an improved CNN, with even fewer parameters, was introduced to solve problems such as overfitting and gradient vanishing [65]. Another study introduced a novel reconstruction technique based on CNNs with high speed and performance [66]. In terms of advancement in CNN approach, researchers [67] described the improvements made to the CNN in various aspects, such as layer design, activation function, loss function, regularization, optimization, and fast computation. CNNs use convolutional and pooling layers that enable automated feature extraction to draw out spatial information from image data. A typical CNN model consists of several pairs of convolutional and pooling layers, followed by a simple neural network of fully connected layers. We did not use pooling layers in our model architecture, since they aim to find objects. The convolution layer is made up of several feature maps that are generated by the convolution of the input image with a specific kernel. Each convolution kernel is a weight matrix, forming a twodimensional representation of a single channel [62]. CNNs consider neighboring pixels and rely on the pattern and texture, not just one pixel at a time. In classifying a mid-resolution satellite image similar to those used in this study, the objective is to classify each pixel based on its digital number across different bands. We investigated the role of digital numbers of neighboring pixels in determining the class of each pixel. Each set of neighboring pixels fed into the CNN algorithm is called a chip, and they are created by setting two parameters, namely the size of the moving window and stride, as illustrated in Figure 3. neighboring pixels and rely on the pattern and texture, not just one pixel at a time. In classifying a mid-resolution satellite image similar to those used in this study, the objective is to classify each pixel based on its digital number across different bands. We investigated the role of digital numbers of neighboring pixels in determining the class of each pixel. Each set of neighboring pixels fed into the CNN algorithm is called a chip, and they are created by setting two parameters, namely the size of the moving window and stride, as illustrated in Figure 3.

Framework and Experimental Setup
In Figure 4, we propose a framework in which SVMs, MLP, and CNNs are applied on three different types of multispectral remote-sensing data to investigate their efficiency in discriminating between lithological units. The initial steps involved loading and preprocessing remote-sensing and ground-truth datasets as inputs to our modeling process. We scaled the input datasets in the next step to ensure that all the features were treated equally. For instance, neural networks are sensitive to the input data distribution because they naturally tend to give more importance to features with higher values [68]. The data can either be scaled in the range of zero to one (normalized) or minus one to one (standardized). In this study, we used standardized data to train the SVM and MLP models and normalized data for the CNN model, although this cannot have a tangible effect on the modeling accuracy. We used the statistical properties of each of the input spectral bands for scaling them. The input data were split into two halves, i.e., training and test, to evaluate the performance of the model at a later stage. We define a function for this purpose and the train-test proportion considered is 75-25 percent.

Framework and Experimental Setup
In Figure 4, we propose a framework in which SVMs, MLP, and CNNs are applied on three different types of multispectral remote-sensing data to investigate their efficiency in discriminating between lithological units. The initial steps involved loading and preprocessing remote-sensing and ground-truth datasets as inputs to our modeling process. We scaled the input datasets in the next step to ensure that all the features were treated equally. For instance, neural networks are sensitive to the input data distribution because they naturally tend to give more importance to features with higher values [68]. The data can either be scaled in the range of zero to one (normalized) or minus one to one (standardized). In this study, we used standardized data to train the SVM and MLP models and normalized data for the CNN model, although this cannot have a tangible effect on the modeling accuracy. We used the statistical properties of each of the input spectral bands for scaling them. The input data were split into two halves, i.e., training and test, to evaluate the performance of the model at a later stage. We define a function for this purpose and the train-test proportion considered is 75-25 percent.
The number of pixels along the x-axis and y-axis of the OLI, ASTER, and Sentinel-2 images is 257 × 289, 513 × 577, and 770 × 866, respectively. As mentioned earlier, each pixel in these images represents a square cell covering an area of 900, 225, and 100 square meters on the ground surface. Implementing each ML method and creating models requires setting a number of hyperparameters. This study experimented with different values to create each model and selected those providing the most accurate result. In the case of SVM, we used a radial basis function (RBF) kernel and a tolerance value of 0.001 with no limit for the number of iterations. In MLP, we used 15 hidden neurons and a maximum of 400 iterations (epochs), using Adam as a solver. Moreover, we used the rectified linear unit function (ReLU) as an activation function for the hidden layer and the Softmax activation for the output layer. The number of pixels along the x-axis and y-axis of the OLI, ASTER, and Sentinel-2 images is 257 × 289, 513 × 577, and 770 × 866, respectively. As mentioned earlier, each pixel in these images represents a square cell covering an area of 900, 225, and 100 square meters on the ground surface. Implementing each ML method and creating models requires setting a number of hyperparameters. This study experimented with different values to create each model and selected those providing the most accurate result. In the case of SVM, we used a radial basis function (RBF) kernel and a tolerance value of 0.001 with no limit for the number of iterations. In MLP, we used 15 hidden neurons and a maximum of 400 iterations (epochs), using Adam as a solver. Moreover, we used the rectified linear unit function (ReLU) as an activation function for the hidden layer and the Softmax activation for the output layer.
As shown in Figure 5, our CNN model uses two convolution layers with a kernel size of 7 × 7, stride of 7, and fully linked layers. These layers are a network of serially connected dense layers used for classification. In a fully connected network, every neuron from the first layer is connected to every neuron in the second layer. We examined other kernel sizes and strides, and the best accuracy was achieved by the mentioned values. Model architectures are sensitive, and the same model architecture cannot be expected to provide similar accuracy using different kernel sizes. Therefore, a few minor modifications in the As shown in Figure 5, our CNN model uses two convolution layers with a kernel size of 7 × 7, stride of 7, and fully linked layers. These layers are a network of serially connected dense layers used for classification. In a fully connected network, every neuron from the first layer is connected to every neuron in the second layer. We examined other kernel sizes and strides, and the best accuracy was achieved by the mentioned values. Model architectures are sensitive, and the same model architecture cannot be expected to provide similar accuracy using different kernel sizes. Therefore, a few minor modifications in the architecture are expected. We used the ReLU activation function for the input layers and the Softmax activation function in the final layer for classification in the CNN model architecture. Furthermore, we used a root-mean-squared propagation optimizer with a maximum of 20 training epochs.
Remote Sens. 2021, 13, x FOR PEER REVIEW 10 of 20 architecture are expected. We used the ReLU activation function for the input layers and the Softmax activation function in the final layer for classification in the CNN model architecture. Furthermore, we used a root-mean-squared propagation optimizer with a maximum of 20 training epochs. We used the majority analysis with a three-by-three kernel as a post-classification tool available within the ENVI software package in order to change spurious pixels within a large single class to that class. The center pixel in the kernel was replaced with the class value that a majority of the pixels in the kernel have [69]. The majority analysis required a given center pixel weight to generalize each classified map, and a neighborhood was built around each pixel identified by the user-defined distance. The number of pixels belonging to various groups within this neighborhood are counted, and the center pixel is assigned to the class that corresponds to the rest of the pixels in the neighborhood [69].
Along with this study, a Jupyter notebook based on the Python programming language was released that helps exploration geologists quickly implement the proposed framework. The link to this notebook is available in the supplementary materials section. It enables the user to rapidly map lithological units by using different types of remotesensing data. A variety of packages have been used in this notebook, among which scikitlearn and TensorFlow packages are the most important. The scikit-learn package was used to create conventional models, i.e., SVM and MLP, and the TensorFlow package was applied to implement CNNs.

Receiver Operating Characteristics
The evaluation of model accuracy is a critical stage in every classification. In this study, receiver operating characteristics (ROCs) [70] were used to evaluate the classification accuracy of each output map. The ROC curves have a y-axis representing the truepositive (sensitivity) rate and an x-axis for false-positive rate (specificity), as shown in Figure A1. This means that the ideal point is in the top left corner of the chart, with a falsepositive rate of zero and a true positive rate of one. This is far from true, but it does imply that a greater area under the curve (AUC) [70] is generally preferable. The ROC curve for random chance is the 45° diagonal line linking (0,0) and (1,1). The gold-standard ROC curve is the line joining (0, 0) to (0, 1) and (0, 1) to (1, 1). In general, ROC curves fall between these two extremes. The area under the ROC curve is a summary metric that averages diagnostic precision over a range of test values ( Figure A1) [67].
The steepness of ROC curves is also significant, as it is optimal to maximize truepositive rate while also minimizing false-positive rates. In binary classification, ROC We used the majority analysis with a three-by-three kernel as a post-classification tool available within the ENVI software package in order to change spurious pixels within a large single class to that class. The center pixel in the kernel was replaced with the class value that a majority of the pixels in the kernel have [69]. The majority analysis required a given center pixel weight to generalize each classified map, and a neighborhood was built around each pixel identified by the user-defined distance. The number of pixels belonging to various groups within this neighborhood are counted, and the center pixel is assigned to the class that corresponds to the rest of the pixels in the neighborhood [69].
Along with this study, a Jupyter notebook based on the Python programming language was released that helps exploration geologists quickly implement the proposed framework. The link to this notebook is available in the supplementary materials section. It enables the user to rapidly map lithological units by using different types of remote-sensing data. A variety of packages have been used in this notebook, among which scikit-learn and TensorFlow packages are the most important. The scikit-learn package was used to create conventional models, i.e., SVM and MLP, and the TensorFlow package was applied to implement CNNs.

Receiver Operating Characteristics
The evaluation of model accuracy is a critical stage in every classification. In this study, receiver operating characteristics (ROCs) [70] were used to evaluate the classification accuracy of each output map. The ROC curves have a y-axis representing the true-positive (sensitivity) rate and an x-axis for false-positive rate (specificity), as shown in Figure A1. This means that the ideal point is in the top left corner of the chart, with a false-positive rate of zero and a true positive rate of one. This is far from true, but it does imply that a greater area under the curve (AUC) [70] is generally preferable. The ROC curve for random chance is the 45 • diagonal line linking (0,0) and (1,1). The gold-standard ROC curve is the line joining (0, 0) to (0, 1) and (0, 1) to (1,1). In general, ROC curves fall between these two extremes. The area under the ROC curve is a summary metric that averages diagnostic precision over a range of test values ( Figure A1) [67].
The steepness of ROC curves is also significant, as it is optimal to maximize truepositive rate while also minimizing false-positive rates. In binary classification, ROC curves are commonly used to investigate the performance of a classifier. The performance must be binarized to apply the ROC curve and ROC area to multi-label classification. One ROC curve can be drawn for each class, but a ROC curve can also be drawn by treating each member of the label indicator matrix as a binary predictor (micro-averaging). Another assessment criterion for multi-label classification is macro-averaging, which assigns equal weight to each label's classification [71].

Classified Lithological Maps
The proposed framework was applied to the spectral subsets of three different types of remote-sensing data that were selected based on the spectral characteristics of target geological features in the study area, i.e., lithological units. Three lithological maps for each data type were created by applying different ML methods (SVM, MLP, and CNNs), as shown in Figure 6. Each class represents a specific lithological unit, and there are nine classes on each map (Figure 2). The spatial resolution of maps is different, and it is based on the spatial resolution of the input data and varies from 10 to 30 m.
curves are commonly used to investigate the performance of a classifier. The performance must be binarized to apply the ROC curve and ROC area to multi-label classification. One ROC curve can be drawn for each class, but a ROC curve can also be drawn by treating each member of the label indicator matrix as a binary predictor (micro-averaging). Another assessment criterion for multi-label classification is macro-averaging, which assigns equal weight to each label's classification [71].

Classified Lithological Maps
The proposed framework was applied to the spectral subsets of three different types of remote-sensing data that were selected based on the spectral characteristics of target geological features in the study area, i.e., lithological units. Three lithological maps for each data type were created by applying different ML methods (SVM, MLP, and CNNs), as shown in Figure 6. Each class represents a specific lithological unit, and there are nine classes on each map (Figure 2). The spatial resolution of maps is different, and it is based on the spatial resolution of the input data and varies from 10 to 30 m. According to the classified maps, most of the study area consists of Quaternary, Oligocene igneous, and Eocene sedimentary rocks. The flysch units that date back to Eocene cover most of the area and mainly consist of shale and sandstone (class 5), and they can be seen as a light red to gray sandstone (class 9) in some areas. Igneous rocks include dacite (class 2), andesite (class 4), and quartz monzonite (class 1), which are scattered throughout the study area, mainly striking northwest-southeast. Quaternary deposits cover the rock units in some places and include colluvium scree and talus (class 3), younger composite alluvial fans and terraces (class 6), older composite alluvial fans and terraces (class 8), and riverbed and recent alluvium (class 7). Table 1 depicts the accuracy of each class, and Figure 7 demonstrates the micro and macro average ROC curve for all nine classified maps. According to the ROC curves shown in Figure 7, the lithological units were classified more accurately by using the combination of CNN and ASTER data. However, the result is almost the same as Sentinel-2 data. The AUC of micro and macro averages is close to one in these plots. Considering the AUCs, the MLP outperforms the SVM, and it is ranked second in terms of accuracy. On the other hand, after the ASTER data, the Sentinel-2 data yielded the most accurate results for lithological mapping in the study area, and the OLI is ranked last, which could be anticipated. The combination of SVM and OLI shows the lowest accuracy, where the micro average is 0.76, and the macro average is 0.84. In addition to the ROC curves, several rock samples were collected from the study area to validate the lithological maps, particularly in places with little ground-truth data. These samples were mostly collected from the places where rock types are classified as quartz monzonite, andesite, dacite, and sandstone, and three of them, along with their polished thin sections, are shown in Figure 8. The sampling locations are shown in Figure 1 and labeled as A-C. The results obtained by investigating the polished thin sections indicate that the classifiers have been successful in the sampling regions. Moreover, we analyzed three samples from the study area, as shown in Figure 1, labeled as D-F, using the inductively coupled plasma-mass spectrometry (ICP-MS) for determining the concentration values of gold, manganese, lead, and zinc. Based on the laboratory analysis, the margins of igneous units in the study area are of great importance in terms of economic mineralization, and the geochemical analyses confirm this observation (Table 2). shown in Figure 7, the lithological units were classified more accurately by using the combination of CNN and ASTER data. However, the result is almost the same as Sentinel-2 data. The AUC of micro and macro averages is close to one in these plots. Considering the AUCs, the MLP outperforms the SVM, and it is ranked second in terms of accuracy. On the other hand, after the ASTER data, the Sentinel-2 data yielded the most accurate results for lithological mapping in the study area, and the OLI is ranked last, which could be anticipated. The combination of SVM and OLI shows the lowest accuracy, where the micro average is 0.76, and the macro average is 0.84.

Discussion
Based on the prediction accuracy obtained by the ROC curves and the thin-section analysis of the rock samples collected from the study area, the combination of CNN and ASTER data provided the most accurate lithological map for our study area. Although the Sentinel-2 data provide a better spatial resolution in the VNIR bands, i.e., 10 m, and the number of bands in this range are more than the ASTER data, this data type was less successful in terms of classifying lithological units compared to the ASTER data; however, the contrast is insignificant. The ASTER data proved to be an efficient data type for alteration mapping [12,24] and, based on the results of this study, for lithological mapping. It is noteworthy that the results can probably be different in other regions depending on the extent and geological characteristics of the target area, such as dominant rocks and min- Figure 8. (a) Sample A was taken from the sandstone unit and the microscopic thin section shows angled quartz grains in a matrix involving clay minerals. (b) Sample B was taken from the dacite unit. The upper microscopic section shows biotite-shaped phenocrysts with anhydrite and iron oxide substitution and feldspar in the matrix. The lower section shows the partial replacement of plagioclase by calcite and sericite. (c) Sample C was taken from the quartz monzonite unit. In the upper microscopic section, plagioclase, hornblende, and epidote phenocrysts are obvious, and in the lower section, quartz phenocrysts, accompanied by the radial secondary growth of quartz, can be seen. Table 2. Results of the geochemical analysis of three samples taken from the study area ( Figure 2).

Discussion
Based on the prediction accuracy obtained by the ROC curves and the thin-section analysis of the rock samples collected from the study area, the combination of CNN and ASTER data provided the most accurate lithological map for our study area. Although the Sentinel-2 data provide a better spatial resolution in the VNIR bands, i.e., 10 m, and the number of bands in this range are more than the ASTER data, this data type was less successful in terms of classifying lithological units compared to the ASTER data; however, the contrast is insignificant. The ASTER data proved to be an efficient data type for alteration mapping [12,24] and, based on the results of this study, for lithological mapping. It is noteworthy that the results can probably be different in other regions depending on the extent and geological characteristics of the target area, such as dominant rocks and minerals and the spectral characteristics of the features to be mapped. The SWIR bands of ASTER data have great advantages for lithological mapping [72], particularly in an area with altered rock types, since they mostly show unique spectral behaviors in this range. However, even our most accurate computer-generated lithological map displays some inconsistencies compared to field observations. For example, sandstones are mapped inside dacite and andesite units in some areas, or andesite units are seen within quartz monzonite units. These issues might be addressed by collecting more training data or improving the classified maps through checking suspicious areas.
The SVM and MLP methods performed well in mapping lithological units in the study area and provided valuable information regarding the lithological units. However, CNNs were more successful in providing an accurate lithological map. Apart from computationally efficient characteristics of CNNs, we also had less noisy pixels in our classified maps by CNNs because of considering neighboring pixels in modeling and relying on the pattern and texture, not just one pixel at a time. The extremely high accuracy of the model we achieved by combining the CNNs and ASTER data can be considered as a consequence of the low number of mixed pixels in our testing set. A lower accuracy would likely be achieved in regions with a more complicated geological setting and mixed pixels. The model performed well in predicting test classes, because the classes were easily differentiable in the multidimensional space. Therefore, the model learned well how to distinguish between classes for pure pixels.
In MLP classifiers, the tested datasets require more hidden units, and the complexity is managed by holding the number of these units small, while the SVM complexity is independent of the dimension of the datasets. SVMs are based on structural risk minimization, while MLP classifiers implement empirical risk minimization. Another difference is the complexity of the network. MLP networks that implement global approximation strategies typically use very few hidden neurons. On the other hand, SVMs are based on a local approximation strategy and use a large number of hidden units [73]. However, the MLP network was generally more accurate than SVM for lithological classification using remotesensing data in this research. CNNs also involve some disadvantages; for example, CNNs do not encode the position and orientation of objects, they lack the ability to be spatially invariant to the input data, and a large amount of training data is needed. However, CNNs with the hyperparameters used in this study were the most successful method.
The map provided by the proposed framework can be considered as an initial lithological base map. This map can be improved by carrying out comprehensive fieldwork with predefined profiles and a reasonable spacing, along with collecting a required number of samples. The resultant map of this process can be worth interpreting and deciding whether to continue the exploration operation in a region. According to the laboratory analysis and field observations, gold, manganese, lead, and zinc mineralizations and the interaction of pyrite with manganese-impregnated siliceous veins have been found in the margins of igneous units and mapped accurately in the lithological maps. In addition, the geochemical analysis of the samples collected from potential locations supports the high probability of economic gold mineralization in the study area.
ML methods have been widely used in remote-sensing-data analysis, but there are still gaps in predictive uncertainty quantification. In general, it is difficult to assess the uncertainty of remote-sensing model applications. Model validation is usually performed by comparing it with ground truth or alternative information believed to represent ground truth. Different approaches have been developed to address validation issues, and different models are possible. Bayesian inference provides a principled approach for quantifying the uncertainty of model parameters. Recently, there have been major advances in Bayesian neural networks and Bayesian deep learning. However, these methods have not been much used for remote sensing, and their application for geological exploration is absent. In future studies, we are planning to use other deep learning methods to map geological features [34], such as Bayesian neural networks, since it provides principled uncertainty quantification via the posterior distribution [74]. These methods may provide a meaningful interpretation of remote-sensing data in the face of challenges, such as data noise, sparse datasets, and missing data. Uncertainty quantification using Bayesian inference can be used to predict the uncertainty associated with model parameters and data. As a result, applying these methods can play an essential role in the evolution of geological remote sensing and improve its efficiency in mineral exploration.

Conclusions
Nowadays, geoscientists are equipped with the latest groundbreaking and efficient ML methods to work with big data, such as remote-sensing data. In this study, three different ML methods were used to process three types of remote-sensing data by mapping lithological units of a region in the southeast of Iran. We classified the remote-sensing data and discriminated lithological units, providing valuable information for mineral exploration. We showed the efficiency of applying machine-and deep-learning techniques on remote-sensing data and observed that the CNN and ASTER data combination provides the most accurate lithological map of the study area based on the ROC curves, and the test data were successfully predicted. Such maps can be considered base maps for further geological fieldworks and a reliable factor aiding in deciding for mineral-exploration operations.
We addressed the challenges of mapping lithological units on the ground and proposed a framework to overcome them. Our framework presented in a Jupyter notebook is an opensource community tool for mapping lithological units by using multi-or hyperspectral data. This notebook can significantly enhance the ability of exploration geologists to map lithological units. It can be considered a fast, reliable, and low-cost approach for generating a remote-sensing evidential layer and delineating favorable loci for precious mineral deposits at any stage of an exploration program. The framework can be improved by optimizing SVM, MLP, and CNN hyperparameters. Moreover, other ML methods, such as random forest, naive Bayes, k-nearest neighbors, and minimum distance, can be added to this framework to compare their efficiency with other methods.

Data Availability Statement:
The datasets used in this study, including remote sensing and ground truth data, are available at https://github.com/sydney-machine-learning/deeplearning_lithology (accessed on 20 November 2021).

Acknowledgments:
The authors would like to thank L. S. Galvao for handling this paper and anonymous reviewers for providing constructive comments that improved the manuscript.

Conflicts of Interest:
The authors declare no conflict of interest.
Appendix A Table A1. Technical performance and attributes of Landsat 8, Sentinel-2, and ASTER data [46][47][48].  Figure A1. Three imaginary ROC curves reflecting the gold standard's diagnostic precision (lines A; AUC = 1) on the upper and left axes in the unit square, a normal ROC curve (curve B; AUC = 0.85), and a diagonal line leading to random chance (line C; AUC = 0.5) are shown. The ROC curve shifts toward A as the diagnostic-test precision increases and the AUC reaches 1 [75].