Lithological Mapping Based on Fully Convolutional Network and Multi-Source Geological Data

: Deep learning algorithms have found numerous applications in the ﬁeld of geological mapping to assist in mineral exploration and beneﬁt from capabilities such as high-dimensional feature learning and processing through multi-layer networks. However, there are two challenges associated with identifying geological features using deep learning methods. On the one hand, a single type of data resource cannot diagnose the characteristics of all geological units; on the other hand, deep learning models are commonly designed to output a certain class for the whole input rather than segmenting it into several parts, which is necessary for geological mapping tasks. To address such concerns, a framework that comprises a multi-source data fusion technology and a fully convolutional network (FCN) model is proposed in this study, aiming to improve the classiﬁcation accuracy for geological mapping. Furthermore, multi-source data fusion technology is ﬁrst applied to integrate geochemical, geophysical, and remote sensing data for comprehensive analysis. A semantic segmentation-based FCN model is then constructed to determine the lithological units per pixel by exploring the relationships among multi-source data. The FCN is trained end-to-end and performs dense pixel-wise prediction with an arbitrary input size, which is ideal for targeting geological features such as lithological units. The framework is ﬁnally proven by a comparative study in discriminating seven lithological units in the Cuonadong dome, Tibet, China. A total classiﬁcation accuracy of 0.96 and a high mean intersection over union value of 0.9 were achieved, indicating that the proposed model would be an innovative alternative to traditional machine learning algorithms for geological feature mapping.


Introduction
As a fundamental work in mineral exploration, geological mapping plays a crucial role by enabling the detection of geological features that involve lithological units and alterations.Several approaches have been applied for geological mapping to assist in the discovery of mineral deposits based on geological, geophysical, geochemical, and remote sensing data.These methods can be summarized as traditional field surveys, statistical processes, and more recent machine learning technologies (e.g., random forest, support vector machine, and logistic regression) [1][2][3][4].For example, Wang et al. [5] designed a hybrid method that comprises random forest and metric learning to delineate the spatial distribution of Himalayan leucogranites based on remote sensing images and geochemical data.This approach integrated the advantages of both geochemical and remote sensing, which is helpful for improving the recognition accuracy of Himalayan leucogranite.
As an essential part of machine learning algorithms with multiple hidden layers, deep learning algorithms have emerged as state-of-the-art methods for many important breakthroughs and have garnered increasing interest in the fields of pattern recognition, computer vision, and mineral prospectivity mapping [6][7][8][9][10][11][12][13][14].Deep learning algorithms are dominant in dealing with high-dimensional datasets for classification and prediction through multi-layer network learning [15].Taking a convolutional neural network (CNN) as an example, it is a type of feedforward neural network that contains convolution computation and a depth structure.CNN is popular in mineral exploration based on its sharedweights architecture and translation invariance, which can extract inner relationships from complex geological features, explore hidden metallogenic information, and describe the patterns that may be ignored by traditional machine learning methods [12,[16][17][18].
However, several difficulties arise associated with the identification of geological features using deep learning models.First, a single type of data resource cannot diagnose the characteristics of all lithological units.Second, commonly used deep learning algorithms require large amounts of known labels, whereas most of the current training datasets are built for 2D images with a fixed input size (e.g., MNIST, ImageNet, and CIFAR-10).Third, for lithological mapping, rather than outputting a certain class, segmenting the input image into several lithological units is required.Compared with traditional image-wise classification, a solution based on CNN tends to take a patch with each pixel as the center, assigns labels to this patch, and predicts the class centered at the corresponding patch based on the information from the whole patch.However, it is difficult to optimize the size of the patch, which determines the learning ability, and the segmentation of training images in lithological units of varying shapes, especially at the overlapping geological boundaries, yields to the classification performance owing to the redundant calculations on the overlapped and neighboring patches [19].
Geodata science, which is defined by analyzing the spatial associations between geological big data and known geoknowledge, is growing in acceptance and being used in geoscience [20,21].Geodata involves geological, geophysical, geochemical, and remote sensing data, extending from the surface to the depth of the earth to reveal the characteristics of geological features.For example, remote sensing images record electromagnetic waves, characterized by their wide view and high resolution; geochemical data indicate the enrichment and depletion of geochemical elements; and geophysical data reflect the gravity, permeability, conductivity, and radioactivity that can be assessed to explore hidden information under the earth.For the first problem, the incorporation of multi-source geodata is considered an effective and inexpensive approach for the comprehensive analysis of various geodata, especially in areas with limited geological data [5,[22][23][24][25].
Mineralization is a typical rare geological event that results in far fewer labeled samples for training deep neural networks [20].Data augmentation, by the means of flipping, rotation, and clipping, is often recommended to generate additional data for model training.However, these transformations carried on geodata may change the geological meaning of the training labels [12].The recently-developed fully convolutional network (FCN) provides an alternative solution to such problems.FCN is a semantic segmentation model that trains end-to-end and performs dense per-pixel prediction at an arbitrarily sized input image or labeled pixels, rather than a fixed input 2D/3D image size [26][27][28][29][30]. FCN abandons any of the fully connected layers at the end and only performs convolution operations to predict a class at each pixel of the input image, instead of only one class for the whole input.As a result, FCN can fully utilize limited labels and free from huge computational cost compared with the traditional image-wise or patch-wise classification in CNN [27].Furthermore, FCN has a capability to deeply understand the nonlinear complex relationship between multi-source geological data.In this case, the application of FCN facilitates the accuracy improvement of lithological boundaries, and therefore, is considered to be favorable for high-precision geological feature mapping.
Accordingly, this study aims to develop a promising framework (Figure 1) for lithological mapping, which provides available access to aid decision-making in targeting mineral deposits.Multi-source data fusion technology was first adopted to achieve a comprehensive analysis of geodata by integrating ASTER remote sensing images, PALSAR DEM data, geochemical exploration data, and aeromagnetic data at various scales.An FCN model motivated by the VGG-19 network was then developed to learn the global, local, and contextual features of the fused geodata for lithological mapping.A comparative study to discriminate among several lithological units in the Cuonadong dome, Tibet, China, was conducted to demonstrate the proposed framework, where researchers have discovered rare earth polymetallic mineralization (Be, Li, Ni, Ta, Bi, and Cs) in the Himalayan leucogranite within this area.

Multi-Source Data Fusion
A multispectral remote sensing image can be decomposed into high-frequency components (MS_bHi) and low-frequency components (MS_bLi) by image filtering algorithms, representing the spatial detail and background information, respectively.The distribution of geochemical elements refers to a series of physical processes, such as electronic transitions and atomic vibrations.These processes result in changes in spectral reflection that can be detected and recorded by remote sensing image sensors.In this regard, the mineral spectrum is the response to its chemical and physical components; hence, the low-frequency components of multispectral bands are spatially and genetically related to geochemical or geophysical information (geo), which can be calculated as [31]: where fi is the correlation coefficient between multispectral bands and geochemical or geophysical information, and bLi denotes the i band of the low-frequency component of the multispectral data.The low-resolution geochemical or geophysical data can be fused by the reconstructed high-frequency component using the correlation coefficient above: where geof is the fused high-resolution image, geoc is the resampled geochemical or geophysical data with the same resolution as the high-frequency component, and bLi is the i band of the high-frequency component of the multispectral data.

Fully Convolutional Network
FCN was developed on the basis of a classical CNN and was initially designed for pixel-wise image semantic segmentation [26,32,33].Semantic segmentation, known as A brief overview of the proposed framework.Multi-source data fusion technology was first adopted to integrate remote sensing images, geochemical exploration data, and aeromagnetic data at various scales.An FCN model was then applied to learn the global, local, and contextual features of the fused geodata for lithological mapping.The training and validation databases were built using the sliding windows method based on a labeled geological map.

Multi-Source Data Fusion
A multispectral remote sensing image can be decomposed into high-frequency components (MS_b Hi ) and low-frequency components (MS_b Li ) by image filtering algorithms, representing the spatial detail and background information, respectively.The distribution of geochemical elements refers to a series of physical processes, such as electronic transitions and atomic vibrations.These processes result in changes in spectral reflection that can be detected and recorded by remote sensing image sensors.In this regard, the mineral spectrum is the response to its chemical and physical components; hence, the low-frequency components of multispectral bands are spatially and genetically related to geochemical or geophysical information (geo), which can be calculated as [31]: where f i is the correlation coefficient between multispectral bands and geochemical or geophysical information, and b Li denotes the i band of the low-frequency component of the multispectral data.The low-resolution geochemical or geophysical data can be fused by the reconstructed high-frequency component using the correlation coefficient above: where geo f is the fused high-resolution image, geo c is the resampled geochemical or geophysical data with the same resolution as the high-frequency component, and b Li is the i band of the high-frequency component of the multispectral data.

Fully Convolutional Network
FCN was developed on the basis of a classical CNN and was initially designed for pixel-wise image semantic segmentation [26,32,33].Semantic segmentation, known as dense classification, aims to classify the category of each pixel in an image according to the annotation.A simple FCN architecture is composed of convolution layers, pool layers, and deconvolution layers that are connected by activation functions, such as the sigmoid, ReLU, and Tanh functions.Figure 2 shows the visualization of the FCN architecture, where the first part is a convolution network similar to a CNN, and the second part is a deconvolution network with skip connection.In more detail, the convolution layer is designed as a feature extraction layer to learn multi-layer features and extract abstract and advanced information from the input data.The pooling layer helps transform high-dimensional features into low-dimensional representative features, thereby reducing the spatial size and computing parameters.Regarding the deconvolution layer, it can be regarded as the reverse process of the convolution layer and the pooling layer, which increases the size of the feature maps by fusing multi-scale global and local feature information with skip connection.
Remote Sens. 2021, 13, x FOR PEER REVIEW 4 of 17 dense classification, aims to classify the category of each pixel in an image according to the annotation.A simple FCN architecture is composed of convolution layers, pool layers, and deconvolution layers that are connected by activation functions, such as the sigmoid, ReLU, and Tanh functions.Figure 2 shows the visualization of the FCN architecture, where the first part is a convolution network similar to a CNN, and the second part is a deconvolution network with skip connection.In more detail, the convolution layer is designed as a feature extraction layer to learn multi-layer features and extract abstract and advanced information from the input data.The pooling layer helps transform high-dimensional features into low-dimensional representative features, thereby reducing the spatial size and computing parameters.Regarding the deconvolution layer, it can be regarded as the reverse process of the convolution layer and the pooling layer, which increases the size of the feature maps by fusing multi-scale global and local feature information with skip connection.(

1) Deconvolution network
In a CNN, the convolutional operation decreases both the size and resolution of the feature image.To recover the size of the feature map as input, the FCN replaces all of the last fully connected layers of the CNN with a 1  1 convolution layer.The deconvolution network enables the feature map obtained by the convolution and pooling layers to be restored to the original resolution by a continuous upsampling operation, also called bilinear interpolation.This ingenious operation causes each predicted value to correspond to the pixel in the input image one by one and achieves end-to-end and per-pixel classification [35,36].
(2) Skip connection The FCN also takes advantage of skip connections to recover lost detailed information during the downsampling operation [27].Skip connections combine information from the lower and higher layers iteratively, thereby supplementing fine details in the lower layers for the global feature in the higher layers.As shown in Figure 2, the output feature map is obtained by upsampling 32 times directly after the convolution operation; this network is called FCN-32s.When the step size of the deconvolution is set to 16, the output feature map is upsampled 16 times, and connected with the corresponding pooled  [34], where the first part is a convolution network similar to a CNN, and the second part is a deconvolution network with skip connection.The final output is obtained by a continuous upsampling operation and connecting with the corresponding pooled map.
(1) Deconvolution network In a CNN, the convolutional operation decreases both the size and resolution of the feature image.To recover the size of the feature map as input, the FCN replaces all of the last fully connected layers of the CNN with a 1 × 1 convolution layer.The deconvolution network enables the feature map obtained by the convolution and pooling layers to be restored to the original resolution by a continuous upsampling operation, also called bilinear interpolation.This ingenious operation causes each predicted value to correspond to the pixel in the input image one by one and achieves end-to-end and per-pixel classification [35,36]. (

2) Skip connection
The FCN also takes advantage of skip connections to recover lost detailed information during the downsampling operation [27].Skip connections combine information from the lower and higher layers iteratively, thereby supplementing fine details in the lower layers for the global feature in the higher layers.As shown in Figure 2, the output feature map is obtained by upsampling 32 times directly after the convolution operation; this network is called FCN-32s.When the step size of the deconvolution is set to 16, the output feature map is upsampled 16 times, and connected with the corresponding pooled map, this network is called FCN-8s.A further skip is added to make predictions based on the feature map after 8× upsampling, which is called FCN-8s [26].

Workflow and FCN Model Architecture
An FCN-8s framework (Figure 1) was constructed for lithological mapping in this study.The implemented FCN-8s model was developed from the well-known VGG-19 network.VGG-19 is an excellent CNN model developed by the visual geometry group of Oxford University, including 16 convolutional layers, 5 max-pooling layers, and 3 fully connected layers, each of which is followed by ReLU as a nonlinear activation function (Table 1) [36].VGG-19 is capable of reducing the number of parameters and retaining more subtle feature information by stacking multiple smaller-size convolution kernels (3 × 3) and max-pooling kernels (2 × 2), which contributes to the fitting of the nonlinear structure of data, which makes the network more robust.Moreover, the padding technique, a process of adding layers of zeros to the input images, is used in VGG-19 to restore the spatial resolution of the feature map.The FCN-8s model discards the last three fully connected layers and replaces them by convolutional layers with a 1 × 1 sized filter.This special design ensures that the output map keeps the same size as the original input map through two 2 × 2 deconvolutional layers and one with a step size of 8.The FCN-8s model performs pixel-wise prediction using the SoftMax classifier, where the final classification map is obtained by fusing the output abstracted high-level information with fine low-level information from the third and fourth pooling layer using a skip connection [26].

Model Evaluation Metrics
The confusion matrix and intersection over union (IoU) values were employed to evaluate the performance of the proposed model.A confusion matrix is a summarized table used to quantitatively measure the classification accuracy of each variable and provide a holistic view of how well a classification model is performing [37,38].The confusion matrix describes the visual representation of the actual labels and predicted values, and where it makes misclassifications by calculating the pixel accuracy.
IoU is a standard metric for segmentation assessment that is specially designed to evaluate how close the predicted value is to the labeled reference [39].IoU is simply calculated through an overlapping ratio between the predictions and labels (intersection) over their total surface (union) [40].In addition, the mean IoU (mIoU) is more commonly accepted by considering the average IoU of each class.Generally, the greater the mIoU, the better the model performs.

Geological Setting
Recently, geologists have discovered excellent rare metal metallogenic potential related to leucogranite, such as Be, Ni, Ta, W, and Sn, in particular Himalayan leucogranites in the Himalayan orogenic belt [41][42][43][44][45][46][47][48].Furthermore, a large-scale Be polymetallic deposit has been found in the Cuonadong dome, which is located in the eastern Himalaya, China.The predicted reserves of WO 3 and BeO are more than 300,000 and 500,000 tons, respectively, indicating a potential prospecting indicator for rare metal deposits of Himalayan leucogranite [49].
evaluate how close the predicted value is to the labeled reference [39].IoU is simply calculated through an overlapping ratio between the predictions and labels (intersection) over their total surface (union) [40].In addition, the mean IoU (mIoU) is more commonly accepted by considering the average IoU of each class.Generally, the greater the mIoU, the better the model performs.

Geological Setting
Recently, geologists have discovered excellent rare metal metallogenic potential related to leucogranite, such as Be, Ni, Ta, W, and Sn, in particular Himalayan leucogranites in the Himalayan orogenic belt [41][42][43][44][45][46][47][48].Furthermore, a large-scale Be polymetallic deposit has been found in the Cuonadong dome, which is located in the eastern Himalaya, China.The predicted reserves of WO3 and BeO are more than 300,000 and 500,000 tons, respectively, indicating a potential prospecting indicator for rare metal deposits of Himalayan leucogranite [49].

Data and Preprocessing
Four types of geodata were collected for lithologic mapping in the study area, including an ASTER remote sensing image that covers nine visible near infrared (VNIR) and short-wave infrared (SWIR) bands with spatial resolutions of 15 and 30 m, respectively (Figure 4a), geochemical exploration data at a scale of 1:200,000 with seven major oxides (SiO 2 , Al 2 O 3 , CaO, MgO, Fe 2 O 3 , Na 2 O, and K 2 O) (Figure 4b), high-precision PALSAR DEM data representing the topographic surface of geological features (Figure 4c), and aeromagnetic data at a scale of 1:200,000 (Figure 4d).The ASTER and PALSAR DEM images were captured on 17 February 2002 and 4 April 2018 in Level 1T, and were provided for free by the United States Geological Survey.The geochemical data were determined using X-ray fluorescence from the Chinese National Geochemical Mapping Project [57].The aeromagnetic data from the airborne total magnetic intensity dataset were compiled by the China Geological Survey.
ing an ASTER remote sensing image that covers nine visible near infrared (VNIR) and short-wave infrared (SWIR) bands with spatial resolutions of 15 and 30 m, respectively (Figure 4a), geochemical exploration data at a scale of 1:200,000 with seven major oxides (SiO2, Al2O3, CaO, MgO, Fe2O3, Na2O, and K2O) (Figure 4b), high-precision PALSAR DEM data representing the topographic surface of geological features (Figure 4c), and aeromagnetic data at a scale of 1:200,000 (Figure 4d).The ASTER and PALSAR DEM images were captured on 17 February 2002 and 4 April 2018 in Level 1T, and were provided for free by the United States Geological Survey.The geochemical data were determined using X-ray fluorescence from the Chinese National Geochemical Mapping Project [57].The aeromagnetic data from the airborne total magnetic intensity dataset were compiled by the China Geological Survey.The preprocessing of ASTER remote sensing data includes radiation correction, atmospheric correction, and orthophoto correction.More specifically, the SWIR bands were first resampled to 15 m, consistent with the same resolution as the VNIR bands.Then, an ENVI's FLAASH module was applied to remove the scattering and absorption effects from the atmosphere and correct image distortion caused by radiation errors.Orthophoto correction was also performed to eliminate the topography effect from topographic relief The preprocessing of ASTER remote sensing data includes radiation correction, atmospheric correction, and orthophoto correction.More specifically, the SWIR bands were first resampled to 15 m, consistent with the same resolution as the VNIR bands.Then, an ENVI's FLAASH module was applied to remove the scattering and absorption effects from the atmosphere and correct image distortion caused by radiation errors.Orthophoto correction was also performed to eliminate the topography effect from topographic relief with the help of the RPC orthorectification module in ENVI.These corrections contribute to the identification accuracy of targets in mountainous areas.In addition, the aeromagnetic data are recommended to be filtered by magnetization conversion; the geochemical data should be transformed by the centered log-ratio transformations for addressing the closure effect of the compositional data [58,59].The multi-source data fusion technology was then employed to fuse each element of the geochemical data (2 km) and aeromagnetic data (2 km) with nine VNIR-SWIR (15 m) bands in the ASTER images, according to the correlations between the spectral reflectance and geochemical element concentration or magnetic intensity.The fused image increases the spatial resolution of geochemical and aeromagnetic data from 2 km to 15 m, providing additional diagnostic information for lithological mapping.Figure 5 displays the fused nine-channel image (1611 × 1650 × 9 pixels) that merged ASTER, geochemical, aeromagnetic, and DEM data and the labeled geological map (1611 × 1650 pixels), containing seven lithologic units and an additional class covered with lakes and snow.
closure effect of the compositional data [58,59].The multi-source data fusion technology was then employed to fuse each element of the geochemical data (2 km) and aeromagnetic data (2 km) with nine VNIR-SWIR (15 m) bands in the ASTER images, according to the correlations between the spectral reflectance and geochemical element concentration or magnetic intensity.The fused image increases the spatial resolution of geochemical and aeromagnetic data from 2 km to 15 m, providing additional diagnostic information for lithological mapping.Figure 5 displays the nine-channel image (1611  1650  9 pixels) that merged ASTER, geochemical, aeromagnetic, and DEM data and the labeled geological map (1611  1650 pixels), containing seven lithologic units and an additional class covered with lakes and snow.

Building of Training Datasets
The input image was split into several part tiles of 256  256  9 pixels and fed into the FCN-8s model one by one.The training database was built using the sliding windows method at a stride size of 20 (Figure 1).A total of 1000 samples were collected by traversing the entire image and the geological map synchronously, occupying less than 1% of the total available patches.Each sample contained one or more labeled lithological classes.80% of which were randomly selected as the training samples, and the remaining 20% were used to validate the model.

Model Training
The hyperparameters of the FCN-8s model, including epoch, batch size, maximum iterations, and learning rate, were carefully optimized and set to 100, 5, 7000, and 0.0001, respectively.The stochastic gradient descent with momentum was selected as the dominant optimizer for solving the model optimization of weighting parameters [60].The cross-entropy loss function continued to drop with the increasing iterative training times and finally reached a state of convergence, suggesting that the model was optimally tuned (Figure 6).Here, the loss is a summation of the errors made for each training pixel, indicating the performance of the model after each iteration of optimization [61].It is worth mentioning that FCN enables the handling of unbalanced classes by weighting or sampling the loss [27].

Building of Training Datasets
The input image was split into several part tiles of 256 × 256 × 9 pixels and fed into the FCN-8s model one by one.The training database was built using the sliding windows method at a stride size of 20 (Figure 1).A total of 1000 samples were collected by traversing the entire image and the geological map synchronously, occupying less than 1% of the total available patches.Each sample contained one or more labeled lithological classes.80% of which were randomly selected as the training samples, and the remaining 20% were used to validate the model.

Model Training
The hyperparameters of the FCN-8s model, including epoch, batch size, maximum iterations, and learning rate, were carefully optimized and set to 100, 5, 7000, and 0.0001, respectively.The stochastic gradient descent with momentum was selected as the dominant optimizer for solving the model optimization of weighting parameters [60].The cross-entropy loss function continued to drop with the increasing iterative training times and finally reached a state of convergence, suggesting that the model was optimally tuned (Figure 6).Here, the loss is a summation of the errors made for each training pixel, indicating the performance of the model after each iteration of optimization [61].It is worth mentioning that FCN enables the handling of unbalanced classes by weighting or sampling the loss [27].

Lithological Mapping
The fused nine-channel image was then fed into the trained model to predict th tribution of seven lithological classes, which was implemented with the help of th sorFlow 1.2 platform in Python 3.6.Figure 7 presents the classification map obtain the optimized FCN-8s model.From a visual point of view, almost all identified lithol units were consistent with the labeled geological map.From a quantitative point of the confusion matrix (Figure 8) showed that the model achieved excellent perform both in total accuracy (0.96) and the pixel accuracy for each class, as well as a high (0.9).Partial misclassifications were observed to exist at the edge and splice of dif lithologic units.For example, strip-shaped early Paleozoic marbles were widely m sified as Paleozoic biotite quartz schists, with pixel accuracies of 0.83 and 0.92, r tively.One possible reason is the difficulty associated with learning the correlatio tween the samples due to insufficient data caused by fewer outcropping areas; an reason is the highly similar mineral composition under strong skarnization, thus res in misclassification and omission.Splitting the input image into smaller parts is kno be a solution for solving these problems; however, such an operation expands the tr set, which may increase the computational burden.Nevertheless, the above evi demonstrates that the FCN-8s model has a capability to provide a solution for ma geological target features by exploring the deeper hidden relationships among source data.The integration of geochemical, geophysical, spatial, spectral, and graphic data in remote sensing images provides additional supplemental inform from various perspectives and facilitates the discrimination of lithological units.

Lithological Mapping
The fused nine-channel image was then fed into the trained model to predict the distribution of seven lithological classes, which was implemented with the help of the TensorFlow 1.2 platform in Python 3.6.Figure 7 presents the classification map obtained by the optimized FCN-8s model.From a visual point of view, almost all identified lithological units were consistent with the labeled geological map.From a quantitative point of view, the confusion matrix (Figure 8) showed that the model achieved excellent performance both in total accuracy (0.96) and the pixel accuracy for each class, as well as a high mIoU (0.9).Partial misclassifications were observed to exist at the edge and splice of different lithologic units.For example, strip-shaped early Paleozoic marbles were widely misclassified as Paleozoic biotite quartz schists, with pixel accuracies of 0.83 and 0.92, respectively.One possible reason is the difficulty associated with learning the correlations between the samples due to insufficient data caused by fewer outcropping areas; another reason is the highly similar mineral composition under strong skarnization, thus resulting in misclassification and omission.Splitting the input image into smaller parts is known to be a solution for solving these problems; however, such an operation expands the training set, which may increase the computational burden.Nevertheless, the above evidence demonstrates that the FCN-8s model has a capability to provide a solution for mapping geological target features by exploring the deeper hidden relationships among multi-source data.The integration of geochemical, geophysical, spatial, spectral, and topographic data in remote sensing images provides additional supplemental information from various perspectives and facilitates the discrimination of lithological units.To further illustrate the effectiveness of the FCN-8s model in lithological mapping, a comparative study was implemented using the classic random forest algorithm with fused data.Random forest is a machine learning method that is built on multiple decision trees using the idea of ensemble learning [62].Random forest is widely used in geological feature mapping, benefitting from its high classification accuracy and low bias [63,64].The parameters of the random forest model-number of trees (ntree) and number of variables at each node (mtry)-were set to 200 and 3, respectively; the training set was built by selecting 10% of samples (pixels) randomly in each class, which is consistent with previous studies [5], for quantitative comparison.
Figure 9 shows the pixel-wise classification map obtained using the random forest model with the fused data.Evidently, FCN-8s outperformed the random forest model with little "salt-and-pepper" noise and clear geological boundaries.Moreover, the FCN model restrained numerous misclassifications that occurred in the classification map delineated by random forest, such as the Paleozoic biotite quartz schist, early Paleozoic marble, and Himalayan leucogranites (Figure 10); this is because FCN can extract deeper semantic information per layers by convolution network and recover the size of feature maps by deconvolution operation, especially the details of the boundaries.Compared with the confusion matrix of random forest (Figure 11), the FCN-8s model improved the mean IoU from 0.85 to 0.90 and the total classification accuracy by more than 5%.For lithologic units with large coverage, such as Jurassic sandstone, slate and Quaternary strata, no significant improvements were observed, owing to the obvious differences in mineral composition that are easily discriminated.For those with minor coverage, such as Paleozoic marble and Triassic sandstone and slate, there was a noticeable improvement in classification accuracy of up to 8%.Nevertheless, quite a few Paleozoic biotite quartz schists were wrongly divided into early Paleozoic marbles by FCN-8s or the random forest model.On the one hand, these two lithologic units are mainly composed of carbonate minerals, which brings similar geochemical and spectral characteristics that are difficult to distinguish.On the other hand, the irregular distribution of lithological units decreases the size of the receptive field in the neural network, thus limiting the learning of lithological features and impeding the improvement of classification performance.

Discussion
Himalayan leucogranite has received increasing attention owing to its confirmed potential to host rare earth polymetallic mineral resources.Hence, how to map the spatial distribution of Himalayan leucogranites is considered as a primary task in exploring rare metal deposits.Fortunately, the distinctive geochemical, geophysical, and spectral properties are beneficial to the discrimination of Himalayan leucogranite.Previous studies have delineated both the large-scale and small-scale spatial distributions of leucogranites in the Himalayan orogenic belt [65] and Cuonadong dome [5,64], respectively, achieving a high identification accuracy of 0.84 by integrating geochemical and remote sensing images, with the support of random forest [5].
However, finding an approach to better identify geological features has always been a challenge.Lithology is formed under the joint action of a variety of geological features, which have spatially complex structural characteristics in geophysical, geochemical and remote sensing spectral information.The method mentioned above belongs to the shallow machine learning algorithm, which is powerless facing the complex nonlinear relationships and spatial patterns among multi-source data, and thus may restrict the rare metal

Discussion
Himalayan leucogranite has received increasing attention owing to its confirmed potential to host rare earth polymetallic mineral resources.Hence, how to map the spatial distribution of Himalayan leucogranites is considered as a primary task in exploring rare metal deposits.Fortunately, the distinctive geochemical, geophysical, and spectral properties are beneficial to the discrimination of Himalayan leucogranite.Previous studies have delineated both the large-scale and small-scale spatial distributions of leucogranites in the Himalayan orogenic belt [65] and Cuonadong dome [5,64], respectively, achieving a high identification accuracy of 0.84 by integrating geochemical and remote sensing images, with the support of random forest [5].
However, finding an approach to better identify geological features has always been a challenge.Lithology is formed under the joint action of a variety of geological features, which have spatially complex structural characteristics in geophysical, geochemical and remote sensing spectral information.The method mentioned above belongs to the shallow machine learning algorithm, which is powerless facing the complex nonlinear relationships and spatial patterns among multi-source data, and thus may restrict the rare metal exploration in this area.Accordingly, the results of this study improve the classification accuracy of leucogranites by 5% in the same study area after adding DEM topographic and aeromagnetic information.Remote sensing images provide more diagnostic bands and enrich the spatial details of geochemical and geophysical data; DEM data supplement the analysis of topographic characteristics related to weathering and erosion.The integration of the three types of geodata captures the complementary advantages of multi-source data.On this basis, this study introduced an FCN-8s model to promote the discrimination of highly similar lithological units, which can provide sufficient training samples and make it less expensive computationally than the patch-based approach; this is because no redundant operations and repeated calculations need to be performed on neighboring patches.The FCN-8s model further reaches considerable classification accuracy of up to 0.96 by learning the deep-level context correlation between multi-source data.The above descriptions prove the effectiveness of the proposed FCN-8s in mapping geological target features for mineral exploration.

Conclusions
This study introduces an accessible and robust way for geological feature mapping by incorporating deep learning algorithm and multi-source data fusion technology, aiming to improve the classification accuracy of lithological units.Multi-source data fusion technology was first employed to provide abundant diagnostic information.An FCN classification model was then built based on the well-known VGG-19 network to identify the distribution of geological features.This joint approach was illustrated by a case study in mapping seven lithological units in the Cuonadong dome, Tibet, China, where rare earth polymetallic deposits have been discovered.Three conclusions can be drawn based on the results: (1) The multi-source data fusion technology integrates ASTER remote sensing images, geochemical exploration data, PALSAR DEM data, and aeromagnetic data at various scales, providing a comprehensive analysis of geodata rather than a single type of data resource.(2) FCN is a specially designed semantic segmentation model that dominates in end-toend and pixel-wise prediction with an arbitrary input size.FCN retains the advantages of feature extraction in CNN and solves the problem of classifying each pixel in an image through deconvolution operations and skip connections, making it an innovative alternative for lithological mapping.(3) A comparative study was carried out, proving that the proposed framework is effective and successful in geological feature mapping from the viewpoints of vision and quantification.The proposed FCN-8s model increased the classification accuracy of leucogranites by 9% compared to that reported in previous studies by extracting deeper-level hidden information from multi-source data.
Sens. 2021,13,  x FOR PEER REVIEW 3 of 17 data, geochemical exploration data, and aeromagnetic data at various scales.An FCN model motivated by the VGG-19 network was then developed to learn the global, local, and contextual features of the fused geodata for lithological mapping.A comparative study to discriminate among several lithological units in the Cuonadong dome, Tibet, China, was conducted to demonstrate the proposed framework, where researchers have discovered rare earth polymetallic mineralization (Be, Li, Ni, Ta, Bi, and Cs) in the Himalayan leucogranite within this area.

Figure 1 .
Figure1.A brief overview of the proposed framework.Multi-source data fusion technology was first adopted to integrate remote sensing images, geochemical exploration data, and aeromagnetic data at various scales.An FCN model was then applied to learn the global, local, and contextual features of the fused geodata for lithological mapping.The training and validation databases were built using the sliding windows method based on a labeled geological map.

Figure 1 .
Figure1.A brief overview of the proposed framework.Multi-source data fusion technology was first adopted to integrate remote sensing images, geochemical exploration data, and aeromagnetic data at various scales.An FCN model was then applied to learn the global, local, and contextual features of the fused geodata for lithological mapping.The training and validation databases were built using the sliding windows method based on a labeled geological map.

Figure 2 .
Figure 2. FCN architecture design.The model was constructed from the VGG-19 network[34], where the first part is a convolution network similar to a CNN, and the second part is a deconvolution network with skip connection.The final output is obtained by a continuous upsampling operation and connecting with the corresponding pooled map.

Figure 2 .
Figure 2. FCN architecture design.The model was constructed from the VGG-19 network[34], where the first part is a convolution network similar to a CNN, and the second part is a deconvolution network with skip connection.The final output is obtained by a continuous upsampling operation and connecting with the corresponding pooled map.

Figure 4 .
Figure 4. Four types of geodata: (a) false-color composites of ASTER image (bands 3, 2, and 1) with a spatial resolution of 15 m, provided by the United States Geological Survey; (b) geochemical samples and concentration distribution at a scale of 1:200,000 (taking Fe2O3 as an example), provided by the Chinese National Geochemical Mapping Project; (c) high-precision PALSAR DEM data with a spatial resolution of 12.5 m, provided by the United States Geological Survey; and (d) aeromagnetic data at a scale of 1:200,000, provided by the China Geological Survey.

Figure 4 .
Figure 4. Four types of geodata: (a) false-color composites of ASTER image (bands 3, 2, and 1) with a spatial resolution of 15 m, provided by the United States Geological Survey; (b) geochemical samples and concentration distribution at a scale of 1:200,000 (taking Fe 2 O 3 as an example), provided by the Chinese National Geochemical Mapping Project; (c) high-precision PALSAR DEM data with a spatial resolution of 12.5 m, provided by the United States Geological Survey; and (d) aeromagnetic data at a scale of 1:200,000, provided by the China Geological Survey.

Figure 5 .
Figure 5. Maps showing (a) the fused multi-resource data of four types of geodata, increasing the spatial resolution of geochemical and aeromagnetic data from 2 km to 15 m and (b) labeled image containing seven lithologic units based on the geological map.

Figure 5 .
Figure 5. Maps showing (a) the fused multi-resource data of four types of geodata, increasing the spatial resolution of geochemical and aeromagnetic data from 2 km to 15 m and (b) labeled image containing seven lithologic units based on the geological map.

Figure 6 .
Figure 6.Plot of loss function versus number of iterations for training and verification sets, i ing that the model was optimally tuned.

Figure 6 .
Figure 6.Plot of loss function versus number of iterations for training and verification sets, indicating that the model was optimally tuned.

Figure 7 .
Figure 7. Classification map obtained by the FCN-8s model and fused data, indicating that almost all identified lithological units were visually consistent with the labeled geological map.

17 Figure 9 .
Figure 9. Classification map obtained by random forest model and fused data.The result show obvious "salt-and-pepper" noise.

Figure 9 .
Figure 9. Classification map obtained by random forest model and fused data.The result show obvious "salt-and-pepper" noise.

Figure 9 .
Figure 9. Classification map obtained by random forest model and fused data.The result show obvious "salt-and-pepper" noise.

Figure 10 .
Figure 10.Magnified boundaries of lithological units.It is evident that FCN-8s outperformed the random forest model with clear geological boundaries and few misclassifications.

Figure 10 . 17 Figure 11 .
Figure 10.Magnified boundaries of lithological units.It is evident that FCN-8s outperformed the random forest model with clear geological boundaries and few misclassifications.Remote Sens. 2021, 13, x FOR PEER REVIEW 13 of 17