Retrieval of Soil Heavy Metal Content for Environment Monitoring in Mining Area via Transfer Learning

: Monitoring environmental pollution sources is an ongoing issue that must be addressed to reduce risks to public health, food safety, and the environment. However, retrieving topsoil heavy metal content at a low cost for environmental monitoring in mining areas is challenging. Therefore, this study proposes a network model based on transfer learning theory and a back propagation (BP) network optimized by a genetic algorithm (GA), taking the Daxigou mining area in Shaanxi Province, China, as a case study. Firstly, visible and near-infrared spectrum data from Landsat8 satellite images, digital elevation models, and geochemical data from ﬁeld-collected soil samples were used to extract environmental factor candidates indicating the content and spatial distribution of certain heavy metals, including copper (Cu) and lead (Pb). Secondly, each element was correlated with environmental factors and a multicollinearity test was performed to determine the optimal factor set. Then, the BP network optimized by GA was pre-trained with sample data collected in 2017 and retrained with minimal sample data from 2019 using the parameter transfer learning method, allowing spatial distribution mapping of the Cu and Pb content in topsoil of the Daxigou mining area in 2019. From the validation results using ﬁeld-collected data, the root mean square error (RMSE) and mean relative error (MRE) values using the proposed model, respectively, reduced by 4.688 mg/kg and 1.533 mg/kg for Cu and reduced by 1.586 mg/kg and 1.232 mg/kg for Pb compared to the traditional GA-BP model. Thus, conclusions can be drawn that our proposed Tr-GA-BP network performs well, requiring 16 training samples collected in 2019. In addition, the content of Cu is the highest; Pb is the second highest in the study area. Both of them were spatially distributed mainly in the exploitation, slag stacking, roadside, etc., consistent with ﬁeld investigation results.


Introduction
Insufficiently treated wastewater, dust, and municipal and industrial waste, especially from mining activities, have caused an increase in the content of heavy metals in soil and groundwater over the past decades. There has been a progressive degradation of the environment and a serious threat to food safety and public health [1,2]. One of the main factors of negative human impact on the natural environment is the release of heavy metals, which pose a serious threat to living organisms [3]. This is an unfavorable and dangerous phenomenon because compounds of such elements as copper, chromium, cadmium, or lead are not biodegradable and accumulate in living organisms, thus passing into the trophic chain and posing a threat to human health and even life. Therefore, efficiently investigating and monitoring the circumstances of soil threatened by heavy metal, especially in mining areas, for pollution control, ecological protection, and public health, is a key scientific problem currently faced by China [4].
To address these issues, scholars have performed many studies. For example, scholars [5,6] used visible/near-infrared data measured by a hand-held spectrometer to construct a regression model for the retrieval of the soil heavy metals in a farmland area; Gu et al. [7] used laser-induced break-down spectroscopy (LIBS) measurement technology with a combination of laboratory analysis data from soil samples to map the spatial distribution of the soil heavy metal content. However, such site-by-site measurement technology has a high cost for measurement work in the field for large-scale pollution investigations.
Hyperspectral remote sensing imagery has been proven to be effective for directly or indirectly reflecting the characteristics of soil-and vegetation-covered surfaces at a large scale and short period. Tan et al. [8] proposed estimating the spatial distribution of heavy metal in agricultural soils using airborne hyperspectral imaging and random forest. Zhang et al. [9] made contributions on the issue in soils of potentially polluted sites based on unmanned aerial vehicle (UAV) hyperspectral imagery. The authors [10,11] summarized previous research on soil heavy metal content estimation using different data sources and analyzed the ongoing challenges and existing issues.
However, the cost of acquiring hyperspectral imagery with medium or high spatial resolution is usually high. Therefore, some scholars, such as Peng et al. [12], utilized Landsat8 multi-spectral imagery, spectral indices, and auxiliary environmental variables to model and map the spatial distribution of heavy metals in Qatari soils. Investigating these studies, the selection of factors that could be effective for the retrieval of soil heavy metal content is a focus of research so as to make up the insufficiency of spectral information of multispectral imagery.
As stated by Wang et al. [10], soil is a complex mixed system composed of many components and affected by a large number of environmental variables; thus, it is difficult to explicitly determine the relationship between observations and the content of soil heavy metal from physical and chemical mechanisms alone. Therefore, choosing an effective regression model also is vital to improve the precision of the retrieval of soil heavy metal content. According to the previous related publications, e.g., [9,[13][14][15], statistical and machine learning models such as partial least squares regression (PLSR), support vector regression (SVR), M5 model tree, extreme learning machines, random forest, or back propagation are popular for modeling the complex quantitative retrieval problems due to their advantages of simple structure and low training cost compared with popular deep learning networks.
However, it is not enough to depend on common factors and popular estimation models; more factors reflecting the relationship of the adsorption or occurrence among organic matter, clay minerals, and other soil parameters should be incorporated. Authors, e.g., [16][17][18][19], made contributions on the issue. The studies [6,8,17] analyzed the reflectance and adsorption mechanism of soil heavy metals based on spectral characteristics of multispectral or hyperspectral data. From the standpoint of the interaction or occurrence relationships, scholars [16,18,19] focused on the interaction between heavy metals and soil constituents such as organic matter, moisture content, and metallic oxides as important factors influencing the potential for soil, crop, and ground water pollution by heavy metals. Considering the interaction between heavy metal and soil constituents to indirectly deduce the content of the soil heavy metals is becoming a novel path to solve the problem.
The previous studies have made substantial contributions to the development of research on the retrieval of heavy metal content in topsoil from different viewpoints. Nevertheless, there is an issue that should be considered, that is, the generalization ability of statistical regression and machine learning models is weak for other similar scenes or the same scene at different times if there are limited training samples. So, the difficulties of reducing the data collection cost and updating the operating mode for soil heavy metal content retrieval should be solved. The transfer learning theory framework was introduced to solve the few-shot learning problem, which does not require large-scale training data and has low learning costs. It can reduce the data-collection cost by transferring training data from source to target. The theory has a very broad application in fields with limited data volume. In the field of soil contamination monitoring, it has high cost to collect a great number of training samples in complex terrain and land cover; also, training data for parameters of machine learning models are hardly available to the public, unlike in the field of target detection and classification. Therefore, it is of great advantage to introduce transfer learning theory framework in the application of soil heavy metal retrieval, which can reduce the data-collection cost by transferring sample data or prior information from one task to another task.
However, only limited studies, such as [20], have been investigated on the transfer ability of quantitative retrieval models from one scene to the others for soil heavy metal (Pb and Zn) pollution mapping. In previous work focusing on the quantitative inversion problems in soil contamination monitoring using the transfer learning framework, some of the existing problems are what can be transferred from source domain to target domain and how we can transfer data or knowledge from source domain to target domain under the conditions of insufficient training samples. One of the great difficulties is how we can avoid negative transfer of training samples from source domain to target domain while keeping high regression precision.
In this context, our study proposes an innovative quantitative retrieval method by combining a GA-BP neural network with a parameter transfer learning strategy in order to map the spatial distribution of soil heavy metal in the same area but in different periods, considering Daxigou siderite in Shaanxi in China as a case study. The purpose of this study is to evaluate soil pollution and harm to human health from heavy metal in the mining area and surroundings and to provide assistance for decision making for land degradation control, ecological environment protection, and restoration.

The Study Area
The Daxigou mining area is in Xiaoling Town, Zhashui County, Shangluo City, Shaanxi Province, China, with a designated area of 4.33 km 2 , as shown in Figure 1. It is the largest siderite operation in Shaanxi, accounting for 47.6% of the total iron ore reserves. In 1982, the Northwest Metallurgical Geological Exploration Company found that the Daxigou-Yindongzi deposit is rich in copper, lead, zinc, silver, etc., in and around the mining area [21]. Mining in the area officially began in 1988, and open-pit mining has mainly been used since 2007.
In the study, the mining area and its surroundings covering 39 km 2 areas, where the content of copper and lead is relatively high, were considered as our study case. There are mainly medium gullies and low gullies, with large elevation difference and a complex topography in the area, which belongs to the structural erosion landform. The main land use categories are mining area, cultivated land, forestland, grassland, industrial and mining facilities, and residential area. Mining activities have caused heavy metal pollution and ecological and geological environment damage since 1988 [22]. Therefore, it is of great necessity to investigate and regularly monitor the heavy metal pollution in the area.

Soil Sample Collection
According to the topography, geomorphic characteristics, and land use types in the study area, soil samples were obtained along the three main ridge lines considering their representativeness and the uniform distribution of the sample points. The sampling points were mainly distributed both in the middle of the hillsides where it was possible to reach and in sites close to the valley. The site distribution with a plum blossom shape was designed within a range of 30 m × 30 m from the sampling point center. The accumulation of heavy metals at the bottom of the slope usually was high due to scouring, where the sampling depth is approximately 20-30 cm, while the sampling depth on the middle of the slope was approximately 10-20 cm.
The soil within the 30 m × 30 m coverage was mixed equally and then 1kg of soil was removed and placed into the sample package. Simultaneously, WGS84 coordinates of the central point of the sampling area were recorded for each sample. In addition, the soil attributes and its environment observations, including the pressure, position, and the land use category in each site, were recorded.
According to the scheme above, 44 and 43 total soil samples were collected from the field by professional technicians in October 2017 and October 2019, respectively. The sampling site distribution and soil samples to be analyzed in the laboratory are shown in Figure 2

Soil Sample Collection
According to the topography, geomorphic characteristics, and land use types in the study area, soil samples were obtained along the three main ridge lines considering their representativeness and the uniform distribution of the sample points. The sampling points were mainly distributed both in the middle of the hillsides where it was possible to reach and in sites close to the valley. The site distribution with a plum blossom shape was designed within a range of 30 m × 30 m from the sampling point center. The accumulation of heavy metals at the bottom of the slope usually was high due to scouring, where the sampling depth is approximately 20-30 cm, while the sampling depth on the middle of the slope was approximately 10-20 cm.
The soil within the 30 m × 30 m coverage was mixed equally and then 1 kg of soil was removed and placed into the sample package. Simultaneously, WGS84 coordinates of the central point of the sampling area were recorded for each sample. In addition, the soil attributes and its environment observations, including the pressure, position, and the land use category in each site, were recorded.
According to the scheme above, 44 and 43 total soil samples were collected from the field by professional technicians in October 2017 and October 2019, respectively. The sampling site distribution and soil samples to be analyzed in the laboratory are shown in Figure 2a

Soil Sample Analysis and Preprocessing
Each sample collected in 2017 and in 2019 was crushed to remove the remains of animals and plants and then dried, followed by a screening operation with a nylon screen for laboratory analysis.
To determine the interesting heavy metals in the study area, a mixed sample was formed from 44 samples collected in 2017 and the content of eight popular heavy metals (Hg, As, Cd, Cu, Ni, Zn, Pb, and Cr) in soil pollution were analyzed using professional instrument and detection methods, determined by China National Environmental Monitoring Center, in a laboratory of environmental testing center of Guolian Quality Inspection Technology Co., Ltd. in Xi'an, China. Among them, Cu and Pb content were measured using flame atomic absorption spectrophotometry.
By comparing the detected values of each element with their reference values published by State Environmental Protection Administration of China [23], according to the degree to which the detected values exceeded the reference and combined with the enrichment degree of heavy metals in Daxigou-Yindongzi polymetallic ore deposit [21] and the total cost of soil samples to be analyzed in the laboratory, Cu and Pb were determined to be the elements of interest in this study.
The histogram analysis of Cu and Pb in all of the soil samples in 2017 and in 2019 was performed, respectively. From the histograms of the content of the two elements of interest, the content of the Cu and Pb had abnormal values, which would affect the accuracy of the estimation model; therefore, the maximum abnormal values were eliminated. Consequently, the effective number of samples of Cu and Pb was, respectively, 44 and 40.Finally, each element was examined in more detail subsequently based on the correlations between Cu and Pb from the least squares regression analysis.

Soil Sample Analysis and Preprocessing
Each sample collected in 2017 and in 2019 was crushed to remove the remains of animals and plants and then dried, followed by a screening operation with a nylon screen for laboratory analysis.
To determine the interesting heavy metals in the study area, a mixed sample was formed from 44 samples collected in 2017 and the content of eight popular heavy metals (Hg, As, Cd, Cu, Ni, Zn, Pb, and Cr) in soil pollution were analyzed using professional instrument and detection methods, determined by China National Environmental Monitoring Center, in a laboratory of environmental testing center of Guolian Quality Inspection Technology Co., Ltd. in Xi'an, China. Among them, Cu and Pb content were measured using flame atomic absorption spectrophotometry.
By comparing the detected values of each element with their reference values published by State Environmental Protection Administration of China [23], according to the degree to which the detected values exceeded the reference and combined with the enrichment degree of heavy metals in Daxigou-Yindongzi polymetallic ore deposit [21] and the total cost of soil samples to be analyzed in the laboratory, Cu and Pb were determined to be the elements of interest in this study.
The histogram analysis of Cu and Pb in all of the soil samples in 2017 and in 2019 was performed, respectively. From the histograms of the content of the two elements of interest, the content of the Cu and Pb had abnormal values, which would affect the accuracy of the estimation model; therefore, the maximum abnormal values were eliminated. Consequently, the effective number of samples of Cu and Pb was, respectively, 44 and 40. Finally, each element was examined in more detail subsequently based on the correlations between Cu and Pb from the least squares regression analysis.

Remote Sensing Data Preparation
Landsat8 images of the study area in 2017 and 2019 were collected from the U.S. Geological Survey (https://earthexplorer.usgs.gov/, accessed on 28 September 2019), and images with cloud interference were excluded. Because there is less vegetation interference from November to March every year, Landsat8 data from this period are more conducive to satellite observation of soil properties. More importantly, the images acquired during this period are closer to the collection time of the soil samples. Therefore, Landsat8 images acquired in December 2017 and in November 2019 were used and then preprocessed for atmosphere correction using the FLAASH module of the ENVI 5.0 software.
In addition, a 30 m digital elevation model (DEM) product was acquired from the geospatial data cloud website (http://www.gscloud.cn/, accessed on 4 January 2020), and then the slope and aspect data were derived using ArcGIS 10.0 software.

Spectral Factors
Previous studies, e.g., [24], have shown that the spectral curves of heavy-metalcontaminated soil and normal soil showed different spectral characteristics. The heavymetal-contaminated soil showed strong absorption characteristics in the spectrum range of 400-500 nm in Landsat8 satellite imagery, spectral reflectance showed an overall upward trend from 500 to 780 nm, reflectance showed a downward trend from 780 to 900 nm, and reflectance of polluted soil showed a rising trend from 1200 to 2500 nm. These results indicated that the above four spectrum ranges were diagnostic ranges to distinguish heavy-metal-contaminated soil from normal soil. In this paper, the reflectance of the B2-B7 bands displayed strong correlations with the Pb content of the soils, while those of the B2-B4 bands showed stronger correlations with the Cu content. Therefore, according to the geomorphic types of the study area, the spectrum reflectance of six bands on Landsat8 images B2-B7 was selected as the candidates.
Considering that the heavy metals are often mixed with other soil components and the content of heavy metals contained in soil is usually low, the characteristics of heavy metals in soil are also very weak, especially in satellite imagery; therefore, it is difficult to use the reflectivity or absorption spectrum characteristics of heavy metals to estimate the content of heavy metals in soil. However, the content of soil heavy metals can be indicated indirectly by the adsorption or occurrence relationship among water, clay minerals contained in soil, and environmental factors such as vegetation growth circumstances, topography, and the distance to pollution sources, as referred to by [6,19,25].
Based on the above analysis, eight spectral indices derived from the spectral reflectance of bands B2-B7 of the Landsat8 image acquired in 2017 and 2019 reflect the soil properties related to heavy metals. Specifically, the clay mineral ratio (CMR) [26] reflects the clay mineral content in soil, which indirectly can affect the distribution of heavy metals in soil. The improved normalized water index (MNDWI) [27] can strengthen the characteristics of soil moisture. In the vegetation coverage area, vegetation growth circumstances reflected by the normalized vegetation index (NDVI), differential vegetation index (DVI), and enhanced vegetation index (EVI) can indirectly reflect the type of soil and content of heavy metals in soil [28]. The greenness, brightness, and humidity components generated by the tasseled cap transformation can discriminate vegetation from soil information; the definition of each spectral index derived from Landsat8 imagery can be seen in Table 1.

Type Factors Definition
Spectral index

Terrain Factors
Previous studies, e.g., [29], have shown that auxiliary factors such as terrain have a great effect on the spatial distribution of heavy metals in soil. Considering that the study area has high mountains and medium mountains partly covered by vegetation, this study introduced three topography factors (altitude, slope, and aspect, as in Figure 3) to describe the spatial distribution of heavy metals in soil.  Previous studies, e.g., [29], have shown that auxiliary factors such as terrain have a great effect on the spatial distribution of heavy metals in soil. Considering that the study area has high mountains and medium mountains partly covered by vegetation, this study introduced three topography factors (altitude, slope, and aspect, as in Figure 3) to describe the spatial distribution of heavy metals in soil.  As shown in Figure 3, the altitude difference in the study area is large and the slope is steep, which makes the heavy metals in soil at the top of the mining area tend to migrate downward. The slope direction will affect the circumstances of vegetation growth and inhibit rain from washing away the heavy metals in soil towards the bottom of the mountain. These auxiliary environmental variables will have a certain impact on the spatial distribution of metals in soil.

Select the Optimal Factors for Each Metal of Interest
To select the optimal factors indicating the content of the two heavy metals, the correlation analysis of six spectral bands, eight spectral indices, and the three topography indicators were made using the least squares method. Subsequently, a collinearity test was performed. According to the detection criteria, the collinearity test between one of As shown in Figure 3, the altitude difference in the study area is large and the slope is steep, which makes the heavy metals in soil at the top of the mining area tend to migrate downward. The slope direction will affect the circumstances of vegetation growth and inhibit rain from washing away the heavy metals in soil towards the bottom of the mountain. These auxiliary environmental variables will have a certain impact on the spatial distribution of metals in soil.

Select the Optimal Factors for Each Metal of Interest
To select the optimal factors indicating the content of the two heavy metals, the correlation analysis of six spectral bands, eight spectral indices, and the three topography indicators were made using the least squares method. Subsequently, a collinearity test was performed. According to the detection criteria, the collinearity test between one of the factors and the others is weak if the value of the variance inflation factor (VIF) is less than 10, and the tested factor with high correlation is viewed as one of the optimal indicators. Exceptionally, three terrain factors showed low correlation coefficients; however, the result of a multivariant linear regression with a combination of some terrain factors with the chosen spectral reflectivity and spectral indices showed an improvement in decision coefficients R 2 and root mean square error (RMSE).
Therefore, aspect and altitude were added to the set of optimal factors for Cu and Pb. According to the analysis method above mentioned and the previous studies (e.g., [13], the set of optimal spectral factors of Cu and Pb was chosen as in 2017 (Table 2) and in 2019 (Table 3).  In addition, this study developed the least square regression analysis method to analyze the correlation among the two metals. The analysis results showed that the correlations between the two metals were greater; thus, the two heavy metals need to be analyzed and estimated separately in the study.

Construct a Pre-Trained GA-BP Model Using Samples in 2017
The quantitative retrieval tasks in remote sensing applications usually can be viewed as a statistical regression problem. Generally, the statistical learning or shallow machine learning regression models, such as PLSR, SVR, condition rule-based M5 model tree, extreme learning machine, and random forest, have shown the advantages of low training cost and better performance for a local region.
Compared with others, BP networks are popular for solving complex nonlinear regression problems. The network is characterized by signal forward transmission and an error back propagation structure. The network weights are dynamically adjusted with the estimation error by back propagation during gradient descent. However, the method that randomly initializes the weights and thresholds of the original BP network often leads to local optimization [30]. Although the distributionally robust optimization (DRO) algorithm [31] was proposed for different applications, e.g., network behavior analysis and risk management, the genetic algorithm (GA) is popularly used to seek global optimization for nonlinear problems.
In our previous work [14], the BP network optimized by GA was compared to multivariate linear regression model and M5 model tree for the retrieval of soil heavy metal in the study area in 2017. It has shown that our selected GA-BP approach perform well. Therefore, this study introduced GA to initialize the weights and thresholds of a three-layer BP network by the optimal individual selection to improve the accuracy and stability of the approximation.
The main steps of the GA-BP model are as follows:(1) determine the structure of the BP network; (2) initialize the GA population and train the BP network with training samples; (3) train the GA-BP network. The parameters of the GA-BP network are listed in Table 4. Thus, a GA-BP network was established as suitable for the estimation of the content of Cu and Pb in which 80% of the randomly selected soil samples acquired in 2017 were selected to train the weight parameters of the above GA-BP model.

Construct Our Tr-GA-BP Model for Retrieval of Heavy Metals in 2019
To reduce the soil sampling costs for heavy metals in 2019 and to avoid a negative transfer from source domain to target domain, which is adverse to improving the estimation precision of soil heavy metal content in the study area in 2019, the study proposed the Tr-GA-BP model using a parameter transfer learning strategy based on the pre-trained GA-BP network. The idea of the proposed Tr-GA-BP model is that the optimal individuals of the GA-BP network were transferred to the domain in 2019 through similarity analysis between the feature from source domain(referring to the study area in 2017) and the target domain(referring to the study area in 2019). Consequently, the parameters of the GA-BP model were retrained using a few samples collected in 2019.Here, the gradient descent method was used to optimize the parameters of the pre-trained GA-BP model in the process of the similarity analysis on the features between the source domain and the target domain. The description of our proposed Tr-GA-BP model was described as follows: Let denote a set of samples from the area in 2017, and X i S , Y i S represents the i-th sample (M is the number of samples from source domain). X i S ∈ R L represents the L-dimension feature vector defined as the optimal factors of the source domain sample and Y i S ∈ R is the one-dimensional vector representing the measured content of Cu and Pb contained in the soil samples acquired in 2017. Let W * S designate the optimal parameter matrix learned from the pre-trained GA-BP model, respectively. A similarity coefficient β is defined to measure the similarity of parameters between the source domain and the target domain. Let W T be a parameter matrix of the Tr-GA-BP network in the target domain. It can be defined as Equation (1): where the initial value β 0 of β is obtained using the grid search algorithm with the range of [0, 0.1, 1]. Then, fewer samples from the target domain were used to retrain the parameters of the pre-trained GA-BP network. Thus, the matrix W T was updated with the similarity coefficient β optimized using the gradient decent algorithm. Finally, the parameters β and W T of our Tr-GA-BP model could reach the optimum simultaneously.
The construction steps of the Tr-GA-BP model can be described as follows: (1) Obtain the optimal parameters matrix W * S (including weight and threshold parameters) using the pre-trained GA-BP model from the source domain.
(2) Set the initial value of similarity coefficient β as β 0 with a range of [0,0.1,1] using the grid search algorithm and initialize the parameter matrix W T of target domain as Retrain the GA-BP model using a few samples from target domain using Equation (1) to update the optimal similarity coefficient β * ; then, the optimal parameters matrix W * T is obtained. Thus, the Tr-GA-BP model can be formed based on the transfer learning idea for the retrieval of Cu and Pb content in 2019 in the study area.
According to the proposed Tr-GA-BP model, let the initial value of the similarity coefficient between the source domain and the target domain be β 0 = 0.1, and the optimal weight matrix W * T,Cu for Cu and W * T,Pb for Pb, respectively, are obtained using samples acquired in 2017 and 2019: The optimal threshold matrix T * T,Cu and T * T,Pb for Cu and Pb, respectively, is:

Implementation
As stated in the former section, we obtain a larger set of samples, D

Results and Discussion
The content of Cu and Pb at each site in the study area in 2019 was estimated using our Tr-GA-BP model mentioned above; the spatial distribution of the estimated content of both elements was mapped as shown in Figure 4.

Implementation
As stated in the former section, we obtain a larger set of samples, DS, for Cu and Pb in soil from the source domain in 2017 and a smaller set of samples, DT, from the target domain in 2019. Subsequently, the GA-BP model was pre-trained using 44 samples collected in the study area in 2017. Furthermore, the proposed Tr-GA-BP model was formed by retraining GA-BP model with only 16 samples from the target domain in 2019.The implementation of estimating soil heavy metal content in the study area in 2019 using our well-trained Tr-GA-BP model was performed under Windows using MATLAB programming language

Results and Discussion
The content of Cu and Pb at each site in the study area in 2019 was estimated using our Tr-GA-BP model mentioned above; the spatial distribution of the estimated content of both elements was mapped as shown in Figure 4.   As seen from Figure 4, the higher content of Cu and Pb in 2019 mainly was present in the mining area, slag stacking area, and on both sides of the road in the study area. According to the field survey, the ore is always transported from the mining area at the top of the slope to the road in the valley. Due to the accumulation of fallen ore, the metal content of the road is high. Therefore, the spatial distribution of the estimated results of Cu and Pb in the area using our proposed Tr-GA-BP model are consistent with the field validation and the result from our previous study using a different method [13].
To quantitatively verify the effectiveness of the proposed Tr-GA-BP model in this paper, the remaining 20% of samples was used to evaluate the estimation error for the content of Cu and Pb, taking RMSE and mean relative error (MRE) as measures, as in Table 5.  MRE values using the proposed Tr-GA-BP model were  2.804 and 0.521, respectively, which reduced by 1.586 and 1.232 compared to the GA-BP model. Thus, it was proved that the accuracy of our proposed Tr-GA-BP model based on transfer learning and prior information is effective in improving the precision of soil heavy metal content estimation in the case of fewer samples from target domain and is superior to that of the traditional GA-BP network for the estimation error of Cu and Pb content in topsoil.
To explore the degree of soil contaminated by Cu and Pb, a comparison of the content of Cu and Pb in 2019 with the reference value (i.e., the maximum and arithmetic mean values of soil element content) published by the Shaanxi Province in China is listed in Table 6. From Table 6, it was found that both the maximum and the arithmetic average of Cu content in 2019 estimated here are far greater than the corresponding reference value. For Pb, the estimated value is slightly greater than its corresponding reference value and the average is close to the reference value.
To further investigate the circumstance of the spatial distribution of both elements, a statistical analysis of the estimated content of Cu and Pb is listed as shown in Table 7. From Figure 4 and Tables 6 and 7, conclusions can be drawn: the estimated value of the Cu content changes in the range from 0 to 110 mg/kg and the Cu content ranges from 50 to 70 mg/kg, accounting for 80.1% of the total study area in 2019. The Pb content in the area ranges 10-50 mg/kg. The area with the content no more than 30 mg/kg accounted for 84.1% of the total area. In addition, the content of Cu is the highest; Pb is the second highest in the study area, which is consistent with the geochemical investigation mapping data [21]. Meanwhile, by comparing the two elements estimated in this study with the maximum and arithmetic mean values of the reference values of soil elements in Shaanxi Province, it is found that the content of the two elements in some parts of the study area exceed the average reference values of soil elements in Shaanxi Province, which indicates that the soil in some areas has been polluted by these two heavy metals since 1990.

Conclusions
Retrieving the content of topsoil heavy metals at a lower sample collection cost for environmental monitoring in a mining area while keeping high estimation precision is challenging. Considering the Daxigou mining area in Shaanxi located in the Qinling Mountains and covered by vegetation as a study case, this study introduced the transfer learning idea to innovatively construct a Tr-GA-BP network so as to implement the retrieval of the content of two interesting heavy metals, i.e., Cu and Pb, in soil in 2019 based on a pre-trained GA-BP network using Landsat8 multispectral satellite images, DEM, and geochemical data using more samples collected in 2017 and less samples in 2019.
Finally, the spatial distribution mapping and content change analysis were conducted using the proposed Tr-GA-BP network. From the validation results using field-collected data, the RMSE and MRE values using the proposed Tr-GA-BP model were, respectively, 9.078 mg/kg and 0.369 mg/kg for Cu, reduced by 4.688 mg/kg and 1.533 mg/kg compared to the GA-BP model. For Pb, the RMSE and MRE values using the proposed Tr-GA-BP model were 2.804 mg/kg and 0.521 mg/kg, respectively, which reduced by 1.586 mg/kg and 1.232 mg/kg compared to the GA-BP model. Thus, our proposed Tr-GA-BP model based on transfer learning and prior information performs well in improving the estimation precision of Cu and Pb content in soil under the condition of16training samples collected in 2019 and is superior to that of the traditional GA-BP network. In addition, the content of Cu is the highest; Pb is the second highest in the study area. Both of them were mainly distributed in the exploitation, slag stacking, on the roadsides, and at the base of slope, which is consistent with the field investigation results and our previous study result with different methods. This pollution has been endangering the soil, water, and the health of local residents.
The proposed method in this paper should show better performance if more soil samples are collected. In the future, a transfer learning strategy should be optimized and terrain illumination and shadow effects in mountainous areas should be considered so as to further improve the estimation accuracy of the heavy metal content in soil.

Institutional Review Board Statement:
The study was conducted in accordance with the Declaration of Helsinki, and approved by the Institutional Review Board.
Informed Consent Statement: Informed consent was obtained from all subjects involved in the study. Data Availability Statement: Data is unavailable due to privacy.