Multi-Scale Stereoscopic Hyperspectral Remote Sensing Estimation of Heavy Metal Contamination in Wheat Soil over a Large Area of Farmland

: With the rapid development of China’s industrialization and urbanization, the problem of heavy metal pollution in soil has become increasingly prominent, seriously threatening the safety of the ecosystem and human health. The development of hyperspectral remote sensing technology provides the possibility to achieve the rapid and non-destructive monitoring of soil heavy metal contents. This study aimed to fully explore the potential of ground and satellite image spectra in estimating soil heavy metal contents. We chose Xushe Town, Yixing City, Jiangsu Province as the research area, collected soil samples from farmland over two different periods, and measured the contents of the heavy metals Cd and As in the laboratory. At the same time, under ﬁeld conditions, we also measured the spectra of wheat leaves and obtained HuanJing-1A HyperSpectral Imager (HJ-1A HSI) satellite image data. We ﬁrst performed various spectral transformation pre-processing techniques on the leaf and image spectral data. Then, we used genetic algorithm (GA) optimized partial least squares regression (PLSR) to establish an estimation model of the soil heavy metal Cd and As contents, while evaluating the accuracy of the model. Finally, we obtained the best ground and satellite remote sensing estimation models and drew spatial distribution maps of the soil Cd and As contents in the study area. The results showed the following: (1) spectral pre-processing techniques can highlight some hidden information in the spectra, including mathematical transformations such as differentiation; (2) in ground and satellite spectral modeling, the GA-PLSR model has higher accuracy than PLSR, and using a GA for spectral band selection can improve the model’s accuracy and stability; (3) wheat leaf spectra provide a good ability to estimate soil Cd (relative percent difference (RPD) = 2.72) and excellent ability to estimate soil As (RPD = 3.25); HJ-1A HSI image spectra only provide the possibility of distinguishing high and low values of soil Cd and As (RPD = 1.87, RPD = 1.91). Therefore, it is possible to indirectly estimate soil heavy metal Cd and As contents using wheat leaf hyperspectral data, and HJ-1A HSI image spectra can also identify areas of key pollution.


Introduction
Soil is the loose, fertile top layer on the earth's surface that supplies water and nutrients to plants.It is not only the foundation for crop growth but also a natural resource that is essential for human production and life [1,2].However, with the rapid development of industrialization and urbanization in China, soil environmental pollution has become increasingly prominent [3].Among them, soil heavy metal pollution has the characteristics of slow migration, strong toxicity, and irreversibility [4].As it gradually accumulates in the food chain, it can eventually cause serious health problems when ingested and Agronomy 2023, 13, 2396 2 of 18 accumulated by humans [5,6].Therefore, effective quantitative monitoring of the heavy metal content can help understand the degree and sources of pollution and provide a theoretical basis for the remediation and management of heavy metal pollution [7][8][9].The traditional method of detecting soil heavy metal contents via chemical analyses is time-consuming, labor-intensive, and costly, although it has high accuracy.It is difficult to meet the needs of large-scale, fast, real-time, and continuous monitoring of soil heavy metal contents.With the development of remote sensing technology, hyperspectral remote sensing has rapidly developed due to its high spectral resolution and ability to obtain detailed spectral information of objects [10].Therefore, it provides the possibility for the rapid, non-destructive monitoring of soil heavy metal contents [11,12].
After many years of research and development, our team has successfully applied the source-sink theory and independently developed 5S technology (RS, GIS, GPS, ES, IDSS) [13] for the study of soil heavy metal pollution.We also systematically sorted out the current status, problems, and prospects of soil heavy metal pollution monitoring and ecological remediation [11,[14][15][16][17] and made significant progress.However, in the process of monitoring and analyzing soil heavy metal contamination, we also found some problems, such as low accuracy, difficulties in validation, and failure to form a system [18].Therefore, we need to fully combine the data of the earth and sky to establish an effective integrated monitoring network and verification system for soil heavy metal pollution.
The use of hyperspectral remote sensing for monitoring soil heavy metal pollution is mainly divided into direct monitoring and indirect monitoring.The mechanism of direct monitoring is that heavy metal elements in the soil can be adsorbed by soil components such as organic matter, clay minerals, iron-manganese oxides, and carbonate minerals.These soil components will affect the spectral morphology and reflectance of the soil and also show specific spectral absorption features in the soil spectrum [15].Currently, scholars have conducted more research in direct monitoring.For example, Zhang et al. [19] successfully constructed estimation models for five soil heavy metals (Cd, Cr, Pb, Cu, and Zn) using partial least squares regression (PLSR) models, and the estimation R 2 was above 0.7.In addition, Qian et al. [20] successfully constructed quantitative estimation models for eight soil heavy metals (Cd, As, Cr, Pb, Ni, Hg, Cu, and Zn) in Zhangjiagang farmland soil in Suzhou, China, with estimation R 2 values all above 0.5.Pyo et al. [21] successfully constructed three high-spectral quantitative estimation models for the heavy metal contents (As, Pb, and Cu) in the soil of a mining area in the Geum River Basin, South Korea.The estimation R 2 values of these models were all above 0.7.Lin et al. [22] studied the soil in the black soil area of northeast China and constructed models for three heavy metal elements (Cu, Zn, and Mn), with an estimation R 2 for all models above 0.6.Zhou et al. [23] studied the soil in the Sanjiangyuan area of China and constructed six high-spectral monitoring models for soil heavy metals (Cr, Pb, Ni, Cu, Zn, and Mn).They found that the use of spectral transformation methods can improve the correlation between spectra and heavy metals, thereby improving the model accuracy.
Although the direct monitoring method can obtain high model accuracy and the constructed model is usually stable, this method has some limitations.On the one hand, soil samples need to be collected in the field and then processed in the laboratory to determine the soil hyperspectral data, which is a cumbersome and time-consuming process.On the other hand, there is still a gap between laboratory soil spectra and remote sensing images, which makes it difficult to directly apply it to aerospace images for monitoring soil heavy metal contamination [24,25].Therefore, using the hyperspectral data of vegetation canopies or leaves to indirectly estimate soil heavy metal contents is more convenient and conducive to promotion and application.The main mechanism of this method is that heavy metals migrate from the soil to crops and accumulate.Under heavy metal stress, the chlorophyll and protein contents of crops will be affected to a certain extent, which will be reflected in the differences in reflectance spectra [14].At present, some progress has been made in the related research.For example, Zhou et al. [26] successfully constructed an estimation model for CaCl 2 -extractable soil heavy metal Cd using rice leaf hyperspectral data.Shi et al. [27] established a multivariate hyperspectral vegetation index using the canopy spectra of rice at the heading and flowering stages to estimate the As content in farmland soil.Wang et al. [28] found that the Cu content in the wheat canopy increased with the increase in Cu content in the soil under different soil Cu concentration stress levels.Meanwhile, there were differences in the spectral reflectance of the wheat canopy, which provided a basis for the indirect estimation of the Cu content in the soil.
With the gradual maturity of aerospace hyperspectral remote sensing technology, it can continuously obtain rich spectral information of the earth's surface at the regional scale.This provides effective data for the real-time, macroscopic, non-destructive, and stereoscopic monitoring of soil heavy metal pollution.Scholars have also begun to explore the potential of using aerospace hyperspectral data to monitor soil heavy metal contents [29,30].Tan et al. [31] obtained airborne hyperspectral data of a coal mining area and established estimation models for the soil heavy metals Cr, Cu, and Pb using a random forest model with R 2 values of 0.75, 0.68, and 0.74, respectively.Yin et al. [32] used GF-5 hyperspectral data to draw a spatial distribution map of soil heavy metal Cu contents, which can reflect regions with higher levels of Cu pollution more accurately.Zhang et al. [33] used GF-5 hyperspectral data from an open-pit coal mining area to draw spatial distribution maps of the soil heavy metal contents of Zn, Ni, and Cu, with estimation R 2 values of 0.77, 0.62, and 0.56, respectively.Wang et al. [34] used airborne HyMap hyperspectral images to map the spatial distribution of the soil heavy metals Ni, Zn, and Pb in agricultural areas, and simulated three types of aerospace satellite hyperspectral models.
Our team has carried out a series of research studies on heavy metal pollution in farmland soil.(1) Through pot experiments, we studied the migration and transformation laws of heavy metals in the soil-crop system.We also analyzed the effects of heavy metal stress on the crop height, biomass, chlorophyll, and nitrogen content [35].(2) In Zhangjiagang City, Jiangsu Province, we established hyperspectral quantitative estimation models for soil and crop heavy metals.We also analyzed the spatiotemporal variation of heavy metals in Zhangjiagang City, identifying the sources and pollution flow directions of heavy metal pollution [36].(3) In this study, we transferred the research area from Zhangjiagang City to Xushe Town, Yixing City, Jiangsu Province, aiming to fully explore the potential of ground spectra and satellite image spectra in estimating soil heavy metals.We collected farmland soil samples in April and May 2020, measured wheat leaf spectra in April, and obtained HuanJing-1A HyperSpectral Imager (HJ-1A HSI) satellite image data in early June.Then, we performed various spectral transformation processes on the leaf spectra and image spectra data and used a genetic algorithm (GA) to screen the spectral feature bands.Finally, we used PLSR to establish estimation models for soil heavy metal cadmium (Cd) and arsenic (As) contents with different pre-processed spectra and evaluated the accuracy of the models.In the end, we obtained the best ground and satellite remote sensing estimation models and drew spatial distribution maps of soil Cd and As contents in the study area.

Study Area
The research area is located in Xushe Town, west of Yixing City, Jiangsu Province (Figure 1), with coordinates of 31  2 .The area has a subtropical monsoon climate with distinct seasons, abundant rainfall, and average annual temperature and precipitation of 16.0 • C and 1434.0 mm, respectively.The terrain is higher in the west and lower in the east, mainly consisting of plains and hills.Xushe Town is the largest agricultural town in Yixing City, with a cultivated land area of 1.2 × 10 4 hm 2 , mainly growing rice and wheat.The main soil types are paddy soil, yellow-brown soil, and fluvo-aquic soil.
consisting of plains and hills.Xushe Town is the largest agricultural town in Yixing City, with a cultivated land area of 1.2 × 10 4 hm 2 , mainly growing rice and wheat.The main soil types are paddy soil, yellow-brown soil, and fluvo-aquic soil.

Sample Collection and Data Determination
Twenty-two sampling sites were evenly set in the farmland area of Xushe Town (Figure 1).In April and May 2019, we used the five-point mixing method to collect surface soil samples (0-20 cm) from these 22 sampling sites.During sampling, we used a handheld GPS device to obtain and record the location of each sampling point and sealed the soil samples in plastic bags to bring back to the laboratory.In the laboratory, we air-dried and removed small stones and plant residues from all soil samples at 60 °C.Afterwards, the soil samples were ground and sieved through a 100-mesh sieve (aperture of 0.15 mm) and evenly divided into two parts.One part of the soil samples was used to measure the pH value using the potentiometric method (NY/T 1377-2007) [37] and the other part was used to measure the contents of Cd and As via inductively coupled plasma mass spectrometry (ICP-MS) [38].
Over the same time period of soil sampling, from 11:00 to 14:00 Beijing time in April 2019, we used a portable field spectrometer (UniSpec, PP systems, Haverhill, MA, USA) to collect spectral data of wheat leaves.The spectral range of the spectrometer was 301-1145 nm, and the spectral resolution was 3.3 nm.Under sunny and mild wind conditions, five wheat plants were randomly selected at each sampling point, and three fully expanded leaves were chosen from each plant for spectral measurement.Before each spectral measurement, white calibration was performed and five measurements were repeated.Seventy-five spectral data points of leaves were obtained for each sampling point, and the average value was calculated to obtain the spectral data of wheat leaves at the sampling point.

HJ-1A HSI Data Acquisition and Processing
HJ-1A is China's first environmental satellite, launched in September 2008.The satellite carries a hyperspectral imaging instrument (HSI), with a revisit cycle of 4 days, imaging swath of 50 km, spectral resolution of 100 m, and 115 spectral bands in the spectral range of 459 nm to 956 nm.In this study, an HJ-1A HSI image from 3 June 2019 were obtained from the China Center for Resources Satellite Data and Application (https://data.cresda.cn/(accessed on 3 June 2019)), which was the beginning of the crop

Sample Collection and Data Determination
Twenty-two sampling sites were evenly set in the farmland area of Xushe Town (Figure 1).In April and May 2019, we used the five-point mixing method to collect surface soil samples (0-20 cm) from these 22 sampling sites.During sampling, we used a handheld GPS device to obtain and record the location of each sampling point and sealed the soil samples in plastic bags to bring back to the laboratory.In the laboratory, we air-dried and removed small stones and plant residues from all soil samples at 60 • C. Afterwards, the soil samples were ground and sieved through a 100-mesh sieve (aperture of 0.15 mm) and evenly divided into two parts.One part of the soil samples was used to measure the pH value using the potentiometric method (NY/T 1377-2007) [37] and the other part was used to measure the contents of Cd and As via inductively coupled plasma mass spectrometry (ICP-MS) [38].
Over the same time period of soil sampling, from 11:00 to 14:00 Beijing time in April 2019, we used a portable field spectrometer (UniSpec, PP systems, Haverhill, MA, USA) to collect spectral data of wheat leaves.The spectral range of the spectrometer was 301-1145 nm, and the spectral resolution was 3.3 nm.Under sunny and mild wind conditions, five wheat plants were randomly selected at each sampling point, and three fully expanded leaves were chosen from each plant for spectral measurement.Before each spectral measurement, white calibration was performed and five measurements were repeated.Seventy-five spectral data points of leaves were obtained for each sampling point, and the average value was calculated to obtain the spectral data of wheat leaves at the sampling point.

HJ-1A HSI Data Acquisition and Processing
HJ-1A is China's first environmental satellite, launched in September 2008.The satellite carries a hyperspectral imaging instrument (HSI), with a revisit cycle of 4 days, imaging swath of 50 km, spectral resolution of 100 m, and 115 spectral bands in the spectral range of 459 nm to 956 nm.In this study, an HJ-1A HSI image from 3 June 2019 were obtained from the China Center for Resources Satellite Data and Application (https://data.cresda.cn/(accessed on 3 June 2019)), which was the beginning of the crop growth period in the study area.The image was processed with radiometric correction, strip removal, atmospheric correction, and geometric precision correction for a subsequent analysis and modeling.

Spectral Pre-Processing
Firstly, in this process, the noisy edge bands before 380 nm are removed from the wheat leaf spectral data, and each sampling point selects the bands ranging from 380 to 1145 nm for the spectral data processing and analysis.At the same time, the corresponding HJ-1A HSI image spectral data of the sampling point are extracted, the noisy edge bands before 480 nm are removed, and each sampling point selects the bands ranging from 480 to 950 nm for the spectral data processing and analysis.Then, a Savitzky-Golay (SG) smoothing processing is applied to the leaf spectral data and HJ-1A HSI image data.The smoothed spectral data are referred to as the raw spectrum R. Finally, based on the SG smoothing processing, seven mathematical transformation methods, including first derivative (FD), second derivative (SD), absorbance transformation (AT), first derivative of absorbance (AFD), second derivative of absorbance (ASD), multiplicative scatter correction (MSC), and standard normal variate (SNV), are applied to pre-process the leaf spectrum R [39,40].At the same time, five mathematical transformation methods, including FD, SD, AT, AFD, and ASD, are used to pre-process the image spectral R.

Genetic Algorithm
A GA is an evolutionary algorithm for optimization problems.It imitates the mechanisms of natural selection and genetics; evaluates individuals with a better fitness function; and selects, crosses, and mutates them using genetic operators to produce individuals in the new generation population.This process is repeated iteratively until the convergence criteria are met to find the optimal solution [41].At the same time, GAs can avoid the over-fitting problem caused by general iterative methods falling into local minima.
Before modeling spectral data, a GA is used to select feature bands to reduce band redundancy and optimize the model performance [42].The GA considers each band as a gene and encodes a certain number of bands as chromosomes.Then, a portion of the samples are extracted to set the initial population.Crossing and mutation simulate the genetic and evolutionary process of random populations in nature, and the fitness function is used to evaluate the predictive ability of the model.After testing, we set the GA parameters to a population size of 40, a crossover probability of 0.5, a mutation probability of 0.01, and a genetic generation of 100.We also ran the process 10 times to reduce the impact of randomness.We used the cross-validation root mean squared error (RMSE cv ) of the PLSR as the fitness criterion.The higher the fitness of the individual, the lower the RMSE cv .

Partial Least Squares Regression
PLSR is one of the most commonly used hyperspectral data processing methods for estimating soil heavy metal contents.This method projects the independent and dependent variables onto a new coordinate system, extracting the independent variable with the strongest explanatory variable as the principal component and constructing a new linear model.This can reduce the impacts of collinearity and noise and improve the robustness of the model [43].In the PLSR modeling process, cross-validation is used to determine the optimal number of principal components.The criteria for an optimal model are fewer principal components, a high coefficient of determination, and a low cross-validation root mean square error.
The GA was used to select feature bands for different pre-processed spectra, and PLSR is used to model and analyze the heavy metal contents.The 22 data samples were divided into two parts, with one selected from every four samples for validation.A total of 17 samples were used for the modeling and analysis, and five samples were used for model accuracy validation.

Model Assessment
This study used the cross-validation coefficient of determination (R 2 cv ) and root mean squared error of the cross-validation (RMSE cv ) for model cross-validation.The external validation coefficient of determination (R 2 ev ), root mean squared error of the external validation (RMSE ev ), and relative percent difference (RPD) were selected for model external validation.The closer R 2 cv and R 2 ev are to 1, the better the model fitting and stability.The lower the RMSE cv and RMSE ev , the higher the model accuracy.The evaluation criteria for the RPD adopt the five-layer interpretation method proposed by Williams et al. [44].When RPD > 3.00, this indicates that the model has excellent estimation ability.When the RPD is within the range of 2.50 to 3.00, this indicates that the model has good estimation ability.When the RPD is within the range of 2.00 to 2.50, this indicates that the model can be approximately quantitatively estimated.When the RPD is within the range of 1.50 to 2.00, this indicates that the model has the possibility of distinguishing high and low values.When the RPD < 1.50, this indicates that the model has poor estimation ability.

Data Analysis Tools
The data sorting and statistical analysis of the experiment were completed in Excel 2016 software.The GA band screening and PLSR modeling for spectral data processing were carried out in R-Studio 3.5.3(https://posit.co/products/open-source/rstudio/(accessed on 18 March 2023)).The spectral feature distribution map and estimation accuracy scatter plot were plotted using Origin 2022 software.The spatial distribution map of the soil Cd and As contents was completed in ArcGIS 10.2 (ESRI Inc., Redlands, CA, USA).

Ground Preparation Experiments
After the previous potted plant experiment [35], we analyzed the migration and transformation of heavy metals in the soil-crop system and studied the effects of different soil heavy metal contents on the plant height, biomass, and chlorophyll and nitrogen contents of wheat.The results showed that the enrichment of heavy metals in wheat was in the order of root > leaf > stem > seed.At the same time, heavy metals have a certain impact on the plant height of wheat.With the increase in soil heavy metal concentration, the above-ground biomass, chlorophyll content, and nitrogen content of the wheat all decrease to a certain extent.This indicates that heavy metal pollution in farmland soil will cause certain harm to above-ground crops.With the increase in soil heavy metal pollution, this harm will also intensify.This is mainly because heavy metals inhibit the photosynthesis and nutrient absorption of crops, thereby affecting their growth and development, resulting in reduced crop yields and quality.This has great impacts on food security and human health, and also provides a basis for the indirect monitoring of soil heavy metal pollution in crop spectra.
Through the ground experiment in Zhangjiagang [36], we collected soil and crop samples and measured their heavy metal contents and hyperspectral data.We performed different forms of spectral transformations on the spectral reflectance of soil and crop samples, and selected sensitive bands for 8 heavy metals in soil and 3 heavy metals in crops.Finally, we constructed the best hyperspectral direct inversion model for 8 soil heavy metals and 3 crop heavy metals in Zhangjiagang.

Sky-Air-Ground Integrated Source-Sink Theory
Through the study of previous pot experiments [35], we identified the migration and enrichment characteristics of heavy metals in the soil-crop system.Through ground experiments [36], we determined the hyperspectral sensitive bands and the best direct inversion models for soil and crop heavy metals.Although direct monitoring can achieve higher model accuracy, it is difficult to apply it directly for large-scale soil heavy metal pollution monitoring using aerospace images [18].Therefore, we introduce the sky-ground integra-tion source-sink theory and explore the potential of using multiscale crop hyperspectral data to indirectly estimate soil heavy metal contents.
In the source-sink theory, the "source" refers to the source of a process and the "sink" refers to the place where a process disappears [45].In the study of soil heavy metal pollution, the "source" refers to the source of soil environmental pollution and the "sink" refers to the area or ecosystem that absorbs pollutants [46].For the entire region, the formation of soil heavy metal pollution is a process moving from "source" to "sink".Its sources mainly include natural sources, industrial sources, living sources, traffic sources, and agricultural sources.Different sources of pollution enter the soil through different pathways, such as soil-forming matrices, atmospheric deposition, irrigation and runoff, solid waste, and composting.They accumulate and eventually form heavy metal contamination.
In the soil-crop system, heavy metals migrate from the soil and accumulate in crops.The soil becomes the source of pollution, while the leaves, stems, and grains of the crops are the places where the pollution is collected.When the stress of heavy metals on crops becomes heavier, this can indirectly reflect the degree of heavy metal pollution in the soil.Therefore, the indirect monitoring method for hyperspectral remote sensing is to explore the process of pollution from "sink" to "source" in the soil-crop system.Combining the multiscale remote sensing monitoring network of sky-air-ground integration with the source-sink theory can help effectively monitor and verify soil heavy metal pollution in real time.

Statistics of Soil Samples
The pH and soil heavy metal Cd and As contents of the sampling sites in the study area are shown in Table 1.When sampled in April, the soil pH values ranged from 4.03 to 8.54, with an average of 6.14, and 77% of the soil samples were acidic.The Cd contents ranged from 0.06 to 0.91 mg kg −1 , with an average of 0.29 mg kg −1 .The As contents ranged from 0.96 to 9.81 mg kg −1 , with an average of 5.18 mg kg −1 .The variation coefficients of the Cd and As were 79% and 34%, respectively.When sampled in May, the soil pH values ranged from 4.67 to 7.54, with an average of 5.92, and 73% of the soil samples were acidic.The Cd contents ranged from 0.07 to 0.82 mg kg −1 , with an average of 0.31 mg kg −1 .The As contents ranged from 2.84 to 8.37 mg kg −1 , with an average of 5.24 mg kg −1 .The variation coefficients of the Cd and As were 58% and 23%, respectively.

Spectral Curve Characteristics
The spectra of the wheat leaves after different pre-processing methods are shown in Figure 2. From the raw spectrum R (Figure 2a), there is a reflection peak in the green light band at 550 nm and absorption valleys in the blue-violet and red light bands near 450 nm and 670 nm, respectively.Meanwhile, the high reflectance is in the near-infrared range from 760 nm to 1120 nm.Compared with R, the FD spectrum (Figure 2b) has an absorption valley at 1129 nm and reflection peaks at 519 nm and 705 nm.The SD spectrum (Figure 2c) has an absorption valley at 1119 nm and reflection peaks at 509 nm and 776 nm.The AT spectrum (Figure 2d) has an absorption valley at 550 nm and reflection peaks at 669 nm.The AFD spectra (Figure 2e) have absorption valleys at 516 nm and 696 nm and reflection peaks at 572 nm.The ASD spectra (Figure 2f) have absorption valleys at 506 nm and 686 nm.The ASD spectra (Figure 2g) have absorption valleys at 506 nm and 686 nm and reflection peaks at 530 nm, 556 nm, 712 nm, and 1119 nm.The MSC spectra broaden the differences in the band range of 380 nm to 700 nm.The trend of the SNV spectral curve (Figure 2h) is the same as that of R but the spectral curve is denser, indirectly indicating that the SNV spectral transformation can reduce the background noise to a certain extent.
band at 550 nm and absorption valleys in the blue-violet and red light bands near 450 nm and 670 nm, respectively.Meanwhile, the high reflectance is in the near-infrared range from 760 nm to 1120 nm.Compared with R, the FD spectrum (Figure 2b) has an absorption valley at 1129 nm and reflection peaks at 519 nm and 705 nm.The SD spectrum (Figure 2c) has an absorption valley at 1119 nm and reflection peaks at 509 nm and 776 nm.The AT spectrum (Figure 2d) has an absorption valley at 550 nm and reflection peaks at 669 nm.The AFD spectra (Figure 2e) have absorption valleys at 516 nm and 696 nm and reflection peaks at 572 nm.The ASD spectra (Figure 2f) have absorption valleys at 506 nm and 686 nm.The ASD spectra (Figure 2g) have absorption valleys at 506 nm and 686 nm and reflection peaks at 530 nm, 556 nm, 712 nm, and 1119 nm.The MSC spectra broaden the differences in the band range of 380 nm to 700 nm.The trend of the SNV spectral curve (Figure 2h) is the same as that of R but the spectral curve is denser, indirectly indicating that the SNV spectral transformation can reduce the background noise to a certain extent.The HJ-1A HSI spectra after applying different pre-processing techniques are shown in Figure 3.The raw spectrum R (Figure 3a) has a reflection peak in the green light band around 550 nm and an absorption valley in the red light band around 660 nm.Meanwhile, the high reflectance is in the range of 730 nm to 920 nm in the near-infrared band.Compared with the R, the FD spectrum (Figure 3b) has a reflection peak near 690 nm.The SD spectrum (Figure 3c) has a reflection peak near 680 nm.The AT spectrum (Figure 3d) has a reflection peak at 673 nm, and the AT spectral curve is more dense compared with that of the R, which indicates that the AT transformation can reduce the background noise to a certain extent.The AFD spectrum (Figure 3e) has an absorption valley near 690 nm.The ASD spectrum (Figure 3f) has an absorption valley near 680 nm, and there are several reflection peaks.The HJ-1A HSI spectra after applying different pre-processing techniques are shown in Figure 3.The raw spectrum R (Figure 3a) has a reflection peak in the green light band around 550 nm and an absorption valley in the red light band around 660 nm.Meanwhile, the high reflectance is in the range of 730 nm to 920 nm in the near-infrared band.Compared with the R, the FD spectrum (Figure 3b) has a reflection peak near 690 nm.The SD spectrum (Figure 3c) has a reflection peak near 680 nm.The AT spectrum (Figure 3d) has a reflection peak at 673 nm, and the AT spectral curve is more dense compared with that of the R, which indicates that the AT transformation can reduce the background noise to a certain extent.The AFD spectrum (Figure 3e) has an absorption valley near 690 nm.The ASD spectrum (Figure 3f) has an absorption valley near 680 nm, and there are several reflection peaks.

Spectral Characteristic Bands
For wheat leaf spectra, the characteristic bands of the soil Cd and As were screened using the GA, as shown in Figure 4a,b.Under different spectral pre-processing methods, the GA screened 17-25 and 16-30 characteristic bands of soil Cd and As from 230 bands, respectively.Among them, the characteristic bands of soil Cd were mainly concentrated at 385-415 nm, 650-970 nm, and 1030-1090 nm.The characteristic bands of soil As were mainly concentrated at 380-400 nm, 420-430 nm, 500-570 nm, 715-800 nm, and 1000-1050 nm.

Spectral Characteristic Bands
For wheat leaf spectra, the characteristic bands of the soil Cd and As were screened using the GA, as shown in Figure 4a,b.Under different spectral pre-processing methods, the GA screened 17-25 and 16-30 characteristic bands of soil Cd and As from 230 bands, respectively.Among them, the characteristic bands of soil Cd were mainly concentrated at 385-415 nm, 650-970 nm, and 1030-1090 nm.The characteristic bands of soil As were mainly concentrated at 380-400 nm, 420-430 nm, 500-570 nm, 715-800 nm, and 1000-1050 nm.

Spectral Characteristic Bands
For wheat leaf spectra, the characteristic bands of the soil Cd and As were screened using the GA, as shown in Figure 4a,b.Under different spectral pre-processing methods, the GA screened 17-25 and 16-30 characteristic bands of soil Cd and As from 230 bands, respectively.Among them, the characteristic bands of soil Cd were mainly concentrated at 385-415 nm, 650-970 nm, and 1030-1090 nm.The characteristic bands of soil As were mainly concentrated at 380-400 nm, 420-430 nm, 500-570 nm, 715-800 nm, and 1000-1050 nm.For the HJ-1A HSI spectra, the characteristic bands of soil Cd and As were screened using the GA, as shown in Figure 4c,d.Under different spectral pre-processing methods, the GA screened 14-25 and 22-29 characteristic bands of soil Cd and As from 105 bands, respectively.The characteristic bands of soil Cd were mainly concentrated at 490-505 nm, 545-555 nm, 565-580 nm, 635-680 nm, and 745-790 nm; the characteristic bands of soil As were mainly concentrated at 480-500 nm, 510-520 nm, 550-600 nm, 610-620 nm, 635-710 nm, 730-810 nm, and 900-930 nm.

Comparison of GA-PLSR and PLSR Modeling Results
For the wheat leaf spectra, the results of the cross-validation of the GA-PLSR and PLSR modeling are shown in Table 2. Compared to the PLSR model, the GA-PLSR model increased the R 2 cv values by 5.77% to 45.45% and decreased the RMSE cv values by 3.45% to 34.15% when estimating the soil Cd content.Meanwhile, the GA-PLSR model increased the R 2 cv values by 7.14% to 100.00% and decreased the RMSE cv values by 3.96% to 69.29% when estimating the soil As content.The results show that before using the wheat leaf spectra to build the model for estimating soil heavy metal contents, using the GA for spectral wavelength selection could improve the model's accuracy and stability.PC, R 2 cv , and RMSE cv are the principal component, decision coefficient of cross-validation, and root mean squared error of cross-validation, respectively.R is the spectrum after Savitzky-Golay smoothing.FD, SD, AT, AFD, ASD, MSC, and SNV are the spectra obtained by using the first-order derivative, second-order derivative, absorbance transformation, first-order derivative of absorbance, second-order derivative of absorbance, multiplicative scatter correction, and standard normal variate transformation on the basis of Savitzky-Golay smoothing, respectively.

Best Estimate Model
The results for the cross-validation and external validation of the GA-PLSR model are shown in Table 3.The results for the soil Cd content show that the accuracy of all seven transformed spectra was improved compared to R. Among them, the R, FD, SD, AT, ASD, and MSC spectral pre-processing RPDs were between 1.50 and 2.00, which have the potential to distinguish between high and low soil Cd content values.The RPD of the AFD spectral pre-processing was 2.10, which has the ability to approximately quantify the soil Cd content.The SNV spectral pre-processing method had the highest model accuracy (RPD of 2.72), which indicates a good ability to estimate soil Cd contents.
The results for the soil As content showed that the accuracy of the six transformed spectra, except for the SD spectral transform, was improved compared to the R.Among them, the R, FD, SD, and ASD spectral pre-processing RPDs were between 1.50 and 2.00, which have the potential to distinguish between high and low soil As content values.The RPDs of the AT and SNV spectral pre-processing techniques were both greater than 2, which have the ability to approximately estimate soil As contents.The RPD of the AFD spectral pre-processing reached 2.56, which has good ability to estimate soil As contents.
The MSC spectral pre-processing method had the highest model accuracy (RPD of 3.25), which indicates an excellent ability to estimate soil As contents.For the HJ-1A HSI image spectra, the results of the cross-validation of the GA-PLSR and PLSR modeling are shown in Table 4. Compared to the PLSR model, the GA-PLSR model increased the R 2 cv values by 5.26% to 33.33% and decreased the RMSE cv values by 0.00% to 23.53% when estimating soil Cd contents.Meanwhile, the GA-PLSR model increased the R 2 cv values by 2.33% to 50.00% and decreased the RMSE cv values by 0.83% to 30.97% when estimating soil As contents.The results showed that before using HJ-1A HSI image spectra to build the model for estimating soil heavy metal contents, using the GA for spectral wavelength selection could improve the model's accuracy and stability.The results of the cross-validation and external validation of the GA-PLSR model are shown in Table 5.The results for the soil Cd contents showed that the accuracy of all five transformed spectra was improved compared to R. Among them, the R, SD, ASD, and MSC spectral pre-processing techniques had an RPD of less than 1.50, indicating poor ability to estimate soil Cd contents.The RPD of the AT and AFD spectral pre-processing techniques was between 1.50 and 2.00, indicating the possibility of distinguishing between high and low soil Cd contents.Among them, the model accuracy of the AT spectral was the highest, with an RPD of 1.87.The results for the soil As contents showed that the accuracy of all five transformed spectra was improved compared to R. Among them, the R, FD, SD, and AT spectral preprocessing techniques had an RPD of less than 1.50, indicating poor ability to estimate soil As contents.The RPD of the AFD and ASD spectral pre-processing techniques was between 1.50 and 2.00, indicating the possibility of distinguishing between high and low soil As contents.Among them, the model accuracy of the AFD spectral pre-processing was the highest, with an RPD of 1.91.

Spatial Estimation Results and Validation
The HJ-1A HSI image spectra of the study area were transformed via the AT and AFD, respectively.Then, the estimation results were obtained by using the optimal GA-PLSR model for soil Cd and As contents, respectively, and the outliers were removed.Finally, spatial distribution maps were obtained for the Cd and As contents of farmland soils in the study area (Figure 5).The results show that the overall soil Cd contents in the study area range from 0.03 to 0.78 mg kg −1 , with an average value of 0.28 mg kg −1 and a standard deviation of 0.19 mg kg −1 .The high values are mainly distributed in the northwest and southeast regions.The overall soil As contents in the study area range from 0.02 to 8.71 mg kg −1 , with an average value of 4.66 mg kg −1 and a standard deviation of 1.26 mg kg −1 .
The results of the soil Cd and As content estimation in the study area were counted and compared with the measured values at the sampling sites (Table 6).It can be seen that the maximum, minimum, and mean values of the estimated Cd contents of farmland soils in the study area were smaller and the standard deviation was larger compared to the measured values at the sampling sites.The maximum value and standard deviation of the estimated values of the As contents in farmland soils were larger and the minimum and mean values were smaller.Overall, there was a certain degree of deviation between the estimated results using HJ-1A HSI images and the measured values at the sampling sites.The estimated mean values of the Cd and As contents in farmland soils were reduced by 12.5% and 11.1%, respectively.
The HJ-1A HSI image spectra of the study area were transformed via the AT and AFD, respectively.Then, the estimation results were obtained by using the optimal GA-PLSR model for soil Cd and As contents, respectively, and the outliers were removed.Finally, spatial distribution maps were obtained for the Cd and As contents of farmland soils in the study area (Figure 5).The results show that the overall soil Cd contents in the study area range from 0.03 to 0.78 mg kg −1 , with an average value of 0.28 mg kg −1 and a standard deviation of 0.19 mg kg −1 .The high values are mainly distributed in the northwest and southeast regions.The overall soil As contents in the study area range from 0.02 to 8.71 mg kg −1 , with an average value of 4.66 mg kg −1 and a standard deviation of 1.26 mg kg −1 .The results of the soil Cd and As content estimation in the study area were counted and compared with the measured values at the sampling sites (Table 6).It can be seen that the maximum, minimum, and mean values of the estimated Cd contents of farmland soils in the study area were smaller and the standard deviation was larger compared to the measured values at the sampling sites.The maximum value and standard deviation of the estimated values of the As contents in farmland soils were larger and the minimum and mean values were smaller.Overall, there was a certain degree of deviation between the estimated results using HJ-1A HSI images and the measured values at the sampling sites.The estimated mean values of the Cd and As contents in farmland soils were reduced by 12.5% and 11.1%, respectively.Ground hyperspectral monitoring is the basis for soil heavy metal hyperspectral remote sensing monitoring studies.It is characterized by high timeliness, strong stability, and high resolutions, and provides the most direct and accurate ground observation data.Although direct monitoring on the ground can achieve higher model accuracy, it is difficult to directly apply it to the large-scale monitoring of soil heavy metal pollution in aerospace images [24,25].The indirect estimation of crop hyperspectral data can be used to explore the process of pollution from "sink" to "source".The main mechanism is using the transfer and enrichment effects of pollution from "sources" to "sinks" in the soil-crop system.Among them, the indirect monitoring method on the ground is easy to operate and promote but only suitable for fixed point monitoring.The remote sensing inversion method of aerospace hyperspectral data has the advantages of large-area, dynamic, and non-destructive monitoring.However, this method lacks the validation of ground data and the reliability of the results cannot be guaranteed.Therefore, we introduce the theory of an integrated source and sink of the sky and ground and conduct multiscale hyperspectral remote sensing monitoring through the combination of the sky and ground.On the one hand, the model for indirectly estimating the heavy metal content in the soil was successfully constructed through ground crop hyperspectral data.On the other hand, the spatial distribution of heavy metal contamination in farmland soil in the study area was successfully mapped using aerospace hyperspectral remote sensing data.Moreover, we compared the spatial estimation results and measured values at the sampling sites, which fully demonstrated the reliability of the research results.

Effect of Spectral Pre-Processing and GA-PLSR on Modeling Performance
From the modeling results, most of the pre-processed spectra showed improved accuracy compared to the raw spectral models.This was because noise information is easily brought in during the acquisition of spectral information, while spectral preprocessing techniques can reduce the spectral noise and highlight spectral feature information [47,48].The SNV and MSC are the best spectral transformation methods for estimating soil Cd and As contents from wheat leaf spectra, respectively.This is mainly because both the SNV and MSC can effectively reduce leaf surface scattering and enhance the spectral absorption information [49].The AT and AFD are the best spectral transformation methods for estimating soil Cd and As contents from HJ-1A HSI spectra, respectively.This is mainly because the AT transformation reduces the influence of multiplicative factors caused by changes in lighting conditions, and the FD transformation can effectively extract and amplify the information implied in the spectra [50,51].
Higher model accuracy can be obtained by using the GA for spectral feature band selection before PLSR modeling.This is mainly because hyperspectral data have the characteristics of redundancy and collinearity, and direct modeling using PLSR is easily compromised by a large amount of redundant information [52].However, the GA can effectively screen feature bands from the full spectrum, thereby improving the model's accuracy and stability.Zhang et al. [53] and Sun et al. [54] have also confirmed that GA-PLSR performs better than PLSR when predicting heavy metal contents using soil hyperspectral data.Therefore, this method can be used for the hyperspectral modeling of heavy metal contents in farmland soils.

Comparison of the Best Estimates of Wheat Leaf and HJ-1A HSI Spectra
The wheat leaf spectra had a good ability to estimate soil Cd contents (RPD of 2.72), which could be estimated relatively accurately at most sample sites (Figure 6a,b).Meanwhile, the wheat leaf spectra had an excellent ability to estimate soil As contents (RPD of 3.25), and the vast majority of sample sites were estimated relatively accurately (Figure 6c,d).However, the HJ-1A HSI spectra only had the ability to distinguish between high and low values (RPDs of less than 2), and only a small number of sample sites could be estimated accurately (Figure 6e-h).This was because the HJ-1A HSI spectra (Figure 3) clearly fluctuated more than the wheat leaf spectra (Figure 2), and the reflection peaks and absorption valleys were not obvious in some band ranges.Meanwhile, the wheat leaf and HJ-1A HSI data (Figure 4) also show some different spectral characteristics of soil Cd and As.These differences were mainly due to the satellite imaging spectrometer being interfered with by multiple factors during data acquisition, such as the surface, atmosphere, and environment.This reduced the spectral responses to soil and vegetation in the hyperspectral images, thereby limiting the accuracy of the extraction of soil heavy metal information [24,55].absorption valleys were not obvious in some band ranges.Meanwhile, the wheat leaf and HJ-1A HSI data (Figure 4) also show some different spectral characteristics of soil Cd and As.These differences were mainly due to the satellite imaging spectrometer being interfered with by multiple factors during data acquisition, such as the surface, atmosphere, and environment.This reduced the spectral responses to soil and vegetation in the hyperspectral images, thereby limiting the accuracy of the extraction of soil heavy metal information [24,55].

Innovativeness
The traditional methods for detecting heavy metal contents in soil use chemical analyses, which although accurate, are time-consuming, labor-intensive, and expensive.In comparison, remote sensing technology can allow the large-scale, real-time, and continuous monitoring of land information.Among the various techniques, hyperspectral remote sensing can allow the fine characterization of land information and is an important means of monitoring soil heavy metal pollution on a large scale [7].Different remote sensing platforms can achieve monitoring at different scales, such as the point scale for ground remote sensing, field scale for aerial remote sensing, and regional scale for space remote sensing.Ground remote sensing has the advantages of high monitoring accuracy and strong stability, space remote sensing has the advantages of wide monitoring ranges and real-time monitoring, and aerial remote sensing is in between the first two methods [11,25].Therefore, by combining remote sensing monitoring methods of different scales, complementary advantages can be formed, which can effectively allow the stereoscopic remote sensing monitoring of soil heavy metal pollution.This study combined ground hyperspectral and remote sensing hyperspectral images.We not only obtained the best estimation model for the indirect estimation of soil heavy metals from wheat leaf spectra but also mapped a spatial distribution map of soil heavy metal contents in the study area

Innovativeness
The traditional methods for detecting heavy metal contents in soil use chemical analyses, which although accurate, are time-consuming, labor-intensive, and expensive.In comparison, remote sensing technology can allow the large-scale, real-time, and continuous monitoring of land information.Among the various techniques, hyperspectral remote sensing can allow the fine characterization of land information and is an important means of monitoring soil heavy metal pollution on a large scale [7].Different remote sensing platforms can achieve monitoring at different scales, such as the point scale for ground remote sensing, field scale for aerial remote sensing, and regional scale for space remote sensing.Ground remote sensing has the advantages of high monitoring accuracy and strong stability, space remote sensing has the advantages of wide monitoring ranges and real-time monitoring, and aerial remote sensing is in between the first two methods [11,25].Therefore, by combining remote sensing monitoring methods of different scales, complementary advantages can be formed, which can effectively allow the stereoscopic remote sensing monitoring of soil heavy metal pollution.This study combined ground hyperspectral and remote sensing hyperspectral images.We not only obtained the best estimation model for the indirect estimation of soil heavy metals from wheat leaf spectra but also mapped a spatial distribution map of soil heavy metal contents in the study area using HJ-1A HSI image spectra.We also fully validated the inversion results with the information from ground sampling sites, and the overall results were good.Therefore, the multi-scale stereoscopic remote sensing monitoring system for soil heavy metal pollution constructed in this study can meet the need for rapid, real-time, and continuous monitoring over large areas.

Conclusions
Using spectral analysis techniques and modeling methods, this study constructed a multi-scale remote sensing monitoring system for soil heavy metal contamination.We not only explored the potential of wheat leaf spectra in the indirect estimation of soil heavy metal contents but also mapped the spatial distribution of soil heavy metal contamination in the study area using HJ-1A HSI spectra.We found that the GA-PLSR model's accuracy is better than PLSR in both ground and satellite spectral modeling, and the use of a GA for spectral band selection can improve the model's accuracy and stability.Wheat leaf spectra provide good estimation ability for both soil Cd and As, while HJ-1A HSI image spectra only allow the possibility to distinguish between high and low values for soil Cd and As.Therefore, it is possible to indirectly estimate soil heavy metal Cd and As contents using wheat leaf hyperspectra, while HJ-1A HSI image spectroscopy is also capable of identifying areas with key contamination risks.In the future, the methods and ideas of this study can be applied to further improve the accuracy of soil heavy metal contamination monitoring while combining multi-source temporal and spatial data.

Figure 1 .
Figure 1.Location of the study area and distribution of sampling sites.

Figure 1 .
Figure 1.Location of the study area and distribution of sampling sites.

Figure 2 .
Figure 2. Spectral curve characteristics of wheat leaves under different pre-processing techniques.

Figure 2 .
Figure 2. Spectral curve characteristics of wheat leaves under different pre-processing techniques.

Figure 3 .
Figure 3. Spectral curve characteristics of HJ-1A HSI under different pre-processing techniques.

Figure 3 .
Figure 3. Spectral curve characteristics of HJ-1A HSI under different pre-processing techniques.

Figure 3 .
Figure 3. Spectral curve characteristics of HJ-1A HSI under different pre-processing techniques.

Figure 5 .
Figure 5. Spatial distribution map of heavy metal contents in farmland soil in Xushe Town.

Figure 5 .
Figure 5. Spatial distribution map of heavy metal contents in farmland soil in Xushe Town.

Figure 6 .
Figure 6.Comparison of measured and estimated values from the cross-validation and external validation of the best estimation models for Cd and As content in soils: (a-d) wheat leaves; (e-f) HJ-1A HSI image.The solid line is fitted line, the dashed line is 1:1 line.

Figure 6 .
Figure 6.Comparison of measured and estimated values from the cross-validation and external validation of the best estimation models for Cd and As content in soils: (a-d) wheat leaves; (e-h) HJ-1A HSI image.The solid line is fitted line, the dashed line is 1:1 line.

Table 1 .
Statistics of chemical components in soil samples.

Table 2 .
Cross-validation results of GA-PLSR and PLSR models for estimating soil heavy metal contents in wheat leaves.

Table 3 .
Results for the cross-validation and external validation of the GA-PLSR model for estimating soil heavy metal contents in wheat leaves.

Table 4 .
Cross-validation results of GA-PLSR and PLSR models for estimating soil heavy metal contents in HJ-1A HSI images.

Table 5 .
Results of cross-validation and external validation of the GA-PLSR model for estimating soil heavy metal contents in HJ-1A HSI images.

Table 6 .
Comparison of measured values at sampling sites and spatially estimated values of heavy metals in agricultural soils of Xushe Town.

Table 6 .
Comparison of measured values at sampling sites and spatially estimated values of heavy metals in agricultural soils of Xushe Town.