Soil is an important resource in the natural environment and for agricultural production; healthy soil is a basic requirement in realizing the goal of sustainable agricultural development [1
]. In the development and use of mineral resources, heavy metal(loid) pollution of agricultural soil is one of the most serious problems caused by mining agricultural production [3
]. Soil heavy metal(loid)s have poor migration ability and easily accumulate in the soil to form heavy metal(loid) pollution [4
]. Heavy metal(loid) pollution can reduce the activity of microorganisms in soils, affect the yield and quality of crops, degrade water quality, and seriously endanger human health through the food chain [5
], and further affect the sustainable development of agriculture. Meanwhile, heavy metal(loid)s can also damage water quality and seriously harm human health through the food chain. [5
]. Therefore, it is necessary to monitor heavy metal(loid) pollution in agricultural soil [6
The traditional method for estimating heavy metal(loid)s in soils is the geochemical method, which is highly accurate but inefficient, expensive, and only suitable for small-scale monitoring [7
]. Conversely, remote sensing technology is highly efficient, low-cost, and is suitable for macro-monitoring [9
]. Therefore, remote sensing technology provides an effective alternative method to estimate heavy metal(loid) contents in soil.
At present, the remote sensing technology research methods are mainly applied using the full bands, characteristic bands, and spectral indices [11
]. When using full bands, the integrity and continuity of information can be guaranteed, and the models have high precision and strong objectivity, but the problem of data redundancy occurs because of the high spectral resolution, and the results are strongly influenced by background noise [11
]. The use of characteristic bands to estimate soil heavy metal(loid) contents involves finding the direct relationship between characteristic bands and soil heavy metal(loid) content for modeling. This method could effectively reduce redundant data, the impact between spectral bands, and the calculation cost, but it could also easily lead to the loss of some effective information [13
]. Due to the low content of heavy metal(loid)s in soils, their spectral signals are weak, and the characteristic bands are difficult to identify; so the correlation between the characteristic bands and the heavy metal(loid) contents in soils is weak and improving the estimation precision is difficult [14
]. Estimating heavy metal(loid) contents using a spectral index is a method of constructing a spectral index in the visible and near-infrared region, and the model can be constructed based on this spectral index. This method could effectively reduce the interference due to background noise, supplement the information between different bands, and significantly strengthen the correlation between spectral variables and soil heavy metal(loid) contents. The best bands for predicting soil heavy metal(loid) content could be selected, which compensates for the shortcomings of using the full bands and characteristic bands. However, accurate estimation could only be achieved by establishing appropriate indices [15
]. To summarize, it is important to choose the appropriate research method when estimating soil heavy metal(loid) contents with remote sensing.
The process of estimating heavy metal(loid) contents in soil is also strongly influenced by the mathematical model [16
]. At present, methods of estimating heavy metal(loid) content in soils are usually based on empirical statistical methods, including linear and nonlinear models [17
]. Linear models mainly include multiple linear regression (MLR) and partial least squares regression (PLSR) models, which have a simple structure, fast calculation speed, and a strong ability to deal with high-dimensional data; however, these models are not effective in the simulation of nonlinear relations [18
]. In recent years, nonlinear mathematical analysis methods, such as random forest regression (RFR) and adaptive neuro-fuzzy inference system (ANFIS) algorithms, have been gradually introduced into the field of soil spectral research because of their high stability and robustness in nonlinear relations. These models have some disadvantages such as a complicated structure and long computation time [20
Heavy metals are elements with a density greater than 4.5 g/m3
], such as nickel, mercury, chromium, and copper. Metalloids are substances that are intermediates between a metal and a nonmetal, and have similar properties to metals, such as arsenic [23
]. Heavy metal(loid)s have strong capacities to migrate, enrich, and contaminate, and may enter the human body through a number of pathways, such as air, water, and the food chain [24
]. Recent studies showed that the excessive human intake of heavy metals could lead to a higher risk of cancer, as well as chronic adverse effects on the respiratory system, circulation, and the nervous system [25
]. For example, high levels of Cu in the body can cause anemia, high cholesterol, bone changes, and damage to the capillaries, liver, kidneys, and stomach [26
]. One of the most dangerous metalloids found in the Earth’s crust and water is As, and long-term drinking of water contaminated with As can cause various diseases, such as cancer of various organs, cardiovascular problems, diabetes, and neurotoxicity [27
]. Therefore, it is crucial to monitor the heavy metal(loid) contents in farmland soil and in water conservation areas near mining areas.
The Suzi River Basin is the main area upstream of the Dahuofang reservoir, which supplies water to cities in Liaoning Province and is rich in mineral resources. During the mining process, heavy metal(loid)s are gradually deposited into the soil, which leads to heavy metal(loid) pollution of farmland soil. Among the common heavy metal(loid)s that pollute soil, Ni, Hg, Cr, Cu, and As were used as examples in this study. Based on simple spectral indices (addition, subtraction, and ratio), advanced spectral indices (normalized difference indices) were introduced, which were used as independent variables of the model. MLR, PLSR, RFR, and ANFIS were used for modeling, and the relationship between the optimal spectral indices and the heavy metal(loid) contents in soil was discussed, thus providing a scientific and effective basis for the selection of a method of estimating heavy metal(loid) contents in soils, and providing a basis for water source protection in this area.
Through the analysis of the original spectrum and the mathematically transformed spectrum, we found that the spectral resolution and sensitivity of the transformed spectrum were significantly improved, and many weak absorption peaks were amplified. These transformations effectively eliminated the influence of the parallel background values and extracted the spectral information. In these mathematical transformations, the first derivative and the second derivative resulted in higher model accuracy. In this study, the accuracy of Ni and Cu estimated using the first derivative was higher than that of the second derivative model, whereas the estimation accuracy for Hg, Cr, and As was higher using the second derivative. The main reason for this finding is that the first derivative can enhance the spectral information while maintaining the continuity and integrity of the spectral information [57
]. For the second derivative, the optimal band combination algorithm was used in this study to process the spectrum, which effectively removed the influence of additional spectral noise caused by the higher derivative. According to the spectral index calculation in Table 4
, Table 5
, Table 6
, Table 7
and Table 8
, 21 wavelengths were selected from the spectrum of the five heavy metal(loid)s, including 468, 606, 734, 825, 1100, 1170, 1192, 1184, 1217, 1477, 1535, 1628, 1732, 2146, 2191, 2251, 2268, 2314, 2388, 2414, and 2424 nm. Some studies showed that these wavelengths are close to the absorption characteristics of hematite, ferrihydrite, Fe3+
, goethite, illite, and organic matter [17
], and are strongly correlated with the spectral characteristics of N-H, C-H, O-H
group, and -OH stretching vibration of water molecules in agricultural soil [14
]. The main reason for this is the adsorption of heavy metal(loid)s by organic matter and iron/manganese minerals in soil, which is mainly related to the electronic transition of metal ions [62
]. In addition, the chemical bond stretching, the stretching vibration of the water molecular –OH in the soil silicate minerals, and the absorption of water molecules are also main reasons [63
Spectral transformation is a necessary and effective method for the prediction modelling of soil heavy metal(loid)s [66
]. Appropriate transformation can effectively highlight the characteristic bands of the spectrum, make the reflection peak and absorption valley of the spectral curve more prominent, and enhance the correlation between the soil spectrum and heavy metal(loid) content, so as to improve the prediction accuracy of the model. SG-smoothing pre-processing can eliminate the multiplicative interference of granularity, separating and removing complex effects, and leaving accurate soil spectral information [67
]. The SG-smoothed spectra were transformed by different mathematical methods, which improved the correlation between soil spectra and heavy metal(loid) content, especially the first and second derivatives. The results showed that the derivative transformation of the spectrum can distinguish similar spectra and effectively highlight the absorption characteristics of the spectrum. However, the integer derivative tends to ignore the progressive changes in the spectrum and curvature in the slit, resulting in the loss of subtle information [68
]. Previous studies also showed that the combination of different spectral pre-treatment methods can improve the prediction accuracy of the model [5
], which agrees with the findings of other studies [5
In this study, six optimal spectral indices (DIV, NPDI, RI, DI, NDSI, and SI) were used to investigate the feasibility of using spectral indices for the prediction of soil Ni, Hg, Cr, Cu, and As contents. A correlation between six spectral indices and soil Ni, Hg, Cr, Cu, and As contents was found. Compared with the correlation of the optimal characteristic bands (Table 2
), the correlation between the spectral indices and soil Ni, Hg, Cr, Cu, and As contents was significantly stronger.
Compared with the single-band model, the model accuracy significantly increased. In the single-band model, except for Hg, Cu, and As contents, Ni and Cr contents could be estimated, but the model accuracy was low, with
of 0.424 and 0.361, respectively. The results showed that the modeling accuracy can be significantly improved by using the optimized spectral indices because they can effectively reduce the influence of background noise, and can detect more detailed spectral characteristics. The optimal spectral indices well-magnified the correlation between dependent and independent variables, thus highlighting some important information useful for heavy metal(loid) estimation. Therefore, the optimal spectral indices could effectively compensate for the deficiency of the full-band and characteristic-band methods. Wavelength interaction could also be considered to reduce the influence of irrelevant wavelengths and deal with the overlapping absorption of soil components [70
], showing the potential for estimating the content of heavy metal(loid)s in soil by spectral indices.
This study showed that the optimal spectral indices can be successfully used to estimate the content of heavy metal(loid)s in agricultural soils, and the precision of modeling can be markedly improved. Among the prediction models of soil heavy metal(loid) contents established by the MLR, PLSR, RFR, and ANFIS algorithms, MLR is the most commonly used model because of its clarity and simple structure [73
]. However, compared with MLR, the second generation of the PLSR algorithm has better predictive ability. MLR is a combination of principal component analysis and multiple linear regression, reducing the advantages of variable constraints. Independent variables are used in the PLSR algorithm, which can effectively eliminate the influence of noise, thereby improving the model prediction ability [76
]. In this study, two relatively new machine learning algorithms (RFR and ANFIS) were also adopted, but their model accuracy was significantly lower than those of MLR and PLSR. The main reason may be that a linear relationship exists between the optimized spectral indices used in this study and the soil Ni, Hg, Cr, Cu, and As contents; some studies also reported a significant linear correlation between the spectral band and the content of heavy metal(loid)s [79
]. Although RFR and ANFIS algorithms have excellent performance, efficiency, and excellent performance in solving complex nonlinear problems, they tend to lack generalization ability [80
]. The small sample size is another possible reason limiting the prediction accuracy of the RFR algorithm in this study. By analyzing the accuracy of different models, we found that the optimal spectral indices can weaken nonlinear relationships and enhance linear relationships, improving the prediction accuracy of the model. Therefore, estimating soil heavy metal(loid) contents using optimal spectral indices is feasible, which provides technical support for the rapid monitoring of soil heavy metal(loid) contents.
Our research object was the soil sample point, and the research scope may have some limitations. In future studies, we hope that hyperspectral satellite and UAV remote sensing data can be fully used to obtain soil heavy metal(loid) estimation results in the region. Based on these methods, other substances (such as nitrogen, phosphorus, and potassium) could also be included in a study, which could provide a basis for comprehensive detection and protection of soil. Although we obtained good estimation results in this study, only 92 soil samples were used to develop and verify the model. This small number of soil samples may have affected the stability of the model, and more samples are needed to further improve the model. Some problems have not yet been explored and solved in practical application, such as the influence of soil thickness, soil moisture, soil temperature, and different soil properties on the spectrum, and the correlation between noninteger differential spectra and heavy metal(loid) contents in soil. To summarize, the quantitative modeling of soil heavy metal(loid)s using spectral remote sensing still needs further study. Finally, it is hoped that this study could provide a theoretical basis for the sustainable development of agriculture.
In the past, few studies have used the optimal spectral indices to predict the content of heavy metal(loid)s in agricultural soils. Taking Ni, Hg, Cr, Cu, and As in the agricultural soil of the Suzi River Basin, Liaoning Province as an example, we studied the potential of using optimal spectral indices to estimate the contents of Ni, Hg, Cr, Cu, and As in agricultural soil. The results showed that the correlation between the spectrum after mathematical conversion and the Ni, Hg, Cr, Cu, and As contents in agricultural soil is significantly stronger than that of the original spectra, which indicates that spectral processing technology is an effective method to eliminate noise and highlight spectral characteristics, being essential in the spectral analysis process. The correlation between the optimal spectral indices and the contents of Ni, Hg, Cr, Cu, and As in agricultural soils showed that the optimal band combination algorithm could effectively avoid the influence of noise. By using MLR, PLSR, RFR, and ANFIS algorithms to estimate the Ni, Hg, Cr, Cu, and As contents in the soil, we found a strong linear relationship between the optimized spectral indices and the Ni, Hg, Cr, Cu, and As contents in agriculture soil; the Ni, Hg, Cr, Cu, and As contents could be better estimated by MLR and PLSR. The results showed that the introduction of more spectral indices could weaken the nonlinear relationships and enhance the linear relationships and could quickly and accurately estimate the contents of Ni, Hg, Cr, Cu, and As in farmland soil. In addition, this study has some limitations. The influence of different soil types, soil temperature, and other conditions on the spectrum was not considered in the study. More sample data are needed to test the stability of the model. Therefore, in future work, more samples of different soil types will be collected to establish a more scientific and reliable estimation model of soil heavy metal(loid)s. This study’s findings are sufficient to provide a feasible reference for the estimation of heavy metal(loid) contents in other regions or for the estimation of the content of other heavy metal(loid)s, and it provides a theoretical basis for the estimation of metal(loid) contents in soil using hyperspectral satellites, which has strong promotion value. This study also provides technical support for sustainable agricultural development.