How Spectrally Nearby Samples Influence the Inversion of Soil Heavy Metal Copper

Liu, Yi; Shi, Tiezhu; Chen, Yiyun; Zhang, Wenyi; Yang, Chao; Tang, Yuzhi; Yuan, Lichao; Wang, Chuang; Cui, Wenling

doi:10.3390/land14091830

Open AccessArticle

How Spectrally Nearby Samples Influence the Inversion of Soil Heavy Metal Copper

by

Yi Liu

¹

,

Tiezhu Shi

^2,3,*,

Yiyun Chen

⁴

,

Wenyi Zhang

¹

,

Chao Yang

^2,3,

Yuzhi Tang

⁵,

Lichao Yuan

¹,

Chuang Wang

⁶ and

Wenling Cui

¹

School of Public Administration, Guangdong University of Finance & Economics, Guangzhou 510320, China

²

School of Architecture and Urban Planning, Shenzhen University, Shenzhen 518060, China

³

State Key Laboratory of Subtropical Building and Urban Science & Guangdong-Hong Kong-Macau Joint Laboratory for Smart Cities & Ministry of Natural Resources Key Laboratory for Geo-Environmental Monitoring of Great Bay Area, Shenzhen University, Shenzhen 518060, China

⁴

School of Resource and Environmental Science, Wuhan University, Wuhan 430079, China

⁵

Guangdong Laboratory of Artificial Intelligence and Digital Economy (SZ), Shenzhen 518107, China

⁶

Sociology Department, Lakehead University, Thunder Bay, ON P7B 5E1, Canada

^*

Author to whom correspondence should be addressed.

Land 2025, 14(9), 1830; https://doi.org/10.3390/land14091830

Submission received: 21 August 2025 / Revised: 5 September 2025 / Accepted: 5 September 2025 / Published: 8 September 2025

(This article belongs to the Special Issue Digital Soil Mapping and Precision Agriculture)

Download

Browse Figures

Versions Notes

Abstract

Monitoring soil heavy metal contamination in urban land to protect human health requires rapid and low-cost methods. Visible and infrared (vis-NIR) spectroscopy shows strong promise for monitoring metals such as copper (Cu). However, an important question is how “spectrally nearby” samples influence Cu estimation models. This study investigates that issue in depth. We collected 250 soil samples from Shenzhen City, China (the world’s tenth-largest city). During building the model, we selected spectrally nearby samples for each validation sample, varying the number of neighbors from 20 to 200 by adding one sample at a time. Results show that, compared with the traditional method, incorporating nearby samples substantially improved Cu prediction: the coefficient of determination in prediction (

R_{p}^{2}

) increased from 0.75 to 0.92, and the root mean square error of prediction (RMSEP) decreased from 8.56 to 4.50 mg·kg⁻¹. The optimal number of nearby samples was 125, representing 62.25% of the dataset. And the performance followed an L-shape curve as the number of neighbors increased—rapid improvement at first, then stabilization. We conclude that using spectrally nearby samples is an effective way to improve vis-NIR Cu estimation models. The optimal number of neighbors should balance model accuracy, robustness, and complexity.

Keywords:

copper contamination; soil heavy metal; urban environments; environment monitoring

1. Introduction

Heavy metal pollution of soil is a global problem [1,2]. Approximately 2.8 million contaminated sites have been identified in the European Union [3], and in China more than 80% of soil contamination on agricultural land is attributable to heavy metals [4]. Heavy metals are known for their toxicity, which can lead to an increased risk of disease and cancer [5,6]. Moreover, soil stores about 4 trillion tons of carbon, nearly five times the amount found in the atmosphere (0.8 trillion tons) [7]. Heavy metals can affect soil microorganisms, leading to greater carbon (C) release and less C sequestration, potentially worsening climate change [8,9]. Urban areas differ greatly from natural areas because of extensive human activities. Over half of the global population now lives in cities [10]. Human activities, like vehicle emissions and industrial process, are the sources of heavy metals [11]. Heavy metals can contaminate the food chain and drinking water, ultimately affecting humans and animals [12]. Urban areas are particularly vulnerable to heavy metal contamination, so it is crucial to assess heavy metal pollution in urban soils.

Estimating soil levels of heavy metals like Cu with traditional laboratory methods is costly and takes a long time [13]. Copper is one of the major toxic metals in urban soils [14,15]. Copper measurement involves a process called inductively coupled plasma optical emission spectrometry (ICP-OES) [16]. In that process, soil is dissolved in an acidic solution, which is not environmentally friendly [17]. However, visible and infrared (vis-NIR) spectroscopy and X-ray fluorescence (XRF) spectroscopy have emerged as rapid and low-cost alternatives to laboratory analyses for estimating soil heavy metal concentrations [18,19]. In this study, we focus only on vis-NIR spectroscopy. For example, Shi et al. (2024) used vis-NIR spectroscopy to estimate soil Cu and lead (Pb) [20]. Chen et al. (2024) utilized vis-NIR spectroscopy to detect soil chromium (Cr) pollution [21]. Krzebietke et al. (2023) monitored Cu, cadmium (Cd), nickel (Ni), Cr, manganese (Mn), Pb, zinc (Zn), and iron (Fe) in cultivated soils by vis-NIR spectroscopy [22]. Additionally, Wang et al. (2024) reviewed soil heavy metal inversion modeling and reported R² values ranging from 0.52 to 0.91 [23]. Monitoring soil Cu in urban areas requires analyzing many samples, and vis-NIR spectroscopy offers a promising way to handle large quantities efficiently.

The mechanism of vis-NIR spectroscopy for estimating soil properties relies on samples’ absorption and reflectance of vis-NIR light [24,25,26]. For instance, soils high in soil organic matter (SOM) absorb more vis-NIR light and thus exhibit low reflectance. Conversely, a sample with low SOM content will absorb less and reflect more [27]. This means that samples with similar spectra may have similar soil properties: high reflectance corresponds to low SOM, and low reflectance to high SOM. Heavy metals like Cu have fewer responsive bands and monitoring depends heavily on other related soil properties, such SOM or clay contents [28,29]. However, the estimation process remains the same: relating spectra to soil properties using multivariate analysis [30]. Just as in SOM estimation, Cu estimation might also benefit from using spectral similarities [31]. Thus, when collecting new samples from urban areas, comparing them to existing calibration samples based on spectral similarity may enhance the accuracy of Cu estimation models.

Early studies explored using spectral similarity to predict soil properties. Liu Yande et al. (2024) used spectral similarity to categorize samples, achieving good prediction of soil properties [32]. Viscarra Rossel et al. (2024) enhanced spectral similarity between samples by applying transfer learning [33]. Liu Yi et al. (2024) divided the samples into five groups based on spectral similarity [31]. Other research, including studies by Spiers et al. (2023) [34], Zeng et al. (2023) [35], and Ramirez-Lopez et al. (2013), has focused on algorithms related to spectral similarity [36]. Overall, taking spectral similarity into account often leads to better results [27,37,38]. How similar are the samples? Using spectrally nearby or neighbor samples may be more effective than relying on similarity-based methods for categorizing samples. However, previous studies usually did not use spectrally nearby samples and instead adopted similar approaches, such as stratification strategies [39].

Spectrally nearby samples are samples whose spectral characteristics are alike; they are used to predict the Cu content of a target sample. This concept is similar to spatially nearby samples, which are just samples close together on the ground (measured in meters or kilometers). In the same way, spectrally nearby samples are those that share similar spectral reflectance values (%), making it easy to identify a sample’s nearby samples just by looking at the spectral curves. Wang et al. (2024) used nearest neighbor samples based on spectral distance and achieved better prediction of soil properties [40]. Yang et al. (2023) employed spectral neighbor samples to estimate soil organic matter [41]. Tsakiridis et al. (2020) used nearest spectral neighbor samples to predict soil properties [42]. Research shows that using spectrally nearby samples generally leads to improved predictions. This is because spectrally nearby samples tend to share common characteristics in both their spectral data and their environmental factors or parent materials, which strengthens the relationship between spectral data and Cu content [27,43]. As a result, Cu estimations are more accurate when using these spectrally nearby samples [44]. However, few studies have specifically examined using spectrally nearby samples to estimate Cu or show how these samples affect soil Cu estimation models. The optimal number of spectrally nearby samples is also unclear—it is unknown whether adding more always improves predictions or eventually offers no benefit [45].

Shenzhen is a typical coastal city in China that has experienced rapid industrialization and urbanization. Intense human activities, combined with its natural conditions (highly temperate and heavy rainfall), may release large amounts of heavy metal into the soil environment [46]. Heavy metal contamination has raised public concern about environmental safety. Thus, we sampled soils from the city and examined how samples with similar spectral characteristics affect models that estimate Cu in the soil. Specifically, our research addresses two key questions in soil spectroscopy science: (1) Does using spectrally nearby samples improve the accuracy of Cu estimation models compared to traditional methods? (2) How does the number of nearby samples impact the model performance, and what is the optimal number of samples to use for Cu estimation?

2. Materials and Methods

2.1. Study Area

Shenzhen is located in the Pearl River Delta near the South China Sea (Figure 1). The city extends roughly 97 km east–west and 43 km north–south, with an average elevation of 82 m above sea level (masl). The city has a subtropical monsoon climate, a mean annual temperature of 22.4 °C, and average annual precipitation of 1933 mm. There are over 362 rivers within the city. The dominant soil types are latosolic red soils, red soils, yellow soils, paddy soils, and coastal Solonchaks based on the Genetic Soil Classification of China (GSCC) [31]. In the World Reference Base for Soil Resources (WRB) system, these correspond to Acrisols, Cambisols, Anthrosols, and Solonchaks. This city is well-known for its rapid economic and population growth over the past 40 years, making it the third largest city in China. However, fast urbanization and industrialization have increased the release of heavy metals into the soils [47]. As a result, urban soils are complex, strongly influenced by human activities, and differ markedly from natural soils. The region’s high temperatures, heavy rainfall, and high humidity promote deposition and accumulation of heavy metals. Soil contamination is important issue for sustainable development and public health in Shenzhen.

2.2. Soil Sample Collection and Data Acquisition

As shown in Figure 1, 250 samples were collected form the study area in November 2016. The sampling strategy provided even coverage across the study area. We divided the area into 2 × 2 km grid cells and, across five sampling campaigns, collected one soil sample (about 1.5 kg) from a 0–20 cm depth in each cell [48]. Because Shenzhen is highly urbanized and largely covered by impermeable surface (concrete, asphalt, and pavers; see Figure 1), we sampled mainly from the limited permeable areas within each cell: green areas along avenues, parks, grasslands, gardens, bare ground, and hillsides. Taking one sample per cell ensured samples were well spaced and not too close. In cells that were inaccessible (e.g., steep hills and lakes), sampling was not possible and no sample was collected (Figure 1). Since samples were collected from urban areas, artificial deposits were removed. All samples were carefully sealed in bags, labeled with their coordinates and index number, and then sent to the laboratory for analysis.

In the laboratory, the samples were air-dried, ground, and sieved through a 2 mm mesh. Each sample was split into two portions. One portion was digested in concentrated HNO₃ and Cu concentration was measured by ICP-OES [49,50]. The other portion was used for vis-NIR spectral analysis.

The visible and infrared spectral reflectance data were collected using an ASD FieldSpec^® 3 portable spectro-radiometer (Analytical Spectral Devices Inc., Boulder, CO, USA). All measurements were taken in a dark room, with a halogen lamp as the only light source, positioned at a 45° angle above the sample. The sensor probe was held 12 cm above the soil sample at a 90° angle. We acquired 10 scans per sample and averaged them for analysis. Since the probe’s illumination spot can influence the spectrum, the sample remained fixed during all 10 scans. Spectral measurements covered 350–2500 nm with a 1 nm resampling interval.

2.3. Spectrally Nearby Samples

Most previous studies did not select spectrally nearby samples to build models for estimating heavy metal content. In theory, samples that are close in spectral space have similar properties, which could promote the performance of Cu content estimation models. Therefore, this study focuses on using spectrally nearby samples and tests their effect on model construction.

In this study, “spectrally nearby samples” are those whose spectral signatures are most similar to a given validation sample. For each validation sample, we select its nearest neighbors in spectral space and use those neighbors to build the Cu estimation model for that validation sample (Figure 2). For example, as shown in Figure 3, the 20 spectrally closest samples are selected to build the Cu content estimation model. That is, each validation sample is encircled by samples with the most similar spectral signatures. This approach is very different from traditional methods, which typically do not consider whether calibration samples surround the validation sample in spectral space.

To find out whether using more spectrally nearby samples improves model accuracy, we tested different numbers of nearby samples. With 200 calibration samples available, we set the neighbor count between 20 (minimum to avoid unreliable models) and 200 (the maximum). To examine the influence of sample count on Cu estimation modeling, we increased the number by one each time. In other words, we tested with 20, 21, 22, …, up to 200 spectrally nearby samples.

To search the spectrally nearby samples, we calculate the spectral distance between samples. In our study, we use the most simple and clear distance as shown in Equation (1). The difference in each wavelength between two samples was calculated. As shown in Figure 3, through the difference in each wavelength, it is very easy and clear to find spectrally nearby samples.

D i s t a n c e (i, j) = \sqrt{{{(X_{350 i} {- X}_{350 j})}^{2} + {(X_{351 i} {- X}_{351 j})}^{2} + {(X_{352 i} {- X}_{352 j})}^{2} + \dots + (X_{2500 i} {- X}_{2500 j})}^{2}}

(1)

In this equation, Distance (i,j) represents the spectral distance between sample i and sample j.

X_{350 i}

,

X_{351 i}

,

X_{352 i}

,

{\dots, X}_{2500 i}

are the reflectance values of sample i at wavelengths of 350 nm, 351 nm, 352 nm, …, 2500 nm. Similarly,

X_{350 j}

,

X_{351 j}

,

X_{352 j}

,

{\dots, X}_{2500 j}

are the reflectance values of sample j.

2.4. Model Construction

As shown in Figure 2, 80% of the 250 samples (200 samples) were assigned to calibration, while 20% (50 samples) were reserved for validation. After ranking all 250 samples by Cu concentration from low to high, we selected every fifth sample for validation. By doing this, we obtain validation samples that were uniformly distributed across Cu content and representative of future samples [51]. This division strategy differs slightly from random selection or the Kennard–Stone (KS) method [40,52,53]. The validation samples remain fixed for testing the model, while the 200 calibration samples serve as candidates for selecting as spectrally nearby samples during model construction.

For each validation target, a Cu prediction model was trained using spectrally nearby samples. As described in Section 2.3 and shown in Figure 2, we tried neighborhood sizes from 20 up to 250, adding one more sample each time. We applied partial least square regression (PLSR) to perform multivariable regression. Although machine learning and deep learning approaches are becoming more popular in soil science, PLSR remains the most common and stable method [53,54]. PLSR projects the high-dimensional data space onto a small number of several latent variables and then relates these latent variables to Cu content. This approach is somewhat similar to principal component analysis (PCA), but PLSR generally performs better. We determined number of latent variables via leave one out cross-validation.

2.5. Model Evaluation

Several indicators were used to evaluate the performance of the Cu model based on spectrally nearby samples, including the coefficient of determination in prediction (

R_{p}^{2}

), root mean square error of prediction (RMSEP), and residual predictive deviation (RPD). The

R_{p}^{2}

value lies between 0 and 1, and higher values reflect superior model quality. RMSEP, measured in mg·kg⁻¹, represents the prediction error, so a lower RMSEP means the model estimates Cu content more accurately. RPD is a dimensionless indicator that allows for comparison between different soil properties and different research results. Higher RPD values indicate that the model performs better. The equations for these indicators are as follows:

R_{p}^{2} = 1 - \frac{\sum_{i}^{n} {(\hat{y_{l}} - y_{i})}^{2}}{\sum_{i}^{n} {(\hat{y_{l}} - \bar{y})}^{2}}

(2)

R M S E P = \sqrt{\frac{\sum_{i}^{n} {(\hat{y_{l}} - y_{i})}^{2}}{n}}

(3)

R P D = \frac{S D}{R M S E P}

(4)

n is how many samples there are. For each sample i,

\hat{y_{l}}

is the predicted value and

y_{i}

is the measured values.

\bar{y}

is the average of measured values and SD is standard deviation.

3. Results

3.1. Statistical Description of Soil Samples

Copper concentrations across Shenzhen show considerable variation (20.45–103.24 mg·kg⁻¹), with a mean of 58.29 mg·kg⁻¹ (Table 1). Previous studies in this city have reported mean Cu content of 14.43 mg·kg⁻¹ [55] and 73.14 mg·kg⁻¹ [56]. The relatively low concentration (14.43 mg·kg⁻¹) is likely because most samples came from natural soils with little human influence. These sites are far away from communities, buildings, roads, and farmland, and many lie in protected ecological areas with strict land-use controls. With minimal human-derived Cu inputs, the measured Cu concentration is close to the background value (17.00 mg·kg⁻¹). Other researchers have also found lower Cu content in natural soils, such as 26.19 mg·kg⁻¹ [57]. In contrast, the mean value of 73.14 mg·kg⁻¹ is closer to our result. Its much higher level than the background value suggests strong anthropogenic influence, likely from irrigation, construction, industrial activities, and the use of fertilizers and pesticides [2]. Over the past 40 years, more than 10,000 factories have been constructed, and more than 300 buildings taller than 200 m have been built. The city, near 22.5° N, also maintains extensive artificial green areas that likely require fertilizers and pesticide. Because our study area is urban, the Cu concentrations are lower than those reported for mining areas (e.g., 1–5732 mg·kg⁻¹ or 70–2231 mg·kg⁻¹) [58]. Therefore, applying our findings to mining areas will require further study.

The calibration and the validation sets have very similar statistical indicators such as minimum, maximum, mean, standard deviation (SD), and coefficient of variation (CV). This suggests that the samples were successfully divided and that the validation set is representative and likely to cover future samples from the same study area.

3.2. Performance of Cu Models Without Considering Spectrally Nearby Samples

Using the traditional method that does not consider spectrally nearby samples, the Cu estimation model achieved acceptable results (Figure 4). The

R_{p}^{2}

was 0.75, the RPD was 1.83, and the RMSEP was 8.56 mg·kg⁻¹. Most samples were located within the 95% confidence ellipse and near the fitted line, indicating robust estimation of Cu. However, some samples were far from the fitted line, suggesting there is still a need to improve the model. The fitted line’s slope was less than 45° (the 1:1 line), suggesting overestimation of low Cu concentrations and underestimation of high Cu concentrations. From Figure 4a, in the spectral space, the region of calibration samples covered that of the validation samples.

3.3. Model Performance Using Different Numbers of Spectrally Nearby Samples

Including more spectrally nearby samples led to higher

R_{p}^{2}

(Figure 5a). With just 20 spectrally nearby samples, the

R_{p}^{2}

reached 0.86. When the sample size increased from 20 to 45, the

R_{p}^{2}

rose to 0.89. However, with 45 to 60 samples, the

R_{p}^{2}

decreased. As the samples size expanded from 60 to 125 samples, the

R_{p}^{2}

climbed to 0.92, indicating the selection of more crucial samples. Once the sample count exceeded 125, the

R_{p}^{2}

showed little change, implying minor enhancements to the Cu models.

Incorporating additional spectrally nearby samples resulted in lower RMSEP (Figure 5b). Using 20 spectrally nearby samples, the RMSEP equaled 6.08 mg·kg⁻¹. From 21 to 45 samples, RMSEP decreased to 5.20 mg·kg⁻¹, showing a considerable enhancement of the Cu prediction model. However, from 45 to 60 samples, the RMSEP increased to 5.34 mg·kg⁻¹. Increasing the sample size from 60 to 125 samples lowered the RMSEP to 4.50 mg·kg⁻¹, indicating a notable decrease. Above 125 samples, RMSEP stabilized, showing only slight variability even when additional samples were selected.

The RPD showed fluctuations in a wave-like pattern as additional spectrally nearby samples were chosen (Figure 5c). Between 20 and 45 samples, the RPD increased, but decreased between 45 and 60. From 60 to 125, the RPD increased and then decreased. However, the RPD reached about 3.48 at 125 samples and after 125 samples the improvement became less pronounced. By approximately 195 samples, the RPD reached 3.85, which nearly exhausted all the calibration samples.

Overall, the Cu estimation model exhibited an L-shaped trend: performance improved sharply at first, then leveled off. Increasing nearby samples from 20 to 45 produced improvement in the model (

R_{p}^{2}

: 0.86 → 0.89; RMSEP: 6.08 → 5.20 mg·kg⁻¹; RPD:2.57 → 3.01). Between 45 and 60 samples, the model’s performance decreased slightly:

R_{p}^{2}

= 0.88, RMSEP = 5.34 mg·kg⁻¹, RPD = 2.92. However, from 60 to 125, the model improved again (

R_{p}^{2}

is 0.92, RMSEP is 4.50 mg·kg⁻¹, RPD is 3.48). With more than 125, Cu models stabilized with slight variability.

Compared with the traditional Cu model (

R_{p}^{2}

= 0.75; RMSEP = 8.56 mg·kg⁻¹; RPD = 1.83; Figure 4), models built from spectrally nearby samples showed significant improvement. Training with only 20 spectrally nearby samples yielded a Cu model with

R_{p}^{2}

= 0.86, RMSEP = 6.08 mg·kg⁻¹, and RPD = 2.57, outperforming the conventional 200-sample approach. Using 125 samples, the

R_{p}^{2}

rose to 0.92, RMSEP fell to 4.50 mg·kg⁻¹, and RPD reached 3.48—the RMSEP was nearly 50% lower than the traditional approach without spectrally nearby samples.

Based on the results in Figure 5, 125 samples were identified as the optimal number of spectrally nearby samples for estimating Cu concentrations. This value was selected because 125 samples delivered accurate outcomes (

R_{p}^{2}

= 0.92, RMSEP = 4.50 mg·kg⁻¹, RPD = 3.48), and only slight improvements were observed when more samples were added, as shown by the fitted line (blue dotted line). Therefore, using 125 samples provided a balanced consideration of sample size and model accuracy.

When there were 125 spectrally nearby samples, each validation sample was surrounded by neighboring calibration samples in spectral space (Figure 6). Figure 6a shows that the raw spectral curves of each validation sample were surrounded by its neighboring samples. For further analysis, we calculated and displayed the mean reflectance in two wavelength ranges: 350–1400 nm and 1401–2500 nm (Figure 6b). It is clear that the validation samples were also surrounded by calibration samples. Moreover, the calibration samples formed an elliptical pattern rather than a circle one. This is due to the strong correlation (r = 0.86) between the reflectance in the 350–1400 nm and 1401–2500 nm ranges. In other words, a sample with low mean reflectance in the 350–1400 nm range would also have low mean reflectance in the 1401–2500 nm range. Consequently, the samples were distributed along a line pattern (a 1:1 line with a 45-degree angle) within the elliptical pattern.

As shown in Figure 6c, the Cu model trained on 125 samples exhibited excellent prediction ability (

R_{p}^{2}

= 0.92; RMSEP = 4.50 mg·kg⁻¹; and RPD = 3.48). Most samples cluster around the 1:1 line and sit inside the 95% confidence ellipse. These results demonstrate the benefits of using spectrally nearby samples.

4. Discussion

4.1. Assessment of Urban Soil Cu Content Using Vis-NIR Spectroscopy

Contamination of urban soils by heavy metals has become a worldwide environmental concern, prompting many researchers to use vis-NIR spectroscopy to estimate metal concentrations [59]. In our study, we successfully estimated soil Cu levels in urban areas, achieving good results with

R_{p}^{2}

= 0.92, RMSEP = 4.50 mg·kg⁻¹, and RPD = 3.48. Other studies have also demonstrated good performance in Cu estimation, such as R² = 0.90 [60], R² = 0.92 [22], R² = 0.96 [61]. However, some researchers have reported poor performance in Cu estimation, with R² = 0.44 [62], R² = 0.46 [63], R² = 0.64 [64], R² = 0.66 [65]. The varying performance of Cu estimation models indicates that the feasibility of applying vis-NIR spectroscopy needs further evaluation to identify the causes of these discrepancies. Factors such as the spatial extent of study the area, the range of Cu content, and various environmental conditions contribute to these outcomes. For example, the soil in our urban study area is extensively influenced by human activates and covers a large spatial extent with a wide range of Cu concentrations. Therefore, our study provides a valuable case for monitoring urban soil heavy metal contamination through vis-NIR spectroscopy.

To understand the underlying mechanism of estimating Cu using spectroscopy, we performed selectivity ratio analysis to evaluate the usefulness of each wavelength [66]. In our study, the selectivity ratio value ranged from 0 to 0.5 (Figure 7). A higher selectivity ratio indicates a given wavelength is more useful the Cu estimation. Wavelengths between 378 and 525 nm showed high selectivity ratios (>0.1). Similarly, other researchers have reported the strong positive correlations between Cu and spectroscopy in the 460–590 nm range [62]. This correlation is due to the occurrence of Fe-oxide and SOM [67]. The second peak in selectivity ratio value occurs around 1400 nm, which is consistent with the secondary Al-OH absorption feature of clay minerals [63]. Additionally, the high selectivity ratio at ~2150 nm arises from the absorption by silica-alumina minerals.

4.2. How Spectrally Nearby Samples Affect Soil Cu Estimation Accuracy

Compared with conventional Cu estimation models, our study’s approach of using spectrally nearby samples demonstrates significant improvement. Specifically, model performance improved:

R_{p}^{2}

rose from 0.75 to 0.92, RMSEP dropped from 8.56 to 4.50 mg·kg⁻¹, and RPD rose from 1.83 to 3.48 (Figure 4b and Figure 6c). This improvement can be attributed to the enhanced spectral similarity between calibration and validation samples achieved by using spectrally nearby samples. Spectrally similar samples are more likely to improve the Cu estimation accuracy. Previous studies have reported similar findings. Wang et al. (2024) improved the SOM estimation model’s R² from 0.49 to 0.68 by using spectrally nearest samples [40]. Shi et al. (2015) reduced RMSE from 0.66% to 0.60% for SOM estimation using spectrally nearby samples [45]. However, our improvement is more significant due to our controlled sampling strategy. In our study, samples were collected using a grid-based method (Figure 1), which increased the likelihood of finding spectrally nearby samples that shared common soil properties. Additionally, we focused extensively on the impact of spectrally nearby samples, providing a clearer understanding of their influence on Cu estimation models. As a result, when traditional heavy metal estimation methods underperform, we recommend considering spectrally nearby samples as a practical strategy to improve the model performance.

When considering spectral similarity, most researchers use clustering, subsets, or stratification methods rather than spectrally nearby samples. For example, Liu et al. (2024) classified samples into five groups based on spectral similarity [31]. Shoshany et al. (2022) divided samples into five clusters by k-means clustering [43]. In fact, these subsets or grouping methods are different from the use of spectrally nearby samples. As shown in Figure 8, spectral subsets or clusters are much rougher than the spectrally nearby samples approach. Using subsets or groups does not guarantee that a validation sample will be surrounded by its nearest neighbors. As illustrated in Figure 8, a validation sample located at the edge or far from the center of a subset can be quite different from other samples in the same subset. In contrast, the spectrally nearby samples approach ensures that a validation sample is positioned at the center, surrounded by its most spectrally similar neighbors. This provides a clear advantage over spectral clustering or stratification methods. The study by Liu et al. (2024) provides evidence—their model performance actually worsened when using spectral subsets, with R² decreasing from 0.74 to 0.71 [31]. Therefore, it is recommended to use spectrally nearby samples rather than spectral subsets or stratifications.

When the number of spectrally nearby samples increases, the Cu estimation model performance followed an L-shaped pattern: it improved initially and then stabilized (Figure 5). This indicates that adding a few nearby samples can substantially enhance the model’s accuracy, but adding too many yields only marginal gains. A similar L-shaped pattern was observed by Wang et al. (2024) [40], as well as by Liu et al. (2019) [68], Debaene et al. (2014) [69], and Grinand et al. (2012) [70]. In contrast, some studies, like Shi et al. (2015) [45] and Liu et al. (2024) [51], reported an inverted U-shape, where performance eventually declines if the number of samples becomes too large. Despite this difference, two main conclusions can be drawn: (i) the contribution of spectrally nearby samples to Cu model accuracy depends on the number of samples used; (ii) optimizing this sample count is crucial to achieve the maximal predictive performance.

To elucidate the underlying mechanism by which the count of spectrally nearby samples impacts the Cu model, Figure 9 is presented. This figure visualizes the mean spectral distance across validation and calibration sets. Initially, the mean spectral distance increased gradually in a linear pattern. However, it then rose steeply after around 125 samples. Interestingly, this change in mean spectral distance is inversely related to the Cu estimation performance, as seen in the RPD values in Figure 9—the performance improves initially and then stabilizes. As noted in the previous paragraph, using a smaller number of spectrally nearby samples is more effective compared to using a larger number. In other words, these results suggest the existence of a threshold for the optimal count of nearby samples [71]. Within this threshold, as the count of samples increases, the positive effect continues to outweigh the negative effect. But beyond the threshold, the negative impact starts to exceed the positive, and the improvement in the estimation model is lost. According to Figure 9, the threshold may correspond to a mean spectral distance of around 0.1569, which occurs at 125 samples. Beyond 125 samples, the mean spectral distance increases significantly, suggesting that the complexity between validation and calibration samples also increases greatly. This rising spectral complexity makes it challenging to obtain a better Cu estimation model.

The optimal number of spectrally nearby samples is 125, which accounts for 62.5% of the total dataset (Figure 4). This subset yields strong Cu estimation performance, with

R_{p}^{2}

= 0.92, RMSEP = 4.50 mg·kg⁻¹, and RPD = 3.48. The chosen size is neither too small (e.g., 10%) nor too large (e.g., 90%); when a subset is too small, there is insufficient information to build a robust model, whereas a larger subset greatly increases model complexity without significant performance gains [72]. For these 125 samples, the mean spectral distance (validation vs. calibration) is 0.1569 (Figure 10). Visually, the red circle in Figure 10—drawn with a radius of 0.1569—covers an area that is neither too large nor too small, representing a balanced “compromise” in spectral space. Hence, the recommended number of spectrally nearby samples should be carefully chosen to be sufficient for accuracy while avoiding unnecessary complexity.

5. Conclusions

The study’s primary contribution is a thorough examination of how spectrally nearby samples affect vis-NIR spectroscopy models for estimating heavy metal Cu in urban areas. We found that including spectrally nearby samples substantially outperformed conventional approaches that ignore nearby samples. Model performance improved from

R_{p}^{2}

= 0.75 to 0.92, RMSEP from 8.56 to 4.50 mg·kg⁻¹, and RPD from 1.83 to 3.48. We also examined how the number of spectrally nearby samples influences model performance. The results show an L-shaped pattern: model accuracy increases with more nearby samples, then stabilizes. The optimal number was 125 (62.5% of the total), which balances model robustness and complexity. Overall, this study provides new insights into the importance of spectral similarity for heavy metal estimation in urban soils and offers valuable guidance for monitoring and managing urban soil contamination. However, application to mining areas with much higher Cu levels requires further investigation.

Author Contributions

Conceptualization, Y.L. and T.S.; methodology, W.Z.; software, W.Z.; validation, C.Y., Y.T. and L.Y.; formal analysis, C.W. and W.Z.; investigation, C.W.; resources, Y.C.; data curation, Y.T.; writing—original draft preparation, Y.L. and T.S.; writing—review and editing, Y.C.; visualization, W.C.; supervision, Y.C.; project administration, Y.L.; funding acquisition, T.S. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Guangdong Basic and Applied Basic Research Foundation (No. 2024A1515010110); the National Natural Science Foundation of China (No. 42471312, No. 42201319); and the Open Research Fund Program of MNR Key Laboratory for Geo-Environmental Monitoring of the Great Bay Area (No. GEMLab-2023015).

Data Availability Statement

Data are available upon request from the corresponding author.

Acknowledgments

We express our gratitude to the editors and the reviewers for offering valuable comments that have enhanced the quality of this paper. We also want to extend our significant appreciation to all the colleagues who provided essential assistance in this work.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Liu, Y.R.; van der Heijden, M.G.A.; Riedo, J.; Sanz-Lazaro, C.; Eldridge, D.J.; Bastida, F.; Moreno-Jiménez, E.; Zhou, X.Q.; Hu, H.W.; He, J.Z.; et al. Soil contamination in nearby natural areas mirrors that in urban greenspaces worldwide. Nat. Commun. 2023, 14, 1706. [Google Scholar] [CrossRef]
Hou, D.; Jia, X.; Wang, L.; McGrath, S.P.; Zhu, Y.G.; Hu, Q.; Zhao, F.J.; Bank, M.S.; O’Connor, D.; Nriagu, J. Global soil pollution by toxic metals threatens agriculture and human health. Science 2025, 388, 316–321. [Google Scholar] [CrossRef] [PubMed]
Pérez, A.P.; Eugenio, N.R. Status of Local Soil Contamination in Europe; Publications Office of the European Union: Brussels, Belgium, 2018. [Google Scholar]
Zhao, F.J.; Ma, Y.; Zhu, Y.G.; Tang, Z.; McGrath, S.P. Soil Contamination in China: Current Status and Mitigation Strategies. Environ. Sci. Technol. 2015, 49, 750–759. [Google Scholar] [CrossRef] [PubMed]
Shokri, S.; Abdoli, N.; Sadighara, P.; Mahvi, A.H.; Esrafili, A.; Gholami, M.; Jannat, B.; Yousefi, M. Risk assessment of heavy metals consumption through onion on human health in Iran. Food Chem. X 2022, 14, 100283. [Google Scholar] [CrossRef]
Angon, P.B.; Islam, M.S.; Shreejana, K.C.; Das, A.; Anjum, N.; Poudel, A.; Suchi, S.A. Sources, effects and present perspectives of heavy metals contamination: Soil, plants and human food chain. Heliyon 2024, 10, e28357. [Google Scholar] [CrossRef]
Lehmann, J.; Kleber, M. The contentious nature of soil organic matter. Nature 2015, 528, 60–68. [Google Scholar] [CrossRef]
Hou, D.; O’Connor, D.; Igalavithana, A.D.; Alessi, D.S.; Luo, J.; Tsang, D.C.; Sparks, D.L.; Yamauchi, Y.; Rinklebe, J.; Ok, Y.S. Metal contamination and bioremediation of agricultural soils for food safety and sustainability. Nat. Rev. Earth Environ. 2020, 1, 366–381. [Google Scholar] [CrossRef]
Xu, M.; Cui, Y.; Beiyuan, J.; Wang, X.; Duan, C.; Fang, L. Heavy metal pollution increases soil microbial carbon limitation: Evidence from ecological enzyme stoichiometry. Soil Ecol. Lett. 2021, 3, 230–241. [Google Scholar] [CrossRef]
Manoli, G.; Fatichi, S.; Schläpfer, M.; Yu, K.L.; Crowther, T.W.; Meili, N.; Burlando, P.; Katul, G.G.; Bou-Zeid, E. Magnitude of urban heat islands largely explained by climate and population. Nature 2019, 573, 55–60. [Google Scholar] [CrossRef]
Palansooriya, K.N.; Li, J.; Dissanayake, P.D.; Suvarna, M.; Li, L.; Yuan, X.; Sarkar, B.; Tsang, D.C.W.; Rinklebe, J.; Wang, X.; et al. Prediction of Soil Heavy Metal Immobilization by Biochar Using Machine Learning. Environ. Sci. Technol. 2022, 56, 4187–4198. [Google Scholar] [CrossRef]
O’Riordan, R.; Davies, J.; Stevens, C.; Quinton, J.N.; Boyko, C. The ecosystem services of urban soils: A review. Geoderma 2021, 395, 115076. [Google Scholar] [CrossRef]
Viscarra Rossel, R.A.; Behrens, T.; Ben-Dor, E.; Brown, D.; Demattê, J.; Shepherd, K.; Shi, Z.; Stenberg, B.; Stevens, A.; Adamchuk, V.; et al. A global spectral library to characterize the world’s soil. Earth Sci. Rev. 2016, 155, 198–230. [Google Scholar] [CrossRef]
Marija, R.; Lana, M.; Helena, B.; Davor, R. Copper Accumulation in Vineyard Soils: Distribution, Fractionation and Bioavailability Assessment. In Environmental Risk Assessment of Soil Contamination; Maria, C.H.-S., Ed.; IntechOpen: Rijeka, Croatia, 2014; pp. 799–825. [Google Scholar]
Cheng, H.X.; Li, K.; Li, M.; Yang, K.; Liu, F.; Cheng, X. Geochemical background and baseline value of chemical elements in urban soil in China. Earth Sci. Front. 2014, 21, 265–306. [Google Scholar] [CrossRef]
Olesik, J.W. Elemental Analysis Using ICP-OES and ICP/MS. Anal. Chem. 1991, 63, A12–A21. [Google Scholar] [CrossRef]
Kilbride, C.; Poole, J.; Hutchings, T.R. A comparison of Cu, Pb, As, Cd, Zn, Fe, Ni and Mn determined by acid extraction/ICP-OES and ex situ field portable X-ray fluorescence analyses. Environ. Pollut. 2006, 143, 16–23. [Google Scholar] [CrossRef]
Viscarra Rossel, R.A.; Behrens, T.; Ben-Dor, E.; Chabrillat, S.; Dematte, J.A.M.; Ge, Y.; Gomez, C.; Guerrero, C.; Peng, Y.; Ramirez-Lopez, L.; et al. Diffuse reflectance spectroscopy for estimating soil properties: A technology for the 21st century. Eur. J. Soil Sci. 2022, 73, e13271. [Google Scholar] [CrossRef]
Sacristán, D.; Rossel, R.A.V.; Recatalá, L. Proximal sensing of Cu in soil and lettuce using portable X-ray fluorescence spectrometry. Geoderma 2016, 265, 6–11. [Google Scholar] [CrossRef]
Shi, T.R.; Fu, Z.C.; Miao, X.H.; Lin, F.F.; Ma, J.Y.; Gu, S.Y.; Li, L.; Wu, C.F.; Luo, Y.M. Would it be better for partition prediction of heavy metal concentration in soils based on the fusion of XRF and Vis-NIR data? Sci. Total Environ. 2024, 908, 168381. [Google Scholar] [CrossRef]
Chen, L.; Tan, K.; Wang, X.; Chen, Y. A rapid soil Chromium pollution detection method based on hyperspectral remote sensing data. Int. J. Appl. Earth Obs. Geoinf. 2024, 128, 103759. [Google Scholar] [CrossRef]
Krzebietke, S.; Daszykowski, M.; Czarnik-Matusewicz, H.; Stanimirova, I.; Pieszczek, L.; Sienkiewicz, S.; Wierzbowska, J. Monitoring the concentrations of Cd, Cu, Pb, Ni, Cr, Zn, Mn and Fe in cultivated Haplic Luvisol soils using near-infrared reflectance spectroscopy and chemometrics. Talanta 2023, 251, 123749. [Google Scholar] [CrossRef]
Wang, Y.; Zou, B.; Chai, L.; Lin, Z.; Feng, H.; Tang, Y.; Tian, R.; Tu, Y.; Zhang, B.; Zou, H. Monitoring of soil heavy metals based on hyperspectral remote sensing: A review. Earth Sci. Rev. 2024, 254, 104814. [Google Scholar] [CrossRef]
Stenberg, B.; Viscarra Rossel, R.A.; Mouazen, A.M.; Wetterlind, J. Chapter five-visible and near infrared spectroscopy in soil science. Adv. Agron. 2010, 107, 163–215. [Google Scholar] [CrossRef]
Dor, E.B.; Granot, A.; Wallach, R.; Francos, N.; Pearlstein, D.H.; Efrati, B.; Boruvka, L.; Gholizadeh, A.; Schmid, T. Exploitation of the SoilPRO^® (SP) apparatus to measure soil surface reflectance in the field: Five case studies. Geoderma 2023, 438, 17. [Google Scholar] [CrossRef]
Kuang, B.; Mouazen, A.M. Influence of the number of samples on prediction error of visible and near infrared spectroscopy of selected soil properties at the farm scale. Eur. J. Soil Sci. 2012, 63, 421–429. [Google Scholar] [CrossRef]
Liu, Y.; Shi, Z.; Zhang, G.; Chen, Y.; Li, S.; Hong, Y.; Shi, T.; Wang, J.; Liu, Y. Application of Spectrally Derived Soil Type as Ancillary Data to Improve the Estimation of Soil Organic Carbon by Using the Chinese Soil Vis-NIR Spectral Library. Remote Sens. 2018, 10, 1747. [Google Scholar] [CrossRef]
Hong, Y.; Shen, R.; Cheng, H.; Chen, S.; Chen, Y.; Guo, L.; He, J.; Liu, Y.; Yu, L.; Liu, Y. Cadmium concentration estimation in peri-urban agricultural soils: Using reflectance spectroscopy, soil auxiliary information, or a combination of both? Geoderma 2019, 354, 113875. [Google Scholar] [CrossRef]
Gholizadeh, A.; Borůvka, L.; Saberioon, M.M.; Kozák, J.; Vašát, R.; Němeček, K. Comparing different data preprocessing methods for monitoring soil heavy metals based on soil spectral features. Soil Water Res. 2015, 10, 218–227. [Google Scholar] [CrossRef]
Tziolas, N.; Tsakiridis, N.; Ogen, Y.; Kalopesa, E.; Ben-Dor, E.; Theocharis, J.; Zalidis, G. An integrated methodology using open soil spectral libraries and Earth Observation data for soil organic carbon estimations in support of soil-related SDGs. Remote Sens. Environ. 2020, 244, 111793. [Google Scholar] [CrossRef]
Liu, Y.; Shi, T.; Chen, Y.; Lan, Z.; Guo, K.; Zhuang, D.; Yang, C.; Zhang, W. Monitoring the Soil Copper of Urban Land with Visible and Near-Infrared Spectroscopy: Comparing Spectral, Compositional, and Spatial Similarities. Land 2024, 13, 1279. [Google Scholar] [CrossRef]
Liu, Y.; He, C.; Jiang, X. Sample selection method using near-infrared spectral information entropy as similarity criterion for constructing and updating peach firmness and soluble solids content prediction models. J. Chemom. 2024, 38, e3528. [Google Scholar] [CrossRef]
Viscarra Rossel, R.A.; Shen, Z.F.; Lopez, L.R.; Behrens, T.; Shi, Z.; Wetterlind, J.; Sudduth, K.A.; Stenberg, B.; Guerrero, C.; Gholizadeh, A.; et al. An imperative for soil spectroscopic modelling is to think global but fit local with transfer learning. Earth Sci. Rev. 2024, 254, 104797. [Google Scholar] [CrossRef]
Spiers, R.C.; Norby, C.; Kalivas, J.H. Physicochemical Responsive Integrated Similarity Measure (PRISM) for a Comprehensive Quantitative Perspective of Sample Similarity Dynamically Assessed with NIR Spectra. Anal. Chem. 2023, 95, 12776–12784. [Google Scholar] [CrossRef]
Zeng, R.; Rossiter, D.G.; Zhao, Y.; Li, D.; Liu, F.; Zheng, G.; Zhang, G. The choice of spectral similarity algorithms influences suspected soil sample provenance. Forensic Sci. Int. 2023, 347, 111688. [Google Scholar] [CrossRef] [PubMed]
Ramirez-Lopez, L.; Behrens, T.; Schmidt, K.; Rossel, R.V.; Demattê, J.; Scholten, T. Distance and similarity-search metrics for use with soil vis–NIR spectra. Geoderma 2013, 199, 43–53. [Google Scholar] [CrossRef]
Zeng, R.; Zhang, J.P.; Cai, K.; Gao, W.C.; Pan, W.J.; Jiang, C.Y.; Zhang, P.Y.; Wu, B.W.; Wang, C.H.; Jin, X.Y.; et al. How similar is “similar,” or what is the best measure of soil spectral and physiochemical similarity? PLoS ONE 2021, 16, e0247028. [Google Scholar] [CrossRef] [PubMed]
Hong, Y.; Yu, L.; Chen, Y.; Liu, Y.; Liu, Y.; Liu, Y.; Cheng, H. Prediction of soil organic matter by vis–NIR spectroscopy using normalized soil moisture index as a proxy of soil moisture. Remote Sens. 2017, 10, 28. [Google Scholar] [CrossRef]
Qi, Y.; Qie, X.; Qin, Q.; Shukla, M.K. Prediction of soil calcium carbonate with soil visible-near-infrared reflection (Vis-NIR) spectral in Shaanxi province, China: Soil groups vs. spectral groups. Int. J. Remote Sens. 2021, 42, 2502–2516. [Google Scholar] [CrossRef]
Wang, Z.; Chen, S.C.; Lu, R.; Zhang, X.L.; Ma, Y.X.; Shi, Z. Non-linear memory-based learning for predicting soil properties using a regional vis-NIR spectral library. Geoderma 2024, 441, 116752. [Google Scholar] [CrossRef]
Yang, M.; Chen, S.; Xu, D.; Hong, Y.; Li, S.; Peng, J.; Ji, W.; Guo, X.; Zhao, X.; Shi, Z. Strategies for predicting soil organic matter in the field using the Chinese Vis-NIR soil spectral library. Geoderma 2023, 433, 116461. [Google Scholar] [CrossRef]
Tsakiridis, N.L.; Keramaris, K.D.; Theocharis, J.B.; Zalidis, G.C. Simultaneous prediction of soil properties from VNIR-SWIR spectra using a localized multi-channel 1-D convolutional neural network. Geoderma 2020, 367, 114208. [Google Scholar] [CrossRef]
Shoshany, M.; Roitberg, E.; Goldshleger, N.; Kizel, F. Universal quadratic soil spectral reflectance line and its deviation patterns’ relationships with chemical and textural properties: A global data base analysis. Remote Sens. Environ. 2022, 280, 113182. [Google Scholar] [CrossRef]
Guerrero, C.; Wetterlind, J.; Stenberg, B.; Mouazen, A.M.; Gabarrón-Galeote, M.A.; Ruiz-Sinoga, J.D.; Zornoza, R.; Viscarra Rossel, R.A. Do we really need large spectral libraries for local scale SOC assessment with NIR spectroscopy? Soil Tillage Res. 2016, 155, 501–509. [Google Scholar] [CrossRef]
Shi, Z.; Ji, W.; Viscarra Rossel, R.A.; Chen, S.; Zhou, Y. Prediction of soil organic matter using a spatially constrained local partial least squares regression and the Chinese vis–NIR spectral library. Eur. J. Soil Sci. 2015, 66, 679–687. [Google Scholar] [CrossRef]
Zhang, Y.; Mao, W.; Li, R.; Liu, Y.; Wang, P.; Zheng, Z.; Guan, Y. Distribution characteristics, risk assessment, and quantitative source apportionment of typical contaminants (HMs, N, P, and TOC) in river sediment under rapid urbanization: A study case of Shenzhen river, Pearl River Delta, China. Process Saf. Environ. Prot. 2022, 162, 155–168. [Google Scholar] [CrossRef]
Duan, D.Y.; Wang, P.; Rao, X.; Zhong, J.H.; Xiao, M.H.; Huang, F.; Xiao, R.B. Identifying interactive effects of spatial drivers in soil heavy metal pollutants using interpretable machine learning models. Sci. Total Environ. 2024, 934, 173284. [Google Scholar] [CrossRef] [PubMed]
Shi, T.; Hu, Z.; Shi, Z.; Guo, L.; Chen, Y.; Li, Q.; Wu, G. Geo-detection of factors controlling spatial patterns of heavy metals in urban topsoil using multi-source data. Sci. Total Environ. 2018, 643, 451–459. [Google Scholar] [CrossRef]
Wang, C.; Yang, Z.; Yuan, X.; Browne, P.; Chen, L.; Ji, J. The influences of soil properties on Cu and Zn availability in soil and their transfer to wheat t (Triticum aestivum L.) in the Yangtze River delta region, China. Geoderma 2013, 193, 131–139. [Google Scholar] [CrossRef]
Lindsay, W.L.; Norvell, W.A. Development of a DTPA Soil Test for Zinc, Iron, Manganese, and Copper. Soil Sci. Soc. Am. J. 1978, 42, 2327. [Google Scholar] [CrossRef]
Liu, Y.; Shi, T.Z.; Lan, Z.Y.; Guo, K.; Yang, C.; Chen, Y.Y. Monitoring Soil Copper in Urban Land Using Visibale and Near-Infrared Spectroscopy with Spatially Nearby Samples. Sensors 2024, 24, 5612. [Google Scholar] [CrossRef]
Dai, L.; Wang, Z.; Zhuo, Z.; Ma, Y.; Shi, Z.; Chen, S. Prediction of soil organic carbon fractions in tropical cropland using a regional visible and near-infrared spectral library and machine learning. Soil Tillage Res. 2025, 245, 106297. [Google Scholar] [CrossRef]
Zhou, Y.; Biswas, A.; Hong, Y.; Chen, S.; Hu, B.; Shi, Z.; Guo, Y.; Li, S. Enhancing soil profile analysis with soil spectral libraries and laboratory hyperspectral imaging. Geoderma 2024, 450, 117036. [Google Scholar] [CrossRef]
Xu, S.X.; Zhao, Y.C.; Wang, Y.Y. Optimizing machine learning models for predicting soil pH and total P in intact soil profiles with visible and near-infrared reflectance (VNIR) spectroscopy. Comput. Electron. Agric. 2024, 218, 108643. [Google Scholar] [CrossRef]
Lin, T.; Zhao, S.H.; Xi, X.P.; Yang, K.; Luo, F. Environmental Background Values of Heavy Metals and Physicochemical Properties in Different Soils in Shenzhen. Environ. Sci. 2021, 42, 3518–3526. [Google Scholar] [CrossRef]
Chang, W.; Li, Z.; Zhou, Y.; Zeng, H. Heavy metal pollution and comprehensive ecological risk assessment of surface soil in diffe-rent functional areas of Shenzhen, China. Chin. J. Appl. Ecol. 2020, 31, 999–1007. [Google Scholar] [CrossRef]
Li, X.; Qiu, H.; Ding, A.; Fan, P. Heavy metal concentrations prediction of marine sediments by visible-near infrared spectroscopy based on attention mechanism. J. Hazard. Mater. 2025, 484, 136729. [Google Scholar] [CrossRef]
Punia, A. Role of temperature, wind, and precipitation in heavy metal contamination at copper mines: A review. Environ. Sci. Pollut. Res. 2021, 28, 4056–4072. [Google Scholar] [CrossRef]
Nawar, S.; Cipullo, S.; Douglas, R.K.; Coulon, F.; Mouazen, A.M. The applicability of spectroscopy methods for estimating potentially toxic elements in soils: State-of-the-art and future trends. Appl. Spectrosc. Rev. 2020, 55, 525–557. [Google Scholar] [CrossRef]
Gozukara, G.; Acar, M.; Ozlu, E.; Dengiz, O.; Hartemink, A.E.; Zhang, Y.K. A soil quality index using Vis-NIR and pXRF spectra of a soil profile. Catena 2022, 211, 105954. [Google Scholar] [CrossRef]
Yuan, L.; Chen, X.H.; Yao, L.J.; Pan, T. Multi-parameter optimization for Vis-NIR spectroscopic analysis of multiple indicators of soil heavy metal in the tideland reclamation area of the Pearl River Delta. Soil Sediment Contam. 2024, 33, 115–138. [Google Scholar] [CrossRef]
Gholizadeh, A.; Saberioon, M.; Ben-Dor, E.; Rossel, R.A.V.; Borukva, L. Modelling potentially toxic elements in forest soils with vis—NIR spectra and learning algorithms. Environ. Pollut. 2020, 267, 115574. [Google Scholar] [CrossRef]
Zhou, W.; Yang, H.; Xie, L.J.; Li, H.R.; Huang, L.; Zhao, Y.P.; Yue, T.X. Hyperspectral inversion of soil heavy metals in Three-River Source Region based on random forest model. Catena 2021, 202, 105222. [Google Scholar] [CrossRef]
Yang, K.; Wu, F.; Guo, H.; Chen, D.; Deng, Y.; Huang, Z.; Han, C.; Chen, Z.; Xiao, R.; Chen, P. Hyperspectral Inversion of Soil Cu Content in Agricultural Land Based on Continuous Wavelet Transform and Stacking Ensemble Learning. Land 2024, 13, 1810. [Google Scholar] [CrossRef]
Cui, S.C.; Zhou, K.F.; Ding, R.F.; Cheng, Y.Y.; Jiang, G. Estimation of soil copper content based on fractional-order derivative spectroscopy and spectral characteristic band selection. Spectrochim. Acta Part A Mol. Biomol. Spectrosc. 2022, 275, 121190. [Google Scholar] [CrossRef] [PubMed]
Rajalahti, T.; Arneberg, R.; Berven, F.S.; Myhr, K.M.; Ulvik, R.J.; Kvalheim, O.M. Biomarker discovery in mass spectral profiles by means of selectivity ratio plot. Chemom. Intell. Lab. Syst. 2009, 95, 35–48. [Google Scholar] [CrossRef]
Gholizadeh, A.; Saberioon, M.; Ben-Dor, E.; Boruvka, L. Monitoring of selected soil contaminants using proximal and remote sensing techniques: Background, state-of-the-art and future perspectives. Crit. Rev. Environ. Sci. Technol. 2018, 48, 243–278. [Google Scholar] [CrossRef]
Liu, Y.; Liu, Y.; Chen, Y.; Zhang, Y.; Shi, T.; Wang, J.; Hong, Y.; Fei, T. The Influence of Spectral Pretreatment on the Selection of Representative Calibration Samples for Soil Organic Matter Estimation Using Vis-NIR Reflectance Spectroscopy. Remote Sens. 2019, 11, 450. [Google Scholar] [CrossRef]
Debaene, G.; Niedźwiecki, J.; Pecio, A.; Żurek, A. Effect of the number of calibration samples on the prediction of several soil properties at the farm-scale. Geoderma 2014, 214, 114–125. [Google Scholar] [CrossRef]
Grinand, C.; Barthes, B.; Brunet, D.; Kouakoua, E.; Arrouays, D.; Jolivet, C.; Caria, G.; Bernoux, M. Prediction of soil organic and inorganic carbon contents at a national scale (France) using mid-infrared reflectance spectroscopy (MIRS). Eur. J. Soil Sci. 2012, 63, 141–151. [Google Scholar] [CrossRef]
Li, S.; Viscarra Rossel, R.A.; Webster, R. The cost-effectiveness of reflectance spectroscopy for estimating soil organic carbon. Eur. J. Soil Sci. 2022, 73, 13202. [Google Scholar] [CrossRef]
Ng, W.; Minasny, B.; Mendes, W.D.; Dematt, J.A.M. The influence of training sample size on the accuracy of deep learning models for the prediction of soil properties with near-infrared spectroscopy data. Soil 2020, 6, 565–578. [Google Scholar] [CrossRef]

Figure 1. Map of the study area and sampling sites.

Figure 2. The flowchart of spectrally nearby samples. “#i” denotes the i-th sample, and “#” indicates the sample number.

Figure 3. Validation results using 20 and 50 spectrally nearby samples to build the Cu estimation model. (a) Spectral curve and (b) mean reflectance of a validation sample’s 20 spectrally nearby samples; (c) Spectral curve and (d) mean reflectance of a validation sample’s 50 spectrally nearby samples.

Figure 4. Performance of the Cu estimation model without spectrally nearby samples. (a) Spectral curves of the calibration and validation samples. (b) Comparison of spectroscopy-based and measured soil Cu concentrations.

Figure 5. Model performance for soil Cu estimation with different numbers of spectrally nearby samples: (a)

R_{p}^{2}

, (b) RMSEP, and (c) RPD. The red line shows the values of the three indicators, and the blue dotted line represents the fitted line.

Figure 5. Model performance for soil Cu estimation with different numbers of spectrally nearby samples: (a)

R_{p}^{2}

, (b) RMSEP, and (c) RPD. The red line shows the values of the three indicators, and the blue dotted line represents the fitted line.

Figure 6. Soil Cu estimation model performance when 125 spectrally nearby samples are used. (a) Spectral curve and (b) mean reflectance of a validation sample’s 125 spectrally nearby samples. (c) Comparison of spectroscopy-based and measured soil Cu concentrations.

Figure 7. The selectivity ratio of wavelength for soil Cu estimation model. Higher selectivity ratios indicate greater utility for Cu estimation.

Figure 8. Examples of spectrally nearby samples and spectral subsets.

Figure 9. Mean spectral distance between validation and calibration sets for varying numbers of nearby samples.

Figure 10. Mean spectral distance between the validation and calibration samples with a selection of 125 nearby samples. The red circle has radius of 0.1569.

Table 1. Copper concentration for the calibration and validation samples.

Ascription	Count	Cu (mg·kg⁻¹)
Ascription	Count	Range ¹	Min	Max	Mean	SD ²	CV ³	Background Value
Calibration set	200	82.79	20.45	103.24	58.29	15.60	0.27	17.00
Validation set	50	71.85	25.21	97.06	58.30	15.63	0.27
Entire	250	82.79	20.45	103.24	58.29	15.57	0.27

¹ Range denotes the span from the smallest to the largest observation. ² SD denotes standard deviation. ³ CV denotes coefficient of variation.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Liu, Y.; Shi, T.; Chen, Y.; Zhang, W.; Yang, C.; Tang, Y.; Yuan, L.; Wang, C.; Cui, W. How Spectrally Nearby Samples Influence the Inversion of Soil Heavy Metal Copper. Land 2025, 14, 1830. https://doi.org/10.3390/land14091830

AMA Style

Liu Y, Shi T, Chen Y, Zhang W, Yang C, Tang Y, Yuan L, Wang C, Cui W. How Spectrally Nearby Samples Influence the Inversion of Soil Heavy Metal Copper. Land. 2025; 14(9):1830. https://doi.org/10.3390/land14091830

Chicago/Turabian Style

Liu, Yi, Tiezhu Shi, Yiyun Chen, Wenyi Zhang, Chao Yang, Yuzhi Tang, Lichao Yuan, Chuang Wang, and Wenling Cui. 2025. "How Spectrally Nearby Samples Influence the Inversion of Soil Heavy Metal Copper" Land 14, no. 9: 1830. https://doi.org/10.3390/land14091830

APA Style

Liu, Y., Shi, T., Chen, Y., Zhang, W., Yang, C., Tang, Y., Yuan, L., Wang, C., & Cui, W. (2025). How Spectrally Nearby Samples Influence the Inversion of Soil Heavy Metal Copper. Land, 14(9), 1830. https://doi.org/10.3390/land14091830

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

How Spectrally Nearby Samples Influence the Inversion of Soil Heavy Metal Copper

Abstract

1. Introduction

2. Materials and Methods

2.1. Study Area

2.2. Soil Sample Collection and Data Acquisition

2.3. Spectrally Nearby Samples

2.4. Model Construction

2.5. Model Evaluation

3. Results

3.1. Statistical Description of Soil Samples

3.2. Performance of Cu Models Without Considering Spectrally Nearby Samples

3.3. Model Performance Using Different Numbers of Spectrally Nearby Samples

4. Discussion

4.1. Assessment of Urban Soil Cu Content Using Vis-NIR Spectroscopy

4.2. How Spectrally Nearby Samples Affect Soil Cu Estimation Accuracy

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI