Hyperspectral Unmixing-Based Remote Sensing Inversion of Multiple Heavy Metals in Mining Soils: A Case Study of the Lengshuijiang Antimony Mine, Hunan Province

Zhang, Xinyu; Cao, Li; Ge, Jiawang; Feng, Ruyi; Han, Wei; Huang, Xiaohui; Wang, Sheng; Wang, Yuewei

doi:10.3390/rs18050767

Open AccessArticle

Hyperspectral Unmixing-Based Remote Sensing Inversion of Multiple Heavy Metals in Mining Soils: A Case Study of the Lengshuijiang Antimony Mine, Hunan Province

by

Xinyu Zhang

¹

,

Li Cao

^1,2,3,

Jiawang Ge

^2,3,

Ruyi Feng

¹,

Wei Han

^1,*

,

Xiaohui Huang

¹

,

Sheng Wang

¹ and

Yuewei Wang

¹

School of Computer Science, China University of Geosciences, Wuhan 430074, China

²

The Second Surveying and Mapping Institute of Hunan Province, Changsha 410029, China

³

Key Laboratory of Natural Resources Monitoring and Supervision in Southern Hilly Region, Ministry of Natural Resources, Changsha 410029, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2026, 18(5), 767; https://doi.org/10.3390/rs18050767

Submission received: 14 January 2026 / Revised: 24 February 2026 / Accepted: 28 February 2026 / Published: 3 March 2026

Download

Browse Figures

Versions Notes

Highlights

A spectral-unmixing-based framework mitigates mixed-pixel interference in mining areas.
PPI-VCA with prior-guided SAM improves endmember purity and interpretability.
High-precision mapping of eight heavy metals (Pb, Cd, Cr, Hg, As, Cu, Zn, Ni) is achieved.
Spatial analysis reveals synergistic pollution patterns and potential sources.

Abstract

Soil heavy metal contamination in mining areas poses a serious environmental challenge, requiring monitoring approaches with both wide coverage and high accuracy. Hyperspectral remote sensing provides an effective solution, yet its performance in complex mining environments is often limited by mixed-pixel effects and nonlinear spectral responses. To address these issues, this study proposes a Physically-Constrained Collaborative Endmember Extraction (PCCEE) framework that integrates spectral unmixing with machine learning for multi-element inversion. Using Gaofen-5 hyperspectral imagery, a collaborative workflow combining Pixel Purity Index (PPI), Vertex Component Analysis (VCA), and prior-spectral-constrained Spectral Angle Mapper (SAM) was developed to improve endmember purity and physical interpretability. Among three unmixing models (LMM, NMF, and SVR), the Linear Mixing Model achieved the best balance between accuracy and efficiency. Random Forest regression using retrieved abundances enabled high-accuracy inversion of eight heavy metals (mean R² = 0.85). Spatial analysis revealed significant co-enrichment of Pb, Cd, and Zn related to sulfide weathering, while PCA distinguished compound and independent pollution sources. The proposed PCCEE framework effectively mitigates mixed-pixel interference and provides a transferable approach for heavy metal monitoring and risk assessment in complex mining environments.

Keywords:

hyperspectral remote sensing; endmember extraction; spectral unmixing; soil heavy metal inversion; pollution source apportionment

1. Introduction

Soil heavy metal contamination is a major environmental concern worldwide, especially in areas of mineral resource exploitation [1,2]. Industrialization and mining have led to the accumulation of toxic metals such as Pb, Cd, Cr, Hg, and As in soils [3,4]. While mineral resource development plays a critical role in socio-economic growth, the adverse impacts of mining on the natural environment are often inevitable—such activities release potentially toxic substances into the soil, air, and water [3,4]. Heavy metals exhibit strong biotoxicity, persistence, and long environmental half-lives, making them difficult to degrade. Once absorbed by plants, they degrade soil quality, threaten wildlife and public health, and may enter the food chain, undermining ecosystem stability and biodiversity [5,6]. Monitoring their spatial distribution in mining areas is therefore essential for identifying contamination, guiding reclamation, and protecting public health [7].

Remote sensing, with its ability for rapid, large-scale surface observation, has become indispensable for environmental monitoring. By enabling thematic mapping of diverse surface features, it improves efficiency in detecting soil pollution and analyzing spatial distributions, particularly for heavy metals [8,9,10]. Conventional remote sensing is limited by spectral resolution, restricting accurate identification of different metals. Hyperspectral remote sensing, providing continuous, high-dimensional spectra, captures subtle spectral responses of heavy metals. Recent advances in hyperspectral sensors have enabled applications in soil quality assessment, mining pollution detection, and environmental risk analysis [11,12]. Previous studies have shown that different metals exhibit distinct spectral characteristics, supporting simultaneous multi-element inversion [13]. However, challenges such as mixed pixels and spectral redundancy require refined spectral unmixing for improved inversion accuracy.

Despite the potential of hyperspectral inversion, several limitations constrain its effectiveness in mining areas. Soil heavy metal inversion using hyperspectral remote sensing is often challenged by complex environmental conditions in mining areas [1,14,15]. In particular, vegetation coverage, mixed-pixel effects, and limited soil sampling frequently coexist [16,17,18], making it difficult to directly establish stable relationships between spectral information and heavy metal concentrations. Previous studies have attempted to mitigate these challenges through vegetation indices, spectral unmixing, or advanced regression models [2,19,20]. However, these factors are usually treated independently, and the influence of endmember extraction mechanisms on the stability of heavy metal inversion under such compounded conditions remains insufficiently explored [21,22,23]. Therefore, rather than attempting to simultaneously resolve all environmental and data-related limitations, this study primarily addresses the following scientific problem: how physically constrained collaborative endmember extraction affects the robustness and reliability of soil heavy metal inversion in complex mining environments.

In summary, hyperspectral heavy metal inversion is limited by insufficient spectral characterization, inadequate adaptation to complex surfaces, and lack of multi-element analysis. To address these, this study proposes a Physically-Constrained Collaborative Endmember Extraction (PCCEE) framework, integrating spectral unmixing and machine learning for multi-element inversion. Applied to the antimony mining area of Lengshuijiang, Hunan, 53 soil samples were analyzed for eight metals (Pb, Cd, Cr, Hg, As, Cu, Zn, Ni). Multi-metal inversion achieved a mean R² of 0.85, outperforming conventional approaches. PCCEE highlights diagnostic spectral features, mitigates mixed-pixel interference, and enables reliable hotspot mapping. The framework is transferable to other mining environments through expansion of the prior spectral library and the same unmixing-to-inversion workflow. The remainder of this paper is organized as follows: Section 2 describes the study area, data acquisition, preprocessing, and PCCEE workflow; Section 3 presents results; Section 4 analyzes correlations; Section 5 concludes.

2. Materials and Methods

2.1. Study Area

This study focuses on a large-scale antimony mining area in Lengshuijiang City, Hunan Province, China (hereafter the Antimony Mine) [24]. Known as the “World Capital of Antimony,” the site exhibits typical polymetallic and composite heavy metal contamination resulting from over 120 years of mining activities [25]. Pollution processes span the entire chain of “primary mineral release–secondary waste migration–anthropogenic input,” making it an ideal location for investigating multi-element contamination in complex terrains. The mine is situated in central-western Hunan (111°20′–111°35′E, 27°35′–27°45′N), in a transitional zone between low hills and the Dongting Lake plain. Long-term mining and ore weathering have led to multi-element coexistence [26], strong spatial heterogeneity [27] and ecological risks [28].

The Antimony Mine provides a representative site for hyperspectral studies of eight heavy metals (Pb, Cd, Cr, Hg, As, Cu, Zn, Ni), supporting precision pollution management and remote sensing applications. This makes it an ideal testbed for implementing and evaluating the proposed PCCEE framework.

2.2. Data Sources

2.2.1. Remote Sensing Data

This study employed hyperspectral data from the GaoFen-5 (GF-5) satellite. Launched in May 2018, GF-5 is China’s first comprehensive hyperspectral observation satellite. The specific geographical location is shown in Figure 1. Its Advanced Hyperspectral Imager (AHSI) covers 330–2500 nm with high spectral resolution (2.5–4 nm) and medium spatial resolution (30 m), enabling effective identification of surface minerals, soils, and vegetation. AHSI has 330 spectral bands—156 in VNIR (330–1000 nm) and 174 in SWIR (1000–2500 nm)—with an average spectral resolution of ~3 nm. The temporal resolution is four days, suitable for dynamic monitoring of mining areas.

The dataset used was acquired on 24 September 2024, with good radiometric quality and full coverage of the mining area and all soil sampling sites. Minimal cloud contamination occurred, and the acquisition coincided with the late vegetation growth season, reducing vegetation interference. Level-1A data were obtained from the China Centre for Resource Satellite Data and Application (CRESDA). Considering the Antimony Mine’s rugged terrain (slopes 15–30°), orthorectification ensured accurate alignment between spectral information and ground locations.

2.2.2. Field Sampling Data

Fifty-three soil samples were collected during July–August 2024. Sampling sites covered the core mining area, regions surrounding tailings ponds, and downstream farmlands, with an approximate density of one site per km² [29]. A combined grid-based and random sampling strategy was adopted to ensure spatial representativeness. Surface soil samples (0–20 cm) were collected, air-dried at room temperature, ground, and passed through a 100-mesh nylon sieve prior to chemical analysis. Concentrations of eight heavy metals (Pb, Cd, Cr, Hg, As, Cu, Zn, and Ni) were determined using inductively coupled plasma mass spectrometry (ICP-MS) and atomic fluorescence spectrometry (AFS) in a certified environmental analysis laboratory. ICP-MS was applied for Cd, Cr, Cu, Ni, Pb, and Zn, while AFS was used for As and Hg. Standard laboratory quality assurance procedures were followed, including reagent blanks, multi-point calibration with standard solutions, and repeated measurements of selected samples to ensure analytical consistency. The maximum concentrations of Pb, Cd, and As reached 65.8 mg/kg, 2.45 mg/kg, and 66.09 mg/kg, respectively, exceeding the background values of Hunan Province and indicating significant heavy metal enrichment in the study area [30,31].

2.2.3. Data Preprocessing

To minimize sensor errors, atmospheric interference, and topographic distortions, the GF-5 AHSI imagery was rigorously preprocessed [31]. The steps included radiometric calibration to convert digital numbers to apparent reflectance, FLAASH atmospheric correction to retrieve surface reflectance, and DEM-assisted orthorectification for geometric accuracy. The resulting reflectance data closely approximate the true land surface properties, establishing a reliable basis for subsequent spectral unmixing and heavy metal inversion.

2.3. Spectral Decomposition Workflow

To address the challenges of complex surface characteristics and pervasive mixed pixels in mining areas, this study proposes a Physically-Constrained Collaborative Endmember Extraction (PCCEE) framework based on GF-5 AHSI hyperspectral data [32]. The framework and workflow of the entire research are shown in Figure 2. The framework integrates four key steps: (1) pixel-level data restructuring, (2) complementary endmember extraction through PPI and VCA, (3) prior-knowledge-constrained enhancement using Spectral Angle Mapper (SAM) with laboratory spectra, and (4) automated Top-K optimization. This collaborative design ensures both spectral diversity and physical interpretability, enabling the isolation of meaningful pure endmembers from heterogeneous land cover. Building on these optimized endmembers, subsequent abundance estimation and Random Forest inversion achieve high-accuracy multi-metal mapping in the Lengshuijiang antimony mine. Beyond this specific application, the PCCEE framework is theoretically transferable to other mining environments by expanding the PS library while applying the same unmixing-to-inversion workflow.

2.3.1. Hyperspectral Data Reading and Dimensional Restructuring

The raw GF-5 AHSI data are stored in GeoTIFF format, with the default dimension order of (bands, rows, columns). To facilitate pixel-level spectral analysis, the data were first reorganized into a three-dimensional array (rows × columns × bands) and subsequently flattened into a two-dimensional pixel matrix (N × b, where N represents the number of pixels and b the number of spectral bands) as input for endmember extraction. This processing ensures an integrated representation of both spatial and spectral information.

2.3.2. Mixed Pixel Unmixing: Complementary Endmember Extraction via PPI and VCA

In mining area hyperspectral imagery, heavy metal contamination often manifests as localized high-concentration hotspots or mixed spectral signatures, resulting in widespread mixed pixels. Single endmember extraction algorithms are generally insufficient to simultaneously capture both high-purity pollution hotspots and moderately pure surrounding areas. Therefore, this study adopts a complementary strategy combining the Pixel Purity Index (PPI) [33] and Vertex Component Analysis (VCA), leveraging the theoretical strengths of both methods. This approach ensures that the constructed endmember library encompasses both typical contaminants and background materials, providing highly reliable inputs for subsequent heavy metal content inversion.

PPI Algorithm Based on Projection Statistics

To capture the extreme spectral features of heavy metal hotspots (e.g., ore residues), we utilized the Pixel Purity Index (PPI), which isolates pure pixels based on their recurrence as extremes in random projections. The implementation was optimized with GPU acceleration and a spatial constraint filter to remove outliers, thereby ensuring that the extracted endmembers directly represented genuine contamination sources for reliable inversion.

In the specific implementation, for the pixel matrix

X \in R^{N \times b}

, 10,000 groups of random unit vectors

v_{k}

are generated, and the inner product (projection value) between each pixel

i

and the random vector is calculated. The formula is as follows:

p r o j_{k} (i) = X_{i} \cdot v_{k}

(1)

We then count the number of extreme values (scores) for each pixel across all projections:

S c o r e (i) = \sum_{k = 1}^{10,000} Π (p r o j_{k} (i)) = \max (p r o j_{k}) \lor p r o j_{k} (i) = \min (p r o j_{k})

(2)

where

p r o j_{k} (i)

is the inner product value of the

i_{t h}

pixel in the

K_{t h}

projection, and

Π (\cdot)

is the indicator function. The final extracted PPI endmembers represent potential ground objects with significant spectral differences (e.g., bare soil, slag).

2.: Vertex Component Analysis (VCA)

After dimensionality reduction via PCA, VCA iteratively extracts new endmembers that are linearly independent of the existing endmember set using a convex simplex vertex search strategy [34]. Specifically, the mean is first removed from the pixel matrix, and the first few principal components are retained to cover over 95% of the total variance:

X^{'} = X - \bar{X}, U = P C A (X^{'}, n_{P C})

(3)

Next, a random weight vector

w

is initialized, and the projection differences between each pixel and the current endmember set are computed. The pixel with the largest difference is selected as a new endmember:

i^{*} = \arg \max_{i} | w^{Τ} (U_{i} - P_{E} U_{i}) |

(4)

where

P_{E}

denotes the projection matrix of the current endmember set, and

U_{i}

represents the pixel in the principal component space.

To ensure spectral diversity, VCA was employed to capture medium-purity endmembers (e.g., vegetated soils), effectively complementing the high-purity targets identified by PPI. We constructed a final endmember library of 16 spectra, with 10 from PPI and 6 from VCA, empirically chosen to represent the major surface materials. Thereby, this integrated approach provided a reliable foundation for the subsequent heavy metal inversion.

2.3.3. Prior Knowledge-Constrained Endmember Enhancement via SAM Matching

This study incorporated domain-specific prior knowledge—laboratory-measured spectra of four key land-cover types (vegetation, urban surfaces, mining soils, and water bodies)—to enhance the geophysical interpretability of endmember extraction. Spectral similarity between image pixels and these reference spectra was quantified using the Spectral Angle Mapper (SAM) [35], defined as follows:

θ = a c r o s s (\frac{〈x, r〉}{‖x‖ \cdot ‖r‖})

(5)

where

x

denotes the spectral vector of an image pixel,

r

represents the reference spectrum,

〈\cdot, \cdot〉

is the inner product, and

‖\cdot‖

is the L2 norm. A smaller SAM value

θ

indicates higher spectral similarity, with values ranging from 0° (identical) to 90° (orthogonal).

To ensure precise matching, all prior spectra were resampled to the GF-5 AHSI wavelength grid via linear interpolation. The pixel with the minimum SAM value to each reference spectrum was selected as a representative endmember. This approach recovered characteristic mining soil features, distinguished them from irrelevant elements, and provided a physically interpretable basis for identifying pollution hotspots and inverting heavy metals.

2.3.4. Endmember Optimization

In this study, the endmembers obtained from PPI, VCA, and PS matching were integrated into a comprehensive endmember library

ε = {\{e_{j}\}}_{j = 1}^{M}

, followed by an initial validation through visualization. On the one hand, spectral angles (SAM) between endmembers and all image pixels were calculated to generate spatial distribution heatmaps, enabling the assessment of whether the extracted endmembers exhibit reasonable spatial aggregation patterns (e.g., mine tailing piles or vegetation-covered areas). On the other hand, endmember spectral curves were plotted within the same coordinate system to compare their characteristic differences, such as the clay mineral absorption feature around 2200 nm and the vegetation “red edge” at 700–750 nm, thereby providing intuitive guidance for subsequent selection.

Building upon this, we further propose an automated Top-K SAM optimal matching strategy to enhance the physical interpretability and stability of the endmember library. Specifically, for each candidate endmember

e_{j}

, its minimum spectral angle with the PS set

R = {\{r_{k}\}}_{k = 1}^{L}

was computed. The formula is as follows:

s_{j} = \min_{r_{k} \in R} S A M (e_{j}, r_{k})

(6)

S A M (e_{j}, r_{k}) = a c r o s s \frac{〈e_{j}, r_{k}〉}{‖e_{j}‖ ‖r_{k}‖}

(7)

Endmembers were then ranked according to their minimum spectral angle values, and the top K endmembers were selected to construct the optimized endmember library. The formula is as follows:

ε_{b e s t} = \{e_{j} | s_{j} \in T o p - K\}

(8)

In this study, a PS-constrained Top-K endmember selection strategy was adopted, with K set to 4 to reflect the dominant surface components in the mining area, including bare soil, vegetation, city and water in mining area. This resulted in an optimized endmember set that captures both pollution hotspots and key mineral features while effectively eliminating redundant or noisy endmembers. Compared with traditional approaches relying on manual interpretation, this strategy introduces physical constraints and a Top-K selection mechanism, enabling quantitative coupling between spectral information and prior knowledge and thereby enhancing objectivity, physical interpretability, and robustness. Notably, in the context of heavy metal pollution monitoring in mining areas, this method facilitates the identification of localized high-concentration anomalies. Preliminary tests with larger K values did not result in consistent improvements in inversion performance, suggesting that K = 4 provides a reasonable balance between representativeness and model stability.

2.4. Abundance Estimation Methods

Within the PCCEE framework, abundance estimation serves as an important intermediate step linking extracted endmembers to heavy metal concentrations. It quantifies the fractional contribution of each endmember within individual pixels, bridging surface spectral features with soil geochemistry. In mining pollution contexts, these abundances represent mixed proportions of tailings, bare soil, vegetation, and contaminated soils, providing a basis for mapping the spatial distribution of multiple heavy metals.

To exploit this information, the study implements and compares three representative unmixing strategies: the Linear Mixture Model (LMM) with Non-Negative Least Squares (NNLS) for physically interpretable linear modeling, Non-Negative Matrix Factorization (NMF) for unsupervised decomposition, and Support Vector Regression (SVR) to explore potential nonlinear relationships.

The LMM assumes that the spectral signature of a pixel can be expressed as a linear combination of a finite set of endmember spectra [36], and abundances are estimated using NNLS under non-negativity constraints to ensure physical interpretability. To further examine possible nonlinear mixing effects, such as mineral superposition and vegetation–soil interactions, NMF and SVR are additionally employed as alternative abundance estimation strategies [37].

It should be noted that these abundance estimation methods are not newly proposed in this study, but are adopted as supporting tools to investigate how different unmixing strategies interact with the proposed physically constrained collaborative endmember extraction (PCCEE) framework. Detailed algorithmic implementations are therefore omitted for brevity.

2.5. Heavy Metal Inversion Model Development

The final stage of the PCCEE framework is the spatial inversion of heavy metal concentrations, which translates hyperspectral endmember abundances into quantitative estimates of environmental pollutants. In this study, Random Forest Regression (RFR) models were constructed for eight heavy metals, including Pb, Zn, Cu, and Ni, by integrating multi-endmember abundance features derived from the optimized spectral library. This approach effectively captures nonlinear relationships between surface spectral information and metal concentrations, achieving high prediction accuracy and enabling the generation of detailed spatial distribution maps that reveal contamination hotspots, gradients, and heterogeneity across the mining area. By coupling physically constrained unmixing with machine learning-based inversion, the PCCEE framework provides a robust, scalable, and interpretable methodology for regional soil heavy metal monitoring.

2.5.1. Abundance—Sample Association Modeling

To establish a quantitative relationship between endmember abundances and heavy metal concentrations, abundance features were extracted at field sampling locations. First, the geographic coordinates of sampling points were transformed into the projection system of the GF-5 AHSI imagery to ensure spatial correspondence. The corresponding abundance values for each endmember were then extracted from the abundance maps to construct the feature matrix. Samples located outside the image extent or containing missing values were excluded during this process. The resulting dataset consisted of n effective samples, each associated with valid abundance features and concentration labels for all eight metals, forming a multi-output response matrix. Here, n represents the effective sample size retained after spatial matching and data quality screening and was used consistently in subsequent statistical analyses.

2.5.2. Model Training and Validation

RFR was employed to model the relationships between endmember abundances and heavy metal concentrations [38]. RFR is well-suited for capturing nonlinear interactions and handling high-dimensional data. Model parameters were set with an appropriate number of decision trees and a fixed random seed to ensure reproducibility. Model performance was assessed using cross-validation metrics, including the coefficient of determination

R^{2}

and root mean square error (RMSE) for each metal:

R^{2} = 1 - \frac{\sum_{i = 1}^{n} {(y_{i, j} - {\overset{⌢}{y}}_{i, j})}^{2}}{\sum_{i = 1}^{n} {(y_{i, j} - {\bar{y}}_{i, j})}^{2}}

(9)

R M S E_{j} = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(y_{i, j} - y_{i, j})}^{2}}, j = 1, ...8

(10)

In addition to R² and RMSE, the Residual Prediction Deviation (RPD) was introduced as a complementary evaluation index to further assess model robustness. RPD is defined as the ratio of the standard deviation (SD) of the measured heavy metal concentrations to the RMSE of model predictions:

R P D = \frac{S D_{o b s}}{R M S E}

(11)

RPD reflects the relative predictive accuracy of a model with respect to the inherent variability of the target variable. According to commonly accepted criteria in soil spectroscopy and quantitative remote sensing studies, RPD values lower than 1.4 indicate poor predictive performance, values between 1.4 and 2.0 indicate acceptable performance, values between 2.0 and 2.5 indicate good performance, and values greater than 2.5 indicate excellent predictive performance.

2.5.3. Full-Scene Prediction and Spatial Mapping

To evaluate heavy metal contamination at the regional scale, the abundance features of each pixel were input into the trained RFR models to predict concentrations for all eight metals. The predicted values were reconstructed into complete concentration matrices and exported as spatial distribution maps. Visualization employed a hot color scale to highlight contamination hotspots, such as areas surrounding tailings ponds, in contrast to background low-concentration regions.

2.6. Correlation Analysis of Multi-Metal Elements

2.6.1. Correlation Analysis of Multi-Metal Elements

To elucidate the intrinsic relationships among multi-metal pollutants in the study area, this chapter conducts a correlation analysis of eight heavy metal elements (Pb, Cd, Cr, Hg, As, Cu, Zn, Ni) using their concentration data derived from hyperspectral inversion. The Pearson correlation coefficient is employed to quantify linear associations between elements [39], with the following formula:

r_{i, j} = \frac{\sum_{k = 1}^{N} (x_{i k} - {\bar{x}}_{i}) (x_{j k} - {\bar{x}}_{j})}{\sqrt{\sum_{k = 1}^{N} {(x_{i k} - {\bar{x}}_{i})}^{2} {\sum_{k = 1}^{N} (x_{j k} - {\bar{x}}_{j})}^{2}}}

(12)

where

r_{i j}

denotes the correlation coefficient between elements

i

and

j

,

x_{j k}

and

x_{j k}

represent the concentration values of elements

i

and

j

for the k-th valid pixel,

{\bar{x}}_{i}

and

{\bar{x}}_{j}

are their respective mean concentrations, and

N

is the total number of valid pixels. The coefficient

r

ranges from [−1, 1], with absolute values closer to 1 indicating stronger correlations (positive

r > 0

, negative

r < 0

).

2.6.2. Clustering Methodology and Parameter Validation

Building on the correlation analysis, this study applies the K-means unsupervised clustering algorithm to the eight-metal concentration data to identify spatial patterns of pollution. The optimal number of clusters (

k

) is determined using the elbow method (minimizing within-cluster sum of squares, SSE) and silhouette analysis (maximizing cluster separation), yielding

k = 3

.

K-means clustering minimizes the within-cluster sum of squares (SSE) [40], defined as follows:

S S E = \sum_{i = 1}^{k} \sum_{x \in C_{i}} {‖x - μ_{i}‖}^{2}

(13)

where

k

is the number of clusters,

C_{i}

denotes the

i_{t h}

cluster,

x

represents a sample (eight-metal concentration vector), and

μ_{i}

is the centroid (mean vector) of cluster

C_{i}

. The silhouette coefficient

s (i)

quantifies the cohesion of sample

i

within its cluster and separation from other clusters:

s (i) = \frac{b (i) - a (i)}{\max \{a (i), b (i)\}}

(14)

where

a (i)

is the average distance from

i

to other samples in its cluster, and

b (i)

is the average distance from

i

to the nearest samples in other clusters. A higher

s (i)

(range: [−1, 1]) indicates better clustering performance.

For

k = 3

, the silhouette score peaks at 0.62, and SSE stabilizes, confirming

k = 3

as optimal for distinguishing pollution patterns.

3. Results

Based on GF-5 hyperspectral images and soil sampling data, this section presents the inversion results of soil heavy metal concentrations obtained through the PCCEE framework, which integrates spectral decomposition, mixed pixel unmixing, and multi-element collaborative inversion. Specifically, the results include: (i) spectral decomposition and endmember extraction, (ii) unmixing abundance maps, and (iii) multi-element concentration inversion. Together, these results demonstrate the adaptability of the PCCEE framework to the complex mining surface and provide methodological and data support for precise pollution control.

3.1. Spectral Decomposition Results

3.1.1. Endmember Extraction Results

To systematically evaluate the adaptability of different endmember extraction methods to complex mine surface conditions, this study designed ablation experiments comparing seven endmember extraction workflows: PPI, PPI + VCA, PPI + PS, VCA, VCA + PS, PS, and PPI + VCA + PS. The extracted results are shown in Figure 3.

The performance of different endmember extraction methods was evaluated by applying their derived endmembers to linear unmixing, followed by heavy metal concentration inversion using the resulting abundance maps. Inversion accuracy metrics, including R², were used for comparison, as summarized in Table 1.

Experimental results indicate that the combined method PPI + VCA + PS overcomes the inherent limitations of individual endmember extraction approaches in capturing the complex spectral characteristics of mining areas, yielding the most accurate inversion results; the average R² is 0.85 in Table 1. Mining sites typically present a spectrally intimate mixing scenario, where the spectra of pure endmembers are nonlinearly combined. The PPI algorithm effectively identifies the most extreme pixels, which usually correspond to the purest spectral signatures of dominant materials, such as exposed tailings or heavily contaminated soils. However, PPI is prone to selecting outliers or noise. In contrast, VCA complements PPI by identifying vertices of the data convex hull, capturing secondary land cover types with slightly lower purity but distinctive spectral features, such as soils under varying vegetation cover. Nevertheless, in highly heterogeneous mining environments, both VCA and PPI may converge to “mathematical” endmembers that lack physical meaning. From a geochemical perspective, heavy metals in mining soils do not exist independently. They commonly co-occur within mineral phases (e.g., Pb and Cd in galena, Zn and Cd in sphalerite), migrating and precipitating together to form specific composite contamination patterns in the soil. This geochemical association leads to coupled and overlapping spectral features. Therefore, the “Prior” strategy is critical. By incorporating laboratory-measured standard spectra of minerals and soils as prior knowledge, we introduce key physical constraints into the endmember extraction process. This not only guides the algorithm to identify crucial mineral endmembers associated with specific heavy metals (e.g., iron oxides adsorbing As) but also effectively filters out mathematically pure but physically meaningless endmembers extracted by PPI or VCA. The combination of PPI, VCA, and Prior achieves an optimal balance among mathematical purity, spectral diversity, and physical realism, providing the most reliable spectral foundation for subsequent high-accuracy heavy metal inversion.

3.1.2. Unmixing Abundance Results

Using the optimal endmember library (eight classes from PPI + VCA + PS), three abundance estimation methods—LMM, SVR, and NMF—were compared. The extraction results of different methods are shown in Figure 4.

Similar to the results of comparing different endmember extraction methods, we use the abundance maps obtained by unmixing of the three methods for inversion based on the four types of endmembers extracted by PPI + VCA + PS. The inversion results of different methods are compared only by the index that linear unmixing takes the shortest time and achieves the best effect (Table 2). Therefore, in this study, we adopted linear unmixing to invert a variety of metals.

Regarding the performance of unmixing algorithms, the LMM achieves the highest accuracy while exhibiting a significantly higher computational efficiency (4.14 s) compared to NMF (74.34 s) and SVR (1320.58 s). This counterintuitive outcome—where the simplest model performs best—can be explained from two perspectives. First, the decisive role of endmember quality: the proposed PPI + VCA + PS workflow has extracted high-quality endmembers with clear physical meaning. As a result, the majority of spectral variability in the image is already captured by these endmembers. In this context, pixel spectral mixing is largely linear, and complex nonlinear effects are secondary relative to the dominant linear mixing. Consequently, LMM, based on the linearity assumption, is sufficient to capture the principal mixing relationships. Second, although nonlinear models such as SVR and NMF theoretically offer the capacity to model complex mixing, they involve more parameters and higher model complexity. With a limited number of training samples, such complex models are prone to overfitting, fitting the noise in the training data rather than the generalizable patterns, thereby reducing their predictive performance on unseen data. In contrast, the simple and strongly constrained LMM model is more robust and exhibits superior generalization. Moreover, the extremely high computational cost makes nonlinear models impractical for large-scale applications or scenarios requiring rapid processing. Therefore, LMM provides the optimal balance of accuracy, efficiency, and robustness, making it the reasonable choice for this study.

3.2. Inversion Results

This section shows the construction of a multi-element (Pb, Cd, Cr, Hg, As, Cu, Zn, Ni) heavy metal concentration inversion model combined with RFR based on the best endmember extraction results and unmixing abundance results. Scatter plots of the correlation between the retrieved RFR values and the measured values and content inversion plots, respectively, are plotted. The inversion results for different metals are shown in Figure 5.

Based on hyperspectral inversion, the spatial distribution maps in Figure 5 reveal pronounced heterogeneity of heavy metal contamination, characterized by irregular high-concentration patches rather than uniform dispersion. These hotspots, mainly associated with historical mining and waste disposal sites, indicate that pollutant transport is jointly controlled by industrial activities and hydrogeological processes. Highly toxic metals such as Hg, Pb, and Cd dominate in risk zones, posing long-term threats through multi-media migration. Different metals exhibit distinct environmental behaviors: Cd and Hg show higher mobility and bioavailability, while Pb and Ni are more strongly retained near sources. Such differences shape the observed dispersion patterns. Moreover, the significant spatial overlap among metals highlights the prevalence of mixed pollution, the combined effects of which on soil, vegetation, and human health may exceed single-metal toxicity. These findings underscore the need for precision management in order to (i) identify and control major sources, (ii) apply remediation strategies tailored to metal mobility, and (iii) establish integrated monitoring systems to track contaminant dynamics across soil–water–biota systems. In summary, hyperspectral inversion not only delineates contamination heterogeneity but also provides critical spatial evidence for understanding pollutant transport and guiding targeted risk control in mining regions.

In this section, model predictive performance was evaluated using inversion scatter plots for individual heavy metals (Figure 6). The training set is shown as blue circles, the validation set as red triangles, and the sample size (n) is indicated for reference. The dashed line represents the ideal 1:1 relationship. Overall, the model exhibits strong predictive capability for most elements, including Cd, Hg, Cu, Cr, As, Ni, and Zn. Data points from both the training and validation sets are closely distributed around the reference line, indicating that the model effectively captures the dominant spectral–concentration relationships and maintains good generalization performance. Among these elements, Cd, Hg, Cu, and Cr achieve the highest inversion accuracy, with minimal dispersion across both datasets. For Zn, although several high-concentration outliers appear in the validation set, the majority of samples follow a clear linear trend, suggesting that the model remains reliable within low to medium concentration ranges. Deviations at higher concentrations are likely attributable to spectral saturation effects or anomalous samples. In contrast, Pb shows substantially weaker inversion performance, characterized by irregular scatter in both training and validation sets. This limitation is primarily related to the intrinsic spectral properties of Pb rather than sample size. In soils, Pb occurs in multiple mineralogical forms (e.g., carbonates and sulfates), resulting in weak and broad spectral responses that are easily masked by background components such as iron oxides and organic matter. Consequently, Pb exhibits a low signal-to-noise ratio, making accurate hyperspectral inversion particularly challenging. In summary, the proposed model demonstrates robust and generalizable inversion capability for multiple heavy metals. However, Pb prediction requires a differentiated strategy that accounts for its spectrally insensitive behavior, such as exploiting sub-pixel abundance information or incorporating auxiliary variables related to soil geochemistry.

4. Discussion

4.1. Applicability and Uncertainty of the Proposed Framework

Dense vegetation cover remains a fundamental limitation for optical hyperspectral inversion of soil properties because canopy reflectance can partially or fully obscure soil spectral signals [18]. Although the PCCEE framework incorporates vegetation endmembers and physically constrained unmixing to mitigate vegetation interference under mixed land-cover conditions, its effectiveness is inherently constrained in fully vegetated areas where soil exposure is minimal.

Temporal mismatches between field sampling and satellite acquisition may further alter the soil contribution within mixed pixels. In the present study area, the coexistence of bare soil, tailings, and sparsely to moderately vegetated surfaces allows soil-related spectral information to remain detectable. Under such conditions, the proposed framework primarily improves inversion stability by enhancing endmember representativeness rather than overcoming the physical masking imposed by dense canopy cover. Potential discrepancies induced by vegetation dynamics are therefore acknowledged as an inherent uncertainty. In addition, environmental variables were not explicitly incorporated due to data constraints, which may further influence the spectral–concentration relationship.

Beyond vegetation effects, framework applicability is also influenced by sample availability and model complexity. Given the limited number of field samples, an LMM was adopted because simpler models generally provide more stable generalization under small-sample conditions. The optimized endmember extraction strategy was introduced to improve feature representativeness in a sample-scarce context rather than to increase inversion-stage complexity. Comparative tests with nonlinear approaches showed that LMM achieved comparable accuracy in the study area while maintaining superior computational efficiency and interpretability, which is advantageous for large-scale applications.

Nevertheless, nonlinear mixing caused by mineral superposition, soil heterogeneity, and vegetation–soil interactions may still exist and are not explicitly modeled by LMM. These effects may partly explain the relatively lower performance observed for elements such as Pb with weak spectral responses. Furthermore, inversion results were derived using a single Random Forest model and evaluated via cross-validation. While this design isolates the contribution of the unmixing strategy, it may introduce model-dependent uncertainty and limit generalization. Future work should incorporate independent validation datasets and multi-model ensembles to further assess robustness.

Overall, the proposed framework is most suitable for mining environments dominated by bare soil, tailings, or sparse vegetation where linear mixing assumptions are more valid. Its transferability to densely vegetated regions or strongly nonlinear environments should be further evaluated, and the integration of additional environmental variables or nonlinear corrections may improve future performance.

4.2. Spatial Clustering and Risk Pattern Analysis of Multi-Metal Pollution

4.2.1. Correlation Patterns of Polymetallic Elements

The correlation matrix of the eight heavy metals in Figure 7 reveals pronounced inter-element associations and clear geochemical grouping characteristics. Overall, Pb, Cd, and Zn exhibit the strongest positive correlations, indicating a common source or coupled migration behavior in the mining environment. In particular, the high Pb–Cd (r = 0.69) and Pb–Zn (r = 0.73) relationships are consistent with the well-known co-occurrence of these elements in sulfide mineral assemblages such as galena and sphalerite.

A second group of moderate correlations is observed among Cr, Ni, Cu, and As. For example, the Cr–Ni (r = 0.42) and Cu–As (r = 0.50) relationships suggest partial geochemical linkage, possibly related to shared lithological backgrounds or secondary redistribution during weathering and transport processes. These associations are weaker than those of the Pb–Cd–Zn group, indicating more complex or mixed sources.

In contrast, several element pairs show weak or negligible correlations, implying relatively independent origins or environmental behaviors. For instance, the near-zero correlation between Cr and As suggests that Cr is more strongly controlled by parent material composition, whereas As enrichment is primarily associated with sulfide oxidation processes in the mining area.

Overall, the observed correlation structure aligns well with the regional geological setting and known mineralization characteristics. The strong coupling within the Pb–Cd–Zn group reflects sulfide mineral symbiosis, whereas weaker associations involving Cr and As highlight the coexistence of both mining-derived and geogenic controls. These findings provide important evidence for interpreting the sources and co-migration mechanisms of heavy metals in the study area.

4.2.2. Spatial Clustering Characteristics of Polymetallic Pollution

The spatial clustering results in Figure 8 reveal pronounced heterogeneity in the distribution of heavy metal pollution across the study area, reflecting the combined influence of mining activities, geomorphological controls, and downstream transport processes. Three distinct pollution zones can be identified, each exhibiting different geochemical characteristics and environmental implications.

The low-background zone is mainly distributed in peripheral areas away from the core mining and tailings regions. In this zone, most heavy metals remain at relatively low concentrations with weak inter-element coupling, suggesting limited anthropogenic disturbance and stronger control by natural background conditions. The spatial continuity of this cluster indicates relatively stable soil environmental quality.

The moderate pollution diffusion zone forms a transitional belt surrounding the mining center and extending along local drainage pathways. This zone is characterized by moderate enrichment of multiple elements and increased spatial variability, indicating the outward migration of contaminants from primary sources. The observed pattern suggests that surface runoff and sediment transport play important roles in redistributing heavy metals, leading to secondary accumulation in downstream or low-lying areas.

The high-risk composite pollution zone is concentrated near the mining pits and tailings ponds, where the highest concentrations of Pb, Cd, and Zn are observed. The strong co-enrichment and spatial coincidence of these elements are consistent with sulfide mineral weathering and tailings-related emissions. This cluster represents the primary ecological risk hotspot in the study area and reflects the dominant influence of long-term mining disturbance.

Overall, the spatial clustering pattern demonstrates a clear source–transport–accumulation framework of heavy metal pollution in the mining landscape. The results not only corroborate the correlation analysis but also provide spatially explicit evidence for identifying priority control areas and understanding the coupled behavior of multi-metal contamination.

5. Conclusions

This study developed and applied a Physically-Constrained Collaborative Endmember Extraction (PCCEE) framework to accurately retrieve the concentrations of eight heavy metals (Pb, Cd, Cr, Hg, As, Cu, Zn, Ni) in soils of the Lengshuijiang Antimony Mine, China. By synergistically combining PPI, VCA, and prior-knowledge-guided SAM, PCCEE enhanced the physical interpretability and purity of extracted endmembers. The selected LMM efficiently resolved mixed pixels and provided robust abundance estimates, which were then coupled with RFR to achieve high-precision, multi-element inversion (mean R² = 0.85). This framework effectively addressed spectral mixing and nonlinear interference in complex mining landscapes, offering a reliable pathway for quantifying heavy metal pollution with clear mechanistic interpretability. Beyond this specific case, the PCCEE framework is theoretically transferable to other mining environments by expanding the PS library while applying the same unmixing-to-inversion workflow. Future efforts should further validate the framework across diverse sensors and mining regions and explore integration with deep learning–based unmixing and dynamic monitoring for enhanced predictive accuracy and interpretability. In conclusion, this research underscores the value of combining spectral unmixing and machine learning for multi-element pollution inversion in complex environments. The framework offers a scalable and interpretable tool for supporting precise environmental management and remediation strategies in mining-affected areas.

Author Contributions

Conceptualization, X.Z. and W.H.; methodology, X.Z.; software, X.Z.; validation, X.Z. and R.F.; formal analysis, X.H.; investigation, S.W.; resources, Y.W.; data curation, L.C. and J.G.; writing—original draft preparation, X.Z.; writing—review and editing, X.Z.; visualization, Y.W.; supervision, Y.W.; project administration, Y.W.; funding acquisition, Y.W. and L.C. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China (No. 42501594) and the Natural Science Foundation of Hunan Province (No. 2024JJ8353).

Data Availability Statement

The hyperspectral imagery, soil sampling data, and processing code supporting this study are publicly available in the Science Data Bank repository: https://doi.org/10.57760/sciencedb.29811 (accessed on 25 February 2026) [41]. The dataset is accessible without embargo or login requirements for peer review purposes.

Acknowledgments

The authors have reviewed and edited the output and take full responsibility for the content of this publication.

Conflicts of Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References

Qi, C.; Hu, T.; Zheng, Y.; Wu, M.; Tang, F.H.; Liu, M.; Zhang, B.; Derrible, S.; Chen, Q.; Hu, G.; et al. Global and regional patterns of soil metal (loid) mobility and associated risks. Nat. Commun. 2025, 16, 2947. [Google Scholar] [CrossRef]
Hu, B.; Shao, S.; Ni, H.; Fu, Z.; Hu, L.; Zhou, Y.; Min, X.; She, S.; Chen, S.; Huang, M.; et al. Current status, spatial features, health risks, and potential driving factors of soil heavy metal pollution in China at province level. Environ. Pollut. 2020, 266, 114961. [Google Scholar] [CrossRef]
Adnan, M.; Xiao, B.; Ali, M.U.; Xiao, P.; Zhao, P.; Wang, H.; Bibi, S. Heavy metals pollution from smelting activities: A threat to soil and groundwater. Ecotoxicol. Environ. Saf. 2024, 274, 116189. [Google Scholar] [CrossRef]
Liu, H.; Qu, M.; Chen, J.; Guang, X.; Zhang, J.; Liu, M.; Kang, J.; Zhao, Y.; Huang, B. Heavy metal accumulation in the surrounding areas affected by mining in China: Spatial distribution patterns, risk assessment, and influencing factors. Sci. Total Environ. 2022, 825, 154004. [Google Scholar] [CrossRef] [PubMed]
Hong, Y.; Shen, R.; Cheng, H.; Chen, Y.; Zhang, Y.; Liu, Y.; Zhou, M.; Yu, L.; Liu, Y.; Liu, Y. Estimating lead and zinc concentrations in peri-urban agricultural soils through reflectance spectroscopy: Effects of fractional-order derivative and random forest. Sci. Total Environ. 2019, 651, 1969–1982. [Google Scholar] [CrossRef]
Tan, K.; Wang, H.; Chen, L.; Du, Q.; Du, P.; Pan, C. Estimation of the spatial distribution of heavy metal in agricultural soils using airborne hyperspectral imaging and random forest. J. Hazard. Mater. 2020, 382, 120987. [Google Scholar] [CrossRef] [PubMed]
Guo, B.; Wang, Y.; Pei, L.; Yu, Y.; Liu, F.; Zhang, D.; Wang, X.; Su, Y.; Zhang, D.; Zhang, B.; et al. Determining the effects of socioeconomic and environmental determinants on chronic obstructive pulmonary disease (COPD) mortality using geographically and temporally weighted regression model across Xi’an during 2014–2016. Sci. Total Environ. 2021, 756, 143869. [Google Scholar] [CrossRef] [PubMed]
Westman, W.E. Monitoring the environment by remote sensing. Trends Ecol. Evol. 1987, 2, 333–337. [Google Scholar] [CrossRef]
Li, J.; Pei, Y.; Zhao, S.; Xiao, R.; Sang, X.; Zhang, C. A review of remote sensing for environmental monitoring in China. Remote Sens. 2020, 12, 1130. [Google Scholar] [CrossRef]
Lovynska, V.; Bayat, B.; Bol, R.; Moradi, S.; Rahmati, M.; Raj, R.; Sytnyk, S.; Wiche, O.; Wu, B.; Montzka, C. Monitoring heavy metals and metalloids in soils and vegetation by remote sensing: A review. Remote Sens. 2024, 16, 3221. [Google Scholar] [CrossRef]
Wu, Y.; Chen, J.; Ji, J.; Gong, P.; Liao, Q.; Tian, Q.; Ma, H. A mechanism study of reflectance spectroscopy for investigating heavy metals in soils. Soil Sci. Soc. Am. J. 2007, 71, 918–926. [Google Scholar] [CrossRef]
Wan, Z.; Wang, S.; Han, W.; Wang, Y.; Huang, X.; Zhang, X.; Chen, X.; Chen, Y. A systematic survey and meta-analysis of the segment anything model in remote sensing image processing: Challenges, advances, applications, and opportunities. ISPRS J. Photogramm. Remote Sens. 2025, 229, 436–466. [Google Scholar] [CrossRef]
Wang, F.; Gao, J.; Zha, Y. Hyperspectral sensing of heavy metals in soil and vegetation: Feasibility and challenges. ISPRS J. Photogramm. Remote Sens. 2018, 136, 73–84. [Google Scholar] [CrossRef]
Wu, Y.; Li, X.; Yu, L.; Wang, T.; Wang, J.; Liu, T. Review of soil heavy metal pollution in China: Spatial distribution, primary sources, and remediation alternatives. Resour. Conserv. Recycl. 2022, 181, 106261. [Google Scholar] [CrossRef]
Hu, B.; Xie, M.; Zhou, Y.; Chen, S.; Zhou, Y.; Ni, H.; Peng, J.; Ji, W.; Hong, Y.; Li, H.; et al. A high-resolution map of soil organic carbon in cropland of Southern China. Catena 2024, 237, 107813. [Google Scholar] [CrossRef]
Twomey, S. Introduction to the Mathematics of Inversion in Remote Sensing and Indirect Measurements; Courier Dover Publications: Garden City, NY, USA, 2019. [Google Scholar]
Su, Y.; Li, B.; Li, J.; Guo, B.; Feng, Q. Hyperspectral remote sensing for soil heavy metal inversion: Insights and applications. Int. J. Digit. Earth 2025, 18, 2520474. [Google Scholar] [CrossRef]
Li, Z.; Ma, Z.; van der Kuijp, T.J.; Yuan, Z.; Huang, L. A review of soil heavy metal pollution from mines in China: Pollution and health risk assessment. Sci. Total Environ. 2014, 468, 843–853. [Google Scholar] [CrossRef]
Fox, G.A.; Metla, R. Soil property analysis using principal components analysis, soil line, and regression models. Soil Sci. Soc. Am. J. 2005, 69, 1782–1788. [Google Scholar] [CrossRef]
Pandit, C.M.; Filippelli, G.M.; Li, L. Estimation of heavy-metal contamination in soil using reflectance spectroscopy and partial least-squares regression. Int. J. Remote Sens. 2010, 31, 4111–4123. [Google Scholar] [CrossRef]
Cheng, H.; Shen, R.; Chen, Y.; Wan, Q.; Shi, T.; Wang, J.; Wan, Y.; Hong, Y.; Li, X. Estimating heavy metal concentrations in suburban soils with reflectance spectroscopy. Geoderma 2019, 336, 59–67. [Google Scholar] [CrossRef]
Shi, T.; Chen, Y.; Liu, Y.; Wu, G. Visible and near-infrared reflectance spectroscopy—An alternative for monitoring soil contamination by heavy metals. J. Hazard. Mater. 2014, 265, 166–176. [Google Scholar] [CrossRef] [PubMed]
Hong, Y.; Chen, S.; Chen, Y.; Linderman, M.; Mouazen, A.M.; Liu, Y.; Guo, L.; Yu, L.; Liu, Y.; Cheng, H.; et al. Comparing laboratory and airborne hyperspectral data for the estimation and mapping of topsoil organic carbon: Feature selection coupled with random forest. Soil Tillage Res. 2020, 199, 104589. [Google Scholar] [CrossRef]
Li, T. The development of geological structures in China. GeoJournal 1980, 4, 487–497. [Google Scholar] [CrossRef]
Wang, X.; He, M.; Xie, J.; Xi, J.; Lu, X. Heavy metal pollution of the world largest antimony mine-affected agricultural soils in Hunan province (China). J. Soils Sediments 2010, 10, 827–837. [Google Scholar] [CrossRef]
Li, X.; Zhao, Z.; Yuan, Y.; Wang, X.; Li, X. Heavy metal accumulation and its spatial distribution in agricultural soils: Evidence from Hunan province, China. RSC Adv. 2018, 8, 10665–10672. [Google Scholar] [CrossRef]
Zhang, X.; Yang, H.; Cui, Z. Evaluation and analysis of soil migration and distribution characteristics of heavy metals in iron tailings. J. Clean. Prod. 2018, 172, 475–480. [Google Scholar] [CrossRef]
Sharma, R.K.; Agrawal, M. Biological effects of heavy metals: An overview. J. Environ. Biol. 2005, 26, 301–313. [Google Scholar] [PubMed]
Duan, Q.; Lee, J.; Liu, Y.; Chen, H.; Hu, H. Distribution of heavy metal pollution in surface soil samples in China: A graphical review. Bull. Environ. Contam. Toxicol. 2016, 97, 303–309. [Google Scholar] [CrossRef]
Carter, M.R.; Gregorich, E.G. Soil Sampling and Methods of Analysis; CRC Press: Boca Raton, FL, USA, 2007. [Google Scholar]
Meng, Q. Remote sensing data preprocessing technology. In Remote Sensing of Urban Green Space; Springer Nature: Singapore, 2023; pp. 9–26. [Google Scholar]
Liu, Y.N.; Sun, D.X.; Cao, K.Q.; Liu, S.F.; Chai, M.Y.; Liang, J.; Yuan, J. Evaluation of GF-5 AHSI on-orbit instrument radiometric performance. J. Remote Sens. 2020, 24, 352–359. [Google Scholar]
González, C.; Resano, J.; Mozos, D.; Plaza, A.; Valencia, D. FPGA implementation of the pixel purity index algorithm for remotely sensed hyperspectral image analysis. EURASIP J. Adv. Signal Process. 2010, 2010, 969806. [Google Scholar] [CrossRef]
Nascimento, J.M.; Dias, J.M. Vertex component analysis: A fast algorithm to unmix hyperspectral data. IEEE Trans. Geosci. Remote Sens. 2005, 43, 898–910. [Google Scholar] [CrossRef]
Yang, C.; Everitt, J.H.; Bradford, J.M. Yield estimation from hyperspectral imagery using spectral angle mapper (SAM). Trans. ASABE 2008, 51, 729–737. [Google Scholar] [CrossRef]
Veganzones, M.A.; Drumetz, L.; Tochon, G.; Dalla Mura, M.; Plaza, A.; Bioucas-Dias, J.; Chanussot, J. A new extended linear mixing model to address spectral variability. In 2014 6th Workshop on Hyperspectral Image and Signal Processing: Evolution in Remote Sensing (WHISPERS); IEEE: Piscataway, NJ, USA, 2014; pp. 1–4. [Google Scholar]
Heylen, R.; Parente, M.; Gader, P. A review of nonlinear hyperspectral unmixing methods. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2014, 7, 1844–1868. [Google Scholar] [CrossRef]
Belgiu, M.; Drăguţ, L. Random forest in remote sensing: A review of applications and future directions. ISPRS J. Photogramm. Remote Sens. 2016, 114, 24–31. [Google Scholar] [CrossRef]
López-Granados, F.; Jurado-Expósito, M.; Peña-Barragán, J.M.; García-Torres, L. Using geostatistical and remote sensing approaches for mapping soil properties. Eur. J. Agron. 2005, 23, 279–289. [Google Scholar] [CrossRef]
Abbas, A.W.; Minallh, N.; Ahmad, N.; Abid, S.A.R.; Khan, M.A.A. K-Means and ISODATA clustering algorithms for landcover classification using remote sensing. Sindh Univ. Res. J.-SURJ 2016, 48, 315–318. [Google Scholar]
Zhang, X. Mine soil metal hyperspectral retrieval dataset in Lengshuijiang City, Hunan Province. Sci. Data Bank 2025. [Google Scholar] [CrossRef]

Figure 1. Location of the study area and soil sampling sites. The blue vector data are the mine boundaries.

Figure 2. Overall workflow of the PCCEE framework.

Figure 3. (a–g) show the optimal spectral results obtained from seven endmember extraction workflows: PPI, VCA, PS, PPI + VCA, PPI + PS, VCA + PS, and PPI + VCA + PS. The horizontal axis represents band wavelength and the vertical axis represents reflectance. Each curve is labeled with the endmember type and extraction method.

Figure 4. Unmixing results of LMM, NMF, and SVR.

Figure 5. Scatter plots of the correlation between retrieved and measured concentrations for eight heavy metals (mg/kg).

Figure 6. Inversion map of soil metal content in the whole mining area.

Figure 7. Multiple element correlation coefficients.

Figure 8. Pollution source clustering and average heavy metal concentrations across clusters.

Table 1. Inversion accuracy of PPI, PPI + VCA, PPI + PS, VCA, VCA + PS, PS, PPI + VCA + PS for each metal.

	Soil Metal Content Only			PPI
Metal	R²	RMSE	NRMSE	R²	RMSE	NRMSE
As	0.86	7.23	0.07	0.74	9.69	0.09
Cd	0.82	0.3	0.08	0.82	0.29	0.08
Cr	0.83	5.56	0.07	0.8	5.99	0.07
Cu	0.79	4.32	0.08	0.82	3.95	0.07
Hg	0.79	0.09	0.09	0.79	0.09	0.09
Ni	0.83	2.79	0.1	0.78	3.2	0.12
Pb	0.85	3.27	0.07	0.8	3.8	0.08
Zn	0.85	15.69	0.1	0.72	21.76	0.14
Average value	0.83	4.91	0.08	0.78	6.1	0.09
Standard deviation	0.02	4.67	0.01	0.04	6.58	0.02
	VCA			PS
Metal	R²	RMSE	NRMSE	R²	RMSE	NRMSE
As	0.2	17.07	0.16	0.82	8.23	0.08
Cd	0.36	0.56	0.15	0.8	0.31	0.08
Cr	0.43	10.12	0.13	0.81	5.85	0.07
Cu	0.56	6.24	0.12	0.83	3.87	0.07
Hg	0.41	0.15	0.16	0.81	0.08	0.09
Ni	0.31	5.61	0.2	0.79	3.08	0.11
Pb	0.52	5.89	0.13	0.83	3.54	0.08
Zn	0.28	34.77	0.22	0.81	18.09	0.12
Average value	0.38	10.05	0.16	0.81	5.38	0.09
Standard deviation	0.11	10.61	0.03	0.01	5.42	0.02
	PPI + VCA			PPI + PS
Metal	R²	RMSE	NRMSE	R²	RMSE	NRMSE
As	0.81	8.25	0.08	0.84	7.66	0.07
Cd	0.8	0.31	0.09	0.83	0.29	0.08
Cr	0.77	6.43	0.08	0.79	6.16	0.08
Cu	0.84	3.8	0.07	0.82	3.94	0.07
Hg	0.77	0.09	0.1	0.79	0.09	0.09
Ni	0.79	3.07	0.11	0.77	3.27	0.12
Pb	0.82	3.61	0.08	0.84	3.39	0.07
Zn	0.82	17.61	0.11	0.88	14.3	0.09
Average value	0.8	5.4	0.09	0.82	4.89	0.08
Standard deviation	0.02	5.28	0.02	0.03	4.3	0.01
	VCA + PS			PPI + VCA + PS
Metal	R²	RMSE	NRMSE	R²	RMSE	NRMSE
As	0.85	7.43	0.07	0.89	6.4	0.06
Cd	0.85	0.27	0.07	0.88	0.25	0.07
Cr	0.88	4.59	0.06	0.84	5.35	0.07
Cu	0.81	4.11	0.08	0.84	3.72	0.07
Hg	0.8	0.09	0.09	0.84	0.08	0.08
Ni	0.85	2.61	0.09	0.81	2.94	0.11
Pb	0.83	3.51	0.08	0.86	3.19	0.07
Zn	0.8	18.44	0.12	0.84	16.23	0.1
Average value	0.83	5.13	0.08	0.85	4.77	0.08
Standard deviation	0.03	5.5	0.02	0.02	4.79	0.02

Table 2. Inversion accuracy of each metal under LMM, NMF and SVR unmixing results.

Methods	As				Cd
Methods	R²	RMSE	NRMSE	RPD	R²	RMSE	NRMSE	RPD
LMM	0.89	6.4	0.06	3.02	0.88	0.25	0.07	2.87
NMF	0.86	7.12	0.07	2.72	0.84	0.28	0.08	2.52
SVR	0.87	6.88	0.06	2.8	0.82	0.3	0.08	2.54
Methods	Cr				Cu
Methods	R²	RMSE	NRMSE	RPD	R²	RMSE	NRMSE	RPD
LMM	0.84	5.35	0.07	2.52	0.84	3.72	0.07	2.57
NMF	0.82	5.63	0.07	2.39	0.77	4.53	0.09	2.11
SVR	0.84	5.39	0.07	2.36	0.82	4	0.08	2.42
Methods	Hg				Ni
Methods	R²	RMSE	NRMSE	RPD	R²	RMSE	NRMSE	RPD
LMM	0.84	0.08	0.08	2.53	0.81	2.94	0.11	2.53
NMF	0.81	0.08	0.09	2.33	0.82	2.91	0.11	2.33
SVR	0.82	0.08	0.09	2.39	0.81	2.96	0.11	2.39
Methods	Pb				Zn
Methods	R²	RMSE	NRMSE	RPD	R²	RMSE	NRMSE	RPD
LMM	0.86	3.19	0.07	2.34	0.84	16.23	0.1	2.56
NMF	0.84	3.4	0.07	2.36	0.82	17.22	0.11	2.41
SVR	0.87	3.05	0.07	2.27	0.85	16.03	0.1	2.64
Methods	Mean
Methods	R²		RMSE		NRMSE		RPD
LMM	0.85		4.77		0.08		2.64
NMF	0.82		5.15		0.08		2.42
SVR	0.84		4.84		0.08		2.51

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhang, X.; Cao, L.; Ge, J.; Feng, R.; Han, W.; Huang, X.; Wang, S.; Wang, Y. Hyperspectral Unmixing-Based Remote Sensing Inversion of Multiple Heavy Metals in Mining Soils: A Case Study of the Lengshuijiang Antimony Mine, Hunan Province. Remote Sens. 2026, 18, 767. https://doi.org/10.3390/rs18050767

AMA Style

Zhang X, Cao L, Ge J, Feng R, Han W, Huang X, Wang S, Wang Y. Hyperspectral Unmixing-Based Remote Sensing Inversion of Multiple Heavy Metals in Mining Soils: A Case Study of the Lengshuijiang Antimony Mine, Hunan Province. Remote Sensing. 2026; 18(5):767. https://doi.org/10.3390/rs18050767

Chicago/Turabian Style

Zhang, Xinyu, Li Cao, Jiawang Ge, Ruyi Feng, Wei Han, Xiaohui Huang, Sheng Wang, and Yuewei Wang. 2026. "Hyperspectral Unmixing-Based Remote Sensing Inversion of Multiple Heavy Metals in Mining Soils: A Case Study of the Lengshuijiang Antimony Mine, Hunan Province" Remote Sensing 18, no. 5: 767. https://doi.org/10.3390/rs18050767

APA Style

Zhang, X., Cao, L., Ge, J., Feng, R., Han, W., Huang, X., Wang, S., & Wang, Y. (2026). Hyperspectral Unmixing-Based Remote Sensing Inversion of Multiple Heavy Metals in Mining Soils: A Case Study of the Lengshuijiang Antimony Mine, Hunan Province. Remote Sensing, 18(5), 767. https://doi.org/10.3390/rs18050767

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Hyperspectral Unmixing-Based Remote Sensing Inversion of Multiple Heavy Metals in Mining Soils: A Case Study of the Lengshuijiang Antimony Mine, Hunan Province

Highlights

Abstract

1. Introduction

2. Materials and Methods

2.1. Study Area

2.2. Data Sources

2.2.1. Remote Sensing Data

2.2.2. Field Sampling Data

2.2.3. Data Preprocessing

2.3. Spectral Decomposition Workflow

2.3.1. Hyperspectral Data Reading and Dimensional Restructuring

2.3.2. Mixed Pixel Unmixing: Complementary Endmember Extraction via PPI and VCA

2.3.3. Prior Knowledge-Constrained Endmember Enhancement via SAM Matching

2.3.4. Endmember Optimization

2.4. Abundance Estimation Methods

2.5. Heavy Metal Inversion Model Development

2.5.1. Abundance—Sample Association Modeling

2.5.2. Model Training and Validation

2.5.3. Full-Scene Prediction and Spatial Mapping

2.6. Correlation Analysis of Multi-Metal Elements

2.6.1. Correlation Analysis of Multi-Metal Elements

2.6.2. Clustering Methodology and Parameter Validation

3. Results

3.1. Spectral Decomposition Results

3.1.1. Endmember Extraction Results

3.1.2. Unmixing Abundance Results

3.2. Inversion Results

4. Discussion

4.1. Applicability and Uncertainty of the Proposed Framework

4.2. Spatial Clustering and Risk Pattern Analysis of Multi-Metal Pollution

4.2.1. Correlation Patterns of Polymetallic Elements

4.2.2. Spatial Clustering Characteristics of Polymetallic Pollution

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI