Knowledge-Driven Adaptive Direct Sampling for Reconstructing Geochemical Fields Under Sampling Bias

Liu, Yameng; Zi, Jiali; Dong, Yanqi; Xu, Nuo; Zhang, Qing; Chen, Feixiang

doi:10.3390/ijgi15030111

Open AccessArticle

Knowledge-Driven Adaptive Direct Sampling for Reconstructing Geochemical Fields Under Sampling Bias

by

Yameng Liu

¹,

Jiali Zi

¹,

Yanqi Dong

¹

,

Nuo Xu

¹,

Qing Zhang

¹ and

Feixiang Chen

^1,2,*

¹

School of Information Science and Technology, Beijing Forestry University, Beijing 100083, China

²

Engineering Research Center for Forestry-Oriented Intelligent Information Processing, National Forestry and Grassland Administration, Beijing 100083, China

^*

Author to whom correspondence should be addressed.

ISPRS Int. J. Geo-Inf. 2026, 15(3), 111; https://doi.org/10.3390/ijgi15030111

Submission received: 19 December 2025 / Revised: 20 February 2026 / Accepted: 5 March 2026 / Published: 6 March 2026

(This article belongs to the Special Issue Spatial Data Science and Knowledge Discovery)

Download

Browse Figures

Versions Notes

Abstract

Deriving meaningful mineralization information from raw geospatial datasets is fundamental to the sustainable evaluation and management of mineral resources. As a cornerstone of mineral resource evaluation, identifying geochemical anomalies often faces the significant challenge of sampling bias in practical applications. Strong spatial unevenness often leads to information loss in traditional geostatistical models, where critical anomaly structures may be over-smoothed or obscured. To address this limitation, this study proposes a knowledge-driven adaptive direct sampling (KD-ADS) framework. This approach functions as a geospatial context-aware reconstruction engine. It integrates a multi-factor knowledge-driven weighting system to prioritize regions with high information value and incorporates a dynamic context-aware neighborhood module that adapts to local statistical characteristics. Using 1268 samples from the Jiulian Mountains tungsten metallogenic belt, ablation studies demonstrate the individual contributions of the knowledge-driven weighting and adaptive neighborhood modules to improving reconstruction accuracy and spatial connectivity. Comparative experiments with the traditional direct sampling (DS) algorithm demonstrate that KD-ADS achieves a more accurate reconstruction of geochemical fields and better preserves discrete high-value mineralization anomalies and spatial heterogeneity under sampling-bias conditions. This approach improves the reproducibility of mineralization enrichment patterns and enhances computational efficiency, providing data science-driven support for sustainable mineral exploration and resource allocation.

Keywords:

spatial sampling bias; geochemical anomaly; spatial structure reconstruction; knowledge-driven modeling; adaptive direct sampling; sustainable mineral exploration

1. Introduction

In modern mineral exploration, extracting mineralization-indicative information from raw geospatial observation data is fundamental to the sustainable evaluation and management of mineral resources. Geochemical data are not merely numerical measurements but are vital sources of spatial information that reveal underlying geological processes. Extracting meaningful spatial knowledge from geochemical datasets is crucial for identifying hidden mineralization patterns and providing reliable decision support for resource allocation [1].

As a fundamental component of mineral resource exploration and environmental assessment [2], geochemical anomaly identification primarily aims to detect anomalous patterns within complex multidimensional spatial data that are closely associated with mineralization processes. The process of identifying geochemical anomalies frequently faces the challenge of spatial sampling bias. In practical exploration, irregularly distributed samples resulting from terrain or resource constraints often lead to critical information loss [3]. This may obscure local heterogeneity and high-frequency anomalous structures, thereby undermining the core spatial knowledge essential for mineral resource discovery. Data gridding has been the standard preprocessing step to address this by transforming scattered samples into continuous fields [4], thereby supporting subsequent spatial modeling and analysis. In practical applications, spatial interpolation or geostatistical simulation methods are typically employed to interpolate and densify raw data as preprocessing steps [5,6]. In traditional interpolation methods, inverse distance weighting is limited in accurately inferring values at unsampled locations based on sparse data [7], while the kriging method, although based on variograms, introduces a smoothing effect due to its weighted moving average principle [8], which weakens or loses critical anomaly information [9]. These limitations have driven the transition toward more advanced spatial inference frameworks that better preserve spatial heterogeneity and anomaly characteristics, particularly multiple-point geostatistical simulations (MPS).

MPS is a simulation method based on training images (TIs) that reproduces complex spatial patterns, better capturing spatial variations and characterizing the uncertainty of variables [10,11,12]. MPS demonstrates high accuracy and flexibility in the simulation of geological data, making it widely applicable. Neven et al. [13] employed MPS to model bedrock and reproduce complex karstic geomorphological features. Brilliant et al. [14] employed MPS to reconstruct braided river channels. Li et al. [15] proposed the use of MPS and local singularity analysis to identify regional geochemical anomalies and potential mineral resource areas. MPS often employs random simulation paths, which may compromise knowledge discovery efficiency and structural fidelity, particularly when confronted with the sampling bias inherent in practical geochemical surveys [16]. Hansen et al. [17] introduced a method that enables MPS to be appropriately conditioned to uncertain data. Liu and Journel [18] proposed a structured path guided by information content. Chen et al. [19] adopted a sample density-sensitive path based on spatial partitioning.

With the development of MPS, the direct sampling (DS) algorithm has gained widespread use due to its flexibility and high computational efficiency in contextual pattern matching [20]. DS matches and transplants current data events with patterns in the TI through distance functions [21], theoretically overcoming the reliance of traditional geostatistical methods on the assumption of stationarity [22]. For geochemical element distributions, which often exhibit weak stationarity locally but multi-scale characteristics globally, DS can effectively balance local continuity with global multi-scale structural features, offering a more flexible and precise tool for geochemical spatial modeling and anomaly identification [23]. In recent years, several improvements have been made to address the limitations of DS in terms of efficiency and simulation accuracy. Wang et al. [24] quantified the uncertainty in geochemical data interpolation by combining a convolutional neural network algorithm with DS. Hosseini et al. [25] introduced a local scaling matrix to reduce the bias caused by global scale differences between hard data and the TI. Straubhaar and Renard [26] extended DS to enable inequality constraints on the simulated variables. Gravey and Mariethoz [27,28] proposed the more user-friendly QuickSampling method and achieved automatic parameterization through training image analysis. Juda et al. [29] and Bai et al. [30] focused on simplifying parameters and processes, optimizing parameterization methods, employing two-stage simulations, or adopting fast interpolation-based strategies, significantly reducing computational costs while maintaining or even enhancing simulation quality.

Despite these advancements, the application of DS to geochemical knowledge discovery still has notable limitations. Due to environmental factors, raw samples often exhibit uneven distribution and drastic density variations [31]. Such sampling bias can lead to information imbalance, potentially resulting in overfitting in dense regions and insufficient constraints in sparse regions. Ignoring differences in spatial distribution and attribute values among samples results in an excessive proportion of low-value points that hold limited significance in indicating mineralization, while failing to adequately characterize rare high-value points that constitute the most critical spatial knowledge for mineralization. Fixed search parameters fail to simultaneously accommodate global trend reconstruction and local anomaly capture, resulting in insufficient reproduction of key mineralization anomalies. Recent studies have attempted to mitigate such issues by integrating domain expertise into the modeling process. Zhang et al. [32] proposed a hybrid modeling method integrating geological knowledge to more accurately capture anomaly patterns. Costa et al. [33] proposed an algorithm incorporating hierarchical adjustment of neighborhood size to tackle the subtle heterogeneity and large-scale spatial trends within ore bodies. Shirjang et al. [34] adopted a weighted sampling strategy, dynamically adjusting weights and analysis windows according to terrain conditions and sampling density to identify anomalous areas. Based on this trend, this study proposes a knowledge-driven adaptive direct sampling (KD-ADS) framework based on pyMPSLib [35]. KD-ADS functions as a geospatial context-aware reconstruction engine designed to recover hidden spatial knowledge from biased datasets. The main innovations are as follows:

(1) A knowledge-driven adaptive sampling strategy is proposed, in which a composite weighting system is constructed that integrates spatial distribution values, data attribute values, and spatial structure constraints. This system mathematically quantifies the information priority of each grid point. By assigning higher weights to grid points with high mineralization-indicative potential, the framework provides an intelligent information-mining sequence. (2) A dynamic context-aware neighborhood mechanism is introduced, which dynamically adjusts search parameters based on local statistical characteristics and variogram constraints. This mechanism enables the framework to balance local detail accuracy in high-density areas and global structural continuity in low-density areas, ensuring the robust reconstruction of complex geochemical fields and improving the reproducibility of spatial heterogeneities.

The rest of this paper is organized as follows. Section 2 introduces the study area data and the proposed KD-ADS. Section 3 presents and discusses the experimental results of KD-ADS. Section 4 provides the conclusions.

2. Materials and Methods

2.1. Study Area and Data Description

The study area is located in the Jiulian Mountains of Guangdong Province, China, one of the China’s key tungsten metallogenic belts [36] (Figure 1). Intense tectonic activity and Yanshanian granite intrusions jointly control the predominantly quartz-vein-type tungsten mineralization [37,38,39,40,41], leading to highly heterogeneous mineralization patterns in geochemical data. Moreover, complex terrain severely restricts geological surveys, leading to sampling bias.

This study is based on 1268 samples collected within the study area, covering 12 elements, including Ag, W, and Mo. The coordinates of the samples are based on a projected coordinate system (unit: meters). While the sampling density is high in regions with accessible terrain, it becomes notably sparse in remote or rugged areas. The spatial distribution of the samples is highly uneven, particularly in the horizontal and vertical directions, with a sampling point spacing ratio of approximately 10:1, accompanied by regions where sampling points are entirely missing. This uneven distribution presents significant challenges for subsequent modeling and simulation. To effectively evaluate the adaptability of KD-ADS to data with different spatial structures, the concentration data of the core metallogenic element W and the closely associated element Mo were selected as comparative datasets. These two elements represent the region’s primary mineralization and exhibit contrasting spatial patterns. W shows highly localized anomalies with strong structural control, while Mo displays broader, more continuous trends.

The raw geochemical datasets for Mo and W exhibit significant positive skewness and high kurtosis, as shown in Table 1. This highly skewed and heavy-tailed distribution indicates the presence of localized, high-intensity mineralization anomalies. Such extreme spatial heterogeneity and non-linear data structures pose significant challenges for traditional two-point statistics (e.g., Kriging), which often lead to the over-smoothing of peak values. Therefore, the use of MPS is necessary.

2.2. Knowledge-Driven Adaptive Sampling Strategy

Traditional DS typically treats all sample points equally, neglecting the differential contributions of samples in terms of spatial location and attribute characteristics. In this study, a multi-dimensional information priority system is constructed by integrating spatial distribution values, data attribute values, and spatial structure constraints. This system enables context-aware weighting, thereby improving both the fidelity of geochemical simulations and the geological plausibility of reconstructed anomalies. The spatial distribution value weight is determined based on the spatial distance between samples and grid nodes, with nodes closer to samples receiving higher weights. The data attribute value weight is adjusted according to the relationship between the attribute value of each grid node and the target attribute, ensuring higher sampling priority for important regions. The spatial structure constraint weight is obtained by modeling the spatial autocorrelation of the data using a variogram, thereby enhancing the match between the simulation results and the spatial structure.

To mitigate the impact of spatial sampling bias, a spatial distribution value weight is constructed using a k-dimension tree for fast distance calculation from grid points to the nearest samples. Considering the uneven distribution of samples, a linear decay method is applied. In clustered sample regions, the linear decay method gradually decreases the weight of nodes as distance increases, while avoiding excessive suppression of distant points. It highlights key areas, while accounting for the potential influence of distant points. For each grid node, distances to all samples are queried, returning distances and indices. The spatial distribution value weight is calculated as follows:

w_{s p a t i a l} = 1 - \frac{d}{D_{m a x}},

(1)

where

d

is the Euclidean distance between the grid node and its nearest samples, and

D_{m a x}

is the maximum distance from all grid nodes to sample points.

Geochemical element concentrations typically follow a skewed distribution. Low-value regions are widespread but of limited significance, whereas high-value anomalies are rare but highly indicative of mineralization. Rare high-value points often carry significant indicative value. These points may directly indicate mineralization anomalies and serve as key evidence for assessing the metallogenic potential of a region. If the framework assigns equal weight to all samples, the abundant low-value data may dominate the simulation, resulting in overly smoothed results and the loss of meaningful high-value anomalies.

The data attribute value weight is constructed using the concept of quantile regression to assign nonlinear weights to the TI attribute values. This reduces the weight of low-value regions, while increasing the weight of high-value regions. Adjusting weights based on the distribution of TI values highlights the contribution of high-value TI values while avoiding distortion of weights caused by extreme values. First, determine the upper limit

V_{u p p e r}

of the low-value interval based on the TI value, and estimate the TI distribution in the region by calculating the quantiles of the TI values. In the low-value region, to suppress the weight of low values, attribute values within the low-value interval

[V_{m i n}, V_{u p p e r}]

are normalized. An exponential decay function is used, with a coefficient set to reduce the weight, and a small base weight is introduced to prevent completely ignoring smaller weights. Within the low-value region, weight variation is nonlinear, focusing on rapidly reducing the interference from extremely low values. In the high-value region, linear compensation approximates the relationship between value weight and attribute values. Higher element concentrations indicate greater mineralization potential and stronger mineralization significance, thereby receiving higher weights. Within the high-value region, weight variation is linear, focusing on the stable attribution of high-value influence. By processing data in zones, the spatial distribution pattern of mineralization anomalies is better reproduced, thereby improving the accuracy of prospecting predictions. Finally, by constraining weights within a specific range, it ensures that weights during sampling are neither too large nor too small, maintaining sampling stability. The formula for calculating the data attribute value weight is as follows:

w_{v a l u e} = \{\begin{array}{l} 0.05 + 0.7 \times e x p (- \frac{v - V_{m i n}}{V_{u p p e r} - V_{m i n}} / 0.5), v \leq V_{u p p e r} \\ 0.4 + 0.8 \times \frac{v - V_{u p p e r}}{V_{75} - V_{u p p e r}}, v > V_{u p p e r} \end{array},

(2)

where

V_{m i n}

is the minimum TI value,

V_{u p p e r}

is the upper limit of the TI low-value interval, and

V_{75}

is the 75th percentile of the TI values. The coefficients set through experiment ensure that high-value anomalies receive priority processing while avoiding overfitting. This dual-zone strategy suppresses background noise while highlighting anomalous knowledge.

Spatial structure constraints quantify the spatial autocorrelation between sample points by introducing a variogram function, describing the variation between samples as distance changes, thereby providing spatial structure constraints for the sampling process. Based on TI values, pairs of lag distances and semi-variances are computed and interpolated to obtain the corresponding variogram values. The spatial structure constraint weight is then calculated based on the distance. Exponential enhancement is applied to strengthen short-range spatial constraints, ensuring that the simulation can reproduce fine local continuity in the TI, while weakening long-range variation constraints to avoid over-constraint and improve flexibility. The spatial structure constraint weight is calculated as follows:

w_{v a r i o} = \{\begin{array}{l} 1 + 3.2 \times {(\frac{γ}{γ_{m a x}})}^{1.6}, d < 800 \\ 1 + 0.8 \times \frac{γ}{γ_{m a x}}, d \geq 800 \end{array},

(3)

where

γ

is the variogram value,

γ_{m a x}

is the maximum variogram value, and

d

is the distance between the grid point and the sample point.

d

is set based on the average sampling interval and the spatial correlation length derived from the variogram. The exponent coefficients were optimized to ensure a smooth yet responsive decay of influence over distance, effectively suppressing noise while preserving local structural features.

The final composite weight is the product of the spatial distribution weight, the data attribute weight, and the spatial structure constraint weight, normalized to the dimension of the simulation grid. The formula is as follows:

w = w_{s p a t i a l} \times w_{v a l u e} \times w_{v a r i o} .

(4)

This weighting mechanism effectively reflects the importance of grid points in terms of spatial distribution, attribute value, and spatial structure, thereby providing a basis for subsequent simulation path selection.

Traditional random paths may suffer from insufficient sampling in high-value regions. This study normalizes the composite weights to generate a probability distribution. A weighted random sampling method is employed to generate the simulation path. KD-ADS employs this composite weight to generate an intelligent information-mining sequence. By prioritizing grid points with high mineralization potential and strong structural constraints, the framework allocates limited computational resources to areas with the highest knowledge-discovery value, effectively bridging the gap between sparse observations and complex geological reality.

2.3. Dynamic Context-Aware Neighborhood Mechanism

Traditional direct sampling employs fixed-radius neighborhood search methods, which struggle to accommodate the multi-scale variability of geochemical data. Unevenly distributed data may result in excessive density in some regions and sparse data in others. To mitigate the error caused by spatial sampling bias, this study proposes a dynamic context-aware neighborhood mechanism that adjusts the search radius and the number of sample points based on local statistical characteristics and variogram differences. The adaptive search radius adjustment modifies the radius based on local data density, avoiding edge effects or information omission caused by a fixed radius. A smaller search radius is applied in dense regions to avoid over-smoothing, while a larger radius is used in sparse regions to ensure sufficient neighborhood points. This ensures that both local high-frequency anomalies and global background trends are reconstructed with high fidelity.

The core of dynamic adjustment lies in adaptively adjusting the size of the search neighborhood and the number of sample points based on local statistical characteristics and variogram differences. By analyzing the data in the local region, its statistical characteristics are computed. These statistical characteristics primarily include mean, variance, and skewness. The formulas for calculating local statistical characteristics are as follows:

m e a n = \frac{1}{N} \sum_{i = 1}^{N} v_{i},

(5)

v a r i a n c e = \frac{1}{N} \sum_{i = 1}^{N} {(v_{i} - m e a n)}^{2},

(6)

s k e w n e s s = \frac{\frac{1}{N} \sum_{i = 1}^{N} {(v_{i} - m e a n)}^{3}}{{(\sqrt{v a r i a n c e})}^{3}},

(7)

where

v_{i}

is the attribute value within the local region, and

N

is the number of valid points in the local region.

Variance serves as a critical indicator of spatial heterogeneity. When the local variance is less than 0.1, it indicates that the local region is relatively uniform. The search radius and number of sample points are expanded accordingly to obtain broader sample information, thereby improving the global consistency of the simulation. When the local variance exceeds 1.0, it indicates significant variation in the local region. The search radius and number of sample points are reduced to focus on local details, avoiding the introduction of excessive irrelevant sample information, thus improving local simulation accuracy. Further adjustments are made based on skewness. Skewness reflects the asymmetry of the attribute value distribution in the local region. The neighborhood is expanded in positively skewed regions, while a moderate neighborhood is maintained in negatively skewed or symmetric regions to better capture local structural features. To ensure that adjusted parameters remain within reasonable ranges, parameter constraints are set. These statistics provide a quantitative basis for the subsequent dynamic adjustment of the search strategy, adjusting both the radius of the neighborhood search and the number of sample points.

To enhance spatial structure matching, a variogram-based feedback mechanism is performed. The adjustment frequency is not fixed but dynamically determined by local complexity. This ensures that high-heterogeneity regions receive more frequent structural corrections, while computational resources are conserved in uniform zones. At each adjustment step, the semi-variance of the current neighborhood is calculated and compared with the target variogram to generate an adjustment factor, thereby achieving dynamic correction of the spatial structure. The formula for calculating the variational function adjustment factor is as follows:

v a r i o = 1 + 0.5 \times t a n h (\frac{t a r g e t_g a m m a - c u r r e n t_g a m m a}{t a r g e t_g a m m a} \times 2),

(8)

where

t a r g e t_g a m m a

is the target variational function value and

c u r r e n t_g a m m a

is the current variational function value. Using

\frac{t a r g e t_g a m m a - c u r r e n t_g a m m a}{t a r g e t_g a m m a}

quantifies the relative degree to which the current simulation deviates from the target spatial structure at a specific scale. The tanh function ensures that the adjustment factor is smooth and bounded. This formula ensures the smoothness of the variogram adjustment. If the structure of the current neighborhood is too discrete or the semi-variance is excessively high, the radius after the initial adjustment is amplified, encouraging the framework to search for patterns over a broader range to enhance spatial continuity. If the structure of the current neighborhood exhibits excessive continuity and low semi-variances, the radius is reduced, causing the framework focus on a smaller area and introduce more variation, avoiding overly smooth and unrealistic geological models. This iterative correction ensures that the reconstructed field remains geologically plausible across multiple scales.

The variogram is estimated using the semi-variance binning calculation method. By calculating the sum of squared differences between effective point pairs and assigning them to distinct lag distance intervals, an estimated variogram is obtained. If the current variogram differs significantly from the target variogram, the neighborhood is adjusted appropriately to bring it closer to the target variogram, thereby further improving the consistency between simulation results and the target variogram. This mechanism enables the search radius and number of sampling points to adapt in real-time to changes in geological conditions, reducing redundant computations while ensuring simulation accuracy.

Search parameters are dynamically adjusted based on local statistical characteristics and variogram constraints. In uniform areas with gentle data variation, expand the search radius and increase the number of sampling points to capture more background trend information, enhancing simulation stability. In high-variability areas with abrupt data changes, the radius is contracted and the number of sampling points is reduced to focus on local extreme values, avoiding smoothing out critical high-frequency information. In moderately variable areas, search parameters are further adjusted based on skewness. Areas with positive skewness may contain high-value anomalies, warranting moderate expansion of the search radius to capture spatial patterns represented by rare high-value points and to prevent underestimation of extremes in simulation results. Conversely, relatively conservative search strategies are applied in negatively skewed or symmetrically distributed zones.

When acquiring valid data events, an octagonal traversal neighborhood search is employed instead of the traditional rectangular search to quickly locate sample points similar to the target point. Compared to a traditional rectangular window, the octagonal shape provides a more uniform search radius in multiple directions, which minimizes directional bias during pattern matching. The octagonal search offers better symmetry and coverage. By hierarchically traversing the octagonal boundary, it ensures that pixels are visited in order of distance while avoiding repeated calculations. Moving along the eight primary directions (horizontal, vertical, and diagonals) generates an approximately circular traversal path. This enables more uniform sampling within the neighborhood, enhancing search efficiency and better capturing local geological features, thus improving simulation accuracy. The framework systematically generates candidate points along octagonal boundaries through two nested loops. First, it captures the eight vertices of the octagon to ensure detection of extreme points along primary directions. Subsequently, it generates a series of points along each edge to ensure comprehensive coverage. Validity checks ensure that points lie within the search boundary and represent valid simulated values. By recording processed coordinates, the framework avoids repeated calculations and additions for the same point, improving efficiency and ensuring stability in complex simulation environments.

3. Experiments and Discussions

3.1. Training Image Construction and Parameter Settings

To construct the TI required for subsequent spatial inference process, the irregularly distributed geochemical samples were transformed into a continuous spatial distribution field through a 20 m grid. Ordinary Kriging interpolation was employed to construct the TIs. This step serves to translate discrete spatial observations into a structured format, providing the prior structural knowledge required for the subsequent simulations. While the TI generated by Ordinary Kriging inherently exhibits a smoothing effect, the KD-ADS framework effectively mitigates this limitation through its knowledge-driven weighting system. It employs a knowledge-driven weighting system that assigns higher importance to nodes based on their proximity to original sample points, their alignment with spatial autocorrelation, and the presence of critical extreme values. By integrating these adaptive weights with the stochastic nature of MPS, KD-ADS restores the high-frequency spatial variability of geochemical fields that are typically suppressed in deterministic interpolation, thereby achieving a high-fidelity reconstruction of localized mineralization patterns.

Figure 2 illustrates the interpolation result for W. It can be observed that the W concentration in the study area exhibits high spatial heterogeneity. The entire region is characterized by a background of numerous low values, with high-value anomaly points being extremely sparse and spatially isolated.

Figure 3 illustrates the interpolation result for the Mo element. Its distribution is more continuous, with high-value regions being relatively extensive and exhibiting better connectivity between them.

These two elements represent distinct geochemical spatial patterns: W serves as a test case for discrete, high-frequency anomaly recovery, while Mo evaluates the framework’s ability to maintain broad structural continuity, providing a basis for verifying the versatility of KD-ADS in dealing with different geological phenomena. However, the inherent smoothing effect of Kriging interpolation is evident in the figures, manifested as blurred boundaries in high-value anomaly areas, weakened peak intensities, a significant reduction in the value range of the original data, and difficulty in clearly delineating potentially sharp-boundary mineralization anomalies. This inevitably leads to the loss of detailed spatial structure information related to mineralization processes. Therefore, these results are not directly used for mineral resource assessment but serve as training images for subsequent spatial inference to more realistically restore the complex spatial structures under uneven distribution.

To ensure the rigor of the experiments, all simulation algorithms were implemented using the same set of optimization parameters. The optimization objective is to achieve a balance between computational efficiency and statistical fidelity. In this study, efficiency is evaluated by the total simulation time, while fidelity is quantified by the Jensen-Shannon Divergence (JSD) between the simulation realizations and the TI. These parameters, including the distance threshold, search radius, maximum number of nodes and max fraction of TI, were calibrated to ensure that the spatial structures of the TI are captured without incurring excessive computational costs. The final parameter set used for all experiments is summarized in Table 2.

3.2. Ablation Study

To systematically evaluate the individual contribution of each proposed component and their collective impact on capturing complex geochemical patterns, we conducted an ablation study using element Mo as a representative variable. The experiment was designed around four incremental variants: Variant A serves as the baseline utilizing the traditional DS algorithm; Variant B introduces the Knowledge-driven Simulation Path; Variant C further incorporates the Dynamic Context-Aware Neighborhood Mechanism; and Variant D represents the complete KD-ADS framework, integrating all proposed optimization strategies. The quantitative performance of these variants evaluated via RMSE, Mean Bias, JSD, and computational time is summarized in Table 3. We use a hyphen to represent the methods not included in this round of experiments and checkmarks to represent the methods included.

The results indicate that the introduction of knowledge-driven adaptive sampling strategy in Variant B plays a decisive role in correcting systematic errors. Compared to the baseline in Variant A, the Mean Bias was significantly reduced from −0.2453 to −0.0596, representing a 75.7% optimization. This substantial improvement proves that the knowledge-driven weighting strategy effectively guides the stochastic simulation path to align with geochemical priors. This mechanism overcomes the inherent bias of random paths when dealing with non-uniform sampling data. While the RMSE showed minor fluctuations during the transition, the overall accuracy stabilized as the search space became more constrained by geological reality. The inclusion of dynamic context-aware neighborhood mechanism in Variant C reflects the model’s response to local conditioning data. Although the JSD increased to 0.1673 and the Mean Bias shifted to 0.1916, these changes indicate that the dynamic neighborhood mechanism prioritizes the reproduction of localized geochemical anomalies and hard data constraints over the global statistical patterns of the training image. This transition is essential for honoring high-frequency spatial variations in real-world sampling. However, Variant D achieved the optimal comprehensive performance by reaching the lowest RMSE of 4.0164. Notably, the integration of all components in Variant D also restored execution efficiency, reducing the simulation time from 33.67 s in Variant C back to 32.89 s. This confirms that the KD-ADS framework creates a synergy that effectively balances point-wise precision and computational practicality.

3.3. Qualitative Comparative Analysis: Pattern Reconstruction and Fidelity

Comparative experiments were carried out using DS and KD-ADS on the W and Mo data. As shown in Figure 4a, the DS reconstruction for W appears fragmented and stochastic. The result appears relatively uniform, with high-value points distributed sparsely and randomly, exhibiting no discernible structure. High values are generally weakened, dominated by green points with scarce and dim yellow points, making it difficult to form meaningful anomaly points. This suggests a potential smoothing effect on high values and limitations in reproducing complex spatial structures, which is unfavorable for identifying mineralization enrichment centers. Figure 4b shows the simulation result for W using KD-ADS. The results exhibit more intricate and heterogeneous spatial structures. The distribution of high-value points is not entirely isolated but demonstrates some clustering patterns that may better reflect the enrichment characteristics of elements in natural geological processes. The transition between background and anomalous values is more gradual, exhibiting good heterogeneity. There are more high-value points in the figure, including bright yellow points, indicating that KD-ADS has an enhanced capability to capture key mineralization features. For W, the KD-ADS results more clearly highlight potential mineralization enrichment areas, and the clustered high-value patterns provide a more reasonable and reliable spatial basis for subsequent mineralization anomaly identification.

Figure 4c shows the simulation result for Mo using DS. The overall result is characterized by a blurred boundary between the anomaly and the background. The bluish-green high-value points blend highly with the background, making the shapes of anomalies difficult to identify. The background field is relatively homogeneous, potentially oversimplifying the spatial variability. Figure 4d shows the simulation result for Mo using KD-ADS. The yellow high-value areas contrast distinctly with the dark blue low-value background. The shapes of anomaly areas are clearer. Within the low-value background field, there are numerous points of varying intensities, better conforming to the continuously varying nature of geochemical fields and reducing abrupt transitions. As an important associated element, the spatial distribution pattern of Mo is critical for understanding mineralization distribution. KD-ADS more effectively reveals the local enrichment patterns of Mo, facilitating subsequent metallogenic analysis.

KD-ADS exhibits superior performance in reproducing high-value anomalies and preserving spatial heterogeneity, with the reasons analyzed as follows. First, the knowledge-driven adaptive sampling strategy transforms the simulation from a stochastic process into an information-based, prioritized search. By integrating spatial distribution, attribute significance, and structural constraints, this strategy ensures that the simulation sequence focuses on grid nodes with the highest information value for mineralization discovery. Second, the dynamic context-aware neighborhood mechanism enables the framework to adaptively adjust its spatial focusing range based on local statistical characteristics and variogram constraints. By real-time adaptive adjustment of the search radius and neighborhood capacity, this mechanism effectively bridges the information gap caused by sampling bias. This framework enhances the interpretability of simulation results, transforming the output from a simple mathematical interpolation into a geologically plausible knowledge discovery, thereby fully accounting for the complex characteristics of geochemical anomalies.

3.4. Quantitative Performance Evaluation

The ability to reproduce the statistical distribution is fundamental for assessing whether a simulation method can faithfully reflect the statistical characteristics of the original data. Cumulative distribution function (CDF) plots and probability distribution histograms were generated for the simulation results obtained using DS and KD-ADS for both elements, compared with the TI (Figure 5 and Figure 6).

As shown in Figure 5, the CDF curves for both methods exhibit a high degree of agreement in their overall trend, rising rapidly in the low-value region and gradually stabilizing. This indicates that KD-ADS can accurately reproduce the core distribution characteristics captured by DS, verifying the effectiveness and reliability of KD-ADS. In the result graphs corresponding to DS (Figure 5a,c), the CDF curves of the simulation results overlap significantly with the TI’s CDF curve. Since the method essentially replicates the statistical distribution of the TI, and the TI contains smoothing of high-value anomalies, this suggests that the method replicates both the data distribution of the TI and the smoothing effect introduced by Kriging interpolation. This suggests that when dealing with unevenly distributed data, the method might be constrained by the errors in the training image stage, limiting its ability to optimally rearrange the inherent patterns of the TI, making it difficult to adhere to spatial intensity of extreme values representing stronger mineralization intensity. In the result graphs corresponding to KD-ADS (Figure 5b,d), the CDF curves of the simulation results lie below the TI’s CDF curve. This indicates that for the same element concentration value, KD-ADS has a lower cumulative probability, implying that the simulation results allocate a higher proportion of pixels to high-value intervals, better preserving the intensity of critical anomalies. It suggests that KD-ADS does not merely aim to replicate the TI but, through the knowledge-driven weighting system, effectively reconstructing the low-probability, high-impact edge scenarios that represent critical mineralization anomalies. From a practical exploration perspective, the ability of KD-ADS to effectively mitigate the spatial dilution of anomalies is crucial. By reconstructing the connectivity of high-intensity patterns in the upper tail of the CDF, the framework provides a more realistic representation of ore-forming processes. This ensures that subtle yet significant geochemical anomalies are not suppressed, thereby enhancing the precision of anomaly delineation and reducing the risk of missing high-grade exploration targets in data-sparse regions.

As shown in Figure 6, the histograms for DS and KD-ADS exhibit strong consistency in peak position and probability density decay trends, both showing significant peaks in the low-value region followed by rapid decay. This suggests that KD-ADS effectively replicates the probability distribution characteristics. The histogram of the KD-ADS simulation results more effectively retains high values and details. For W, where DS has almost no distribution in the high-value region, KD-ADS still retains a certain probability density, indicating that KD-ADS can simulate low-probability but realistically occurring edge scenarios. For Mo, the KD-ADS histogram exhibits richer probability density stratification in the medium-value region, achieving more detailed capture and representation compared to DS. Furthermore, the density change from the peak to the high-value region is relatively smooth, reflecting its optimization for simulating the continuity of the probability distribution.

To ensure the KD-ADS framework is robust across different geochemical variables, we evaluated the tail statistics for both Mo and W. As shown in Table 4, all realizations are strictly bounded by the value ranges of the TI, their performance in reproducing high-value structures differs significantly.

For element Mo, KD-ADS achieves an exceedance rate and a 95th percentile much closer to the target TI than the baseline, indicating a more precise control over the anomaly scale. For element W, although DS numerically approaches the TI’s maximum more closely, KD-ADS exhibits a significantly higher 99th percentile than both the TI and DS. This suggests that KD-ADS prioritizes the spatial clustering and continuity of high-value patterns over the random sampling of isolated outliers. By integrating the knowledge-driven simulation path and dynamic context-aware neighborhood mechanism, KD-ADS ensures that TI-derived extreme values are strategically allocated to regions with high predictive potential, effectively preventing the fragmentation of anomalies and enhancing the structural reliability of the simulated field.

Spatial structure is the foundation of geostatistical simulation. Scatter plots and cross-variogram plots were employed to assess the spatial correlation between the simulation results and the TI (Figure 7). The scatter plot directly reflects the consistency between the range of the simulated values and the TI. Given that the essence of DS is to generate equiprobable random realizations of the TI, rather than an exact replica, the method’s ability to reproduce the statistical distribution is assessed by whether the simulation results in the scatter plot reproduce the value range of the TI. By comparing Figure 7a,c, it is evident that the distribution of simulated values in the medium-high range is relatively sparse in DS, weakening the simulation results in these regions. This suggests that DS may have failed to fully capture the spatial characteristics of the TI in the medium-high value regions, causing the simulated values to concentrate in the lower range while neglecting the variation trend in the medium-high values. In contrast, the distribution of medium-high simulated values performs relatively well in KD-ADS, providing a more accurate simulation of the high-value regions in the TI. This suggests that KD-ADS can enhance the performance of simulation results in the medium-high value range, improving simulation accuracy and reliability.

The cross-variogram plot illustrates the similarity between the TI and the simulation results across different spatial scales. As illustrated in Figure 7, the cross-variogram of the KD-ADS simulation results exhibits a significantly higher degree of smoothness and structural stability compared to the DS method. While the DS curve shows erratic fluctuations, the KD-ADS curve follows a more consistent spatial correlation trend. This stability indicates that by incorporating knowledge-driven adaptive weights, KD-ADS effectively filters out the noise introduced by sampling bias and preserves the underlying co-occurrence patterns of W and Mo.

In addition to evaluating the spatial pattern consistency with the TI, it is crucial to assess the predictive performance against actual geochemical samples to verify their practical reliability. To achieve this, a random cross-validation was performed. The original dataset was partitioned into a training set comprising 80% of the total samples and an independent validation set comprising 20% of the total samples. The validation samples were strictly withheld from all modeling stages, including TI generation and weight calculation.

Ordinary Kriging (OK) serves as a deterministic benchmark, while the summary statistics, including the mean and standard deviation derived from 50 independent realizations for both DS and KD-ADS, are used for the comparison. This ensemble approach ensures that the performance evaluation is robust against stochastic fluctuations. The quantitative metrics, including RMSE,

R^{2}

, Pearson correlation coefficient (

r

) and Mean Bias, are calculated at the validation locations to evaluate the predictive accuracy and statistical unbiasedness.

The validation results on the independent dataset for both elements are presented in Table 5. The comparison of these metrics indicates that the proposed KD-ADS method consistently achieves lower RMSE and higher

R^{2}

values than the traditional DS algorithm, reflecting an overall improvement in predictive stability. Specifically, for element W, the Pearson correlation coefficient is adjusted from −0.005 ± 0.05 to −0.001 ± 0.04, which demonstrates that the introduction of knowledge-driven weights effectively constrains the stochastic search process. Furthermore, KD-ADS exhibits a remarkable advantage in statistical unbiasedness. The Mean Bias for element Mo is even closer to zero than that of the OK method, as shown in Table 3. While OK yields higher

R^{2}

and

r

values due to its deterministic design for error minimization, it inherently smooths the geochemical field at the expense of local heterogeneity. By contrast, the KD-ADS method provides a more robust and geologically plausible predictive framework within the stochastic simulation category, successfully bridging the gap between point-wise precision and the preservation of complex spatial patterns. Consequently, the proposed KD-ADS method proves to be a superior approach compared to the baseline DS.

Although simulation accuracy and reliability are the primary evaluation metrics for method performance, computational efficiency represents another critical indicator of its practical applicability. Under the same computational environment, DS and KD-ADS were each executed 10 times for the two element datasets, and the average time consumption was recorded, as shown in Table 6. The results indicate that KD-ADS improves computational efficiency while simultaneously enhancing simulation accuracy, particularly for the W element, where the average time consumption is reduced by 9.8%. The improvement in computational efficiency is primarily attributed to the knowledge-driven weighting system and the octagonal search strategy, which minimize global search overhead by prioritizing candidates that align with the spatial structure and reducing redundant calculations. These results confirm that KD-ADS is a highly efficient inference engine suitable for complex geochemical surveys.

4. Conclusions

This paper proposes a knowledge-driven adaptive direct sampling framework to address the critical challenge of spatial structure reconstruction in mineral exploration under sampling bias. The proposed knowledge-driven adaptive sampling strategy prioritizes data with high information value to recover key structural details for identifying mineralization centers. Furthermore, the dynamic context-aware neighborhood mechanism provides a sophisticated solution to spatial non-stationarity by adaptively adjusting search parameters according to local statistical characteristics and variogram constraints. This framework achieves a significant reduction in structural variance compared to conventional methods, ensuring high-fidelity reproduction of complex geological patterns. KD-ADS not only enhances the precision of anomaly delineation but also improves computational efficiency, thereby serving as a robust decision-support tool for sustainable mineral resource evaluation. By minimizing exploration uncertainty and enabling more efficient resource allocation, this methodology offers a transformative approach to converting raw geospatial data into actionable geological knowledge, ultimately supporting more informed and sustainable strategic planning in complex exploration environments. Future research will focus on extending this adaptive mechanism to multi-element synergistic simulations, exploring the cross-correlations among complex geochemical assemblages to enhance the reliability of deep-seated mineral potential mapping.

Author Contributions

Conceptualization, Yameng Liu and Feixiang Chen; methodology, Yameng Liu and Yanqi Dong; software, Yameng Liu and Qing Zhang; validation, Yameng Liu and Jiali Zi; formal analysis, Jiali Zi and Nuo Xu; investigation, Yameng Liu and Nuo Xu; resources, Feixiang Chen; data curation, Yameng Liu and Yanqi Dong; writing—original draft preparation, Yameng Liu; writing—review and editing, Feixiang Chen and Yameng Liu; visualization, Yameng Liu and Qing Zhang; supervision, Feixiang Chen; project administration, Feixiang Chen; funding acquisition, Feixiang Chen. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Outstanding Youth Team Project of Central Universities, grant number QNTD202504; the National Key R&D Program of China, grant number 2022YFF1302700; and the Emergency Open Competition Project of National Forestry and Grassland Administration, grant number 202303.

Data Availability Statement

The data presented in this study are available on request from the corresponding author. The data are not publicly available due to institutional data protection policies and confidentiality restrictions related to the geological mapping project.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Yin, B.; Zuo, R.; Xiong, Y.; Li, Y.; Yang, W. Knowledge Discovery of Geochemical Patterns from a Data-Driven Perspective. J. Geochem. Explor. 2021, 231, 106872. [Google Scholar] [CrossRef]
Erdogan Erten, G.; Yavuz, M.; Deutsch, C.V. Combination of Machine Learning and Kriging for Spatial Estimation of Geological Attributes. Nat. Resour. Res. 2022, 31, 191–213. [Google Scholar] [CrossRef]
Hosseini-Dinani, H.; Mokhtari, A.R.; Shahrestani, S.; De Vivo, B. Sampling Density in Regional Exploration and Environmental Geochemical Studies: A Review. Nat. Resour. Res. 2019, 28, 967–994. [Google Scholar] [CrossRef]
Yang, M.; Xue, L.; Ran, X.; Sang, X.; Yan, Q.; Dai, J. Intelligent mineral geological survey method: Daqiao-Yawan area in Gansu Province as an example. Acta Petrol. Sin. 2021, 37, 3880–3892. [Google Scholar] [CrossRef]
Fang, K.; Fang, Y.; Lian, Y.; Hu, M.; He, H. Application of Different Spatial Interpolation Methods in Sodium Intake Estimation. J. Hyg. Res. 2021, 50, 217–222. [Google Scholar] [CrossRef]
Hasanipanah, M.; Meng, D.; Keshtegar, B.; Trung, N.-T.; Thai, D.-K. Nonlinear Models Based on Enhanced Kriging Interpolation for Prediction of Rock Joint Shear Strength. Neural Comput. Appl. 2021, 33, 4205–4215. [Google Scholar] [CrossRef]
Li, Z. An Enhanced Dual IDW Method for High-Quality Geospatial Interpolation. Sci. Rep. 2021, 11, 9903. [Google Scholar] [CrossRef]
Fan, W.; Liu, G.; Chen, Q.; Lu, L.; Cui, Z.; Zuo, B.; Wu, X. Extraction of Weak Geochemical Anomalies Based on Multiple-Point Statistics and Local Singularity Analysis. Comput. Geosci. 2024, 28, 157–173. [Google Scholar] [CrossRef]
Li, Y.; Li, M.; Liu, Z.; Li, C. Combining Kriging Interpolation to Improve the Accuracy of Forest Aboveground Biomass Estimation Using Remote Sensing Data. IEEE Access 2020, 8, 128124–128139. [Google Scholar] [CrossRef]
Strebelle, S. Conditional Simulation of Complex Geological Structures Using Multiple-Point Statistics. Math. Geol. 2002, 34, 1–21. [Google Scholar] [CrossRef]
Abdollahifard, M.J. Fast Multiple-Point Simulation Using a Data-Driven Path and an Efficient Gradient-Based Search. Comput. Geosci. 2016, 86, 64–74. [Google Scholar] [CrossRef]
Wang, M.; Shang, X.; Duan, T. A review of the establishment methods of training image in multiple-point statistics modeling. Geol. J. China Univ. 2022, 28, 96. [Google Scholar] [CrossRef]
Neven, A.; Dall’Alba, V.; Juda, P.; Straubhaar, J.; Renard, P. Ice Volume and Basal Topography Estimation Using Geostatistical Methods and Ground-Penetrating Radar Measurements: Application to the Tsanfleuron and Scex Rouge Glaciers, Swiss Alps. Cryosphere 2021, 15, 5169–5186. [Google Scholar] [CrossRef]
Brilliant, E.; Wardhana, S.; Bilqis, A.; Nurdianingsih, A.; Daniswara, R.; Pranowo, W. A Python Based Multi-Point Geostatistics by Using Direct Sampling Algorithm. J. Geofis. 2020, 18, 49. [Google Scholar] [CrossRef]
Li, C.; Liu, B.; Guo, K.; Li, B.; Kong, Y. Regional Geochemical Anomaly Identification Based on Multiple-Point Geostatistical Simulation and Local Singularity Analysis—A Case Study in Mila Mountain Region, Southern Tibet. Minerals 2021, 11, 1037. [Google Scholar] [CrossRef]
Zhao, M.; Xia, Q.; Li, W. Identifying Geochemical Element Distribution Patterns through Multiple-Point Geostatistical Simulation and Singularity Analysis: A Case Study of the Wulonggou-Balong Area, Qinghai, China. Geochemistry 2025, 85, 126294. [Google Scholar] [CrossRef]
Hansen, T.M.; Vu, L.T.; Mosegaard, K.; Cordua, K.S. Multiple Point Statistical Simulation Using Uncertain (Soft) Conditional Data. Comput. Geosci. 2018, 114, 1–10. [Google Scholar] [CrossRef]
Liu, Y.; Journel, A. Improving Sequential Simulation with a Structured Path Guided by Information Content. Math. Geol. 2004, 36, 945–964. [Google Scholar] [CrossRef]
Chen, Q.; Liu, G.; Ma, X.; Zhang, J.; Zhang, X. Conditional Multiple-Point Geostatistical Simulation for Unevenly Distributed Sample Data. Stoch. Environ. Res. Risk Assess. 2019, 33, 973–987. [Google Scholar] [CrossRef]
Mariethoz, G.; Renard, P.; Straubhaar, J. The Direct Sampling Method to Perform Multiple-Point Geostatistical Simulations. Water Resour. Res. 2010, 46, W11536. [Google Scholar] [CrossRef]
Tahmasebi, P. Multiple Point Statistics: A Review. In Handbook of Mathematical Geosciences: Fifty Years of IAMG; Daya Sagar, B.S., Cheng, Q., Agterberg, F., Eds.; Springer: Cham, Switzerland, 2018; pp. 613–643. [Google Scholar]
Mariethoz, G.; Caers, J. Multiple-Point Geostatistics: Stochastic Modeling with Training Images, 1st ed.; John Wiley & Sons Inc.: Hoboken, NJ, USA, 2014. [Google Scholar]
Wang, J. Identification of Geochemical Anomalies Based on Geostatistical Simulation. Ph.D. Thesis, China University of Geosciences, Wuhan, China, 2018. [Google Scholar]
Wang, Z.; Zuo, R.; Yang, F. Geological Mapping Using Direct Sampling and a Convolutional Neural Network Based on Geochemical Survey Data. Math. Geosci. 2023, 55, 1035–1058. [Google Scholar] [CrossRef]
Hosseini, S.T.; Asghari, O.; Emery, X. An Enhanced Direct Sampling (DS) Approach to Model the Geological Domain with Locally Varying Proportions: Application to Golgohar Iron Ore Mine, Iran. Ore Geol. Rev. 2021, 139, 104452. [Google Scholar] [CrossRef]
Straubhaar, J.; Renard, P. Conditioning Multiple-Point Statistics Simulation to Inequality Data. Earth Space Sci. 2021, 8, e2020EA001515. [Google Scholar] [CrossRef]
Gravey, M.; Mariethoz, G. QuickSampling v1.0: A Robust and Simplified Pixel-Based Multiple-Point Simulation Approach. Geosci. Model Dev. 2020, 13, 2611–2630. [Google Scholar] [CrossRef]
Gravey, M.; Mariethoz, G. AutoQS v1: Automatic Parametrization of QuickSampling Based on Training Images Analysis. Geosci. Model Dev. 2023, 16, 5265–5279. [Google Scholar] [CrossRef]
Juda, P.; Renard, P.; Straubhaar, J. A Parsimonious Parametrization of the Direct Sampling Algorithm for Multiple-Point Statistical Simulations. Appl. Comput. Geosci. 2022, 16, 100091. [Google Scholar] [CrossRef]
Bai, H.; Yang, M.; Mariethoz, G. A Fast Two Part Direct Sampling Method Based on Interpolation. Comput. Geosci. 2023, 175, 105335. [Google Scholar] [CrossRef]
Niu, X.; Zhou, H.; Zhang, W.; Niu, Y.; Guo, D.; Ding, G.; Mao, W. Comparative Application of S-A Multifractal Method to Geochemical Anomaly Delineation: A Case Study of the Wenkang Area, Gansu Province. Geol. Explor. 2023, 59, 817. [Google Scholar] [CrossRef]
Zhang, X.; Xiong, Y.; Chen, Z. Recognizing Geochemical Spatial Patterns Using Deformable Convolutional Networks Guided with Geological Knowledge. EGUsphere 2025, 1–33. [Google Scholar] [CrossRef]
Costa, J.F.C.L.; Niquini, F.G.F.; Schneider, C.L.; Alcântara, R.M.; Capponi, L.N.; Rodrigues, R.S. Geometallurgical Cluster Creation in a Niobium Deposit Using Dual-Space Clustering and Hierarchical Indicator Kriging with Trends. Minerals 2025, 15, 755. [Google Scholar] [CrossRef]
Shirjang, M.; Maghsoudi, A.; Ghezelbash, R. Analysis of Clustering Methods for Geochemical Anomaly Identification through Weighted Sample Catchment Basins. Geochemistry 2025, 85, 126337. [Google Scholar] [CrossRef]
Chen, Q.; Zhou, R.; Liu, C.; Huang, Q.; Cui, Z.; Liu, G. pyMPSLib: A Robust and Scalable Open-Source Python Library for Mutiple-Point Statistical Simulation. Earth Sci. Inform. 2023, 16, 3179–3190. [Google Scholar] [CrossRef]
Xue, Z. Current Situation and Research Progress of Tungsten Ore Development and Utilization. Adv. Geosci. 2025, 15, 382. [Google Scholar] [CrossRef]
Fang, G.; Wang, D.; Huang, C.; Yang, F.; Xu, Y.; Feng, Z.; Li, X.; Zeng, Q.; Yan, C. Fan-shaped case and exploration model of quartz vein-type tungsten polymetallic deposits. Miner. Depos. 2024, 43, 613–628. [Google Scholar] [CrossRef]
Liu, H.; Liu, H.; Nie, C.; Zhang, J.; Steenari, B.-M.; Ekberg, C. Comprehensive Treatments of Tungsten Slags in China: A Critical Review. J. Environ. Manag. 2020, 270, 110927. [Google Scholar] [CrossRef]
Yu, J.; Yang, Y.; Chen, Q.; Chen, S.; Chen, X.; Zhang, R. Deposit Types, Spatial Distribution, Development, and Utilization of Tungsten Deposits in China. Acta Geosci. Sin. 2025, 46, 945–953. [Google Scholar] [CrossRef]
Qiu, W.; Li, Z.; Chen, C.; Wu, C.; Gao, Y.; Zeng, J.; Zhang, B.; Ge, Z.; Wu, Q.; Wang, N.; et al. Analysis of the situation of global tungsten resources and the current status of development and utilization in China. China Min. Mag. 2025, 34, 429–437. [Google Scholar] [CrossRef]
Ni, P.; Pan, J.-Y.; Han, L.; Cui, J.-M.; Gao, Y.; Fan, M.-S.; Li, W.-S.; Chi, Z.; Zhang, K.-H.; Cheng, Z.-L.; et al. Tungsten and Tin Deposits in South China: Temporal and Spatial Distribution, Metallogenic Models and Prospecting Directions. Ore Geol. Rev. 2023, 157, 105453. [Google Scholar] [CrossRef]

Figure 1. Geological map of the study area.

Figure 2. The interpolation result of W.

Figure 3. The interpolation result of Mo.

Figure 4. (a) DS simulation results of W; (b) KD-ADS simulation results of W; (c) DS simulation results of Mo; (d) KD-ADS simulation results of Mo.

Figure 5. (a) CDF of W using DS simulation; (b) CDF of W using KD-ADS simulation; (c) CDF of Mo using DS simulation; (d) CDF of Mo using KD-ADS simulation.

Figure 6. (a) Distribution of W (TI); (b) Distribution of W (DS simulation); (c) Distribution of W (KD-ADS simulation); (d) Distribution of Mo (TI); (e) Distribution of Mo (DS simulation); (f) Distribution of Mo (KD-ADS simulation).

Figure 7. (a) Scatter plot of the TI and DS simulation results for W; (b) Cross-variogram of the TI and DS simulation results for W; (c) Scatter plot of the TI and KD-ADS simulation results for W; (d) Cross-variogram of the TI and KD-ADS simulation results for W; (e) Scatter plot of the TI and DS simulation results for Mo; (f) Cross-variogram of the TI and DS simulation results for Mo; (g) Scatter plot of the TI and KD-ADS simulation results for Mo; (h) Cross-variogram of the TI and KD-ADS simulation results for Mo.

Table 1. Statistical summary of the raw geochemical datasets (Mo and W).

Element	Min	Max	Mean	Skewness	Kurtosis
W	1.70	1846.93	33.34	10.42	162.82
Mo	0.35	52.40	2.05	7.75	87.88

Table 2. Base simulation parameters.

Parameter	Value
Distance threshold	0.1
Search radius	20
Maximum number of nodes	40
Max fraction of TI	0.5

Table 3. Quantitative results of the ablation study.

Knowledge-Driven Adaptive Sampling Strategy	Dynamic Context-Aware Neighborhood Mechanism	Mean Bias	RMSE	JSD	Time (s)
-	-	−0.2453	4.2808	0.0982	29.41
√	-	−0.0596	4.4633	0.0971	32.02
-	√	−0.1916	4.4429	0.1673	33.67
√	√	−0.2024	4.0164	0.1643	32.89

Table 4. Detailed tail statistics for Mo and W.

Data	Method	Max Value	95th Percentile	99th Percentile	Exceedance Rate (%)
W	TI	594.66	35.55	64.42	5.00
	DS	546.70	36.08	64.41	5.44
	KD-ADS	224.05	38.09	77.67	7.14
Mo	TI	13.18	5.38	8.75	5.00
	DS	10.69	6.41	8.63	7.03
	KD-ADS	10.85	6.22	8.84	6.41

Table 5. Performance metrics comparison on the independent validation set.

Data	Method	RMSE	$R^{2}$	$Pearson r$	Mean Bias
W	OK	30.47	0.6945	0.8919	2.6032
	DS	57.38 ± 2.84	−0.086 ± 0.11	−0.005 ± 0.05	4.09 ± 1.00
	KD-ADS	57.26 ± 2.37	−0.083 ± 0.09	−0.001 ± 0.04	4.50 ± 1.01
Mo	OK	2.85	0.4841	0.7437	−0.0416
	DS	4.34 ± 0.12	−0.198 ± 0.06	−0.005 ± 0.06	−0.139 ± 0.12
	KD-ADS	4.32 ± 0.11	−0.187 ± 0.06	−0.002 ± 0.05	−0.002 ± 0.13

Table 6. Comparison of simulation time between DS and KD-ADS.

Data	Method	Time (s)
W	DS	27.95
W	KD-ADS	25.22
Mo	DS	27.56
Mo	KD-ADS	27.30

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2026 by the authors. Published by MDPI on behalf of the International Society for Photogrammetry and Remote Sensing. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.

Share and Cite

MDPI and ACS Style

Liu, Y.; Zi, J.; Dong, Y.; Xu, N.; Zhang, Q.; Chen, F. Knowledge-Driven Adaptive Direct Sampling for Reconstructing Geochemical Fields Under Sampling Bias. ISPRS Int. J. Geo-Inf. 2026, 15, 111. https://doi.org/10.3390/ijgi15030111

AMA Style

Liu Y, Zi J, Dong Y, Xu N, Zhang Q, Chen F. Knowledge-Driven Adaptive Direct Sampling for Reconstructing Geochemical Fields Under Sampling Bias. ISPRS International Journal of Geo-Information. 2026; 15(3):111. https://doi.org/10.3390/ijgi15030111

Chicago/Turabian Style

Liu, Yameng, Jiali Zi, Yanqi Dong, Nuo Xu, Qing Zhang, and Feixiang Chen. 2026. "Knowledge-Driven Adaptive Direct Sampling for Reconstructing Geochemical Fields Under Sampling Bias" ISPRS International Journal of Geo-Information 15, no. 3: 111. https://doi.org/10.3390/ijgi15030111

APA Style

Liu, Y., Zi, J., Dong, Y., Xu, N., Zhang, Q., & Chen, F. (2026). Knowledge-Driven Adaptive Direct Sampling for Reconstructing Geochemical Fields Under Sampling Bias. ISPRS International Journal of Geo-Information, 15(3), 111. https://doi.org/10.3390/ijgi15030111

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Knowledge-Driven Adaptive Direct Sampling for Reconstructing Geochemical Fields Under Sampling Bias

Abstract

1. Introduction

2. Materials and Methods

2.1. Study Area and Data Description

2.2. Knowledge-Driven Adaptive Sampling Strategy

2.3. Dynamic Context-Aware Neighborhood Mechanism

3. Experiments and Discussions

3.1. Training Image Construction and Parameter Settings

3.2. Ablation Study

3.3. Qualitative Comparative Analysis: Pattern Reconstruction and Fidelity

3.4. Quantitative Performance Evaluation

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI