Knowledge–Data Collaboration-Driven Mineral Prospectivity Prediction with Graph Attention Networks

Sheng, Shiting; Wang, Yongzhi; Tian, Jiangtao; Chen, Xingyu; Ning, Yan; Dong, Yuhao; Bilal, Muhammad Atif; An, Zhaofeng

doi:10.3390/min15111164

Open AccessEditor’s ChoiceArticle

Knowledge–Data Collaboration-Driven Mineral Prospectivity Prediction with Graph Attention Networks

by

Shiting Sheng

¹,

Yongzhi Wang

^1,2,3,*

,

Jiangtao Tian

³,

Xingyu Chen

¹

,

Yan Ning

¹,

Yuhao Dong

¹,

Muhammad Atif Bilal

¹

and

Zhaofeng An

¹

College of Geoexploration Science and Technology, Jilin University, Changchun 130026, China

²

Institute of Integrated Information for Mineral Resources Prediction, Jilin University, Changchun 130026, China

³

Xinjiang Academy of Geological Research, Urumqi 830057, China

^*

Author to whom correspondence should be addressed.

Minerals 2025, 15(11), 1164; https://doi.org/10.3390/min15111164

Submission received: 8 September 2025 / Revised: 26 October 2025 / Accepted: 31 October 2025 / Published: 4 November 2025

(This article belongs to the Special Issue Smart Exploration of Critical Minerals: Integrating Multi-Source Data for Enhanced Mineral Prospectivity Mapping)

Download

Browse Figures

Versions Notes

Abstract

Predicting mineral deposits accurately requires capturing the complex interactions among geological structures, geochemical anomalies, and alteration patterns. To address this challenge, this study develops a Knowledge–Data Collaboration Graph Attention Network (KDCGAT) to improve copper mineralization prediction by integrating multi-source geological data. The model combines Graph Attention Network (GAT) with multimodal geoscience data, including fracture structures, remote sensing alteration maps, and geochemical anomalies. Spatial correlations are captured through a self-attention mechanism, aligning deep learning predictions with geological and geochemical knowledge. Using the eastern Tien Shan copper belt in Xinjiang as a case study, KDCGAT achieves a copper deposit identification accuracy of 85.9%, outperforming Weight of Evidence (WoE) by 7%, Graph Convolutional Network (GCN) by 11.3%, and Convolutional Neural Network (CNN) by 19.7%. Ablation experiments show a 21.1% improvement over the baseline GAT model. Finally, five Class A and three Class B mineralization prediction zones are delineated. This study demonstrates the effectiveness of graph neural networks for copper prospectivity prediction and highlights knowledge–data collaboration as a practical tool for mineral exploration.

Keywords:

copper mineral prediction; knowledge-data collaboration; graph attention networks; multi-source data fusion; eastern Tien Shan

1. Introduction

The exploration and prediction of mineral resources are fundamental tasks in geological science. Their accuracy directly affects the efficiency, economic cost, and environmental sustainability of resource development [1]. With the increasing depletion of shallow mineral deposits, the exploration of concealed ore bodies imposes higher demands on the capacity of prediction models to represent complex geological systems. Therefore, developing high-precision and interpretable intelligent prediction methods has both theoretical significance and practical value. Traditional mineral prospectivity mapping relies largely on experience-driven approaches [2,3], which have significant bottlenecks. The Weight of Evidence (WoE) model, based on Bayesian probability superposition, assumes that ore-forming factors are independent [4,5]. This simplification neglects the nonlinear interactions among faults, alterations, and geochemical anomalies, as well as the dynamic coupling effects of multi-stage tectonic and hydrothermal processes [6]. Consequently, such approaches often yield fragmented or overly smoothed prediction results. Moreover, heterogeneous data sources—such as geological, geophysical, geochemical, and remote sensing datasets—differ in spatial resolution, format, and semantics [7]. Conventional techniques like layer overlay and weighted summation struggle to achieve adaptive fusion of multimodal features, thus limiting the efficiency of information utilization.

In recent years, deep learning has demonstrated great potential for mineral prospectivity prediction, such as end-to-end prediction of gold mineral targets based on U-Net [8] and geologically constrained Graph Convolutional Network (GCN) models [9,10]. However, existing studies still face key bottlenecks. Convolutional Neural Networks (CNNs) are constrained to Euclidean space, making it difficult to model irregular topologies such as fractal structures of fracture networks [11,12], resulting in distorted mineralization boundaries [13]. GCN generates over-smoothing effects due to fixed adjacency matrices [14], especially in large-scale systems where accuracy decreases significantly [15], and cannot dynamically allocate weights to key mineralization nodes [16,17]. Furthermore, the “black-box” nature of deep learning models limits their interpretability [18], making it difficult to incorporate prior geological knowledge such as “fault-controlled mineralization” or “alteration zoning” [19]. This disconnect between prediction and geological processes reduces the credibility and usefulness of model results [20]. Taking the eastern Tien Shan copper belt as an example, regional copper mineralization is controlled by the spatiotemporal coupling between multi-stage fault activity and Late Paleozoic magmatic intrusions [21]. Existing models, however, cannot explicitly represent this “fault–magma–alteration–trap” mechanism [22], leading to the omission of critical high-potential zones [23].

Current research mainly combines single data sources with deep learning models. It lacks collaborative feature extraction, cross-modal information fusion, and adaptive spatial correlation modeling methods based on graph neural network (GNN) for multi-source geological data [24]. To address these challenges, this study proposes a Knowledge–Data Collaborative Graph Attention Network (KDCGAT) framework that integrates the Graph Attention Network (GAT) with multimodal geoscience data. The self-attention mechanism in GAT enables the model to capture nonlinear spatial correlations among fractures, alterations, and geochemical anomalies, overcoming the fixed adjacency limitations of GCNs [25,26]. By applying multi-head attention, the study learns different geological feature patterns in parallel. This includes the consistency of fault strike and the superposition effect of alteration, allowing for the fusion of multi-source heterogeneous information [27]. The study constructs a four-dimensional geological knowledge subnet of “fracture rock mass alteration geochemical anomaly.” This achieves bidirectional verification of deep learning decisions and geological genesis logic, enhancing interpretability. The GAT’s modeling capability for directional geological processes, such as hydrothermal directional migration and alteration zoning evolution, provides physical support for genesis-oriented prediction [28].

This study applies the KDCGAT model to the eastern Tien Shan copper belt in Xinjiang, integrating GAT with a multi-source geoscience fusion framework to construct a four-element mineralization feature extraction model based on “fault–rock–alteration–geochemical anomaly.” The proposed method achieves high-precision mineralization potential mapping and contributes the following:

(1): Model Innovation: We integrate GAT with multimodal geoscience data to establish the KDCGAT model, which effectively models nonlinear relationships. The self-attention mechanism captures spatial correlations among geological factors and combines them with a geological knowledge module, enabling bidirectional validation between model decisions and geological principles.
(2): Data Integration: Geological structures, magmatic rocks, remote sensing alteration information, and geochemical anomalies are integrated in ArcGIS 10.8 to build a multi-channel feature cube through spatial resolution unification, band stacking, and dynamic feature reconstruction, improving feature extraction and fusion efficiency.
(3): Prediction Performance: In the eastern Tien Shan copper belt, the model achieves an 85.9% prediction accuracy—7.0% and 19.7% higher than WoE and popular deep learning models (GCN, CNN), respectively. Eight metallogenic prediction zones were identified, with Class A zones showing strong spatial correspondence to known ore deposits, providing a scientific basis for concealed copper exploration.
(4): Knowledge-Driven Enhancement: Ablation experiments demonstrate that the geological knowledge module enhances model performance, improving GAT accuracy by 21.1% compared with the baseline. The proposed intelligent prediction framework is extendable to other complex mineralization systems, offering a new paradigm for interpretable, knowledge-guided mineral prediction.

2. Geological Background and Data Sources

2.1. Regional Geological Background

The eastern Tien Shan orogenic belt is a key tectonic hub in the southern part of the Central Asian orogenic system. It lies at the crossroads of the Junggar-Kazakhstan plate, the Siberian plate, and the Tarim plate. This region has experienced several tectonic changes. These include Paleozoic collision orogeny, Mesozoic thermal subsidence, and Cenozoic differential uplift. These processes created a geotectonic pattern with a typical arc-basin system [29].

The regional fracture system shows clear zoning. It mainly features four deep fractures: Achik Kuduk-Shaquanzi (South Tien Shan Suture Zone), Yamansu-Bitsui (Pre-arc Retroflexion Belt), Kanggurtag-Huangshan (Intra-arc Transformational Faults), and Dacaotan-Dananhu (Island Arc Basement Detachment) [30]. The first three fractures act as boundaries for tectonic units. They influence the tectonic evolution of the Middle Tien Shan Massif, the Aqishan-Yamansu accretionary wedge, and the Kanggurtag intra-arc basin. The tectonic-magmatic response from their activities directly affects the regional Cu mineralization system.

The Dananhu-Toussouquan Island arc sits at the southern edge of the Turpan-Hami Basin. It connects the Dacaotan Fault and the Kanggul Fault. This arc is a vital part of the Central Asian orogenic belt and one of the main metallogenic belts in Xinjiang [31]. The region’s geology is mainly volcanic, dating from the Devonian to Carboniferous periods. It consists of moderately alkaline volcanic rocks, acidic volcaniclastic rocks, clastic rocks, limestone, and turbidites. Notably, magmatism here has lasted from the Ordovician to the Carboniferous. This extended period of activity has resulted in various metal deposits, including porphyries, hydrothermal veins, and volcanogenic massive sulfides. See Figure 1 for the associated geological map.

2.2. Data Sources and Pre-Processing

Based on the copper deposit model and available data, we identified key evidence layers for the mineralization of eastern Tien Shan copper deposits. These layers include geotectonic layers, main mineralized strata, remotely sensed hydroxyls, iron-stained alteration zones, and geochemical anomalies of Cu and related elements. The datasets used are:

(1): 1:5000 Geological Overview Map: This map shows geological ages, tectonic details, and major mineralized strata, crucial for understanding the area’s geology.
(2): 1:20,000 Geochemical Data: This dataset includes 39 elements, such as Ag, As, Au, B, Ba, Be, Bi, Cd, Co, Cr, Cu, F, and Hg. Samples were collected by the Xinjiang Academy of Geological Research.
(3): 1:20,000 Geophysical Data: This dataset contains gravity data for the area, used to examine the physical traits of geological bodies.
(4): Landsat8-OLI Remote Sensing Images: We selected several recent Landsat8-OLI images of the eastern Tien Shan. Each image’s cloud cover was kept below 0.1%, focusing on spring and summer to avoid snow. We pre-processed each image, including radiometric calibration and atmospheric correction. We also created masks for water bodies and vegetation to improve the accuracy of mineral alteration information extraction. These images served as the final area for surface mineral alteration extraction.

In this study, the analysis is limited to rasterized 2D geological data, which simplifies graph construction and spatial representation while maintaining adequate resolution for mineral prospectivity modeling. The four types of data summary and preprocessing used in this study are shown in Table 1.

3. Methods

3.1. Remote Sensing Alteration Information Extraction

The Landsat 8 satellite is equipped with a push-broom Operational Land Imager (OLI) sensor [32]. The OLI data are characterized by high geometric stability and a strong signal-to-noise ratio (Table 2). Principal Component Analysis (PCA) is commonly applied to multispectral remote sensing images to extract mineral alteration information [33]. PCA transforms the original multi-band data into a set of uncorrelated principal components that preserve most of the spectral variance [34]. The preprocessed Landsat 8-OLI remote sensing images for this study appear in Figure 2.

Previous studies have shown that Fe³⁺ exhibits strong absorption features in OLI Band 5. To enhance iron-stained alteration information, we selected OLI Bands 2, 4, 5, and 6 for PCA processing, as this combination effectively emphasizes the spectral characteristics of iron oxides [35]. For clay minerals with OH-, most spectral features are near OLI Band 7. Characteristic absorption bands occur around Band 7, and PCA is applied to enhance hydroxyl etching information using OLI Bands 2, 5, 6, and 7. After PCA processing, all etching information was enhanced. We then applied a 4 × 4 Gaussian low-pass filter to the relevant principal component images. We classified the principal components into anomaly classes using the threshold method. To determine the anomaly grade, we used X ten kδ, where δ is the standard deviation and X is the mean value.

3.2. Extraction of Information on Geochemical Anomalies

The geochemical anomaly information was derived from rock geochemical samples collected in the study area. Prior to analysis, erroneous and outlier data were removed to ensure that the distribution of each element approximated normality, which is essential for subsequent statistical processing [36]. As this study focuses on copper (Cu) mineralization, both the regional geological–tectonic background and previous geochemical investigations were considered. We start with a correlation analysis [37] of the 1:200,000 chemical exploration data, selecting elements that correlate strongly with Cu.

Subsequently, factor analysis was performed on these Cu-associated elements to extract representative geochemical factors. The dataset was evaluated using Bartlett’s sphericity test and the Kaiser–Meyer–Olkin (KMO) test to confirm its suitability for factor analysis [38]. After identifying elemental combinations, we can determine the anomalous lower limit of elemental concentration in our dataset using the Cumulative Frequency Method (CFM) [39]. In this approach, the cumulative frequency distribution of each element’s concentration is calculated to identify key inflection points in the dataset.

The 85% cumulative frequency criterion was adopted, meaning that the concentration value corresponding to an 85% cumulative frequency was defined as the lower threshold for anomaly detection. Based on these thresholds, elemental anomaly maps were constructed, allowing visualization of spatial patterns and identification of geochemical anomaly zones through the analysis of elemental associations and combination characteristics.

3.3. Knowledge–Data Collaboration and Graph Attention Network Model

3.3.1. Data Preprocessing Summary

As shown in Figure 3, this study examines the eastern Tien Shan copper ore belt using geological data. We focus on extracting and integrating key information for predicting copper mines. First, we analyse geological and geophysical data to identify fracture structures and ore-forming strata, including magma bodies. Then, we combine multi-spectral remote sensing images with hydroxyl and iron-stained erosion data. We identify anomalous areas for ore-forming elements using comprehensive geochemical data. Next, we convert and normalize the heterogeneous data into different formats based on the storage structure. We integrate this with multi-dimensional information on favorable elements for mineralisation and reconstruct it into a spatial raster using a GIS platform. We unify spatial resolution and channel dimensions through resampling. This process generates a multi-channel, comprehensive mineralisation feature raster dataset with a standard image element scale and aligned rows and columns, using waveband stacking technology.

3.3.2. GAT-Based Framework

The GAT model, which is based on graph structure, captures spatial correlations and nonlinear features among geological bodies. In this study, geological knowledge is not only used to select evidence layers but also explicitly embedded into the graph structure and attention propagation. The spatial relationships among geological units, faults, and alteration zones are encoded as graph edges and neighborhood weights, allowing the network to learn contextual dependencies guided by geological reasoning. In this sense, the proposed framework represents a form of knowledge–data collaboration, where domain knowledge constrains and informs data-driven learning rather than merely stacking input layers. It uses a self-attention mechanism to effectively model the complex relationships between mineralized nodes and multiple elements, such as fractures, alteration zones, and geochemical anomalies. GAT produces a new set of node features from an input set of node features. It obtains the graph attention layer through specific steps [40]:

(1): Calculate the attention coefficient function as follows:

$e_{i j} = a t t e n t i o n (W \vec{n_{i}}, W \vec{n_{j}}) = E L U ({\vec{a}}^{T} [W \vec{n_{i}} W \vec{n_{j}}])$

(1)

where W is the weight matrix. $e_{i j}$ indicates the importance of node $j ’ s$ features for node $i$ ; $a t t e n t i o n$ is a single-layer feed-forward neural network, and $E L U$ is a nonlinear activation function with a negative slope $α$ = 0.2.
(2): The attention coefficients were normalized using the $s o f t m a x (\cdot)$ Function.

$α_{i j} = {s o f t m a x}_{j} (e_{i j}) = \frac{e x p (e_{i j})}{\sum_{k \in N_{i}} e x p (e_{i j})}$

(2)
(3): Calculate the final node feature vector using a linear combination of the normalized attention coefficient and the original node features. In cases of nonlinear averaging, attention is applied to the final predictive layer.

$\vec{n_{i}^{’}} = σ (\frac{1}{P} \sum_{P = 1}^{p} \sum_{J} α_{i j}^{p} W^{p} \vec{n_{j}})$

(3)

where $α_{i j}^{p}$ is the normalized attention coefficient computed from the $p t h$ attention mechanism, and $W^{p}$ is the weight matrix for the corresponding linear transformation in the pth attention mechanism.

3.3.3. Graph Construction Process

To implement the Graph Attention Network (GAT), we explicitly construct a spatial adjacency graph from the rasterized multi-channel dataset [41]. The construction follows three steps: node definition, edge definition, and edge-weight initialization.

(1): Node definition: Each grid cell (30 m × 30 m) corresponds to one node $v_{i}$ with a feature vector $x_{i} \in R^{5}$ . This design allows the GAT to learn local spatial dependencies between adjacent geological units.
(2): Edge definition: GAT requires a graph structure that defines local neighborhoods for message passing. We construct an undirected spatial adjacency graph based on k-nearest-neighbour (k-NN) relationships. For each node, the four closest raster cells within its 3 × 3 spatial window are connected as neighbors (k = 4). This corresponds to a 4-neighbor topology that effectively captures local geological continuity without over-densifying the graph.
(3): Edge weighting: The standard GAT automatically learns adaptive attention coefficients ( $α_{i j}$ ) that represent the importance of one node to another during message passing [41]. This mechanism replaces fixed edge weights with learnable parameters, allowing the network to dynamically adjust the influence between connected geological knowledge nodes according to their spatial and feature correlations.

In summary, the graph structure strictly follows the GAT formulation—each raster cell is a node, adjacency defines the neighborhood, and the effective edge weights are learned adaptively through the attention mechanism rather than being predefined. This graph design allows the GAT to respect the irregular spatial topology of geological features, while still operating on a regular raster grid. It provides a flexible balance between spatial adjacency (topology) and feature-based attention (semantics), which is critical for mineral prospectivity mapping.

The edge weights in KDCGAT are designed to encode geological similarity and expert knowledge, including lithological continuity, structural orientation, and geochemical correlation. However, misrepresentation or omission of such knowledge may introduce bias into the graph structure and propagate through the attention mechanism, potentially affecting subsurface predictions. Therefore, geological inputs should be carefully verified and curated before model training to minimize uncertainty and ensure geological validity.

During the model’s iterative process, key parameters like mineralization element weight coefficients, learning rate strategies, and decision tree splitting criteria adjust dynamically through an adaptive parameter optimization mechanism. This drives the model to converge to its optimal state step by step [42]. On the technical side, we create a multimodal data processing framework based on Python 3.8. We integrate professional software like ArcGIS 10.8 (for spatial analysis), ENVI 5.6 (for remote sensing interpretation), and Golden Software Surfer (for geochemical field modelling) (https://www.goldensoftware.com/products/surfer/, accessed on 30 October 2025). This forms a collaborative technological chain for multi-source and heterogeneous data, ensuring an efficient connection between geoscientific big data and deep learning models.

3.4. Model Comparison and Evaluation Metrics

To evaluate the performance of the proposed Knowledge-Driven Contextual Graph Attention Network (KDCGAT), three representative machine learning models were selected for comparison: the Weight of Evidence (WoE) model, the Convolutional Neural Network (CNN), and the Graph Convolutional Network (GCN). These models represent three major categories of mineral prospectivity mapping (MPM) methods—statistical, grid-based deep learning, and graph-based deep learning, respectively.

(1): Weight of Evidence (WoE): WoE is a classical statistical approach widely applied in mineral prospectivity mapping [4]. It quantifies the correlation between known mineral occurrences and evidence layers (e.g., lithology, faults, geochemistry) by calculating conditional probabilities. Although effective and interpretable, WoE assumes independence among evidential layers and lacks the ability to model spatial interactions between features.
(2): Convolutional Neural Network (CNN): CNNs have been used to learn spatial patterns directly from rasterized geoscience data [11]. Each convolutional layer aggregates information from neighboring pixels to identify local geological features related to mineralization. However, CNNs are limited to fixed-grid structures and cannot flexibly represent irregular geological geometries or topological relationships.
(3): Graph Convolutional Network (GCN): GCNs extend deep learning to non-Euclidean domains by representing spatial data as graphs [9]. Each node aggregates information from its neighbors through a shared convolution operation, capturing spatial adjacency. Nonetheless, GCNs use uniform weighting in the aggregation process, which may overlook the varying geological relevance between neighboring nodes.

The proposed KDCGAT builds upon the advantages of GCN while addressing its limitations by incorporating knowledge-driven edge weights and a graph attention mechanism. This allows the model to assign adaptive importance to neighboring nodes according to their geological similarity, thereby achieving a more meaningful integration of domain knowledge and spatial relationships. All four models (WoE, CNN, GCN, and KDCGAT) were trained and evaluated using identical input datasets and the same training–testing split for fair comparison.

This study mainly used ROC curve and AUC value as evaluation indicators [43]. Determine the receiver operating characteristic (ROC) of the subject by plotting a set of thresholds or critical values, where True Positive Rate (TPR) is the vertical axis of the curve and False Positive Rate (FPR) is the horizontal axis. The area is used to measure the accuracy of the results under the curve (ROC). The expressions for TPR and FPR are as follows:

T P R = \frac{T P}{T P + F N}

(4)

F P R = \frac{T N}{F P + T N}

(5)

TP is the true positive rate, TN is the true negative rate, FP is the false positive rate, and FN is the false-negative rate.

AUC classifies the performance of prediction models into four categories: 0.5~0.7: low effect, 0.7~0.85: average effect, 0.85~0.95: perfect effect, and 1 indicates an ideal classifier.

4. Results

4.1. Geological Formations and Extraction of Ore-Bearing Strata

This study investigates the metallogenic and ore-controlling patterns of the eastern Tien Shan through an analysis of its tectonic–magmatic evolution, based on regional geological surveys and structural analysis. The main fault systems controlling copper mineralization were identified, with the Dacaotan Fault Zone and the central segment of the Kanggul Fault constituting the primary tectonic framework for ore formation. Most known copper deposits are distributed near these fault zones and are closely associated with Late Paleoproterozoic granitic intrusions [44].

Vector map layers representing magmatic strata and major fault structures were extracted as key evidential layers for mineralization analysis (Figure 4). Additionally, we extract raster layers of igneous rocks (Figure 5) and faults (Figure 6) using the ArcGIS platform to support subsequent spatial modeling and mineralization prediction.

4.2. Remote Sensing Alteration Extraction Results

This study utilizes the spectral sensitivity of the Landsat 8 OLI sensor in the near-infrared (NIR) and short-wave infrared (SWIR) bands to extract hydroxyl and iron-stained alteration anomalies through Principal Component Analysis (PCA) (Figure 7, Figure 8 and Figure 9). The classification of anomaly intensity reveals that iron-stained anomalies mainly run along the NNW-SSE direction. Their spatial features closely correspond to the regional structural lineaments. These anomalies display a clear banded pattern along the sides of the Kangguertag-Huangshan fault zone and the Aqikkuduk-Shaquanzi fault zone. Importantly, the areas with concentrated iron-stained anomalies correlate strongly with known copper mineralization points. This suggests that these anomalies can indicate copper enrichment zones.

Hydroxyl alteration anomalies exhibit a similar spatial distribution to the iron-stained zones. Their main bodies also display banded arrangements along major fault structures, with anomaly intensity gradually decreasing from the fault cores toward their margins. This distribution pattern is characteristic of epithermal mineralization systems. The presence of alteration assemblages such as kaolinization and sericitization within these zones provides direct surface evidence of supergene mineralization processes.

Significant overlapping zones of dual (iron-stained and hydroxyl) anomalies occur in the southern segment of the Aqikkuduk Fault and the eastern segment of the Shaquanzi Fault. These overlap zones correspond closely to the regional porphyry–skarn copper metallogenic belt. Considering the Late Paleozoic tectono-magmatic evolution of the eastern Tien Shan, the double-anomaly superposition zone in the northwestern part of the study area aligns spatially with Carboniferous intermediate-acid intrusive bodies. This spatial correspondence reflects the typical metallogenic characteristics of porphyry copper deposits, where alteration centers are structurally controlled. Therefore, the extraction of alteration anomaly information provides valuable metallogenic evidence and supports comprehensive mineral prospectivity analysis in the region.

4.3. Extraction of Geochemical Anomaly Information

According to the regional geological background and research objectives, the target element is Cu. Therefore, based on correlation analysis and relevant literature references [45,46], 14 copper-friendly elements were selected, including Ni, Co, Cr, Mn, P, Ti, V, Zn, Fe, Ag, Au, Mo, Pb, in total. Firstly, outliers were removed from the dataset, and the lower limits of anomalies for each element were determined based on the 85% CFM [47]. The correlation coefficients and lower limits of anomalies between each element and copper are shown in Table 3.

Based on the IBM SPSS Statistics 27.0.1 software platform, factor analysis was conducted on 14 copper mineralization elements. The KMO and Bartlett test results showed that the KMO value was 0.891, greater than 0.6, with a significance level of 0 and less than 0.05, meeting the double test criteria [38]. The corresponding explanatory total variance (Table 4) and rotation component matrix (Table 5) were further obtained. As shown in Table 4, three extracted factors together accounted for 60.995% of the total variance of the 14 original variables, indicating minimal information loss and a satisfactory factor structure. Therefore, these three factors were selected for further interpretation and spatial analysis.

The composition of each factor was determined based on the loadings of individual elements, reflecting their behavior under specific geological processes. The rotated component matrix (Table 5) indicates that three principal factors, each with eigenvalues greater than 1, were extracted: F1 (Co–Cu–Mn–P–Ti–V–Zn–Fe), F2 (Cr–Ni), and F3 (Ag–Au–Mo–Pb). Factor scores were subsequently used to study the spatial distribution characteristics of elements associated with different metallogenic processes.

Based on the spatial overlap relationship between different elements, define all anomalies with overlapping spatial positions as composite anomalies, and plot the anomalies of each element in the same factor on the same graph. Based on the geological background, artificially screen and eliminate abnormal areas with poor mineralization geological conditions and target element combinations within the region. For comprehensive anomalies with relatively large distribution areas, combined with geological conditions and the combination characteristics of elements composed of the same and different factors, artificial segmentation is used to determine the combination anomalies of each element.

Ultimately, this study identified a total of 76 composite anomalies, including 43 F1 (Co–Cu–Mn–P–Ti–V–Zn–Fe) anomaly zones (Figure 10a), 16 F2 (Cr–Ni) anomaly zones (Figure 10b), and 17 F3 (Ag–Au–Mo–Pb) anomaly zones (Figure 10c). According to the delineation of anomalies based on various factors, a comprehensive geochemical anomaly delineation map (Figure 10d) was obtained. Analysis shows that the geochemical delineation anomalies are close to the overall trend of the fault, and are mostly distributed near the fault, with good overlap with known copper deposits and strong prospecting potential.

4.4. Integration of Information on Favorable Elements for Mineralization

Based on the ArcGIS Pro platform, we extracted raster layers of five key metallogenic elements: geological structure, spatial distribution of magmatic rocks, geochemical synthesis anomalies, hydroxyl, and iron-stained advanced alteration. We use bilinear interpolation for spatial standardization to unify the layers to the same spatial resolution and to achieve row-column alignment. Then, we integrate the elemental layers into a 5-channel comprehensive dataset with strictly matched spatial coordinates by using multiband raster synthesis technology. While this multi-channel raster input represents the data-integration stage, the subsequent KDCGAT framework transforms it into a graph-based representation where geological knowledge defines the spatial adjacency and contextual interactions between cells. This step moves beyond data stacking and operationalizes expert knowledge through graph connectivity and adaptive attention weighting. These layers were resampled to match in image size and channel dimensions. Each layer was combined into bands, resulting in a comprehensive raster layer with five band features (Figure 11).

4.5. Model Performance

All experiments in this research model were conducted on a workstation (2 × 20-core Intel Gold 6133 CPU, 128 GB memory, and 48 GB VRAM RTX 4090 GPU). Figure 12 presents the ROC curves and AUC values for the four models: (a) WoE, (b) GCN, (c) CNN, and (d) KDCGAT. As shown, the proposed KDCGAT model achieves the highest classification performance across all classes, with AUC values exceeding 0.99 for all categories. The ROC curves of KDCGAT are consistently closer to the upper-left corner, indicating a stronger ability to distinguish between ore-bearing and barren samples.

In contrast, the WoE and CNN models exhibit lower AUC values (mostly between 0.85 and 0.96), suggesting a weaker capability to capture complex nonlinear and spatial dependencies in geological data. Although the GCN model improves upon CNN by introducing graph structural learning, its AUC values (0.93–0.97) are still slightly lower than those of KDCGAT, mainly because GCN treats neighboring nodes with uniform importance.

By incorporating an attention mechanism that adaptively weighs node relationships based on geological knowledge, KDCGAT enhances the representation of both local and regional spatial dependencies. This leads to more robust classification boundaries and improved discrimination accuracy, particularly for the complex and overlapping mineralization zones. Therefore, the ROC and AUC analyses quantitatively confirm that KDCGAT provides the most reliable and discriminative performance among all tested models.

4.6. Model Prediction Results and Comparative Analysis

This study applied the KDCGAT model to predict potential copper mineralization zones within the study area, establishing a knowledge–data dual-driven prediction framework. To evaluate model performance, the predictive capabilities of KDCGAT were compared with three representative models: the traditional Weight-of-Evidence (WOE) method, and two deep learning models—Graph Convolutional Network (GCN) and Convolutional Neural Network (CNN).

Prediction maps for each model are shown in Figure 13. Multi-dimensional validation (Table 6) gathered 71 known copper sites, including the Tuwu copper mine. The KDCGAT model’s level 1 prediction area contains 38.0% of these sites. The level 2 area covers 56.3%, and the level 3 area includes 85.9%. This represents a 7%, 11.3%, and 19.7% improvement in accuracy over the WOE, GCN, and CNN models, respectively.

Although the WOE model achieved relatively high accuracy, its reliance on linear statistical assumptions limits its ability to capture nonlinear spatial relationships between evidential layers and ore occurrences. Consequently, it tends to produce fragmented grid-based predictions, insufficiently reflecting the geological controls on mineralization and lacking interpretability [48].

The GCN model, which employs a symmetrically normalized adjacency matrix, exhibited an overall prediction pattern similar to that of KDCGAT. However, due to the over-smoothing effect caused by its fixed aggregation rules, the boundary precision of its predictions was lower. In contrast, the CNN model, implemented based on a U-Net architecture to process rasterized evidential layers, is constrained by Euclidean spatial assumptions. This leads to higher prediction errors when modeling non-Euclidean relationships inherent in geological data.

These comparative experiments clearly demonstrate the advantages of the KDCGAT model. The attention mechanism effectively captures nonlinear interactions among multiple evidential layers, while the knowledge embedding module enhances model interpretability. By integrating geological process understanding with deep learning inference, the KDCGAT model significantly improves the reliability and practicality of mineral prospectivity prediction. The proposed “fracture–rock body–alteration–geochemical anomaly” quaternary attention subnetwork provides a robust decision-support basis for identifying regional copper mineralization targets.

4.7. Ablation Experiment

To test the “knowledge–data” collaboration driven method for predicting copper ore, this paper sets up an ablation experiment. It compares the KDCGAT model with the GAT baseline model, which does not use geological data. The results include a prediction map (Figure 14) and an accuracy table (Table 7) for different prediction zones. The experiments show that the GAT model only predicts 25 points in the Level 1 area, 28 points in the Level 2 area, and 46 points in the Level 3 area. KDCGAT improves prediction accuracy by 2.8%, 16.9%, and 21.1% compared to Baseline GAT at each level, while keeping other conditions the same.

The 21.1% improvement achieved by KDCGAT over the baseline GAT is not merely due to the inclusion of additional geological features. Rather, it stems from the integration of domain knowledge into graph construction and attention-based feature propagation. By embedding geological connectivity information—such as faults and lithological boundaries—into the adjacency structure, and allowing attention weights to reflect the relative influence of neighboring geological units, KDCGAT effectively captures knowledge-driven spatial dependencies. This confirms that the proposed model performs genuine knowledge integration rather than simple data aggregation.

These results validate the effectiveness of the knowledge–data dual-drive framework for copper prospectivity prediction. They also highlight the performance gap between traditional purely data-driven models and models informed by geological knowledge, providing a solid theoretical foundation for further research in knowledge-guided mineral prediction.

4.8. Circling and Evaluation of Forecast Areas

Using the KDCGAT model for prediction and training, this study identifies eight mineralization prediction areas based on mineralization conditions and search indicators. Following the prediction area circling principle, these areas are split into 5 Class A and 3 Class B (Figure 15). Each area is described below:

A1: This zone is located within the Kangurtag–Huangshan Fault Zone, along the southern margin of the Turpan–Hami Basin. It encompasses the well-known Tuwu–Yandong porphyry copper deposits. The dominant fracture orientations are nearly east–west, northwest, and northeast. The exposed strata primarily include the Carboniferous Pengquanshan Group, the Jurassic Xishanyao Formation, and Quaternary deposits. Copper mineralization mainly occurs in the Carboniferous plagioclase granite porphyry and the Pengquanshan Group, defining a Co–Cu–Mn–P–Ti–V–Zn–Fe high anomaly zone.

A2 and A3: These prediction zones cover several known deposits within the Achishan–Yamansu island arc belt. The principal stratigraphic units include the Ordovician Yamansu Formation and the Silurian Achishan Formation. Porphyry copper mineralization is primarily hosted in the Lower Devonian andesitic volcanic and subvolcanic rocks, forming another Co–Cu–Mn–P–Ti–V–Zn–Fe enrichment zone.

A4 and A5: These zones are situated within the Achikkuske–Shaquanzi Fault Zone of the Middle Tien Shan Island Arc Belt, where several known mineralization sites are distributed. The Ordovician–Devonian volcano-sedimentary rock series serves as both the source and the host for mineralization. Devonian–Carboniferous intermediate–acid magmatism provided ore-forming fluids, while the Achikkuske–Shaquanzi Fault and its subsidiary structures acted as conduits for hydrothermal fluid migration and ore precipitation.

B1, B2, and B3: Although no known copper deposits have been identified within these three prediction zones, they exhibit favorable tectonic, stratigraphic, and magmatic conditions for mineralization. The strong spatial coincidence of geochemical and alteration anomalies indicates promising metallogenic potential. These zones are therefore considered prospective copper targets worthy of further geological and geochemical exploration.

5. Conclusions and Discussion

The KDCGAT model for copper mining was constructed based on the fusion of GAT and multimodal geoscience data. It has been applied in the eastern Tien Shan copper mine in Xinjiang. The main conclusions are as follows:

(1): The accuracy of KDCGAT in copper prediction reaches 85.9%, which is 7%, 11.3%, and 19.7% higher than that of Weight of Evidence (WoE), Graph Convolutional Network (GCN), and Convolutional Neural Network (CNN), respectively. The ablation experiment confirms that the geological knowledge-driven module improves the prediction accuracy of KDCGAT by 21.1% compared with that of the baseline GAT model, which verifies the key role of the “knowledge–data” collaborative framework in reducing the over-smoothing effect and optimizing the allocation of feature weights.
(2): The data standardization process on the ArcGIS platform combines five key metallogenic elements: fractures, magmatic rocks, hydroxyl/iron-stained erosion, and geochemical anomalies. It creates standardized feature cubes and offers quality inputs for global–local feature aggregation in GAT.
(3): The model identified eight zones for mineralization prediction. The A1-A5 zones are closely linked to known mine sites. The Class B zones suggest possible exploration targets. This provides a strong basis for future exploration.

While the KDCGAT framework demonstrates strong predictive performance in the eastern Tien Shan, we acknowledge that its effectiveness may vary in geologically distinct regions due to differences in tectonic evolution, lithology, and mineralization patterns. When applied to other metallogenic belts, recalibration or fine-tuning of model parameters and attention weights using local geological, geochemical, and geophysical data may be necessary. Nevertheless, the methodology is inherently generalizable. Its graph-based architecture can flexibly integrate heterogeneous, partially missing, and multi-scale datasets, making it well suited for application to polymetallic and concealed ore systems characterized by data sparsity and complex geological structures. Such adaptability underscores the potential of KDCGAT as a transferable framework for mineral prospectivity mapping across diverse tectonic environments.

In the future, we will further explore the integration of 3D geological modeling with dynamic processes. Currently, temporal evolution information of tectonic deformation and hydrothermal fluid migration is unavailable at sufficient spatial resolution, which limits the model to static predictions. Extending the current 2D framework to 3D geological modeling will allow the incorporation of stratigraphic depth, fault dip, and subsurface alteration zones, enabling more accurate characterization of complex ore-forming systems in both space and time.

Author Contributions

Conceptualization, Y.W. and S.S.; Data curation, X.C.; Formal analysis, S.S.; Funding acquisition, Y.W.; Investigation, S.S., Y.D., Z.A. and M.A.B.; Methodology, S.S., Y.W. and X.C.; Project administration, S.S.; Resources, Y.W.; Software, S.S. and X.C.; Supervision, S.S., Y.W.; Validation, S.S. and Y.W.; Visualization, J.T.; Writing—original draft, S.S. and Y.N.; Writing—review and editing, S.S. and Y.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by national key R & D Program of China (2023YFC2907105, 2021YFC2901801, 2023YFC2906903), key project of national Natural Science Foundation of China (42230810), key science & technology support project of Ministry of Natural Resources of China (ZKKJ202419) and the Geological Exploration Fund of Xinjiang Uygur Autonomous Region of China (K24-XJ009).

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Acknowledgments

We would like to thank the anonymous referee for his/her constructive comments that improved the manuscript. We also appreciate the researchers from all participating institutions involved in this project.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Akbari, S.; Ramazi, H.; Ghezelbash, R. A Novel Framework for Optimizing the Prediction of Areas Favorable to Porphyry-Cu Mineralization: Combination of Ant Colony and Grid Search Optimization Algorithms with Support Vector Machines. Nat. Resour. Res. 2025, 34, 703–729. [Google Scholar] [CrossRef]
Jiang, F.; Li, N.; Zhou, L.L. Grain Segmentation of Sandstone Images Based on Convolutional Neural Networks and Weighted Fuzzy Clustering. IET Image Process. 2020, 14, 3499–3507. [Google Scholar] [CrossRef]
Mery, N.; Maleki, M.; País, G.; Molina, A.; Cáceres, A.; Emery, X. Fuzzy Classification of Mineral Resources: Moving Toward Overlapping Categories to Account for Geological, Economic, Metallurgical, Environmental, and Operational Criteria. Nat. Resour. Res. 2025, 34, 1271–1299. [Google Scholar] [CrossRef]
Ma, L.; Yao, W.; Dai, X.; Jia, R. A New Evidence Weight Combination and Probability Allocation Method in Multi-Sensor Data Fusion. Sensors 2023, 23, 722. [Google Scholar] [CrossRef]
Molla, S.H.; Rukhsana. Fuzzy-AHP and GIS-Based Modeling for Food Grain Cropping Suitability in Sundarban, India. Nat. Resour. Res. 2024, 33, 1913–1940. [Google Scholar] [CrossRef]
Yang, F.; Mao, J.; Wang, Y.; Bierlein, F.P. Geology and geochemistry of the Bulong quartz–barite vein-type gold deposit in the Xinjiang Uygur Autonomous Region, China. Ore Geol. Rev. 2006, 29, 52–76. [Google Scholar] [CrossRef]
Tian, M.; Xie, Z.; Qiu, Q.; Wu, Q.; Chen, J.; Duan, Y.; Tao, L. PMinrKG: Polymetallic mineral resources knowledge graph construction and its applications integrated with multimodal data. Int. J. Digit. Earth 2025, 18, 2494285. [Google Scholar] [CrossRef]
Wang, R.; Xue, L.; Li, Y.; Wang, J.; Yan, Q.; Ran, X. Mineral prospectivity prediction based on the dynamic relation model Atten-GCN: A case study of gold prospecting in the Yingfengjie area, Shaanxi province (northern China). Ore Geol. Rev. 2025, 176, 106399. [Google Scholar]
Zuo, R.; Carranza, E.J.M. Machine Learning-Based Mapping for Mineral Exploration. Math. Geosci. 2023, 55, 891–895. [Google Scholar] [CrossRef]
Xiong, Y.; Zuo, R.; Carranza, E.J.M. Mapping mineral prospectivity through big data analytics and a deep learning algorithm. Ore Geol. Rev. 2018, 102, 811–817. [Google Scholar] [CrossRef]
Tang, C.; Ye, Z.; Zhao, H.; Bai, L.; Lin, J. DeepSCNN: A simplicial convolutional neural network for deep learning. Appl. Intell. 2025, 55, 281. [Google Scholar] [CrossRef]
Puzyrev, V.; Zelic, M.; Duuring, P. Applying neural networks-based modelling to the prediction of mineralization: A case-study using the Western Australian Geochemistry (WACHEM) database. Ore Geol. Rev. 2023, 152, 105242. [Google Scholar] [CrossRef]
Parsa, M.; Lawley, C.J.M.; Cawood, T.; Martins, T.; Cumani, R.; Zhang, S.E.; Thompson, A.; Schetselaar, E.; Beyer, S.; Lentz, D.R.; et al. Pan-Canadian Predictive Modeling of Lithium–Cesium–Tantalum Pegmatites with Deep Learning and Natural Language Processing. Nat. Resour. Res. 2025, 34, 639–668. [Google Scholar] [CrossRef]
Bae, J.-H.; Yu, G.-H.; Lee, J.-H.; Vu, D.T.; Anh, L.H.; Kim, H.-G.; Kim, J.-Y. Superpixel Image Classification with Graph Convolutional Neural Networks Based on Learnable Positional Embedding. Appl. Sci. 2022, 12, 9176. [Google Scholar] [CrossRef]
Sihombing, F.M.; Palin, R.M.; Hughes, H.S.; Robb, L.J. Improved mineral prospectivity mapping using graph neural networks. Ore Geol. Rev. 2024, 172, 106215. [Google Scholar] [CrossRef]
Cao, C.; Wang, X.; Yang, F.; Xie, M.; Liu, B.; Kong, Y.; Li, C.; Zhou, Z. Attention-driven graph convolutional neural networks for mineral prospectivity mapping. Ore Geol. Rev. 2025, 180, 106554. [Google Scholar] [CrossRef]
Yang, F.; Zuo, R. Geologically Constrained Convolutional Neural Network for Mineral Prospectivity Mapping. Math Geosci. 2024, 56, 1605–1628. [Google Scholar] [CrossRef]
Liu, C.; Wang, W.; Tang, J.; Wang, Q.; Zheng, K.; Sun, Y.; Zhang, J.; Gan, F.; Cao, B. A deep-learning-based mineral prospectivity modeling framework and workflow in prediction of porphyry–epithermal mineralization in the Duolong ore District, Tibet. Ore Geol. Rev. 2023, 157, 105419. [Google Scholar] [CrossRef]
Qaderi, S.; Maghsoudi, A.; Yousefi, M.; Pour, A.B. Translation of mineral system components into time step-based ore-forming events and evidence maps for mineral exploration: Intelligent mineral prospectivity mapping through adaptation of recurrent neural networks and random forest algorithm. Ore Geol. Rev. 2025, 179, 106537. [Google Scholar] [CrossRef]
Juliani, C.; Juliani, E. Deep learning of terrain morphology and pattern discovery via network-based representational similarity analysis for deep-sea mineral exploration. Ore Geol. Rev. 2021, 129, 103936. [Google Scholar] [CrossRef]
Zhao, H.; Liao, Q.; Li, S.; Xiao, D.; Wang, G.; Guo, R.; Xue, Z.; Li, X. Early Paleozoic tectonic evolution and magmatism in the Eastern Tianshan, NW China: Evidence from geochronology and geochemistry of volcanic rocks. Gondwana Res. 2022, 102, 354–371. [Google Scholar] [CrossRef]
Lou, W.; Zhang, D. Applications of Deep Learning in Mineral Discrimination: A Case Study of Quartz, Biotite and K-Feldspar from Granite. J. Earth Sci. 2025, 36, 29–45. [Google Scholar] [CrossRef]
Yu, X.; Yu, P.; Wang, K.; Cao, W.; Zhou, Y. Data-Driven Mineral Prospectivity Mapping Based on Known Deposits Using Association Rules. Nat. Resour. Res. 2024, 33, 1025–1048. [Google Scholar] [CrossRef]
Zhang, Z.; Wang, G.; Carranza, E.J.M.; Li, Y.; Liu, X.; Peng, W.; Fan, J.; Xu, F. Mapping of Gold Prospectivity in the Qingchengzi Pb–Zn–Ag–Au Polymetallic District, China, with Ensemble Learning Algorithms. Nat. Resour. Res. 2025, 34, 41–60. [Google Scholar] [CrossRef]
Luo, H.; Guo, N.; Li, C.; Jiang, H. Prediction of Lithium Mineralization Potential in the Jiulong Area, Western Sichuan (China), Using Spectral Residual Attention Convolutional Neural Network. Nat. Resour. Res. 2025, 34, 1331–1350. [Google Scholar] [CrossRef]
Yuan, J.; Cao, M.; Cheng, H.; Yu, H.; Xie, J.; Wang, C. A unified structure learning framework for graph attention networks. Neurocomputing 2022, 495, 194–204. [Google Scholar] [CrossRef]
LeCun, Y.; Bengio, Y.; Hinton, G. Deep Learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef]
Xu, Y.; Li, Z.; Xie, Z.; Cai, H.; Niu, P.; Liu, H. Mineral prospectivity mapping by deep learning method in Yawan-Daqiao area, Gansu. Ore Geol. Rev. 2021, 138, 104316. [Google Scholar] [CrossRef]
Feng, W.; Zheng, J.; Shen, P. Petrology, mineralogy, and geochemistry of the Carboniferous Katbasu Au-Cu deposit, western Tianshan, Northwest China: Implications for petrogenesis, ore genesis, and tectonic setting. Ore Geol. Rev. 2023, 161, 105659. [Google Scholar] [CrossRef]
Tang, Q.; Sun, W.; Ao, S.; Fu, L.-Y.; Xiao, W. Strong lateral heterogeneities of upper mantle shear-wave structures beneath the central and eastern Tien Shan. Int. J. Earth Sci. (Geol. Rundsch.) 2022, 111, 2555–2569. [Google Scholar] [CrossRef]
Soloviev, S.G.; Kryazhev, S.G.; Semenova, D.V.; Kalinin, Y.A.; Bortnikov, N.S. Late Paleozoic Potassic Intrusions of the Eastern Part of the Nikolaev Line and Associated W–Mo–Cu–Au Mineralization: First Isotopic U–Pb Zircon Data (LA-ICP-MS Method) for Rocks from the Adyrtor Intrusions (Middle Tien Shan, Eastern Kyrgyzstan). Dokl. Earth Sci. 2024, 517, 1288–1296. [Google Scholar] [CrossRef]
Zhang, Z.; Yin, F.; Zhu, Y.; Liu, L. Lithologic Mapping in the Karamaili Ophiolite–Mélange Belt in Xinjiang, China, with Machine Learning and Integration of SDGSAT-1 TIS, Landsat-8 OLI and ASTER-GDEM. Nat. Resour. Res. 2025, 34, 1437–1465. [Google Scholar] [CrossRef]
Zerai, F.T.; Gorsevski, P.V.; Panter, K.S.; Farver, J.; Tangestani, M.H. Integration of ASTER and Soil Survey Data by Principal Components Analysis and One-Class Support Vector Machine for Mineral Prospectivity Mapping in Kerkasha, Southwestern Eritrea. Nat. Resour. Res. 2023, 32, 2463–2493. [Google Scholar] [CrossRef]
Deng, S.; Guo, N.; Tang, N.; Shi, W.; Li, X.; Li, C.; Zhou, W.; Huang, C.; Luo, H. Indicator mineral characteristics of potassic alteration zones in porphyry copper deposits based on infrared spectroscopy technology: A case study of the Qulong porphyry copper deposit, Tibet. Ore Geol. Rev. 2025, 178, 106475. [Google Scholar] [CrossRef]
Chen, Q.; Cai, D.; Xia, J.; Zeng, M.; Yang, H.; Zhang, R.; He, Y.; Zhang, X.; Chen, Y.; Xu, X.; et al. Remote sensing identification of hydrothermal alteration minerals in the Duobuza porphyry copper mining area in Tibet using WorldView-3 and GF-5 data: The impact of spatial and spectral resolution. Ore Geol. Rev. 2025, 180, 106573. [Google Scholar] [CrossRef]
Zhao, M.; Jin, Y.; Dong, J.; Zheng, J.; Xia, Q. A Novel Multifractal Method for Geochemical Element Distribution Analysis. Nat. Resour. Res. 2025, 34, 619–637. [Google Scholar] [CrossRef]
Zhang, J.; Ge, X.; Hou, X.; Han, L.; Zhang, Z.; Feng, W.; Zhou, Z.; Luo, X. Strategies for Soil Salinity Mapping Using Remote Sensing and Machine Learning in the Yellow River Delta. Remote Sens. 2025, 17, 2619. [Google Scholar] [CrossRef]
Sheng, S.; Niu, X.; Pan, J.; Jiang, L.; Sun, Y.; Wang, H.; Wang, K. Research on smoke relative concentration identification method based on Landsat8-OLI multispectral data and multivariate analysis. Int. J. Remote Sens. 2023, 44, 3550–3571. [Google Scholar] [CrossRef]
Cao, S.; Guo, Z.; Sun, C.; Yuan, L.; Wu, X. Decarbonization of urban heating systems: Mining water and electricity saving and carbon reduction potentials in substations via multi-step clustering and cumulative frequency method. Sustain. Cities Soc. 2025, 130, 106525. [Google Scholar] [CrossRef]
Zuo, R.; Xu, Y. Graph Deep Learning Model for Mapping Mineral Prospectivity. Math. Geosci. 2023, 55, 1–21. [Google Scholar] [CrossRef]
Velikovi, P.; Cucurull, G.; Casanova, A.; Romero, A.; Liò, P.; Bengio, Y. Graph Attention Networks. arXiv 2017, arXiv:1710.10903. [Google Scholar] [CrossRef]
Yuan, B.; Wang, Q.; Xu, W.; He, C.; Xie, W. An improved extreme learning machine algorithm for prospectivity mapping of copper deposits using multi-source remote sensing data: A case study in the North Altyn Tagh, Xinjiang, China. Int. J. Digit. Earth 2025, 18, 2510567. [Google Scholar] [CrossRef]
Xie, L.; Zhang, R.; Zhan, J.; Li, S.; Shama, A.; Zhan, R.; Wang, T.; Lv, J.; Bao, X.; Wu, R. Wildfire Risk Assessment in Liangshan Prefecture, China Based on An Integration Machine Learning Algorithm. Remote Sens. 2022, 14, 4592. [Google Scholar] [CrossRef]
Ding, J.; Han, C.; Xiao, W.; Wang, Z.; Song, D. Geochronology, geochemistry and Sr-Nd isotopes of the granitic rocks associated with tungsten deposits in Beishan district, NW China, Central Asian Orogenic Belt: Petrogenesis, metallogenic and tectonic implications. Ore Geol. Rev. 2017, 89, 441–462. [Google Scholar] [CrossRef]
Chen, Y.; Cheng, Q. A deep learning-based method for relationship exploration between geochemical elements. J. Appl. Geophys. 2021, 194, 104320. [Google Scholar]
Grunsky, E.C.; Agterberg, F.P. Spatial and multivariate analysis of geochemical data from metavolcanic rocks in the Ben Nevis area. Ontario. Math. Geol. 1988, 20, 825–861. [Google Scholar] [CrossRef]
Liu, B.; Guo, S.; Wei, Y.; Zhan, Z. A Fast Independent Component Analysis Algorithm for Geochemical Anomaly Detection and Its Application to Soil Geochemistry Data Processing. J. Appl. Math. 2014, 1, 1–12. [Google Scholar] [CrossRef]
Chen, Y.; Sui, Y. Dictionary learning for integration of evidential layers for mineral prospectivity modeling. Ore Geol. Rev. 2022, 141, 104649. [Google Scholar] [CrossRef]

Figure 1. (a) Map of China; (b) geographical location of the eastern Tien Shan research area; (c) geological map of the eastern Tien Shan research area.

Figure 2. Preprocessed Landsat8-OLI remote sensing imagery.

Figure 3. General workflow diagram of the KDCGAT model.

Figure 4. Faults and igneous rocks extraction.

Figure 5. Extraction of the faults raster layer.

Figure 6. Extraction of igneous rocks raster layer.

Figure 7. Extraction of anomalous information on iron-stained corrosion.

Figure 8. Hydroxyl etch anomaly information extraction.

Figure 9. Spatial relationships between alteration anomalies, copper deposits, and faults.

Figure 10. Geochemical comprehensive anomaly delineation map: (a) F1, (b) F2, (c) F3, (d) comprehensive anomaly delineation.

Figure 11. Comprehensive raster layer with five band features.

Figure 12. ROC curves and AUC values for each model: (a) WOE, (b) GCN, (c) CNN, (d) KDCGAT.

Figure 13. Copper prediction maps for each model: (a) WOE, (b) GCN, (c) CNN, (d) KDCGAT.

Figure 14. Comparison of (a) Baseline GAT Model and (b) KDCGAT Model Predictions.

Figure 15. Map showing KDCGAT model mineralization prediction area circling.

Table 1. The four types of data aggregation and preprocessing used in this study.

Data Type	Source	Native Resolution	Preprocessing Steps	Role in Model
Geological map (vector)	Xinjiang Academy of Geological Research	1:5000	Digitize → rasterize 30 m → topology smoothing	fracture & lithology channel
Geochemical samples (point)	Xinjiang Academy of Geological Research	1:20,000	Outlier removal → 85% CFM thresholding → Ordinary Kriging to 30 m grid	geochemical anomaly channel
Gravity (gridded)	Xinjiang Academy of Geological Research	1:20,000	Regional trend removal → Bilinear interpolation to 30 m	fracture channel
Landsat 8 OLI (multispectral)	USGS	30 m bands	Radiometric calibration, atmospheric correction, PCA (bands 2, 4, 5, 6 for Fe; 2, 5, 6, 7 for OH) → Gaussian filter → thresholding	alteration channels

Table 2. Technical performance and attributes of Landsat8-OLI data.

Band Number	Spectral Range (μm)	Signal-to-Noise Ratio	Spatial Resolution (m)
1-COASTAL/AEROSOL	0.43–0.45	130	30
2-Blue	0.45–0.51	130	30
3-Green	0.53–0.59	100	30
4-Red	0.64–0.67	90	30
5-NIR	0.85–0.88	90	30
6-SWIR1	1.57–1.65	100	30
7-SWIR2	2.11–2.29	100	30
8-PAN	0.50–0.68	80	15
9-Cirrus	1.36–1.38	50	30

Table 3. Correlation coefficients and lower limit of anomaly for each element with Cu.

Element	Correlation Coefficient with Cu	The Lower Limit for Anomalies
Ag	0.123	63
Au	0.036	2
Co	0.577	12.375
Cr	0.336	51.9
Cu	1.000	31.08
Mn	0.509	814
Mo	0.176	1.064
Ni	0.304	23.2
P	0.348	631.49
Pb	0.015	19.9
Ti	0.472	3816
V	0.588	94.2
Zn	0.473	66
Fe	0.583	4.86

Table 4. The factor explains the total variance of the original variable.

Ingredient	Raw Eigenvalues			Extract the Sum of the Squares of the Loads			Rotating Load Sum of Squares
Ingredient	Total	Percentage of Variance	Accumulate	Total	Percentage of Variance	Accumulate	Total	Percentage of Variance	Accumulate
1	5.696	40.689	40.689	5.696	40.689	40.689	4.982	35.589	35.589
2	1.552	11.087	51.776	1.552	11.087	51.776	2.014	14.384	49.973
3	1.291	9.219	60.995	1.291	9.219	60.995	1.543	11.022	60.995

Table 5. Rotate the composition matrix.

Variable/Element	1	2	3
Ag	0.096	−0.053	0.743
Au	−0.011	0.035	0.172
Co	0.749	0.503	−0.044
Cr	0.277	0.885	0.031
Cu	0.670	0.187	0.165
Mn	0.766	0.105	0.098
Mo	0.120	−0.007	0.546
Ni	0.220	0.896	0.072
P	0.651	0.053	0.032
Pb	−0.067	0.014	0.689
Ti	0.858	0.185	−0.132
V	0.884	0.199	−0.063
Zn	0.694	0.013	0.343
Fe	0.899	0.218	−0.055

Table 6. Proportion of sites in each model prediction area Each of the four models predicts the proportion of stations in the predicted area.

Models	Level 1	Level 2	Level 3
WOE	24/71 (33.8%)	35/71 (49.3%)	56/71 (78.9%)
GCN	19/71 (26.8%)	31/71 (43.7%)	53/71 (74.6%)
CNN	8/71 (11.3%)	27/71 (38.0%)	47/71 (66.2%)
KDCGAT	27/71 (38.0%)	40/71 (56.3%)	63/71 (85.9%)

Table 7. Proportion of sites in each model prediction area.

Models	Level 1	Level 2	Level 3
GAT	25/71 (35.2%)	28/71 (39.4%)	46/71 (64.8%)
KDCGAT	27/71 (38.0%)	40/71 (56.3%)	63/71 (85.9%)

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Sheng, S.; Wang, Y.; Tian, J.; Chen, X.; Ning, Y.; Dong, Y.; Bilal, M.A.; An, Z. Knowledge–Data Collaboration-Driven Mineral Prospectivity Prediction with Graph Attention Networks. Minerals 2025, 15, 1164. https://doi.org/10.3390/min15111164

AMA Style

Sheng S, Wang Y, Tian J, Chen X, Ning Y, Dong Y, Bilal MA, An Z. Knowledge–Data Collaboration-Driven Mineral Prospectivity Prediction with Graph Attention Networks. Minerals. 2025; 15(11):1164. https://doi.org/10.3390/min15111164

Chicago/Turabian Style

Sheng, Shiting, Yongzhi Wang, Jiangtao Tian, Xingyu Chen, Yan Ning, Yuhao Dong, Muhammad Atif Bilal, and Zhaofeng An. 2025. "Knowledge–Data Collaboration-Driven Mineral Prospectivity Prediction with Graph Attention Networks" Minerals 15, no. 11: 1164. https://doi.org/10.3390/min15111164

APA Style

Sheng, S., Wang, Y., Tian, J., Chen, X., Ning, Y., Dong, Y., Bilal, M. A., & An, Z. (2025). Knowledge–Data Collaboration-Driven Mineral Prospectivity Prediction with Graph Attention Networks. Minerals, 15(11), 1164. https://doi.org/10.3390/min15111164

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Knowledge–Data Collaboration-Driven Mineral Prospectivity Prediction with Graph Attention Networks

Abstract

1. Introduction

2. Geological Background and Data Sources

2.1. Regional Geological Background

2.2. Data Sources and Pre-Processing

3. Methods

3.1. Remote Sensing Alteration Information Extraction

3.2. Extraction of Information on Geochemical Anomalies

3.3. Knowledge–Data Collaboration and Graph Attention Network Model

3.3.1. Data Preprocessing Summary

3.3.2. GAT-Based Framework

3.3.3. Graph Construction Process

3.4. Model Comparison and Evaluation Metrics

4. Results

4.1. Geological Formations and Extraction of Ore-Bearing Strata

4.2. Remote Sensing Alteration Extraction Results

4.3. Extraction of Geochemical Anomaly Information

4.4. Integration of Information on Favorable Elements for Mineralization

4.5. Model Performance

4.6. Model Prediction Results and Comparative Analysis

4.7. Ablation Experiment

4.8. Circling and Evaluation of Forecast Areas

5. Conclusions and Discussion

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI