CADF-Net: A Conflict-Aware Adaptive Distillation Network for Fusing Multi-Source Land-Cover Products for Key Vegetation Classes in Cross-Border Regions

Zhang, Yubo; Fu, Long; Li, Zehong; Yang, Yuanyuan; Chen, Hongbing; Zhang, Shuwen

doi:10.3390/rs18091294

Open AccessArticle

CADF-Net: A Conflict-Aware Adaptive Distillation Network for Fusing Multi-Source Land-Cover Products for Key Vegetation Classes in Cross-Border Regions

by

Yubo Zhang

¹

,

Long Fu

¹,

Zehong Li

^2,*,

Yuanyuan Yang

³

,

Hongbing Chen

¹ and

Shuwen Zhang

⁴

¹

College of Information Technology, Jilin Agricultural University, Changchun 130118, China

²

Institute of Geographic Sciences and Natural Resources Research, Chinese Academy of Sciences, Beijing 100101, China

³

School of Public Administration and Policy, Renmin University of China, Beijing 100872, China

⁴

Northeast Institute of Geography and Agroecology, Chinese Academy of Sciences, Changchun 130102, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2026, 18(9), 1294; https://doi.org/10.3390/rs18091294

Submission received: 3 March 2026 / Revised: 21 April 2026 / Accepted: 22 April 2026 / Published: 24 April 2026

(This article belongs to the Special Issue Advanced AI Technology for Remote Sensing Analysis (Second Edition))

Download

Browse Figures

Versions Notes

Highlights

What is the main finding?

The proposed Conflict-aware Adaptive Distillation Fusion Network (CADF-Net) improves both overall accuracy and spatial consistency by explicitly modeling inter-product discrepancies during the fusion of multi-source land-cover products, particularly for key vegetation classes in cross-border regions. CADF-Net enhances discriminative performance for confusion-prone classes (cropland, forestland, and grassland), effectively balancing boundary regularity and fine-scale structural detail.

What are the implications of the main findings?

Introducing a pixel-level Conflict Index (CI) enables spatially differentiated learning, offering a generalizable strategy for resolving boundary inconsistencies in heterogeneous landscapes.
Producing spatially coherent maps of cropland, forestland, and grassland in cross-border regions improves the reliability of transboundary ecological assessment and supports harmonized land-resource management.

Abstract

Cross-border regions often exhibit complex vegetation-related land-cover patterns due to contrasting natural conditions and divergent development trajectories, causing multi-source land-cover products to suffer from disagreements in class assignment and boundary delineation, especially for cropland, forestland, and grassland. Because border zones are rarely mapping priorities, classification instability near national boundaries undermines transboundary comparisons. To address this, we propose a Conflict-aware Adaptive Distillation Fusion Network (CADF-Net) that fuses multi-source land-cover products to improve the discrimination and spatial consistency of key vegetation classes in cross-border regions. Taking the transnational China–Russia border (Sanjiang Plain and Primorskiy Kray) as a representative case, we integrate geo-environmental factors and introduce a pixel-level Conflict Index (CI) to explicitly steer the model toward discrepancy-prone areas. Building on this, we develop an Adaptive Distillation U-Net (AD-UNet) with uncertainty-adaptive distillation and employ a confidence-guided, dynamically weighted ensemble to generate the final fused land-cover product (CADF-LC). Quantitative assessments demonstrate that CADF-LC achieved an OA of 0.8600, a Kappa of 0.8133, and an mIoU of 0.7589, outperforming all input land-cover products. Compared with the strongest input product, Esri Land Cover, CADF-LC improved OA by 0.0150 and mIoU by 0.0222. Furthermore, it effectively mitigates the trade-off between detail loss and morphological fragmentation. Ultimately, CADF-Net enhances classification stability for key vegetation classes, offering a reliable foundation for transboundary ecological monitoring and land management.

Keywords:

land-cover products; geo-environmental factors; cross-border region; conflict index; U-Net; knowledge distillation

1. Introduction

Land-cover data provide a fundamental basis for understanding land-surface structure and associated ecological processes, thereby supporting global change research, ecosystem function assessment, and resource and environmental management [1]. With intensifying climate change and mounting ecological pressures, producing land-cover data that are accurate, spatially consistent, and comparable across regions has become a central objective in geographic information science on a global scale [2]. Such high-quality data underpin a broad range of applications, including ecological conservation planning (e.g., conservation priority zoning), food security monitoring and early-warning systems, sustainable grazing and stocking-rate management, evaluation of ecological restoration initiatives (e.g., the Grain for Green Program), and ecosystem service accounting [3].

In recent years, numerous global and regional land-cover products have become available, providing an important basis for studies of land-surface processes and related resource and environmental applications [4]. However, substantial inconsistencies in spatial patterns and classification accuracy persist among these products due to differences in data sources, classification systems, observation scales, and image acquisition timing [5,6]. These discrepancies are spatially structured rather than random: they tend to cluster in fragmented landscapes, areas with frequent land-cover transitions, and regions with intensive human activities, where any single product may fail to represent actual land-surface conditions consistently [7].

Among major land-cover types, cropland, forestland, and grassland constitute core components of terrestrial ecosystems. They are essential for food security and livestock production and also play key roles in carbon cycling and climate regulation [8,9,10,11]. Owing to vegetation phenology and land-use practices, these classes often exhibit overlapping spectral and textural characteristics in transition zones and heterogeneous, fragmented landscapes, leading to frequent confusion—particularly in agro-pastoral ecotones, forest-edge zones, and arid and semi-arid regions [12,13]. These challenges are further amplified in cross-border regions, where administrative boundaries and divergent management practices add further complexity. Differences in natural conditions, human activities, and governance systems across borders can produce discontinuities in land-cover patterns and class assignment, as reported by cross-regional and cross-administrative-unit studies [14]. In practice, cross-border discrepancies often manifest as unstable discrimination and ambiguous delineation between adjacent classes—especially cropland, forestland, and grassland—thereby introducing systematic biases when integrating multi-source land-cover products [15]. Moreover, existing land-cover mapping and accuracy assessment efforts typically emphasize region-wide metrics, whereas border areas and their transition zones are rarely the focus of analysis; under complex landscape structures, class assignment near national borders can therefore be locally less reliable [16]. Under such conditions, a single land-cover product is unlikely to capture surface characteristics accurately and consistently. Multi-source fusion is thus essential for leveraging complementary strengths across products, improving spatial continuity and cross-regional consistency, and ultimately generating land-cover maps with greater reliability and classification stability [17].

A variety of fusion strategies have been proposed to enhance accuracy and spatial consistency. Clinton et al. [18] developed decision-level fusion approaches that combine class assignments from different products using majority voting, weighted voting, or rule-based schemes. While these methods can improve overall consistency, they typically do not explicitly model class-specific confidence or regional heterogeneity and therefore perform poorly in structurally complex landscapes. More recently, Zhang et al. [19] and Wang et al. [20] introduced probabilistic frameworks that improve discrimination through pixel-level inference. However, such approaches often rely on prior assumptions and are sensitive to sample quality and regional variability, which limits their robustness in spatially dynamic landscapes and complex cross-border settings. Pérez-Hoyos et al. [21] employed Multi-Criteria Analysis (MCA) to integrate multiple indicators into a composite scoring framework, thereby improving decision stability, but this strategy emphasizes global criteria and can underrepresent local variability. Bethuel et al. [22] applied the Dempster–Shafer theory of evidence to handle inter-product discrepancies by representing uncertainty via basic probability assignment, thereby improving discrimination for key classes such as cropland and forestland; nonetheless, this approach remains sensitive to inter-dataset differences and prior specifications, which can limit generalizability in complex and multi-class scenarios. To reduce reliance on purely rule-based or prior-driven designs, Schepaschenko et al. [23] and Nabil et al. [24] incorporated auxiliary data and external statistics to enhance reliability and spatial realism. However, such information is often used as external constraints or for post hoc calibration rather than being tightly integrated into pixel-level fusion decisions, leaving limited capacity to identify and resolve confusion in transition zones and structurally complex areas. Overall, despite notable progress, existing fusion methods still struggle to achieve both strong local discrimination and robust cross-regional consistency in challenging scenarios dominated by confusion-prone classes (e.g., cropland, forestland, and grassland) and pronounced cross-border discrepancies [25,26].

To address these challenges, we propose a Conflict-aware Adaptive Distillation Fusion Network (CADF-Net) to enhance the discrimination and spatial consistency of cropland, forestland, and grassland in cross-border landscapes. The novelty of CADF-Net does not lie in introducing an entirely new standalone distillation or ensemble model, but in organizing conflict characterization, uncertainty-adaptive knowledge transfer, and confidence-guided decision fusion into a unified framework for resolving multi-source land-cover disagreement in cross-border regions. The main contributions are threefold. First, we construct a pixel-level Conflict Index (CI) to characterize discrepancies among multi-source land-cover products. Unlike generic weighting-based composite indicators, the proposed CI is specifically designed for land-cover fusion by jointly representing inter-product class disagreement, local neighborhood heterogeneity, and class-assignment uncertainty. Second, we develop an Adaptive Distillation U-Net (AD-UNet) in which knowledge transfer is conditioned by predictive uncertainty, enabling the student model to inherit stable supervision in consistent regions while preserving flexibility in conflict-prone areas. Third, we couple conflict information with confidence-guided decision fusion, so that model aggregation is performed in a spatially adaptive manner rather than through globally fixed weighting, thereby improving both local discrimination and overall spatial consistency. We validate CADF-Net in a cross-border study area spanning the Sanjiang Plain (China) and Primorskiy Kray (Russia), where natural settings are broadly continuous while land management regimes differ markedly. Using multi-source land-cover products for 2020, we conduct fusion experiments focusing on cropland, forestland, and grassland to systematically evaluate the proposed method’s performance and practical potential in complex cross-border scenarios. The results demonstrate that CADF-Net achieves improved accuracy and spatial consistency for cropland, forestland, and grassland compared with individual products and fusion baselines, offering a reliable foundation for transboundary ecological monitoring.

2. Materials and Methods

2.1. Study Area

This study selected the Sanjiang Plain (China) and Primorskiy Kray (Russia) as the study area (Figure 1), spanning the China–Russia border. The two adjacent regions are characterized by plains, hills, and alluvial lowlands. Under broadly continuous climatic and ecological conditions, they form a typical cropland–forestland–grassland mosaic zone in Northeast Asia [27]. Nevertheless, differences in land-management regimes, agricultural practices, and resource-use patterns between the two countries have produced pronounced cross-border contrasts in landscape structure and land-cover patterns.

The Sanjiang Plain, formed by alluvial deposition from the Heilongjiang (Amur), Songhua, and Ussuri rivers, is a major commodity grain production base in China and a representative wetland region [28]. In the plains, cropland occurs in large, contiguous tracts, where rice, soybean, and maize are the dominant crops [29]. Forest is mainly distributed in the surrounding hilly terrain, whereas wetlands and grasslands are interspersed among cropland patches and play important roles in maintaining regional biodiversity and regulating hydrological processes [30]. In recent years, driven by agricultural reclamation and climate change, wetland areas have declined and grasslands have become increasingly fragmented [31].

Primorskiy Kray, located in the Russian Far East, comprises mountains, hills, and coastal plains and receives relatively abundant precipitation. The region is characterized by high forest cover, primarily concentrated in the Sikhote-Alin Mountains and surrounding hills. Dominant forest types include coniferous forests and mixed broadleaf–conifer forests, making Primorskiy Kray an important forest-resource region in Northeast Asia [32]. Grasslands—mainly occurring as forest steppe or meadow steppe—are distributed in the Khanka Lake Basin and the middle reaches of the Suifen River [33]. By contrast, cropland occupies a relatively small area and is mainly concentrated in plains near the China–Russia border [34]. Overall, the strong cross-border contrast in cropland and forest distributions, together with the ecological importance of grasslands on both sides, makes this region well suited for evaluating the adaptability of the proposed fusion method in highly heterogeneous landscapes.

2.2. Datasets

2.2.1. Land-Cover Products

We systematically reviewed more than 70 currently available and relevant global and regional land-cover products (Table S1 in the Supplementary Materials), including comprehensive products that map overall land-cover types and thematic products targeting specific classes such as vegetation, cropland, and urban areas [35]. Considering spatial resolution, temporal coverage, and regional applicability, we shortlisted eight candidate products spanning multiple sensors, spatial resolutions, and production agencies to ensure representativeness and complementarity: (1) GLC-FCS30, released by the Aerospace Information Research Institute, Chinese Academy of Sciences [36]; (2) FROM-GLC, produced by Tsinghua University using Landsat imagery [37]; (3) ESA WorldCover, generated from Sentinel-1 and Sentinel-2 data by the European Space Agency (ESA) [38]; (4) ESA CCI-LC, released by ESA [39]; (5) CGLS-LC100, provided by the Copernicus Global Land Service (CGLS) [40]; (6) GlobeLand30, released by the National Geomatics Center of China [41]; (7) Esri Land Cover, produced by Esri using Sentinel-2 imagery [42]; and (8) the MODIS annual land-cover product (MCD12Q1) provided by the National Aeronautics and Space Administration (NASA) [43].

Among these candidates, MCD12Q1 provides multiple classification schemes to support diverse application contexts. Specifically, it includes the International Geosphere–Biosphere Programme (IGBP) scheme for global ecosystem characterization [44], as well as several schemes derived from the Food and Agriculture Organization of the United Nations (FAO) Land Cover Classification System (LCCS), including LCCS1, LCCS2, and LCCS3. These LCCS-based schemes differ in their emphasis on vegetation structure, land-cover functions and environmental conditions, and wetland/hydrological settings, respectively. In this study, we examined four schemes (IGBP, LCCS1, LCCS2, and LCCS3) to understand how differing classification logics influence class representation. Based on semantic clarity and suitability for cropland–forestland–grassland mosaics, we ultimately selected Esri Land Cover, GLC-FCS30, ESA CCI-LC, and the IGBP and LCCS2 layers from MCD12Q1 as the input land-cover products used in the fusion experiments.

2.2.2. Geo-Environmental Factors

To better represent geo-environmental heterogeneity and the ecological context of the study area, we incorporated five geo-environmental factors as auxiliary inputs (Table 1): digital elevation model (DEM), slope, land surface temperature (LST), normalized difference vegetation index (NDVI), and soil water content. Together, these variables capture key dimensions of terrain, vegetation condition, soil moisture, and the surface thermal environment, providing complementary environmental information for land-cover discrimination. Factors were selected to minimize information redundancy while maintaining ecological interpretability and remote-sensing relevance [45]. Specifically, DEM and slope describe topographic controls; soil water content characterizes moisture availability; NDVI represents vegetation greenness and cover; and LST reflects surface energy balance and evapotranspiration. Aspect was not included because its effects are largely mediated through illumination, humidity, and temperature [46], which are more directly represented by the selected variables.

The specific data sources and processing methods are as follows. The DEM was obtained from the 30 m NASADEM product, which was reprocessed from the original Shuttle Radar Topography Mission (SRTM) radar observations [47]. Slope was derived from the DEM to characterize local relief and was computed as:

S l o p e = \arctan (\sqrt{{(\frac{\partial z}{\partial x})}^{2} + {(\frac{\partial z}{\partial y})}^{2}})

(1)

where

\frac{\partial z}{\partial x}

and

\frac{\partial z}{\partial y}

denote the elevation gradients in the x and y directions, respectively. Soil water content was obtained from the OpenLandMap Soil Water product at 250 m resolution, which estimates soil moisture conditions based on field capacity (33 kPa) [48]. We used the 0 cm depth layer as a proxy for topsoil moisture. NDVI was derived from the U.S. Geological Survey (USGS) Landsat 8 Level-2 surface reflectance product at 30 m resolution [49]. To reduce seasonal phenological effects and improve stability in cross-border settings, Google Earth Engine (GEE) was used to compute the annual mean NDVI for 2020:

N D V I = \frac{N I R - R E D}{N I R + R E D}

(2)

where NIR and RED represent the near-infrared (Band 5) and red (Band 4) bands, respectively. LST was derived from the MODIS/Terra MOD11A2 v061 product at 1 km resolution [50], and the annual mean LST for 2020 was computed to represent the surface thermal environment.

2.2.3. Reference Dataset Construction

To obtain reliable reference data, we constructed a sample library characterized by spatial independence and interpretation consistency, serving as the basis for model training and evaluation datasets. Initially, we generated 3686 sample points based on a regular fishnet grid, followed by 2000 supplementary points generated via random sampling. Throughout this process, a strict spatial isolation strategy was enforced to ensure the two sets were spatially non-overlapping. A minimum geographic buffer of 300 m was applied to mitigate information leakage caused by spatial autocorrelation, thereby ensuring that subsequent evaluations genuinely reflect the model’s generalization capability. All sample points were visually interpreted based on 2020 high-resolution Google Earth imagery under unified interpretation rules. To enhance interpretation reliability, field survey records and unmanned aerial vehicle (UAV) imagery were utilized as auxiliary references, and representative UAV image examples are shown in Figure 2b. It should be noted that the UAV imagery and field observations served solely to assist visual judgment and improve accuracy; they were not used directly for the final assignment of sample labels. To further assess the reliability of the interpretation-based reference labels, we randomly selected 400 reference samples and re-interpreted them under the same interpretation protocol, and we calculated the agreement rate between the original and repeated interpretations.

Considering the class imbalance in the original raw samples, and to ensure fair comparison among models without bias toward majority classes, we merged the initial sample sets into a single pool. From this pool, we randomly selected 650 points per land-cover class, resulting in a final class-balanced reference dataset totaling 2600 points. Subsequently, adhering to the principle of class balance, these reference points were partitioned into training, validation, and testing subsets. Specifically, 1800 points were allocated for model training, 400 points for validation, and the remaining 400 points served as an independent test set for the final accuracy assessment. The spatial distribution of the 2600 reference samples is shown in Figure 2a. This rigorous sampling and partitioning protocol ensures the reliability of the reference data while enhancing the fairness and stability of the comparative experiments.

2.3. Data Preprocessing

To ensure consistency in spatial scale, projection, and semantic representation, all datasets used in the experiments were systematically preprocessed. All layers were reprojected to a customized Albers equal-area conic projection based on the World Geodetic System 1984 (WGS84) ellipsoid, with a central meridian of 135° E and standard parallels of 40° N and 55° N, chosen to match the study area. All layers were then resampled to a unified spatial resolution of 30 m to enable pixel-level alignment between the input land-cover products and the geo-environmental factors, thereby supporting multi-channel model inputs.

Given differences in native resolution across sources, we applied resampling strategies tailored to data type. For categorical land-cover layers, products with native pixel sizes finer than 30 m were aggregated to 30 m using majority resampling to reduce salt-and-pepper noise [51]. Products coarser than 30 m were resampled to 30 m using nearest-neighbor interpolation to preserve class codes [52]. Notably, resampling coarse products to 30 m was performed solely for spatial alignment and channel stacking, and does not increase the intrinsic spatial detail of the original information. Such upsampling may introduce block-like boundaries and scale-related artifacts. In this study, resampling all inputs to 30 m served not only to ensure spatial consistency, but also to enable pixel-wise stacking and joint learning within a unified patch-based framework. Importantly, this operation does not create new intrinsic fine-scale detail for coarse-resolution inputs; rather, it projects their low-frequency contextual information onto the common 30 m analysis grid. Accordingly, fine-scale delineation in the fused results is expected to rely mainly on the finer-resolution land-cover products and 30 m variables, whereas the coarser inputs provide broader environmental constraints.

To address inconsistencies among classification systems, we developed a unified class-harmonization scheme. We consolidated class definitions across the eight candidate land-cover products. For MCD12Q1, the LCCS1, LCCS2, and IGBP schemes were considered, whereas LCCS3 was excluded because it is primarily designed for wetland-oriented scenarios. Based on semantic consistency and a focus on cropland, forestland, and grassland, we derived mapping rules comprising four Level-1 categories and 23 subclasses (Table S2 in the Supplementary Materials). The selected input land-cover products were then reclassified into four target categories: cropland, forestland, grassland, and other.

In addition, geo-environmental factors were normalized to reduce the influence of differing value ranges and units across variables during model training [53] and to place all channels on a comparable numerical scale. Collectively, these steps produced spatially aligned inputs with unified semantics, providing a solid foundation for subsequent model development and evaluation.

2.4. Conflict Index Construction

Different land-cover products may assign different classes to the same pixel. Such discrepancies are not necessarily errors; they often reflect intrinsic land-surface complexity and class ambiguity in heterogeneous landscapes. We therefore define a pixel-level Conflict Index (CI) to quantify disagreement among the input land-cover products and to provide an explicit guidance signal for model learning and fusion decisions. The CI integrates three components: inter-product class disagreement, local neighborhood heterogeneity, and class-assignment uncertainty. First, we quantify the concentration of class assignments across products at pixel xi using the proportion of the most frequent class (mode frequency):

f_{m o d e} (x_{i}) = \frac{n_{m a x} (x_{i})}{N}

(3)

where n_max(x_i) is the number of occurrences of the most frequent class at pixel x_i, and N is the number of input land-cover products. Lower mode frequency indicates stronger divergence in class assignments; therefore, 1 − f_mode(x_i) represents inter-product class disagreement. Second, local neighborhood heterogeneity around x_i is characterized by the concentration of class labels within a 3 × 3 neighborhood after pooling labels from all input products:

L (x_{i}) = \frac{n_{m a x} (S (x_{i}))}{| S (x_{i}) |}

(4)

where (S(x_i)) denotes the multiset of valid class labels collected from the 3 × 3 neighborhood of pixel x_i across all input land-cover products,

| S (x_{i}) |

is the number of valid labels in the pooled set, and n_max(S(x_i)) is the count of the most frequent label. Smaller L(x_i) indicates a more mixed neighborhood, a manifestation of stronger local heterogeneity; thus, 1 − L(x_i) is used to represent local spatial inconsistency. Third, we use entropy to characterize uncertainty in the class distribution across products:

H (x_{i}) = - \sum_{k = 1}^{K} p_{k} (x_{i}) \log p_{k} (x_{i})

(5)

where K is the number of classes and p_k(x_i) is the frequency with which pixel x_i is labeled as class k across products. Larger entropy indicates higher uncertainty. In the absence of prior knowledge regarding the relative importance of the three components, the CI was computed as the unweighted sum of these three terms [54]:

CI (x_{i}) = (1 - f_{m o d e} (x_{i})) + (1 - L (x_{i})) + H (x_{i})

(6)

A higher CI indicates more pronounced disagreement among the input land-cover products, highlighting conflict-prone areas that are difficult to classify reliably. Compared with single-metric approaches (e.g., using only consistency or entropy) [55], the proposed CI jointly captures inter-product class disagreement, local neighborhood heterogeneity, and class-assignment uncertainty. Rather than serving as a generic composite score, it is designed here as a task-specific pixel-level conflict indicator for multi-source land-cover fusion and further used as an explicit spatial guidance signal in CADF-Net.

2.5. Adaptive Distillation U-Net

2.5.1. Network Architecture and Input Design

To enhance the discriminative performance of the model in high-conflict regions, this study constructs an Adaptive Distillation U-Net (AD-UNet). We draw inspiration from knowledge distillation, enabling a student model to learn from the “soft targets” of a teacher, thereby capturing richer inter-class relationships than those provided by hard labels alone [56]. This soft supervision mechanism not only preserves the teacher model’s stable and smooth decision boundaries but also effectively improves the student’s discriminative ability for complex samples. In multi-source land-cover fusion, the rigid categorization of hard labels often fails to characterize the transitional relationships between classes in high-conflict regions, potentially amplifying classification errors. In contrast, soft labels generated via knowledge distillation better reflect the classification difficulty in these regions, guiding the student model to learn more plausible class-transition patterns. Consequently, we introduced this framework to leverage the stable representation of the teacher alongside the student’s sensitivity to high-conflict areas, enabling pixel-level adaptive decision-making.

The overall training framework employs a unified U-Net encoder–decoder architecture for both the teacher and student models, given that land-cover recognition is highly sensitive to spatial structures and class boundaries [57]. U-Net features symmetric downsampling and upsampling paths connected by skip connections, which effectively fuse local details with global semantic features; this architecture has been widely validated for its stability and efficiency in pixel-level remote sensing classification [58]. Adopting a unified network structure helps maintain a consistent feature space during distillation, ensuring the stable transfer of soft supervision information and avoiding feature misalignment caused by architectural discrepancies.

At the model input level, we constructed a multi-feature joint input layer comprising 26 channels, integrated into the AD-UNet backbone architecture illustrated in Figure 3. Specifically, the five input land-cover products were individually encoded using a unified four-class one-hot scheme; each product contributes 4 channels, totaling 20 channels that provide class indicators and spatial priors. Five geo-environmental factor channels were included to characterize the topographic and environmental context, assisting the model in understanding the spatial distribution patterns of different land-cover types. Furthermore, a single CI channel was introduced as explicit guidance, enabling the model to distinguish between high-conflict regions and areas with consistent classification.

The encoder section consists of four convolutional blocks with channel depths of 64, 128, 256, and 512, respectively. Each block contains two 3 × 3 convolutional layers, each followed by a Batch Normalization layer, and a ReLU activation function. Dropout (rate = 0.3) was introduced to mitigate overfitting. Stepwise downsampling is achieved via 2 × 2 max pooling to extract spatial semantic features from local to global scales. The bottleneck layer utilizes a 1024-channel convolutional block to integrate multi-scale contextual information and enhance semantic understanding of complex features. In the decoding stage, upsampling is performed via transposed convolution, and features are concatenated with corresponding encoder levels via skip connections to fuse shallow spatial details with deep semantic information. The output layer employs a 1 × 1 convolution to map features to four class channels, followed by a Softmax function to obtain the pixel-level class probability distribution. To adapt to large-scale remote sensing data training, input features and labels were constructed using a patch-based approach. We used a sliding window of 256 × 256 pixels with a stride of 128 pixels to ensure sufficient regional coverage and reduce spatial redundancy between samples.

2.5.2. Two-Stage Training and Adaptive Distillation Strategy

This study adopts a two-stage training strategy, as illustrated in the overall knowledge distillation framework in Figure 4. Stage I involves the supervised pre-training of the teacher model, utilizing high-quality, manually interpreted training and validation samples. To enhance the stability of pixel-level supervision and mitigate potential bias arising from uneven pixel counts across classes, a Weighted Cross-Entropy (WCE) loss function is employed for optimization, defined as follows:

L_{W C E} = - \sum_{c = 1}^{C} w_{c} y_{c} \log (p_{c})

(7)

where C is the number of classes, y_c is the c-th component of the one-hot label (y_c = 1 if the pixel belongs to class c, and 0 otherwise), and p_c is the predicted probability for class c. The class weight w_c was computed by normalizing the inverse class frequency in the training set:

w_{c} = \frac{1 / f_{c}}{\sum_{k = 1}^{C} (1 / f_{k})}

(8)

where f_c denotes the proportion of pixels belonging to class c in the training set. This normalization ensures

\sum_{c = 1}^{C} w_{c} = 1

, assigning larger weights to rare classes and smaller weights to common classes. The objective of Stage I was to obtain a high-accuracy teacher that provides reliable probabilistic supervision for subsequent distillation.

Stage II involves the training of the student model based on adaptive distillation. In this stage, we employed a joint supervision strategy that incorporates soft labels generated by the teacher model for knowledge transfer, while simultaneously integrating hard-label constraints from the training data. This approach aims to enhance model generalization while ensuring discriminative accuracy in critical regions. First, knowledge distillation is performed using soft labels output by the teacher model. These soft labels encode inter-class similarities and spatial transition information, facilitating more robust feature representation in scenarios characterized by blurred boundaries or strong landscape heterogeneity [59,60]. We set the temperature coefficient to T = 2 during distillation to soften the probability distribution, enabling the student model to learn the probabilistic structure of non-maximal classes rather than merely mimicking the final classification outcome.

Second, to prevent potential semantic drift caused by relying solely on distillation supervision, we further introduced a hard-label cross-entropy term as an auxiliary constraint. Although input land-cover products exhibit discrepancies in local regions, hard labels provide an explicit categorical semantic reference. By setting the weighting coefficient λ = 0.2, we incorporated hard labels to correct the student model’s class semantics while preserving the rich information provided by soft labels, thereby stabilizing decision boundaries and enhancing the reliability of the training process. Conventional knowledge distillation typically transfers class probabilities to the student model uniformly, making it difficult to distinguish differences in predictive reliability across regions [61]. To address this, we quantified the teacher’s predictive uncertainty using information entropy and constructed adaptive distillation weights based on this metric. The entropy Hi is defined as:

H_{i} = - \sum_{c = 1}^{C} p_{i, c}^{τ} \log (p_{i, c}^{τ} + ϵ)

(9)

where

p_{i, c}^{τ}

denotes the teacher model’s predicted probability softened by temperature T, and ϵ is a small constant to prevent numerical instability. Based on this, the pixel-level distillation weight w_i is constructed as:

w_{i} = 1 - \frac{H_{i}}{\log C}

(10)

This mechanism enhances knowledge transfer in high-confidence regions while appropriately reducing weights in high-uncertainty areas to avoid noise propagation. Finally, the total loss consists of the weighted Kullback–Leibler (KL) divergence distillation term and the hard-label cross-entropy term (L_CE):

L_{t o t a l} = L_{K D} + λ L_{C E}

(11)

L_{K D} = \frac{1}{| S |} \sum_{i \in S} w_{i} \cdot T^{2} \cdot K L (q_{i}^{τ} ∥ p_{i}^{τ})

(12)

where

p_{i}^{τ}

and

q_{i}^{τ}

represent the softened output distributions of the teacher and student models at temperature T, respectively; S is the set of valid pixels; and λ is the balancing coefficient for the cross-entropy term. Through this joint supervision mechanism, the student model not only inherits distributed knowledge and spatial transition information from the teacher but also stabilizes class semantics via hard-label constraints, resulting in more robust discriminative capability and reliable boundary representation in complex regions.

2.6. Confidence-Guided Dynamically Weighted Ensemble

To fully exploit the complementary strengths of the teacher and student models under varying surface conditions, we adopted a confidence-based decision ensemble strategy. The objective of introducing the student model and distillation training is not to replace the teacher, but to enhance robustness in high-conflict and transition zones. Since the teacher model generally exhibits superior consistency and reliability in regions with stable class distributions, a differentiated decision strategy is required during inference. This avoids the introduction of unnecessary switching and noise that would arise from applying a uniform fusion rule globally. First, during inference, the teacher and student models independently output pixel-wise probability distributions, denoted as

p_{i}^{T}

and

p_{i}^{S}

. we quantify their prediction confidence using the maximum Softmax probabilities, denoted as

k_{i}^{T}

and

k_{i}^{S}

, respectively. Based on these confidence measures, a dynamic weight α_i can be defined as:

α_{i} = \frac{k_{i}^{T}}{k_{i}^{T} + k_{i}^{S} + ϵ}

(13)

where

k_{i}^{T}

and

k_{i}^{S}

represent the maximum Softmax probabilities of the teacher and student models, and ϵ is a small constant (e.g., 10⁻⁸) to avoid division by zero.

Notably, we do not apply a continuous weighted average across all pixels using this weight. Instead, we combine confidence information with spatial heterogeneity constraints, utilizing the CI to spatially gate the decision ensemble process. This minimizes needless model switching in low-conflict regions and suppresses spatial noise. We adaptively determine a high-conflict threshold based on the CI distribution; in our experiments, pixels with the top 20% CI values were categorized as high-conflict areas, a quantile that demonstrated stability in preliminary tests. The decision logic is structured as follows. For pixels where both models agree, the agreed class is directly adopted to ensure output stability. For pixels where predictions disagree but the conflict level is low, the teacher’s prediction is prioritized to leverage its reliability in stable regions, thereby avoiding unnecessary noise.

However, for pixels within high-conflict areas where predictions disagree, we further introduce a reliability calibration mechanism. This calibration mechanism maps model confidence to actual accuracy based on empirical statistics from the validation set. Specifically, valid pixels in the validation set are divided into intervals based on prediction confidence. Within each interval, we calculate the empirical accuracy (the proportion of correct predictions). Considering that confidence interpretability may vary by class, we compile these statistics into a 2D lookup table (“Class–Confidence Interval”). Thus, for any given pixel, its calibrated reliability can be retrieved based on its predicted class and confidence score. Based on this, we apply a difference-based decision rule with a teacher prior. The student’s output is selected only when its reliability explicitly exceeds that of the teacher:

{\hat{y}}_{i} = \{\begin{matrix} {\hat{c}}_{i}^{S}, & (r_{i}^{S} - r_{i}^{T}) > Δ \\ {\hat{c}}_{i}^{T}, & o t h e r w i s e \end{matrix}

(14)

where

r_{i}^{T}

and

r_{i}^{S}

denote the calibrated reliability of the teacher and student at pixel i, respectively, and Δ is a difference threshold (set to 0.02). This threshold suppresses frequent switching when the reliability difference is marginal, thereby reducing spatial noise. Through this decision ensemble strategy, the teacher model maintains overall stability in low-conflict or stable regions, while the student model is conditionally introduced in high-conflict areas under strict reliability constraints. This ensures that the discriminative improvements gained from distillation are concentrated in complex transition zones, enhancing local rationality while preserving spatial consistency.

2.7. Baseline Methods

To comprehensively evaluate the effectiveness and applicability of CADF-Net, we compared it with five representative baseline methods spanning rule-based decision fusion, evidence-based probabilistic reasoning, conventional machine learning, and deep learning. All comparative experiments used identical preprocessing procedures and the same training and validation samples to ensure fair and directly comparable results.

Majority Voting was employed as a rule-based fusion strategy that aggregates class labels from multiple inputs and assigns the most frequent class as the final output [62]. It requires no training and is computationally efficient, serving as a straightforward baseline for fusion.

The Dempster–Shafer theory of evidence (D–S) represents uncertainty and discrepancies among sources using belief and plausibility measures, quantifying disagreement via a conflict coefficient [63]. In this study, classification outputs were converted to Basic Probability Assignments (BPAs) and fused using the D–S combination rule; the final class was determined by the maximum-BPA criterion. This method serves as a classical evidence-combination baseline for probabilistic fusion.

Random Forest (RF) was used as a representative ensemble classifier that aggregates predictions from multiple decision trees [64]. We employed 500 trees with the Gini index as the split criterion. Inputs were 20-dimensional one-hot features derived from the five input land-cover products, and outputs corresponded to the four target classes. This baseline evaluates the performance of a conventional machine-learning approach.

For the deep-learning comparison, we trained a Standard U-Net to assess the performance of a conventional semantic segmentation architecture. The input comprised only the 20-channel one-hot encodings of the five land-cover products, excluding geo-environmental factors and the CI. The model was trained using weighted cross-entropy, and its core architecture matched that of the AD-UNet.

To isolate the contribution of geo-environmental information, we also evaluated a U-Net with geo-environmental factors (U-Net (Geo)). We extended the Standard U-Net by adding five geo-environmental factors as inputs, resulting in a 25-channel input. This baseline evaluates the extent to which geo-environmental context improves discrimination. Note that a baseline simply concatenating the CI was excluded, as the CI in our framework functions as a specific guidance signal coupled with the distillation and ensemble mechanisms. We further included DeepLabV3+ as an additional recent deep learning baseline and trained it under the same data split and evaluation protocol to strengthen the comparison with modern semantic segmentation approaches.

2.8. Accuracy Assessment and Spatial Evaluation

To rigorously evaluate the performance and applicability of CADF-Net, we conducted a comprehensive statistical analysis using an independent test set (N = 400), covering both overall performance and class-level discrimination. Specifically, we computed Overall Accuracy (OA), Kappa coefficient (Kappa), and Mean Intersection over Union (mIoU) to assess overall performance. For class-level discrimination (targeting cropland, forestland, and grassland), Producer Accuracy (PA), User Accuracy (UA), F1 Score (F1), and class-wise Intersection over Union (IoU) were calculated. PA and UA quantify errors of omission and commission, respectively; F1 represents the harmonic mean of PA and UA; and IoU measures the spatial overlap for a given class. The specific calculation formulas for these metrics are provided in Table 2.

In these formulas, C denotes the number of classes, N is the total number of test samples, p_ii is the number of correctly classified samples for class i, and p_ij and p_ji denote the number of samples of class i misclassified as class j and samples of class j misclassified as class i, respectively. For Kappa, p_o represents the observed agreement rate, which is equivalent to the OA, and p_e denotes the expected agreement rate by chance.

To statistically assess the preservation of spatial patterns in the fusion results, we evaluated spatial consistency from two perspectives: pixel-wise agreement and distribution similarity. Pixel-wise agreement was computed as the proportion of pixels assigned to the same class between CADF-LC and each input land-cover product, characterizing the spatial correspondence among datasets [65]. Distribution similarity was assessed by comparing the class-wise pixel-frequency distributions between CADF-LC and each input product, explicitly quantifying the consistency in overall class composition [66].

To provide a visual summary of the methodology described above, the overall workflow of this study is illustrated in Figure 5.

3. Results

3.1. Accuracy and Spatial Consistency

Using the independent test set, we compared CADF-LC with the five input land-cover products (ESA CCI-LC, GLC-FCS30, MCD12Q1-IGBP, MCD12Q1-LCCS2, and Esri Land Cover) for quantitative accuracy. Table 3 summarizes these results. To further assess the reliability of the interpretation-based reference labels, we randomly selected 400 reference samples and re-interpreted them under the same interpretation protocol. We found that the overall agreement rate between the original and repeated interpretations was 95.25%, indicating high stability of the reference labels. This additional check provides quantitative evidence that interpretation uncertainty in the reference labels appears limited under the adopted interpretation protocol. Overall, CADF-LC achieved the best performance, with OA = 0.8600, Kappa = 0.8133, and mIoU = 0.7589, surpassing all input products. Compared with the strongest inputs (Esri Land Cover and GLC-FCS30), CADF-LC improved OA by 0.0150 and 0.1050, and mIoU by 0.0222 and 0.1570, respectively. Notably, CADF-LC is the only product achieving a Kappa coefficient exceeding 0.8, indicating superior classification reliability compared to individual inputs. Larger gains were observed relative to the other products: mIoU improved by 0.2502 compared to ESA CCI-LC, and by over 0.34 relative to both MCD12Q1 products, demonstrating the fusion strategy effectively enhances overall classification consistency.

At the class level, CADF-LC showed clear advantages for confusion-prone categories. Regarding cropland, CADF-LC yielded a PA and UA of 0.8400 (IoU = 0.7241, F1 = 0.8400). Compared to Esri Land Cover, although CADF-LC produced a slightly lower PA, it achieved a higher UA, indicating the fused product reduces commission errors and enhances prediction purity. In forestland regions, CADF-LC maintained high accuracy (PA = 0.8900, UA = 0.9674, IoU = 0.8641, F1 = 0.9271), comparable to Esri Land Cover. This suggests no significant performance loss in this highly separable category while maintaining exceptional user accuracy. Grassland is typically a weak point across input products, yet CADF-LC exhibited substantial improvement. With a PA of 0.7900, IoU of 0.6475, and F1 of 0.7861, CADF-LC outperformed Esri Land Cover, ESA CCI-LC, and GLC-FCS30. This effectively mitigates identification deficiency and boundary instability common in grassland mapping.

In summary, CADF-LC outperformed all input products across overall metrics and achieved more balanced performance at the class level. It significantly enhanced discriminative capability for grassland while preserving high accuracy for forestland. Although cropland recall declined slightly, the improvements in prediction purity and overlap metrics highlight the fusion method’s advantages in reducing class confusion and improving spatial consistency and reliability.

Table 4 presents the pixel consistency and distribution similarity between CADF-LC and the input products. Distribution similarity remained consistently high across all products (0.9897–0.9996), with Esri Land Cover showing the highest alignment. This indicates that CADF-LC closely preserves the overall class composition at the regional scale without significant deviation. In contrast, pixel consistency exhibited pronounced variations (0.8334–0.9533), reflecting different levels of spatial agreement. Specifically, consistency was highest with Esri Land Cover (0.9533) but dropped below 0.84 for the MCD12Q1 products. GLC-FCS30 and ESA CCI-LC fell within this range, recording intermediate pixel consistency values of 0.8946 and 0.8628, respectively. Given that distribution similarity is consistently close to 1, these variations primarily reflect discrepancies in spatial allocation, highlighting the impact of local boundary delineation on spatial alignment.

3.2. Comparison with Baseline Methods

Figure 6, Figure 7 and Figure 8 systematically present the accuracy assessment results of various fusion methods on the independent test set. Regarding overall metrics (Figure 6), CADF-Net demonstrated significant performance superiority. Its OA, Kappa, and mIoU reached 0.8600, 0.8133, and 0.7589, respectively, ranking first among all compared methods. DeepLabV3+ yielded OA = 0.8574, Kappa = 0.8099, and mIoU = 0.7548, indicating that it is a highly competitive segmentation baseline. Specifically, compared to the Standard U-Net, CADF-Net achieved improvements of 1.25%, 1.66%, and 1.84% across the three metrics. Even when compared to U-Net (Geo), which incorporates auxiliary geo-environmental factors, CADF-Net maintained stable gains ranging from 0.75% to 1.22%. Compared with DeepLabV3+, CADF-Net still maintained absolute gains of 0.0026 in OA, 0.0034 in Kappa, and 0.0041 in mIoU. The comparison between the Standard U-Net and U-Net (Geo) also provides a partial empirical indication that the aligned multi-resolution geo-environmental variables contribute useful contextual support after resampling, rather than merely introducing disruptive scale artifacts. Taken together, these results suggest that the advantage of CADF-Net cannot be attributed merely to a standard segmentation architecture or simple feature stacking, but is related to its conflict-aware guidance, adaptive distillation strategy, and confidence-guided ensemble design. In contrast, while RF exhibited reliable classification performance (OA = 0.8450), the Majority Voting and D–S methods yielded significantly lower metrics (OA = 0.6900 and 0.6250, respectively). The performance gap was particularly pronounced in terms of mIoU, where the D–S method plummeted to 0.4235, indicating severe fragmentation in the classification results. This suggests that traditional strategies relying solely on pixel-level rule synthesis or evidence combination struggle to overcome accuracy bottlenecks in scenarios characterized by complex surface landscapes and significant discrepancies among input products.

To further clarify the error characteristics of CADF-LC on the independent test set, we summarized the class-wise prediction outcomes in terms of correct predictions, false positives (FP), and false negatives (FN). Among the 400 test samples, CADF-LC correctly identified 84 cropland, 89 forestland, 79 grassland, and 92 other samples. The corresponding FP/FN counts were 16/16 for cropland, 3/11 for forestland, 22/21 for grassland, and 15/8 for other. These results indicate that the remaining errors are concentrated mainly in grassland and cropland, whereas forestland and other remain relatively stable. In particular, grassland exhibits the largest FP and FN counts, confirming that it is still the most challenging category, which is consistent with the class-specific metrics reported above.

The distribution of PA and UA at the category level (Figure 7) further elucidates the trade-offs between errors of omission and commission across different methods. For cropland, most methods exhibited a “high PA, low UA” pattern, indicating a tendency to prioritize recall at the expense of precision. This trend is also mirrored in the forestland category, where majority voting and D–S suffered from severe commission errors compared to the deep learning methods. In contrast, CADF-Net achieved a balanced high performance in both PA and UA (both 0.8400), effectively suppressing commission errors while ensuring extraction integrity. In the most challenging grassland category, discrepancies between methods were sharply amplified. The D–S method displayed an extreme “high UA (1.0000), low PA (0.0400)” pattern, revealing an overly conservative decision logic that resulted in severe omission of grassland pixels. Conversely, CADF-Net achieved the most balanced accuracy for grassland (PA = 0.7900, UA = 0.7822), demonstrating strong robustness in handling easily confused classes. DeepLabV3+ remained highly competitive at the class level and slightly surpassed CADF-Net on a few individual indicators, such as forestland PA and grassland IoU/F1. However, CADF-Net provided the strongest overall performance and a more stable trade-off across categories, particularly in terms of OA, Kappa, mIoU, cropland discrimination, and overall class balance.

Combined with the heatmap analysis of IoU and F1 (Figure 6), it is evident that CADF-Net’s performance gains are primarily driven by the precise identification of difficult-to-classify categories. Although RF achieved a marginal lead in the highly separable forestland category (IoU = 0.8725), suggesting that classification accuracy for this category is approaching saturation, in cropland, grassland, and other—categories characterized by stronger spatial heterogeneity—CADF-Net achieved IoU of 0.7241, 0.6475, and 0.8000, respectively. These scores were consistently at the optimal or near-optimal levels among the compared methods. This indicates that CADF-Net’s mIoU advantage does not stem from an anomaly in a single category, but rather from a comprehensive enhancement in identifying complex and error-prone categories. In summary, the results from Figure 4, Figure 5 and Figure 6 consistently demonstrate that CADF-Net not only leads in overall statistical metrics but also exhibits superiority as a fusion model in terms of class balance and adaptability to challenging samples.

3.3. Spatial Pattern Analysis

Figure 9 illustrates the spatial distribution of the CADF-LC and the corresponding CI. At the regional scale, CADF-LC shows strong visual coherence, with reduced local noise while preserving the overall landscape structure. It clearly delineates the contiguous cropland matrix in the Sanjiang Plain (China) and the broad forest tracts in Primorskiy Kray (Russia), yielding an interpretable cross-border land-cover configuration and indicating effective reconciliation of multi-source discrepancies. Notably, the scattered grassland patches are distinct from the surrounding cropland, effectively avoiding the over-merging often seen in these spectrally similar classes.

At finer scales, CADF-LC provides continuous delineations in transition-rich environments. Complex boundaries—particularly along the river corridors in the lowland Sanjiang Plain on the Chinese side and the hilly terrain surrounding Lake Khanka in western Primorskiy Kray on the Russian side—are represented with smooth yet realistic curvature, mitigating the fragmentation commonly observed in single-source products. In these areas, CADF-LC retains transitional textures rather than producing abrupt edges, suggesting that spatial context is effectively leveraged to regularize pixel-level predictions. Meanwhile, the map does not exhibit excessive over-smoothing, as narrow linear features and small but plausible patch assemblages remain visible in ecotonal zones. Such morphological fidelity ensures that the fused product reflects the genuine spatial complexity of the terrain rather than distortions resulting from resolution mismatches among source data.

The CI map further indicates that inter-product disagreements are spatially clustered rather than random. High-conflict pixels form aggregated belts that are mainly concentrated in agro-pastoral ecotones and some near-border segments, which are likely associated with mixed land-cover composition and ambiguous boundaries. Conversely, low CI values dominate extensive homogeneous regions, implying stronger inter-product consensus. Together, these patterns suggest that CADF-LC maintains stable mapping in consistent landscapes while providing improved behavior in high-conflict transition zones. To complement the visual interpretation of the CI map, we further quantified conflict contributions from both class and product perspectives. The results are summarized in Table 5. From the class perspective, grassland shows the highest mean CI, followed by Other, whereas forestland exhibits the lowest conflict level. From the product perspective, disagreement was measured as the mismatch rate between each input product and the consensus mode label. The resulting disagreement rates were 14.45% for MCD12Q1-IGBP, 21.80% for ESA CCI-LC, 13.88% for Esri Land Cover, 13.57% for GLC-FCS30, and 26.13% for MCD12Q1-LCCS2. These results show that conflict contributions are not evenly distributed across products, with MCD12Q1-LCCS2 and ESA CCI-LC showing the highest disagreement rates. These quantitative results provide additional support that the proposed conflict modeling strategy is both explicitly defined and empirically grounded.

To further evaluate the performance of CADF-LC in representing fine-grained details and complex transitions, we selected eight representative sub-regions for local-scale comparative analysis (Figure 10), including (a) a river valley agricultural zone, (b) a cropland–forestland interlaced area, (c) an urban and peri-urban area, (d) a near-border segment, (e) lake-shore surroundings, (f) a plain transition zone, (g) an area with distinct fluvial features, and (h) a cross-border transition zone. In river valley agricultural zones and areas with distinct fluvial features (Figure 10a,g), input products often homogenize riparian structures. Particularly in narrow channel scenarios, the MCD12Q1 products fail to preserve river features, leading to their near-complete omission. In contrast, CADF-LC accurately reconstructs the continuous river trajectory, aligning closely with reference imagery. While GLC-FCS30 and Esri Land Cover capture channel alignment, they are often prone to boundary fragmentation, whereas CADF-LC strikes a balance by delineating riparian belts with superior continuity. A similar advantage is observed in cropland–forestland interlaced areas and plain transition zones (Figure 10b,f). Where MCD12Q1 and ESA CCI-LC are overly smoothed and GLC-FCS30 lacks boundary coherence, CADF-LC maintains structural integrity without sacrificing local detail. In anthropogenic and aquatic environments, the proposed method demonstrates significant improvements. Specifically, in urban and peri-urban regions (Figure 10c), GLC-FCS30 tends to display fragmented, fragmented pixel patterns—often referred to as the “salt-and-pepper” effect—while MCD12Q1 presents blocky structures inconsistent with the actual urban layout. CADF-LC effectively mitigates these artifacts, yielding urban boundaries that are spatially continuous. This robustness extends to lake surroundings (Figure 10e), where CADF-LC reduces the unstable pixel patterns observed in other products, providing precise shoreline positioning. Furthermore, in near-border segments and cross-border transition zones (Figure 10d,h), where inconsistent mapping standards frequently create sharp discontinuities across administrative boundaries, CADF-LC improves spatial consistency, producing seamless transitions. Synthesizing this analysis, discrepancies among datasets primarily manifest in patch scale and boundary regularity. CADF-LC effectively bridges the extremes of excessive smoothing and morphological fragmentation, ensuring coherent spatial connectivity—particularly in complex transition zones.

3.4. Evaluation of Different Conflict Index Settings

To further verify the role of the Conflict Index (CI), we compared the model performance under five CI settings: Base (No CI), C1 only, C2 only, C3 only, and Full CI (CADF-Net). Here, C1 denotes inter-product class disagreement, C2 denotes local neighborhood heterogeneity, and C3 denotes class-assignment uncertainty, consistent with the CI definition in Section 2.4. As shown in Table 6, removing the CI channel led to the lowest performance, whereas the full CI achieved the best overall results (OA = 0.8600, Kappa = 0.8133, and mIoU = 0.7589), outperforming the Base setting and all three single-component variants. Among the individual components, C2 yielded the strongest single-component contribution, while C1 and C3 provided only modest gains. These results indicate that the three CI components are complementary rather than redundant, and that integrating them yields the optimal CI scheme.

4. Discussion

4.1. Methodological Effectiveness and Mechanism Analysis

Using the Sanjiang Plain (China) and Primorskiy Kray (Russia) as a representative cross-border region, we developed and evaluated CADF-Net for multi-source land-cover fusion. Compared with individual input products and representative fusion baselines, CADF-LC achieved the strongest overall performance, with particularly stable behavior in spatially heterogeneous landscapes. These improvements stem from the coordinated interaction of conflict-aware modeling, adaptive distillation, decision-level weighting, and environmental constraints. Importantly, the contribution of CADF-Net is not a simple combination of existing distillation and ensemble concepts, but a conflict-centered integration strategy in which discrepancy characterization, uncertainty-aware knowledge transfer, and spatially gated decision fusion are explicitly linked within one unified framework. This clearer positioning is further supported by the additional comparison with DeepLabV3+, under which CADF-Net still achieved the highest OA, Kappa, and mIoU, indicating that explicitly modeling inter-product conflict provides benefits beyond generic multi-scale feature extraction alone.

Relative to conventional rule-based approaches such as majority voting and Dempster–Shafer (D–S) evidence combination, CADF-Net does not rely on pixel-wise label aggregation or static prior specifications. Instead, it learns cross-product consistencies and discrepancies directly from multi-channel inputs. Rule-based and evidence-combination strategies typically treat pixels independently and make limited use of spatial semantic dependencies [67], which can amplify fragmented errors in transition-rich and heterogeneous landscapes. In contrast, CADF-Net leverages convolutional feature extraction to jointly model land-cover characteristics and spatial context, preserving within-class continuity and enhancing classification stability in complex regions.

A key contribution of CADF-Net is the explicit representation of inter-product disagreement through the Conflict Index (CI). Unlike global probabilistic inference methods that rely on predefined priors or transition matrices [68], the CI integrates mode frequency, neighborhood heterogeneity, and entropy into a unified spatial indicator of conflict intensity. This mechanism relaxes the implicit assumption of uniform learning difficulty across pixels [69] by distinguishing stable regions from discrepancy-prone areas. Consequently, the network allocates greater modeling capacity to hard-to-classify locations while maintaining coherent global spatial patterns.

The adaptive distillation mechanism further strengthens conflict-aware learning. In high-consistency regions, the teacher model provides stable probabilistic supervision, encouraging smooth and reliable decision boundaries. In contrast, entropy-derived pixel-wise distillation weights attenuate the influence of low-confidence teacher predictions in high-uncertainty areas, allowing the student model to capture fine-grained local patterns without inheriting systematic bias. This spatially differentiated transfer strategy is particularly suitable for multi-source fusion tasks characterized by structured disagreement, outperforming conventional distillation schemes that apply uniform knowledge transfer across the scene.

In addition, the confidence-guided dynamically weighted ensemble enhances robustness at the decision level. Rather than applying static or globally fixed weights, the proposed strategy adaptively assigns fusion weights according to spatially varying confidence and conflict intensity. In stable regions, high-confidence predictions dominate the final decision, reinforcing reliable spatial structures. In high-conflict areas, the ensemble mechanism prevents over-reliance on any single source, thereby reducing error propagation and mitigating abrupt boundary artifacts. This spatially adaptive aggregation complements feature-level modeling and distillation, ensuring that final classifications reflect both local reliability and cross-source consensus.

Finally, the integration of geo-environmental factors enriches the feature space and constrains class decisions to maintain ecological plausibility. According to ecological niche theory, vegetation distribution is closely linked to topography and temperature and moisture conditions—for example, steep slopes generally limit cropland expansion, whereas high soil moisture often is often associated with grassland formation in low-lying areas [70]. By incorporating such environmental context, CADF-Net reinforces class separability when spectral cues are ambiguous, thereby improving the stability and reliability of land-cover discrimination in heterogeneous landscapes.

Overall, the effectiveness of CADF-Net arises not from a single component, but from the coordinated interaction of conflict characterization, adaptive knowledge transfer, spatially differentiated decision aggregation, and ecological constraints. This integrated mechanism enables the framework to achieve both strong local discrimination and robust cross-regional spatial consistency in complex cross-border environments. In the current manuscript, interpretability is discussed mainly at the mechanism and spatial-response levels: the CI explicitly identifies conflict-prone areas, the confidence-guided ensemble provides a traceable decision logic for stable versus high-conflict regions, and the local comparisons in Figure 9 and Figure 10 reveal how the framework behaves in representative transition zones.

4.2. Spatial Consistency and Complementary Effects

The quantitative results indicate that the advantage of CADF-LC is not limited to a marginal improvement in overall accuracy, but also lies in how the fusion result reorganizes spatial information under multi-source constraints. As shown in Table 3, CADF-LC achieved the highest OA, Kappa, and mIoU among all input products, indicating that the proposed framework improves both global classification performance and class-level balance. In particular, the improvement in grassland is noteworthy, because this class is typically more vulnerable to omission, fragmentation, and boundary instability in heterogeneous landscapes. By contrast, forestland remained comparatively stable, suggesting that the fusion process preserved the strong separability of this class while improving the discrimination of more confusion-prone categories. This pattern is consistent with previous studies showing that grassland and transitional vegetation classes are often more difficult to map reliably than structurally dominant classes such as forestland [71].

At the regional scale, Table 4 shows that CADF-LC maintained very high distribution similarity with all input products, while pixel-wise agreement varied substantially across datasets. This combination suggests that the fused product largely preserves the overall class composition of the study area, but does not simply replicate the local spatial allocation of any single input product. Instead, CADF-LC can be understood as a spatial reorganization of multi-source information, in which regional-scale composition is retained while local boundaries and patch configurations are adjusted. Similar patterns have been reported in previous land-cover comparison studies, where high consistency in overall composition may coexist with considerable differences in local boundary placement and patch morphology [72].

The spatial results in Figure 9 and Figure 10 further help explain why these statistical improvements occur. Figure 9 shows that high-CI areas are concentrated mainly in ecotonal and boundary-rich landscapes, indicating that inter-product disagreement is spatially structured rather than random. In these areas, the local comparisons in Figure 10 suggest that CADF-LC performs better at balancing two competing tendencies commonly observed in input products: over-smoothing in coarse or conservative products, and excessive fragmentation in finer but less stable products [73]. This is especially evident in river valleys, urban fringes, lakeshore environments, and cross-border transition zones, where CADF-LC preserves key boundary structures while maintaining greater spatial continuity. Therefore, the practical significance of CADF-LC lies not only in its higher numerical accuracy, but also in its ability to produce a more coherent and ecologically plausible spatial pattern in complex cross-border landscapes.

4.3. Limitations and Future Work

Although CADF-Net achieved improved accuracy and spatial consistency in the study region, several limitations should be acknowledged. First, the present study evaluated CADF-Net only in one representative cross-border region located in a mid-to-high-latitude monsoon environment. Therefore, the current results should be interpreted as evidence of the framework’s effectiveness in this specific ecological setting rather than definitive proof of universal applicability. In lower-latitude regions, as well as in southern or western border areas with different vegetation phenology, cropping systems, background reflectance, and class-confusion mechanisms, the transferability and robustness of the framework still require further validation. Second, to reconcile inter-source differences in class definitions, we adopted a unified classification scheme centered on cropland, forestland, and grassland. While such harmonization is necessary for consistent fusion and facilitates learning shared representations across products, it inherently reduces thematic granularity and may constrain downstream applications that require sub-class information. Third, model performance depends on the quality and bias characteristics of the input land-cover products. Although this additional label-consistency assessment indicated high stability of the reference labels, the labels are still derived primarily from visual interpretation rather than fully independent ground-truth observations, and some residual interpretation uncertainty may remain. However, our additional re-interpretation results suggest that this uncertainty is limited in magnitude and is unlikely to affect the overall conclusions of the comparative evaluation. In future work, we will further improve reference-label reliability by incorporating additional independent expert checks and richer ground-reference support where available. Systematic biases shared by multiple sources may influence CI estimation and, in turn, affect fusion decisions. In addition, we used equal weights for the three CI components (inter-product disagreement, local neighborhood heterogeneity, and uncertainty) to avoid introducing subjective assumptions; however, their relative contributions may vary across ecological regions and landscape configurations. Future work could incorporate independent validation samples or region-specific accuracy feedback to learn component weights in a data-driven manner and enable zonal adaptive optimization, thereby improving the robustness and transferability of conflict characterization.

Future research could extend CADF-Net in at least three directions. (1) Additional auxiliary data sources (e.g., LiDAR, hyperspectral imagery, or socio-economic variables) could be integrated to enrich feature support for challenging classes and transition landscapes. (2) The framework could be generalized to long-term fusion to produce spatiotemporally consistent datasets that better support land-cover change analysis. (3) Hierarchical fusion strategies could be explored to reduce information loss from class harmonization, allowing the model to recover or infer sub-classes (e.g., evergreen vs. deciduous forestland) where high-confidence evidence is available, thereby yielding fused land-cover datasets with finer thematic detail. (4) Cross-regional validation should be conducted in ecologically contrasting border areas, particularly low-latitude regions and southern/western border zones, as well as arid and semi-arid environments, to further assess the robustness and transferability of CADF-Net.

5. Conclusions

This study proposed a Conflict-aware Adaptive Distillation Fusion Network (CADF-Net) to mitigate class-assignment biases and boundary inconsistencies for key vegetation classes when integrating multi-source land-cover products in cross-border environments. Using the Sanjiang Plain (China) and Primorskiy Kray (Russia) as a representative case, the proposed framework was systematically evaluated against multiple input products and baseline fusion approaches. The results demonstrate that CADF-Net consistently improves both overall and class-level accuracy, achieving the highest OA (0.8600), Kappa (0.8133), and mIoU (0.7589) among all compared methods. Compared with the strongest input product, Esri Land Cover, CADF-LC improved OA by 0.0150 and mIoU by 0.0222, while also yielding more coherent spatial patterns in transition-rich environments. Beyond statistical gains, the fused product (CADF-LC) exhibits enhanced spatial coherence, reduced fragmentation, and more stable boundary delineation in heterogeneous landscapes, particularly in cropland–forestland–grassland ecotones and cross-border transition zones. These improvements are driven by the coordinated integration of geo-environmental constraints and three complementary mechanisms. The Conflict Index (CI) provides an interpretable pixel-wise representation of inter-product disagreement, guiding the model to focus on discrepancy-prone areas. The adaptive distillation strategy preserves reliable knowledge in stable regions while mitigating error propagation in high-conflict pixels. The confidence-guided dynamically weighted ensemble further enhances robustness and spatial consistency during the final decision stage. Nevertheless, the performance of the framework remains influenced by the quality and compatibility of input land-cover products. In addition, harmonizing heterogeneous classification legends into a unified scheme inevitably reduces thematic granularity. Overall, CADF-Net provides a practical and robust solution for multi-source land-cover fusion of key vegetation classes in complex cross-border regions, offering improved spatial consistency and classification stability to support transboundary ecological monitoring and land management applications. Future work will focus on cross-regional validation in ecologically contrasting border areas, particularly low-latitude and arid/semi-arid regions, while also incorporating multi-temporal features, additional environmental and socio-economic variables, and hierarchical fusion strategies to further improve robustness, transferability, and thematic detail.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/rs18091294/s1, Table S1: Summary of global and supranational land-cover datasets reviewed in this study; Table S2: Class mapping rules for harmonizing land-cover legends.

Author Contributions

Y.Z.: Methodology, Software, Formal analysis, Funding acquisition, Project administration; L.F.: Writing—original draft, Software, Formal analysis, Validation, Data curation; Z.L.: Writing—review and editing, Supervision; Y.Y.: Writing—review and editing, Investigation; H.C.: Conceptualization, Data curation, Formal analysis; S.Z.: Writing—review and editing, Conceptualization. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Science and Technology Basic Resources Investigation Program of China (Grant No. 2022FY101901-2).

Data Availability Statement

The data generated and analyzed in this study are available free of charge upon request from the corresponding author via email.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Aplin, P. Remote sensing: Land cover. Prog. Phys. Geogr. 2004, 28, 283–293. [Google Scholar] [CrossRef]
Liu, L.; Zhang, X.; Gao, Y.; Chen, X.; Shuai, X.; Mi, J. Finer-resolution mapping of global land cover: Recent developments, consistency analysis, and prospects. J. Remote Sens. 2021, 2021, 5289697. [Google Scholar] [CrossRef]
Troy, A.; Wilson, M.A. Mapping ecosystem services: Practical challenges and opportunities in linking GIS and value transfer. Ecol. Econ. 2006, 60, 435–449. [Google Scholar] [CrossRef]
Feng, M.; Bai, Y. A global land cover map produced through integrating multi-source datasets. Big Earth Data 2019, 3, 191–219. [Google Scholar] [CrossRef]
Fritz, S.; You, L.; Bun, A.; See, L.; McCallum, I.; Schill, C.; Perger, C.; Liu, J.; Hansen, M.; Obersteiner, M. Cropland for sub-Saharan Africa: A synergistic approach using five land cover data sets. Geophys. Res. Lett. 2011, 38, L04404. [Google Scholar] [CrossRef]
Ramankutty, N.; Evan, A.T.; Monfreda, C.; Foley, J.A. Farming the planet: 1. Geographic distribution of global agricultural lands in the year 2000. Global. Biogeochem. Cycles 2008, 22, GB1003. [Google Scholar] [CrossRef]
Xu, X.; Li, D.; Liu, H.; Zhao, G.; Cui, B.; Yi, Y.; Yang, W.; Du, J. Comparative validation and misclassification diagnosis of 30-meter land cover datasets in China. Remote Sens. 2024, 16, 4330. [Google Scholar] [CrossRef]
Foley, J.A.; DeFries, R.; Asner, G.P.; Barford, C.; Bonan, G.; Carpenter, S.R.; Chapin, F.S.; Coe, M.T.; Daily, G.C.; Gibbs, H.K.; et al. Global consequences of land use. Science 2005, 309, 570–574. [Google Scholar] [CrossRef]
Peng, H.; Zhang, X.; Ren, W.; He, J. Spatial pattern and driving factors of cropland ecosystem services in a major grain-producing region: A production-living-ecology perspective. Ecol. Indic. 2023, 155, 111024. [Google Scholar] [CrossRef]
Pergola, M.; De Falco, E.; Cerrato, M. Grassland ecosystem services: Their economic evaluation through a systematic review. Land 2024, 13, 1143. [Google Scholar] [CrossRef]
Psistaki, K.; Tsantopoulos, G.; Paschalidou, A.K. An overview of the role of forests in climate change mitigation. Sustainability 2024, 16, 6089. [Google Scholar] [CrossRef]
Senf, C.; Leitão, P.J.; Pflugmacher, D.; van der Linden, S.; Hostert, P. Mapping land cover in complex Mediterranean landscapes using Landsat: Improved classification accuracies from integrating multi-seasonal and synthetic imagery. Remote Sens. Environ. 2015, 156, 527–536. [Google Scholar] [CrossRef]
Song, J.; Hu, S.; Sun, Z.; Wang, Y.; Liang, X.; Yang, Z.; Liao, Z. Assessing spatiotemporal dynamics of poplar plantation in Northern China’s farming-pastoral ecotone (1989–2022). Forests 2025, 16, 1502. [Google Scholar] [CrossRef]
Kuemmerle, T.; Radeloff, V.C.; Perzanowski, K.; Hostert, P. Cross-border comparison of land cover and landscape pattern in Eastern Europe using a hybrid classification technique. Remote Sens. Environ. 2006, 103, 449–464. [Google Scholar] [CrossRef]
Piquer-Rodríguez, M.; Gasparri, N.I.; Zarbá, L.; Aráoz, E.; Grau, H.R. Land systems’ asymmetries across transnational ecoregions in South America. Sustain. Sci. 2021, 16, 1519–1538. [Google Scholar] [CrossRef]
Tsendbazar, N.; de Bruin, S.; Fritz, S.; Herold, M. Spatial accuracy assessment and integration of global land cover datasets. Remote Sens. 2015, 7, 15804–15821. [Google Scholar] [CrossRef]
Tsendbazar, N.; Herold, M.; Li, L.; Tarko, A.; de Bruin, S.; Masiliunas, D.; Lesiv, M.; Fritz, S.; Buchhorn, M.; Smets, B.; et al. Towards operational validation of annual global land cover maps. Remote Sens. Environ. 2021, 266, 112686. [Google Scholar] [CrossRef]
Clinton, N.; Yu, L.; Gong, P. Geographic stacking: Decision fusion to increase global land cover map accuracy. ISPRS J. Photogramm. Remote Sens. 2015, 103, 57–65. [Google Scholar] [CrossRef]
Zhang, W.; Wang, J.; Lin, H.; Cong, M.; Wan, Y.; Zhang, J. Fusing multiple land cover products based on locally estimated map-reference cover type transition probabilities. Remote Sens. 2023, 15, 481. [Google Scholar] [CrossRef]
Wang, H.; Hu, Y.; Feng, Z. Fusion and analysis of land use/cover datasets based on Bayesian-fuzzy probability prediction: A case study of the Indochina Peninsula. Remote Sens. 2022, 14, 5786. [Google Scholar] [CrossRef]
Pérez-Hoyos, A.; Udías, A.; Rembold, F. Integrating multiple land cover maps through a multi-criteria analysis to improve agricultural monitoring in Africa. Int. J. Appl. Earth Obs. Geoinf. 2020, 88, 102064. [Google Scholar] [CrossRef]
Bethuel, C.; Arvor, D.; Corpetti, T.; Hélie, J.; Descals, A.; Gaveau, D.; Chéron-Bessou, C.; Gignoux, J.; Corgne, S. Applying the Dempster–Shafer fusion theory to combine independent land-use maps: A case study on the mapping of oil palm plantations in Sumatra, Indonesia. Remote Sens. 2025, 17, 234. [Google Scholar] [CrossRef]
Schepaschenko, D.; See, L.; Lesiv, M.; McCallum, I.; Fritz, S.; Salk, C.; Moltchanova, E.; Perger, C.; Shchepashchenko, M.; Shvidenko, A.; et al. Development of a global hybrid forest mask through the synergy of remote sensing, crowdsourcing and FAO statistics. Remote Sens. Environ. 2015, 162, 208–220. [Google Scholar] [CrossRef]
Nabil, M.; Zhang, M.; Wu, B.; Bofana, J.; Elnashar, A. Constructing a 30m African cropland layer for 2016 by integrating multiple remote sensing, crowdsourced, and auxiliary datasets. Big Earth Data 2022, 6, 54–76. [Google Scholar] [CrossRef]
Herold, M.; See, L.; Tsendbazar, N.; Fritz, S. Towards an integrated global land cover monitoring and mapping system. Remote Sens. 2016, 8, 1036. [Google Scholar] [CrossRef]
Tuanmu, M.; Jetz, W. A global 1-km consensus land-cover product for biodiversity and ecosystem modelling. Global. Ecol. Biogeogr. 2014, 23, 1031–1045. [Google Scholar] [CrossRef]
Jia, S.; Yang, Y. Spatiotemporal characteristics and driving factors of land-cover change in the Heilongjiang (Amur) River Basin. Remote Sens. 2023, 15, 3730. [Google Scholar] [CrossRef]
Zou, Y.; Duan, X.; Xue, Z.; E, M.; Sun, M.; Lu, X.; Jiang, M.; Yu, X. Water use conflict between wetland and agriculture. J. Environ. Manag. 2018, 224, 140–146. [Google Scholar] [CrossRef]
Li, M.; Zhang, R.; Luo, H.; Gu, S.; Qin, Z. Crop mapping in the Sanjiang Plain using an improved object-oriented method based on Google Earth Engine and combined growth period attributes. Remote Sens. 2022, 14, 273. [Google Scholar] [CrossRef]
Wang, H.; Song, C.; Song, K. Regional ecological risk assessment of wetlands in the Sanjiang Plain with respect to human disturbance. Sustainability 2020, 12, 1974. [Google Scholar] [CrossRef]
Jin, S.; Liu, X.; Yang, J.; Lv, J.; Gu, Y.; Yan, J.; Yuan, R.; Shi, Y. Spatial-temporal changes of land use/cover change and habitat quality in Sanjiang plain from 1985 to 2017. Front. Environ. Sci. 2022, 10, 1032584. [Google Scholar] [CrossRef]
Hu, Y.; Hu, Y. Detecting forest disturbance and recovery in Primorskiy Kray, Russia, using annual Landsat time series and multi-source land cover products. Remote Sens. 2020, 12, 129. [Google Scholar] [CrossRef]
Marchuk, E.A.; Kvitchenko, A.K.; Kameneva, L.A.; Yuferova, A.A.; Kislov, D.E. East Asian forest-steppe outpost in the Khanka Lowland (Russia) and its conservation. J. Plant Res. 2024, 137, 997–1018. [Google Scholar] [CrossRef] [PubMed]
Dubrovin, K.; Stepanov, A.; Verkhoturov, A. Cropland mapping using Sentinel-1 data in the southern part of the Russian Far East. Sensors 2023, 23, 7902. [Google Scholar] [CrossRef]
García-Álvarez, D.; Camacho Olmedo, M.T.; Paegelow, M.; Mas, J.F. (Eds.) Land Use Cover Datasets and Validation Tools: Validation Practices with QGIS; Springer: Cham, Switzerland, 2022. [Google Scholar] [CrossRef]
Zhang, X.; Zhao, T.; Xu, H.; Liu, W.; Wang, J.; Chen, X.; Liu, L. GLC_FCS30D: The first global 30 m land-cover dynamics monitoring product with a fine classification system for the period from 1985 to 2022 generated using dense-time-series Landsat imagery and the continuous change-detection method. Earth Syst. Sci. Data 2024, 16, 1353–1381. [Google Scholar] [CrossRef]
Gong, P.; Wang, J.; Yu, L.; Zhao, Y.; Zhao, Y.; Liang, L.; Niu, Z.; Huang, X.; Fu, H.; Liu, S.; et al. Finer resolution observation and monitoring of global land cover: First mapping results with Landsat TM and ETM+ data. Int. J. Remote Sens. 2013, 34, 2607–2654. [Google Scholar] [CrossRef]
Zanaga, D.; Van De Kerchove, R.; Xu, P.; Tsendbazar, N.; Lesiv, M. WorldCover Product User Manual V2.0; ESA WorldCover Project: Mol, Belgium, 2022. [Google Scholar]
Defourny, P.; Lamarche, C.; Bontemps, S. Land Cover CCI Product User Guide Version 2.0; ESA Land Cover CCI: Oxfordshire, UK, 2017. [Google Scholar]
Tsendbazar, N.E.; Tarko, A.; Li, L.; Herold, M.; Lesiv, M.; Fritz, S.; Maus, V. Copernicus Global Land Service: Land Cover 100m: Version 3 Globe 2015–2019: Validation Report; Zenodo: Geneva, Switzerland, 2020. [Google Scholar] [CrossRef]
Chen, J.; Chen, J.; Liao, A.; Cao, X.; Chen, L.; Chen, X.; He, C.; Han, G.; Peng, S.; Lu, M.; et al. Global land cover mapping at 30 m resolution: A POK-based operational approach. ISPRS J. Photogramm. Remote Sens. 2015, 103, 7–27. [Google Scholar] [CrossRef]
Karra, K.; Kontgis, C.; Statman-Weil, Z.; Mazzariello, J.C.; Mathis, M.; Brumby, S.P. Global land use/land cover with Sentinel 2 and deep learning. In Proceedings of the IGARSS 2021—2021 IEEE International Geoscience and Remote Sensing Symposium, Brussels, Belgium, 11–16 July 2021. [Google Scholar] [CrossRef]
Sulla-Menashe, D.; Friedl, M.A. User Guide to Collection 6 MODIS Land Cover (MCD12Q1 and MCD12C1) Product; Boston University: Boston, MA, USA, 2018. [Google Scholar] [CrossRef]
Sulla-Menashe, D.; Gray, J.M.; Abercrombie, S.P.; Friedl, M.A. Hierarchical mapping of annual global land cover 2001 to present: The MODIS Collection 6 Land Cover product. Remote Sens. Environ. 2019, 222, 183–194. [Google Scholar] [CrossRef]
Rocchini, D.; Andreo, V.; Förster, M.; Garzon-Lopez, C.X.; Gutierrez, A.P.; Gillespie, T.W.; Hauffe, H.C.; He, K.S.; Kleinschmit, B.; Mairota, P.; et al. Potential of remote sensing to predict species invasions: A modelling perspective. Prog. Phys. Geogr. 2015, 39, 283–309. [Google Scholar] [CrossRef]
Yin, G.; Xie, J.; Ma, D.; Xie, Q.; Verger, A.; Descals, A.; Filella, I.; Peñuelas, J. Aspect matters: Unraveling microclimate impacts on mountain greenness and greening. Geophys. Res. Lett. 2023, 50, e2023GL105879. [Google Scholar] [CrossRef]
Fan, Y.; Ke, C.Q.; Zhou, X.; Shen, X.; Yu, X.; Lhakpa, D. Glacier mass-balance estimates over High Mountain Asia from 2000 to 2021 based on ICESat-2 and NASADEM. J. Glaciol. 2023, 69, 500–512. [Google Scholar] [CrossRef]
Zhang, Y.; Schaap, M.G.; Wei, Z. Development of hierarchical ensemble model and estimates of soil water retention with global coverage. Geophys. Res. Lett. 2020, 47, e2020GL088819. [Google Scholar] [CrossRef]
Crawford, C.J.; Roy, D.P.; Arab, S.; Barnes, C.; Vermote, E.; Hulley, G.; Gerace, A.; Choate, M.; Engebretson, C.; Micijevic, E.; et al. The 50-year Landsat collection 2 archive. Sci. Remote Sens. 2023, 8, 100103. [Google Scholar] [CrossRef]
Duan, S.B.; Li, Z.L.; Li, H.; Göttsche, F.M.; Wu, H.; Zhao, W.; Leng, P.; Zhang, X.; Coll, C. Validation of Collection 6 MODIS land surface temperature product using in situ measurements. Remote Sens. Environ. 2019, 225, 16–29. [Google Scholar] [CrossRef]
Johnson, J.M.; Clarke, K.C. An area preserving method for improved categorical raster resampling. Cartogr. Geogr. Inf. Sci. 2021, 48, 281–295. [Google Scholar] [CrossRef]
Ali, K.; Johnson, B.A. Land-use and land-cover classification in semi-arid areas from medium-resolution remote-sensing imagery: A deep learning approach. Sensors 2022, 22, 8750. [Google Scholar] [CrossRef]
Singh, D.; Singh, B. Investigating the impact of data normalization on classification performance. Appl. Soft Comput. 2020, 97, 105524. [Google Scholar] [CrossRef]
Ferla, G.; Mura, B.; Falasco, S.; Caputo, P.; Matarazzo, A. Multi-criteria decision analysis (MCDA) for sustainability assessment in food sector: A systematic literature review on methods, indicators and tools. Sci. Total Environ. 2024, 946, 174235. [Google Scholar] [CrossRef]
Liu, K.; Xu, E. Fusion and correction of multi-source land cover products based on spatial detection and uncertainty reasoning methods in Central Asia. Remote Sens. 2021, 13, 244. [Google Scholar] [CrossRef]
Wang, L.; Yoon, K.J. Knowledge distillation and student-teacher learning for visual intelligence: A review and new outlooks. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 44, 3048–3068. [Google Scholar] [CrossRef]
Li, X.; Shao, G. Object-based land-cover mapping with high resolution aerial photography at a county scale in Midwestern USA. Remote Sens. 2014, 6, 11372–11390. [Google Scholar] [CrossRef]
Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional networks for biomedical image segmentation. In Proceedings of the 18th International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI 2015), Munich, Germany, 5–9 October 2015. [Google Scholar] [CrossRef]
Hinton, G.; Vinyals, O.; Dean, J. Distilling the knowledge in a neural network. arXiv 2015, arXiv:1503.02531. [Google Scholar] [CrossRef]
Müller, R.; Kornblith, S.; Hinton, G. When does label smoothing help? In Proceedings of the Advances in Neural Information Processing Systems 32 (NeurIPS 2019), Vancouver, BC, Canada, 8–14 December 2019. [Google Scholar] [CrossRef]
Gou, J.; Yu, B.; Maybank, S.J.; Tao, D. Knowledge distillation: A survey. Int. J. Comput. Vis. 2021, 129, 1789–1819. [Google Scholar] [CrossRef]
Waske, B.; Benediktsson, J.A. Fusion of support vector machines for classification of multisensor data. IEEE Trans. Geosci. Remote Sens. 2007, 45, 3858–3866. [Google Scholar] [CrossRef]
Zhao, K.; Li, L.; Chen, Z.; Sun, R.; Yuan, G.; Li, J. A survey: Optimization and applications of evidence fusion algorithm based on Dempster–Shafer theory. Appl. Soft Comput. 2022, 124, 109075. [Google Scholar] [CrossRef]
Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Herold, M.; Mayaux, P.; Woodcock, C.E.; Baccini, A.; Schmullius, C. Some challenges in global land cover mapping: An assessment of agreement and accuracy in existing 1 km datasets. Remote Sens. Environ. 2008, 112, 2538–2556. [Google Scholar] [CrossRef]
Cui, P.; Chen, T.; Li, Y.; Liu, K.; Zhang, D.; Song, C. Comparison and assessment of different land cover datasets on the cropland in Northeast China. Remote Sens. 2023, 15, 5134. [Google Scholar] [CrossRef]
Blaschke, T.; Hay, G.J.; Kelly, M.; Lang, S.; Hofmann, P.; Addink, E.; Feitosa, R.Q.; van der Meer, F.; van der Werff, H.; van Coillie, F.; et al. Geographic object-based image analysis–Towards a new paradigm. ISPRS J. Photogramm. Remote Sens. 2014, 87, 180–191. [Google Scholar] [CrossRef]
Pontius, G.R.; Malanson, J. Comparison of the structure and accuracy of two land change models. Int. J. Geogr. Inf. Sci. 2005, 19, 243–265. [Google Scholar] [CrossRef]
Kemker, R.; Salvaggio, C.; Kanan, C. Algorithms for semantic segmentation of multispectral remote sensing imagery using deep learning. ISPRS J. Photogramm. Remote Sens. 2018, 145, 60–77. [Google Scholar] [CrossRef]
Zou, L.; Tian, F.; Schaepman-Strub, G.; Liang, T.; Fensholt, R.; He, T. Topographic effects on vegetation greening and area expansion in global alpine zones under climate change. Int. J. Appl. Earth Obs. Geoinf. 2025, 142, 104727. [Google Scholar] [CrossRef]
Aryal, K.; Apan, A.; Maraseni, T. Comparing global and local land cover maps for ecosystem management in the Himalayas. Remote Sens. Appl. Soc. Environ. 2023, 30, 100952. [Google Scholar] [CrossRef]
Wang, Y.; Xu, Y.; Xu, X.; Jiang, X.; Mo, Y.; Cui, H.; Zhu, S.; Wu, H. Evaluation of six global high-resolution global land cover products over China. Int. J. Digit. Earth 2024, 17, 2301673. [Google Scholar] [CrossRef]
Opravil, Š.; Baumann, M.; Goga, T.; Afzali, H.; Kuemmerle, T.; Pazúr, R. Consensus land-cover mapping improves grassland classification in European mountain landscapes. Sci. Rep. 2026, 16, 8077. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Location and topographic setting of the study area: (a) Geographic location; (b) Digital elevation model (DEM).

Figure 2. Reference data used for model training and evaluation. (a) spatial distribution of the 2600 reference samples; (b) representative UAV image examples.

Figure 3. The detailed architecture of the U-Net backbone.

Figure 4. The two-stage training and adaptive knowledge distillation framework.

Figure 5. Overall workflow of the proposed CADF-Net framework.

Figure 6. Comparison of overall accuracy metrics (OA, Kappa, and mIoU) among different methods.

Figure 7. Comparison of Producer Accuracy (PA) and User Accuracy (UA) across different land cover classes.

Figure 8. Category-level performance comparison of different methods based on IoU and F1 scores.

Figure 9. Spatial results of the proposed fusion method. (a) The spatial distribution of the fused land-cover product (CADF-LC). (b) The spatial distribution of the Conflict Index (CI).

Figure 10. Local-scale comparison of land-cover products across the representative sub-regions; (a) a river valley agricultural zone, (b) a cropland–forestland interlaced area, (c) an urban and peri-urban area, (d) a near-border segment, (e) lake-shore surroundings, (f) a plain transition zone, (g) an area with distinct fluvial features, and (h) a cross-border transition zone.

Table 1. Sources of geo-environmental factors.

Data	Resolution	Data Source	Release Date
NASADEM	30 m	https://doi.org/10.5067/MEASURES/NASADEM/NASADEM_HGT.001	2020
NASADEM	30 m	accessed on 10 May 2025	2020
OpenLandMap	250 m	https://zenodo.org/records/2784001	2019
OpenLandMap	250 m	accessed on 16 June 2025	2019
MODIS LST	1 km	https://doi.org/10.5067/MODIS/MOD11A2.061	2021
MODIS LST	1 km	accessed on 20 March 2025	2021
Landsat 8 C2	30 m	https://earthexplorer.usgs.gov	2020
Landsat 8 C2	30 m	accessed on 15 July 2025	2020

Table 2. Accuracy metrics and corresponding formulas.

Metric Name	Formula
Overall Accuracy (OA)	$O A = \frac{\sum_{i = 1}^{n} p_{i i}}{N}$
Kappa Coefficient (Kappa)	$K a p p a = \frac{p_{o} - p_{e}}{1 - p_{e}}$
Mean Intersection over Union (mIoU)	$m I o U = \frac{1}{n} \sum_{i = 1}^{n} \frac{p_{i i}}{\sum_{j = 1}^{n} p_{i j} + \sum_{j = 1}^{n} p_{j i} - p_{i i}}$
Producer Accuracy (PA)	$P A = \frac{p_{i i}}{\sum_{j = 1}^{n} p_{i j}}$
User Accuracy (UA)	$U A = \frac{p_{i i}}{\sum_{j = 1}^{n} p_{j i}}$
F1 Score (F1)	$F 1 = 2 \times \frac{P A \times U A}{P A + U A}$
Intersection over Union (IoU)	$I o U_{i} = \frac{p_{i i}}{p_{i i} + p_{i j} + p_{j i}}$

Table 3. Accuracy assessment results of CADF-LC and the input land-cover products.

Dataset	OA	Kappa	mIoU	Class	PA	UA	IoU	F1
ESA CCI-LC	0.6875	0.5833	0.5087	Cropland	0.9100	0.5833	0.5515	0.7109
				Forestland	0.9400	0.6812	0.6528	0.7899
				Grassland	0.2900	0.8056	0.2710	0.4265
				Other	0.6100	0.8714	0.5596	0.7176
GLC-FCS30	0.7550	0.6733	0.6019	Cropland	0.9300	0.6327	0.6039	0.7530
				Forestland	0.9300	0.8455	0.7949	0.8857
				Grassland	0.3700	0.8043	0.3394	0.5068
				Other	0.7900	0.8144	0.6695	0.8020
MCD12Q1-IGBP	0.5950	0.4600	0.4086	Cropland	0.8100	0.6639	0.5745	0.7297
				Forestland	0.9500	0.4822	0.4703	0.6397
				Grassland	0.1700	0.4857	0.1441	0.2519
				Other	0.4500	0.9783	0.4455	0.6164
MCD12Q1-LCCS2	0.6000	0.4667	0.4155	Cropland	0.8100	0.6694	0.5786	0.7330
				Forestland	0.9400	0.4821	0.4677	0.6373
				Grassland	0.1700	0.5000	0.1453	0.2537
				Other	0.4800	0.9600	0.4706	0.6400
Esri Land Cover	0.8450	0.7933	0.7367	Cropland	0.8700	0.7909	0.7073	0.8286
				Forestland	0.9000	0.9574	0.8654	0.9278
				Grassland	0.7500	0.7813	0.6198	0.7653
				Other	0.8600	0.8600	0.7544	0.8600
CADF-LC	0.8600	0.8133	0.7589	Cropland	0.8400	0.8400	0.7241	0.8400
				Forestland	0.8900	0.9674	0.8641	0.9271
				Grassland	0.7900	0.7822	0.6475	0.7861
				Other	0.9200	0.8598	0.8000	0.8889

Table 4. Pixel consistency and distribution similarity of CADF-LC with input products.

Dataset	Pixel Agreement	Distribution Similarity
ESA CCI-LC	0.8628	0.9897
GLC-FCS30	0.8946	0.9942
MCD12Q1-IGBP	0.8334	0.9899
MCD12Q1-LCCS2	0.8335	0.9899
Esri Land Cover	0.9533	0.9996

Table 5. Quantitative analysis of conflict contributions by land-cover class and input product.

Panel A. Land-Cover Class	Mean CI
Cropland	0.2067
Forestland	0.0862
Grassland	0.6938
Other	0.3752
Panel B. Input Product	Disagreement Rate with Mode Label (%)
MCD12Q1-IGBP	14.45
ESA CCI-LC	21.8
Esri Land Cover	13.88
GLC-FCS30	13.57
MCD12Q1-LCCS2	26.13

Table 6. Performance comparison under different Conflict Index (CI) settings.

Method	OA	Kappa	mIoU	Class	PA	UA	IoU	F1
Base (No CI)	0.8373	0.7831	0.7249	Cropland	0.8600	0.7765	0.6894	0.8161
				Forestland	0.8945	0.9519	0.8558	0.9223
				Grassland	0.7794	0.7913	0.6466	0.7854
				Other	0.8152	0.8429	0.7077	0.8288
C1 only	0.8379	0.7839	0.7264	Cropland	0.8200	0.7961	0.6777	0.8079
				Forestland	0.8945	0.9443	0.8496	0.9187
				Grassland	0.7920	0.7614	0.6345	0.7764
				Other	0.8456	0.8608	0.7439	0.8531
C2 only	0.8448	0.7931	0.7366	Cropland	0.8300	0.8384	0.7155	0.8342
				Forestland	0.8894	0.9516	0.8510	0.9195
				Grassland	0.7845	0.7560	0.6260	0.7700
				Other	0.8759	0.8439	0.7538	0.8596
C3 only	0.8398	0.7864	0.728	Cropland	0.8400	0.7962	0.6914	0.8175
				Forestland	0.8844	0.9462	0.8421	0.9143
				Grassland	0.7594	0.7891	0.6312	0.7739
				Other	0.8759	0.8357	0.7473	0.8554
Full CI (CADF-Net)	0.8600	0.8133	0.7589	Cropland	0.8400	0.8400	0.7241	0.8400
				Forestland	0.8900	0.9674	0.8641	0.9271
				Grassland	0.7900	0.7822	0.6475	0.7861
				Other	0.9200	0.8598	0.8000	0.8889

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhang, Y.; Fu, L.; Li, Z.; Yang, Y.; Chen, H.; Zhang, S. CADF-Net: A Conflict-Aware Adaptive Distillation Network for Fusing Multi-Source Land-Cover Products for Key Vegetation Classes in Cross-Border Regions. Remote Sens. 2026, 18, 1294. https://doi.org/10.3390/rs18091294

AMA Style

Zhang Y, Fu L, Li Z, Yang Y, Chen H, Zhang S. CADF-Net: A Conflict-Aware Adaptive Distillation Network for Fusing Multi-Source Land-Cover Products for Key Vegetation Classes in Cross-Border Regions. Remote Sensing. 2026; 18(9):1294. https://doi.org/10.3390/rs18091294

Chicago/Turabian Style

Zhang, Yubo, Long Fu, Zehong Li, Yuanyuan Yang, Hongbing Chen, and Shuwen Zhang. 2026. "CADF-Net: A Conflict-Aware Adaptive Distillation Network for Fusing Multi-Source Land-Cover Products for Key Vegetation Classes in Cross-Border Regions" Remote Sensing 18, no. 9: 1294. https://doi.org/10.3390/rs18091294

APA Style

Zhang, Y., Fu, L., Li, Z., Yang, Y., Chen, H., & Zhang, S. (2026). CADF-Net: A Conflict-Aware Adaptive Distillation Network for Fusing Multi-Source Land-Cover Products for Key Vegetation Classes in Cross-Border Regions. Remote Sensing, 18(9), 1294. https://doi.org/10.3390/rs18091294

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

CADF-Net: A Conflict-Aware Adaptive Distillation Network for Fusing Multi-Source Land-Cover Products for Key Vegetation Classes in Cross-Border Regions

Highlights

Abstract

1. Introduction

2. Materials and Methods

2.1. Study Area

2.2. Datasets

2.2.1. Land-Cover Products

2.2.2. Geo-Environmental Factors

2.2.3. Reference Dataset Construction

2.3. Data Preprocessing

2.4. Conflict Index Construction

2.5. Adaptive Distillation U-Net

2.5.1. Network Architecture and Input Design

2.5.2. Two-Stage Training and Adaptive Distillation Strategy

2.6. Confidence-Guided Dynamically Weighted Ensemble

2.7. Baseline Methods

2.8. Accuracy Assessment and Spatial Evaluation

3. Results

3.1. Accuracy and Spatial Consistency

3.2. Comparison with Baseline Methods

3.3. Spatial Pattern Analysis

3.4. Evaluation of Different Conflict Index Settings

4. Discussion

4.1. Methodological Effectiveness and Mechanism Analysis

4.2. Spatial Consistency and Complementary Effects

4.3. Limitations and Future Work

5. Conclusions

Supplementary Materials

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI