Next Article in Journal
Refining European Crop Mapping Classification Through the Integration of Permanent Crops: A Case Study in Rapidly Transitioning Irrigated Landscapes Induced by Dam Construction
Previous Article in Journal
A Machine Learning-Based Quality Control Algorithm for Heavy Rainfall Using Multi-Source Data
 
 
Due to scheduled maintenance work on our servers, there may be short service disruptions on this website between 11:00 and 12:00 CEST on March 28th.
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Semi-Supervised Black-Soil Area Detection on the Qinghai–Tibetan Plateau

1
National Cryosphere Desert Data Center, Northwest Institute of Eco-Environment and Resources, Chinese Academy of Sciences, Lanzhou 730000, China
2
School of Information Science and Engineering, Lanzhou University, Lanzhou 730000, China
*
Author to whom correspondence should be addressed.
Remote Sens. 2025, 17(24), 3977; https://doi.org/10.3390/rs17243977
Submission received: 20 October 2025 / Revised: 24 November 2025 / Accepted: 1 December 2025 / Published: 9 December 2025
(This article belongs to the Section Environmental Remote Sensing)

Highlights

  • We propose a novel semi-supervised framework (SBLS) for black-soil area detection, which employs two non-shared dual branches with identical architectures. These branches perform mutual learning through cross-branch weak-to-strong pseudo supervision, effectively improving pseudolabel reliability and model generalization.
  • We introduce a cross-branch weak-to-strong pseudo-supervision strategy, where pseudolabels generated from weakly augmented views guide the training of multiple strongly augmented counterparts. This strategy, coupled with high-confidence filtering, enhances training stability and robustness.
  • We design a dual-level contrastive learning mechanism that integrates different contrastive objectives across two feature levels. To further encourage complementary representation learning between branches, we apply two distinct block-wise mixing augmentations to each pair of strongly augmented images. This design increases feature diversity and enables each branch to provide richer supervision for the other.

Abstract

The Qinghai–Tibetan plateau is undergoing severe grassland degradation, commonly known as black-soil areas, caused by overgrazing, climate change, and rodent activity. Accurate black-soil area detection is critical for guiding ecological restoration. However, obtaining large-scale annotated datasets is costly due to the ambiguous visual characteristics and high ecological variability of black-soil areas, often necessitating expert validation and repeated refinement. To address this challenge, we propose SBLS (Semi-supervised Black-Soil area detection), a semi-supervised approach that leverages limited labeled data alongside abundant unlabeled imagery. SBLS adopts a cross-branch pseudo supervision strategy, where pseudolabels generated from weakly augmented views in one branch supervise four strongly augmented views in the other branch. To further capitalize on the unlabeled data, we implement a dual-level contrastive learning approach that operates across both low-level and high-level feature spaces of strongly augmented pairs. Experiments demonstrate that SBLS significantly outperforms existing state-of-the-art methods, establishing a new benchmark for black-soil area detection in semi-supervised settings.

1. Introduction

The Qinghai–Tibetan Plateau (QTP) plays a vital role in climate regulation, water security, and biodiversity conservation in Asia, and also serves as an important carbon sink in the global ecosystem [1]. In recent decades, extensive grassland degradation has occurred across the plateau, leading to the formation of black-soil areas, a unique type of degraded grassland mainly distributed in cold and high-altitude regions between 3000 and 5000 m [2]. These areas are characterized by the loss of the mattic epipedon, exposure of dark and compacted soil, and sharp declines in vegetation cover and soil organic matter [3]. The degradation process is driven by several interacting factors, including overgrazing, permafrost thawing, soil erosion, and rodent disturbance [4]. As a result, black-soil areas show reduced soil fertility, weakened water retention, and increased carbon emissions, which further aggravate regional desertification and threaten the livelihoods of local herders.
Long-term ecological restoration studies in the Sanjiangyuan region have shown that black-soil areas are among the most difficult ecosystems to restore on the QTP, often requiring decades or even centuries for natural recovery [2]. This degradation not only reduces the productive and ecological functions of alpine grasslands but also disturbs the stability of the plateau’s climate and hydrological systems. Therefore, accurate identification and monitoring of black-soil areas are essential for evaluating degradation patterns, guiding ecological engineering, and implementing effective restoration measures. Traditional field surveys and expert-based mapping are limited by high labor costs, small spatial coverage, and the strong spatial heterogeneity of degraded patches. In contrast, remote sensing offers a cost-effective approach for large-scale monitoring, but the spectral and spatial similarity between black-soil areas and mattic epipedons makes automatic detection challenging [4].
Existing supervised learning approaches rely heavily on large volumes of images with ground-truth labels [4]. However, acquiring these expert-annotated labels is challenging because annotating black-soil areas demands expert knowledge of vegetation morphology. Generating high-quality annotations for black-soil areas requires domain expertise, iterative visual confirmation, and precise boundary delineation, making the process prohibitively resource-intensive and difficult to scale. In response, we turn to semi-supervised approach to improve black-soil area detection with limited labeled data. We propose the Semi-supervised Black-Soil area detection (SBLS) method, which leverages weak-to-strong cross-branch pseudo supervision to exploit the diversity between the two branches. Unlike most semi-supervised methods that rely on a single branch [5,6,7,8,9], SBLS utilizes two branches in a mutual learning framework, combining multiview pixel-level contrastive learning with cross-branch pseudo supervision. This design specifically addresses the challenges of biased pseudo labels and the need to capture both low-level textures and high-level semantic cues in black-soil areas.
To enhance generalization, we introduce distinct block-wise mixing augmentations to strongly augmented images [10]. These augmented views enable the model to learn robust and complementary representations, where knowledge from one view aids the learning process in the other. By introducing spatial perturbations, this strategy encourages the model to maintain consistency despite disruptions. While block-wise mixing alongside other strong augmentations may introduce increased variability in predictions, such diversity aligns with the core principles of contrastive learning.
Although contrastive learning, dual-branch models and weak-to-strong pseudo supervision have each been used in semi-supervised learning, our choice to combine them is driven by the characteristics of black-soil detection. First, a single model tends to reinforce its own errors when generating pseudo labels, especially when boundaries are complex. Two branches with independent parameters naturally produce different predictions, giving complementary information. Second, black-soil areas show both subtle texture differences and broader semantic variations, so consistency at only one feature level is not sufficient. Using contrastive learning at both low and high levels helps the model learn fine-texture cues while keeping the semantic space well separated. Third, weak-to-strong cross-branch supervision connects these parts by letting stable predictions from weak views guide the strongly augmented views of the other branch. This helps avoid the noise that strong augmentations usually introduce while still gaining their regularization benefits. Together, these design choices form a framework in which the two branches reduce bias, the two contrastive levels improve feature discrimination, and the cross-branch supervision stabilizes training. This combination provides advantages that are difficult to achieve when any of the components is used alone. SBLS consists of two key components: dual-level contrastive learning and cross-branch pseudo supervision. First, we incorporate dual-level contrastive learning across both low-level and high-level feature spaces to fully leverage unlabeled data. In each branch, we generate two pairs of strongly augmented views, applying contrastive loss to each pair. One pair comes from low-level image augmentations, while the other comes from high-level feature perturbations. This approach facilitates more efficient use of unlabeled data and improves feature separability, allowing the model to more accurately distinguish spatial and semantic patterns relevant to black-soil areas. Second, we propose a cross-branch pseudo supervision strategy, where pseudolabels generated from weakly augmented views in one branch supervise two pairs of strongly augmented views in the other branch. To enhance the reliability of these pseudolabels, we apply a predefined threshold to filter out low-confidence predictions. This cross-branch weak-to-strong pseudo supervision strengthens model robustness by encouraging consistent predictions across different augmentations.
By integrating the two components, SBLS enables effective detection of black-soil areas and mattic epipedon areas [11], while fully exploiting large-scale unlabeled data collected by UAV platforms. Experimental results on the QTP-BS benchmark datasets [4] demonstrate that our approach SBLS achieves state-of-the-art performance, showcasing its strong potential for real-world applications in large-scale black-soil area mapping. This work contributes not only to the advancement of semi-supervised learning in remote sensing, but also provides practical support for long-term ecological monitoring and grassland restoration on the QTP.
The main contributions of this work are as follows:
  • We propose a novel semi-supervised framework (SBLS) for black-soil area detection, which employs two non-shared sub-nets with identical architectures. These networks perform mutual learning through cross-branch weak-to-strong pseudo supervision, effectively enhancing pseudolabel reliability and model generalization.
  • We introduce a cross-branch weak-to-strong pseudo supervision strategy, where pseudolabels from weakly augmented views supervise multiple strongly augmented counterparts. A weak-to-strong pseudo supervision strategy, guided by high-confidence filtering, improves training stability and robustness.
  • We develop a dual-level contrastive learning mechanism that combines different contrastive loss in dual-level feature spaces. To further encourage complementary representation learning between views, we apply two distinct block-wise mixing augmentations to each pair of strongly augmented images. This design increases feature diversity across views, helping each sub-net provide more informative supervision for the other.
The rest of this paper is organized as follows: Section 2 reviews related works on semi-supervised semantic segmentation and remote sensing image analysis. Section 3 provides a detailed description of the proposed SBLS framework, including the cross-branch pseudo supervision strategy and the dual-level contrastive learning mechanism. Section 4 presents the experimental setup and results on the QTP-BS dataset, covering comparisons with state-of-the-art methods, ablation studies, and visual analysis under various supervision levels. Section 5 discusses the performance, generalization ability, limitations, and potential applications of the proposed SBLS framework. Finally, Section 6 concludes the paper.

2. Related Work

Semi-supervised semantic segmentation (SSSS) has emerged as a promising solution to alleviate the reliance on costly pixel-level annotations, especially in remote sensing scenarios where expert knowledge and fine-grained boundary delineation are required [12]. Existing SSSS methods can be broadly categorized into two lines of research: pseudo supervision and consistency regularization.
Pseudo supervision methods [13,14] typically employ a teacher model to generate pseudolabels for unlabeled data, which are then used to supervise a student model. Noisy Student [15] improves this strategy by adding noise and strong augmentations during student training. However, such methods often rely on offline teacher–student updates or multi-stage pipelines, limiting scalability for real-world applications.
Consistency regularization offers an end-to-end alternative by enforcing prediction consistency across different perturbations [16,17]. FixMatch [5] applies weak-to-strong augmentation consistency using hard pseudolabels, while Mean Teacher [18] employs exponential moving average (EMA) to maintain temporal stability. These methods provide effective means to exploit unlabeled data, yet many remain single-branch frameworks with limited ability to capture complementary representations.
To improve pseudolabel stability and network generalization, dual-branch architectures have gained popularity. CPS [19] introduces cross-pseudo supervision, where two independent networks generate and exchange pseudolabels for each other. This mutual learning framework enhances training diversity and reduces confirmation bias. CCVC [20] further enforces cross-branch consistency, compelling each branch to learn distinct and complementary features, while a conflict-aware filtering strategy promotes robust learning under noisy supervision. CrossMatch [21] extends this idea by introducing cross-branch weak-to-strong consistency and discrepancy-based constraints, which improve boundary sensitivity and model robustness.
Beyond architectural improvements, contrastive learning has recently been adopted to enhance feature discrimination. DSSN [22] proposes a dual-level Siamese structure with pixel-wise contrastive losses in both image and feature spaces, maximizing the representation capacity of unlabeled data. Additionally, it introduces a class-aware pseudolabel selection strategy, selecting the top confident prediction for each class to alleviate class imbalance and improve long-tailed performance.
UniMatch [6] revisits consistency training by incorporating a unified dual-stream strategy that supervises two strongly augmented views with a shared weak prediction. It leverages diverse augmentations such as CutMix and feature dropout to enforce consistency at both the image and representation levels, implicitly promoting contrastive learning across views. While effective, it remains a single-branch model with limited cross-branch interactions.
CorrMatch [9] takes a different approach by leveraging correlation maps to propagate labels. It introduces pixel propagation and region propagation strategies to expand high-confidence regions and refine pseudolabels based on feature similarity and shape cues. A dynamic thresholding mechanism further adapts pseudolabel confidence filtering during training, improving label quality without incurring inference overhead.
In addition to semi-supervised approaches, many studies have employed supervised classification for land cover mapping and object extraction in remote sensing. Traditional pixel-wise classifiers, object-based image analysis (OBIA), and multi-scale feature extraction techniques have achieved high accuracy in detecting various land cover types [23,24,25]. For example, random forest and support vector machine classifiers have been widely used to map vegetation, soil, and water bodies at different spatial scales.
With the advent of deep learning, fully supervised models have further improved pixel-level accuracy by capturing hierarchical spatial and spectral features. CNN-based frameworks, such as the interpretable CNN with SHAP for EuroSAT [26], MSG-GCN for forest type classification [27], and 3D-1D-CNN for urban hyperspectral feature extraction [28], demonstrate strong capability in extracting discriminative features and handling complex textures, shadows, and multi-scale objects. Multi-branch architectures have also been proposed to integrate global context and local multi-scale information, achieving superior performance on benchmark datasets such as UC-Merced, SIRI-WHU, and EuroSAT [29].
Generative adversarial network (GAN)-based supervised methods have been explored to address limited training data. Encoder-based modified GANs [30], SPG-GAN [31], and supervised Lie group feature learning GANs [32] can generate labeled samples for land use and scene classification, significantly improving accuracy under small-sample conditions. Semi-supervised multi-scale GANs have also been applied for semantic segmentation of small or challenging objects in remote sensing images [33].
Recurrent neural networks (RNNs) and long short-term memory (LSTM) networks have been used to capture spatial dependencies in high-dimensional hyperspectral images, improving pixel-level classification [34,35,36]. Autoencoder (AE)-based and transformer-based models, including masked autoencoder spectral–spatial transformers (MAEST) [37] and stacked AE with hidden Markov random field [38], further enhance feature learning and classification robustness, particularly in noisy or spectrally heterogeneous regions.
These studies show that while supervised approaches provide reliable feature extraction and high-resolution classification, they require sufficient labeled data and may degrade in low-label or highly degraded regions such as black-soil areas on the Qinghai–Tibetan Plateau. Semi-supervised methods, on the other hand, leverage multiple views, augmentations, and contrastive objectives to reduce label dependency, but most rely on single-branch consistency or lack multi-level supervision across strongly perturbed views. Motivated by these limitations, our method SBLS integrates dual-level contrastive learning with cross-branch pseudo supervision in a dual-network architecture, explicitly enhancing both representation diversity and pseudolabel reliability.

3. Method

An overview of the proposed SBLS architecture is illustrated in Figure 1, which highlights the cross-branch pseudo supervision strategy and dual-level contrastive learning approach across different augmented views. In SSSS, we are given a set of fully pixel-wise annotated images 𝒟 l = { ( X i , T i ) } i = 1 M and a set of unlabeled images 𝒟 u = { X i } i = M + 1 M + N , where X R h × w × c denotes an input image of size h × w with c channels, and T { 0 , 1 } h × w × k is its corresponding pixel-wise one-hot ground-truth label, with k denoting the number of semantic classes. M and N denote the number of labeled and unlabeled images, respectively, where N M in most cases.
To unify notation, we denote the pixel-wise prediction and supervision information as follows. Given an image X = [ x i ] , each pixel x i R c is treated as a feature vector across channels. We denote the ground-truth vector for the i-th pixel as t i . The model outputs a probability vector y i = [ y i j ] R k , where y i j indicates the predicted probability of pixel i belonging to class j, with  j { 1 , , k } . The prediction y i is produced by an encoder–decoder network:
y i = g f ( x i θ ) φ
where f ( · θ ) and g ( · φ ) represent the encoder and decoder, parameterized by θ and φ , respectively.

3.1. Cross-Branch Weak-to-Strong Pseudo Supervision

We introduce our proposed cross-branch weak-to-strong pseudo supervision strategy, which is designed to fully exploit unlabeled data. Specifically, we process the same input image with different augmentation techniques, allowing the network to extract diverse and informative features. These augmented inputs offer varied and complementary representations of the black-soil scene. To enable mutual supervision across branches, we adopt a cross-branch structure, where one weakly augmented view from one branch is used to supervise four strongly augmented views from the other branch. This asymmetric supervision enhances prediction consistency across different levels of data perturbation and improves the utilization of the unlabeled data.
We first apply a weak augmentation operation to the input, denoted as
x i = wag ( x i )
where wag ( · ) represents a weak augmentation function. The resulting image is then passed through the encoder f ( · | θ ) to extract latent features,
z i w = f ( x i θ )
Subsequently, the decoder g ( · φ ) generates the weak-view prediction:
y i w = g ( z i w | φ )
Since the decoder ends with a softmax layer, we derive the pseudolabel by selecting the class with the highest predicted probability:
t ^ i = onehot ( y i w ) ,
where onehot ( · ) refers to converting the predicted probability vector into a discrete label vector, where the entry corresponding to the class with the highest predicted probability is set to 1, and all others are set to 0.
Specifically, the low-level input image is subjected to two instances of strong augmentation. The augmented images are passed through the model to obtain their predictions,
y i s 1 = g ( f ( lsag ( x i ) θ ) φ ) , y i s 2 = g ( f ( lsag ( x i ) θ ) φ ) .
where lsag ( · ) denotes a stochastic strong-augmentation operator incorporating transformations such as blurring and deformation. The randomness of lsag ( · ) ensures that the resulting images differ, thereby enhancing input diversity. Here, lsag ( · ) denotes our proposed low-level strong-augmentation operator, which adopts distinct block-wise mixing augmentations to enhance complementary representation learning between views. Specifically, we apply two different CutMix operations to each pair of strongly augmented images, thereby generating new training samples by cutting and pasting regions between four images. To preserve semantic integrity, the bounding boxes used for CutMix are constrained to be smaller than one-fourth of the original image size. This design increases feature diversity across views and mitigates overfitting while maintaining the primary spatial structure of the input. Such block-wise mixing encourages each sub-network to capture distinct yet complementary visual cues, improving the robustness of contrastive learning at the low level. At the high level, we first obtain the latent feature z i w as given by Equation (3). This feature is then subjected to an additional strong augmentation hsag ( · ) in the latent space, which includes random dropout. The augmented features are decoded by g ( · φ ) to produce the high-level predictions:
y i s 3 = g ( hsag ( z i w ) φ ) , y i s 4 = g ( hsag ( z i w ) φ )
In semi-supervised learning, pseudo supervision is a common strategy to improve model performance. Specifically, a weakly augmented view of the input is used to generate predictions with relatively high confidence. These weak-view prediction supervise the predictions of strongly augmented views of the same sample. This weak-to-strong pesudo supervsion paradigm effectively leverages information from unlabeled data, as the reliable predictions from the weak view guide the learning of the model on more challenging, perturbed views, thereby enhancing detection performance.
In Figure 2, we realize the weak-to-strong pseudo supervision by enforcing bidirectional cross-branch within the SBLS framework. Let the two collaborative branches be denoted by branch a and branch b. In the a b direction, the weakly augmented prediction from branch a, y i w , a = g ( f ( x i θ a ) φ a ) , is converted into a pseudolabel t ^ i a by using Equation (5). This pseudolabel is then used to supervise the strongly augmented predictions of branch b, including y i s 1 , b , y i s 2 , b , y i s 3 , b , and  y i s 4 , b . Supervision is applied only where the weak prediction is sufficiently confident, and compute a masked cross-entropy loss between t ^ i a and each strong view of branch. The opposite direction b a is defined symmetrically, providing reciprocal guidance. This bidirectional design (weak → strong in both directions) reduces confirmation bias compared to single-branch self-training, since each branch supervises the other using a comparatively reliable weak view, while the multiple strong views increase robustness to severe perturbations and improve generalization.
To ensure reliable supervision, we introduce a pixel-wise binary confidence mask m i j that discards low-confidence predictions [5]. For each pixel i, we define its confidence score as the maximum predicted probability over all classes. The mask is set to 1 only if this confidence exceeds a predefined threshold τ = 0.95 ; otherwise, it is set to 0. This threshold follows common practice in semi-supervised segmentation methods such as FixMatch [5] and CPS [19], and ablation studies (Section 4.3) show that 0.95 provides a good balance between including enough supervision and avoiding noisy labels, thereby improving training stability and model generalization.
m i j = 1 , if y i j w > τ 0 , otherwise .
This filtering strategy prevents noisy pseudo labels from degrading the learning process.
The cross-pseudo supervision loss on unlabeled data is given as
L w 2 s a 2 b = i 𝒟 u j = 1 k m i j a · t ^ i j a · log y i j s 1 , b + log y i j s 2 , b + log y i j s 3 , b + log y i j s 4 , b
To avoid bias introduced by varying numbers of selected pixels across images, the final pseudo supervision loss is normalized over the total number of valid pixels:
L w 2 s = L w 2 s a 2 b + L w 2 s b 2 a
This cross-branch weak-to-strong consistency strategy not only propagates high-confidence pseudolabels between branches but also effectively mitigates noise caused by uncertain predictions.

3.2. Dual-Level Constrative Learning

To further enhance the utilization of the unlabeled data, our SBLS framework applies contrastive learning in both low-level and high-level features. We adopt pixel-wise contrastive learning to enforce prediction consistency across different views of the same pixel [22]. In each branch, two strongly augmented views are generated, from which positive pixel pairs are formed by matching the same spatial locations across views. Specifically, for each set of strongly augmented inputs within a branch, we compute contrastive losses at multiple feature levels, encouraging contrastive representations across different views.
We denote by h the logits before the softmax layer in the decoder g ( · φ ) . In line with DIM [39], the pixel-wise contrastive objective L cl is given by
L cl = 1 | 𝒫 | ( i , i ) 𝒫 log d ( h i s1 , h i s2 ) 1 | 𝒩 | ( i , j ) 𝒩 log 1 d ( h i s 1 , h j s 2 ) ,
where d(·, ·) measures the similarity between two logits, 𝒫 and 𝒩 denote the sets of positive and negative pixel pairs, respectively, and h i s1 and h i s2 are the logits of pixel i from two low-level augmented views.
Following the idea of avoiding explicit negative sampling in BYOL [40], we retain only the positive term
L cl = 1 | 𝒫 | ( i , i ) 𝒫 log d ( h i s 1 , h i s 2 ) .
Referring to DSSN [22], we define the similarity score using a Gaussian kernel: d ( h i s 1 , h i s 2 ) = exp h i s 1 h i s 2 2 2 . Substituting this into Equation (12), we have
L cl = 1 | 𝒫 | ( i , i ) 𝒫 h i s 1 h i s 2 2 2
which is equivalent to a mean squared error (MSE) loss between the logits from the two augmented views.
Consequently, the low-level pixel-wise contrastive loss becomes
L low = 1 h w i 𝒟 u h i s 1 h i s 2 2 2 .
Similarly, the high-level pixel-wise contrastive loss is given by
L high = 1 h w i 𝒟 u h i s 3 h i s 4 2 2 .
Aggregating the dual-level consistency losses from both branches, we have
L dlcl   =   L low a   +   L low b   +   L high a   +   L high b

3.3. Overall Supervision for SBLS

To summarize, we present two complementary techniques designed to exploit unlabeled images. Our holistic framework, referred to as SBLS, integrates both strategies and is illustrated in Figure 1. In this section, we present the SBLS algorithm, which is illustrated in Algorithm 1. It takes a small fraction of labeled data and a large fraction of unlabeled data as input to train the model. The supervised loss is computed between the model predictions on labeled data and the corresponding ground-truth labels using Equation (17), ensuring effective learning from annotated data:
L sup = i 𝒟 l j = 1 k t i j ( log y i j a + log y i j b ) .
The overall loss function combines all components as a weighted sum:
L overall = λ 1 L sup + λ 2 L dlcl + λ 3 L w 2 s
The weights λ1, λ2, and λ3 serve as trade-off parameters among different loss terms.
By jointly optimizing this composite loss, the model is able to effectively leverage both labeled and unlabeled data, thereby enhancing its generalization capability and robustness.
Algorithm 1 The SBLS Algorithm.
Require: 
Labeled dataset 𝒟 l , unlabeled dataset 𝒟 u , batch size b, threshold τ  ;
Ensure: 
Trained model parameters θ a , φ a , θ b , and  φ b  ;
 1:
Initialize: θ a , φ a , θ b , φ b , epoch = 0 , max_epochs ;
 2:
while  epoch < max_epochs  do
 3:
    Sample batch from 𝒟 l 𝒟 u  ;
 4:
    for each pixel x i in batch do
 5:
         x i wag ( x i )  ;
▹ Weak augmentation for both branches
 6:
         z i w , a f a ( x i θ a )  ;
 7:
         y i w , a g a ( z i w , a φ a )  ;
 8:
         z i w , b f b ( x i θ b )  ;
 9:
         y i w , b g b ( z i w , b φ b )  ;
10:
         t ^ i a onehot ( y i w , a )
▹ Generate pseudolabels ;
11:
         t ^ i b onehot ( y i w , b )  ;
12:
         y i s 1 , a g a ( f a ( lsag ( x i ) θ a ) φ a )  ;
▹ Strong augmentations (low-level)
13:
         y i s 2 , a g a ( f a ( lsag ( x i ) θ a ) φ a )  ;
14:
         y i s 1 , b g b ( f b ( lsag ( x i ) θ b ) φ b )  ;
15:
         y i s 2 , b g b ( f b ( lsag ( x i ) θ b ) φ b )  ;
16:
         y i s 3 , a g a ( hsag ( z i w , a ) φ a )  ;
▹ Strong augmentations (high-level)
17:
         y i s 4 , a g a ( hsag ( z i w , a ) φ a )  ;
18:
         y i s 3 , b g b ( hsag ( z i w , b ) φ b )  ;
19:
         y i s 4 , b g b ( hsag ( z i w , b ) φ b )  ;
20:
         m i j a = double ( y i j w , a > τ )  ;
▹ Confidence masking
21:
         m i j b = double ( y i j w , b > τ )  ;
22:
    end for
23:
     L sup entropy ( Y a , T ) + entropy ( Y b , T )  ;
▹ on labeled data
24:
     L w 2 s L w 2 s a 2 b + L w 2 s b 2 a  ;
▹ cross-branch pseudo supervision
25:
     L dlcl L low a + L low b + L high a + L high b  ;
▹ dual-level contrastive
26:
     L overall λ 1 L sup + λ 2 L dlcl + λ 3 L w 2 s  ;
27:
    Update θ a , φ a , θ b , φ b via backpropagation of L overall  ;
28:
     epoch epoch + 1  ;
29:
end while

4. Experiments

4.1. Experimental Setting

The QTP-BS dataset [4] used in this study was developed from unmanned aerial vehicle (UAV) imagery collected in Machin County, Guoluo Prefecture, Qinghai Province, China. The study area is located on the Qinghai–Tibetan Plateau at an average altitude of approximately 3588 m and is characterized by severe alpine degradation and fragile ecological conditions. The UAV images cover representative black-soil areas with diverse surface features, including exposed soil, gravel, and rocks [41], as well as typical weed and poisonous plant species commonly found in degraded grasslands [42]. The dataset also includes multiple degradation stages, ranging from slightly degraded to severely desertified grasslands, capturing the spatial variability of black-soil area formation on the plateau. In total, about 400 high-resolution UAV images (5472 × 3648 pixels) were acquired under natural lighting conditions. From these, 100 representative images were carefully selected after removing those affected by shadows, blurriness, or other interference factors [4]. Pixel-level annotations were manually created under the supervision of ecological experts through an iterative process of labeling, verification, and refinement to ensure high precision. Before training, all images were geometrically corrected and cropped into 384 × 384 pixel patches for model input. The QTP-BS dataset provides high-quality spatial and spectral information for the detection and analysis of black-soil areas and effectively reduces potential regional bias in the experiments [4].
The QTP-BS dataset consists of 8400 images with a resolution of 384 × 384 pixels for training, 3600 images for validation, and two separate test sets each containing 1500 images. Compared to the first test set, images in a secondary set were captured under poorer lighting conditions, often featuring significant shadows and low contrast. Furthermore, the boundaries of black-soil areas in this set are more ambiguous, increasing the difficulty of segmentation tasks. This test set serves as a valuable benchmark for evaluating the robustness and generalization performance of models in visually complex and low-quality scenarios. In the semi-supervised setting, only the labeled and unlabeled portions of the training set are used for optimization. The validation set is employed solely for model selection, and its labels are not accessed during training, ensuring a fair semi-supervised evaluation. This setting enables comprehensive evaluation of the model’s adaptability and generalization capability under semi-supervised learning conditions. All images were captured by UAVs and subsequently processed through radiometric correction, geometric correction, and image registration before being uniformly cropped to the standard size. Note that in this study, the term “black-soil area” refers to severely degraded alpine meadow regions characterized by exposed dark soil and sparse vegetation, while the “mattic epipedon” denotes the intact root-mat layer typical of healthy meadows. The segmentation task is formulated as a binary classification between these two categories.
We evaluate the segmentation performance using the mean Intersection over Union (mIoU), a widely adopted metric in semantic segmentation. The IoU for each class is defined as the ratio of the overlap between the predicted mask and the ground truth to their union. The mIoU is then obtained by averaging the IoU values across all C categories: mIoU = 1 C j = 1 C TP j TP j + FP j + FN j , where TP j , FP j , and FN j denote the number of true positives, false positives, and false negatives for the j-th class, respectively.
In addition to mIoU, we also report the pixel accuracy (Acc), which measures the proportion of correctly predicted pixels across all categories: Acc = j = 1 C TP j j = 1 C ( TP j + FP j + FN j ) . Acc reflects the overall pixel-wise prediction correctness and serves as a complementary metric to mIoU, providing a global measure of segmentation reliability.
To validate the effectiveness of our SBLS method in the semi-supervised segmentation task of black-soil areas, we conduct comprehensive comparative experiments against several state-of-the-art SSSS approaches. Specifically, we compared SBLS with CPS [19], CCVC [20], DSSN [22], UniMatch [6], and CorrMatch [9], along with a supervised baseline named Onlysup, which is trained solely on labeled samples. These methods represent diverse advanced strategies in the field of semi-supervised segmentation. For example, CCVC [20] employs a dual-branch co-training framework that encourages two sub-networks to learn informative features from uncorrelated views, while UniMatch [6] is the first to apply a weak-to-strong consistency paradigm in SSSS. For fair comparison, all methods used the same data splits for training and evaluation. To simulate various annotation-scarce scenarios, experiments were conducted under four different ratios of labeled to unlabeled data in the training set: 1/2, 1/4, 1/8, and 1/16. All models were evaluated on two test sets, allowing a thorough assessment of their robustness and generalization under different supervision levels.
To comprehensively evaluate the effectiveness and generalization of the proposed SBLS method, we conducted experiments under the 1/4 data split without pretraining. As shown in Figure 3, BS-Mamba [4] is more suitable for the black-soil detection task. Therefore, BS-Mamba [4] is adopted as the backbone network for all comparison methods in this study. BS-Mamba is a deep model specifically designed for black-soil area detection, demonstrating strong task adaptability and scene generalization. Benefiting from its task-oriented design, BS-Mamba serves as a reliable baseline for assessing the core performance of SBLS, particularly in remote sensing scenarios with limited labeled samples.
The optimizer used was stochastic gradient descent (SGD) with a momentum of 0.9 and a weight decay of 1 × 10−4. For each semi-supervised segmentation method, the learning rate and the total number of training epochs were set based on the specific characteristics of each method to ensure training stability and convergence. The learning rate scheduling strategy combined a warm-up phase with polynomial decay: the learning rate was linearly increased during the initial phase (typically the first 5 epochs), followed by polynomial decay with a power of 0.9. All input images were standardized through preprocessing and uniformly center-cropped to a resolution of 384 × 384 pixels to mitigate the impact of image size variation on training stability. The batch size was set to 4 throughout the training process, balancing computational efficiency, memory usage, and model performance. High-level augmentation techniques were also applied to further enhance model generalization, including dropout with a rate of 0.5.

4.2. Experimental Results

This section evaluates the performance of our proposed SBLS method on the QTP-BS dataset. We assess its effectiveness and robustness using both mIoU and classification accuracy (Acc), providing a more comprehensive evaluation of segmentation quality from pixel-level overlap and per-class recognition perspectives. The results in Table 1 and Table 2 are obtained from the best-performing model trained on the training and validation sets, then tested on two strictly independent test sets within QTP-BS, ensuring fair and credible evaluation.
Table 1 shows the performance comparison between SBLS and several advanced SSSS methods on the first test set. Overall, SBLS achieves the best segmentation performance under all label ratios in terms of both mIoU and Acc, demonstrating excellent robustness and generalization ability. Compared with the supervised baseline OnlySup, SBLS yields consistent improvements across all label settings, indicating that our framework effectively enhances feature discrimination and reduces boundary misclassification even when annotated data is scarce.
Compared with other state-of-the-art semi-supervised methods such as CPS [19], CCVC [20], DSSN [22], UniMatch [6], and CorrMatch [9], SBLS maintains the leading mIoU and Acc under all label ratios. As the proportion of labeled data decreases, all methods exhibit varying degrees of performance degradation. However, SBLS consistently outperforms the others, achieving nearly 11% improvement over CPS [19] in the extremely low-label scenario (1/16). In addition, SBLS exhibits the smallest fluctuation in Acc, reflecting its stronger stability and lower sensitivity to pseudo-label noise. In contrast, CPS shows significant performance drops with reduced annotations, highlighting its higher reliance on consistent pseudo-label reliability.
To further validate the generalizability of SBLS, we conduct experiments on a more challenging secondary test set and present the results in Table 2. This test set contains complex black-soil degradation patterns, including patchy mattic epipedons, exposed soil, snow residues, and rodent-disturbed surfaces. SBLS demonstrates consistent superiority over OnlySup and other semi-supervised methods across all label splits in both mIoU and Acc. Notably, SBLS yields improvements of 7.25%, 5.18%, 7.49%, and 9.29% in mIoU over OnlySup under the 1/2, 1/4, 1/8, and 1/16 label ratios, respectively, while achieving comparable gains in Acc. These results confirm the strong generalization capability of SBLS across real-world degraded surface conditions, particularly in low-annotation scenarios.
To visually compare the performance of different methods in complex scenes, we select representative samples from the QTP-BS dataset for qualitative analysis, including large-scale image views (Figure 4, Figure 5, Figure 6 and Figure 7) and local regions (Figure 8 and Figure 9). In these figures, red areas indicate regions where the ground truth is background but the model incorrectly predicts as foreground, while green areas indicate regions where the ground truth is foreground but the model incorrectly predicts as background. These visualizations effectively showcase the models’ capabilities in perceiving global structures and discriminating fine-grained regions. These figures are best viewed in color to appreciate the distinction between error types.
Figure 4 and Figure 5 show two panoramic remote sensing images from the first test set, with diverse and clear feature distributions. This set includes various types of black-soil areas, such as desertified grasslands, regions covered by weeds and poisonous plants, and areas with mattic epipedons [11]. In particular, weeds and poisonous plants (e.g., Artemisia frigida, Morina kokonorica) often coexist and intertwine with mattic epipedons, resulting in blurred and difficult-to-distinguish boundaries. As illustrated in Figure 4, compared with (c) CPS, our approach SBLS demonstrates superior capability in accurately identifying weeds and poisonous plants. Similarly, in Figure 5, SBLS shows a stronger ability to detect the boundaries and contours of black-soil areas compared with (f) CorrMatch. Benefiting from the cross-branch architecture and multi-level consistency mechanisms, SBLS achieves excellent performance in both structural restoration and texture discrimination, with predictions that align closely with the ground truth.
Figure 6 and Figure 7 present two panoramic remote sensing images from the secondary test set, which contain common challenges such as low illumination, shadows, and human disturbances (e.g., fences). Moreover, the boundaries between black-soil areas and mattic epipedons are highly ambiguous. As shown in Figure 7, (d) DSSN produces numerous false negatives, while (c) CPS suffers from a large number of false positives. These errors primarily stem from their inability to accurately distinguish the boundaries between black-soil areas and mattic epipedons. In contrast, our approach (g) SBLS provides more precise recognition of different types of black-soil areas under these challenging conditions. This improvement is attributed to the cross-branch weak-to-strong consistency strategy, which effectively establishes stable consistency across diverse augmented views and enhances model robustness.
Figure 8 shows four local regions from the first test set, including areas covered by weeds and poisonous plants and complex terrains. These regions exhibit complex textures, colors, and boundaries, which require high pseudolabel accuracy. As shown in Figure 8, both (c) CPS and (f) CorrMatch produce a considerable number of false positives, primarily due to their limited ability to accurately identify weeds and poisonous plants. In contrast, our approach (g) SBLS effectively mitigates this issue by introducing CutMix-based consistency regularization in both the image and feature spaces. This strategy not only preserves boundary continuity but also enhances class discriminability, significantly reducing both false positives and false negatives in these complex regions.
Figure 9 presents four complex local regions from the secondary test set, including soil-type black-soil areas, snow patches, plateau pika burrows [43], and regions coexisting with mattic epipedons. Soil-type black-soil areas are particularly challenging due to their visual similarity to mattic epipedons and are often misclassified as background. As shown in Figure 9, both (d) DSSN and (e) UniMatch exhibit limited boundary perception when detecting soil-type black-soil areas and plateau pika burrows, often resulting in blurred contours or missed targets. In contrast, our approach (g) SBLS significantly improves adaptability to complex scenes through the joint design of pixel-level contrastive constraints and cross-level pseudolabel supervision. Even under challenging conditions such as snow coverage or rodent disturbances, SBLS consistently preserves boundary continuity and class discriminability, further demonstrating the effectiveness of its cross-branch consistency mechanism in improving model robustness.
These results demonstrate that SBLS effectively enhances pseudolabel quality and model perception by leveraging multiview collaboration, cross-branch consistency, and hierarchical regularization. In particular, the cross-branch weak-to-strong supervision encourages diverse yet reliable training signals, while the dual-level contrastive learning refines feature discrimination across image and representation spaces. Together, these designs account for the consistent robustness and superior segmentation accuracy of SBLS across different label ratios and challenging scenarios.

4.3. Ablation Study

Effect of cross-branch design. We conduct ablation experiments to evaluate the contribution of the proposed cross-branch weak-to-strong pseudo supervision by comparing it with a single-branch baseline under different labeled data ratios, as shown in Figure 10. In the single-branch setting, weak predictions are only used to guide its own strong augmentations, which restricts the diversity of supervision signals and increases the risk of confirmation bias. In contrast, the cross-branch design exchanges weak predictions across branches to supervise multiple strongly augmented views, thereby improving pseudo-label reliability and enhancing consistency under strong perturbations. The results show that the cross-branch framework consistently outperforms the single-branch baseline on both test sets of the QTP-BS dataset, with larger gains when labeled data are scarce. This demonstrates that the proposed design effectively promotes feature diversity and strengthens model robustness, leading to superior segmentation performance across different label ratios.
Value of the confidence threshold τ . We conduct an ablation study of this hyperparameter on the QTP-BS dataset under the 1/4 split setting, as shown in Figure 11. We observe that τ = 0.95 achieves the highest mIoU on both test sets. A lower threshold introduces more noisy pseudolabels, whereas a higher threshold reduces the number of available labels. Therefore, τ = 0.95 strikes a good balance between label quality and quantity, and is adopted in all subsequent experiments.
Effect of distinct CutMix augmentation. We conduct ablation experiments on the QTP-BS dataset to investigate the impact of distinct CutMix operations, as shown in Figure 12. The results reveal that applying CutMix consistently improves mIoU across various label ratios, with particularly notable improvements under low-label scenarios. This indicates that the block-wise spatial perturbations introduced by CutMix not only encourage the model to learn more robust representations but also promote the complementary learning between branches in our semi-supervised framework. Therefore, CutMix is adopted as the default augmentation in all subsequent experiments.
Effectiveness of Dual-Level Contrastive Learning. The proposed SBLS framework integrates low-level and high-level contrastive learning to enforce feature consistency at both texture and semantic levels. As shown in Table 3, removing both contrastive losses results in mIoU scores of 77.75% and 67.43% on the first and second test sets, respectively. Introducing only the high-level contrastive learning improves the results to 79.13% and 67.98%, while applying only the low-level contrastive learning achieves 78.80% and 68.82%. This indicates that low-level feature alignment contributes more significantly to stabilizing texture and boundary representations, which is crucial for distinguishing visually similar land surfaces such as black-soil areas and mattic epipedon. When both contrastive modules are jointly applied, the model attains the highest mIoU of 79.21% and 69.07%, yielding improvements of +1.46% and +1.64% over the baseline. These results verify the complementary and cooperative effects of low- and high-level contrastive learning in enhancing feature discrimination within the SBLS framework.

5. Discussion

Frankly speaking, although the proposed SBLS framework was specifically designed for detecting black-soil areas on the Qinghai–Tibetan Plateau, we believe it could also be applied to other types of remote sensing or ground imagery for object detection tasks. The reasons are twofold. First, SBLS is based on a common semi-supervised learning framework, namely a pseudo-label-driven teacher and student network, which provides a general foundation for its design. On top of this, SBLS incorporates two core modules, cross-branch pseudo supervision and dual-level contrastive learning, to enhance pseudo-label reliability and feature discrimination. In addition, the CutMix augmentation strategy facilitates simultaneous learning from both sparse and dense targets. These mechanisms make SBLS particularly suitable for handling the challenging black-soil detection task in complex highland terrain, while its performance is expected to remain stable in simpler or more generalized scenarios.
Second, experimental results demonstrate the strong generalization capability of SBLS. Evaluations on the QTP-BS dataset show that SBLS maintains high mIoU and pseudo-label stability even under low-label conditions. Comparative experiments with DSSN, UniMatch, CPS, and CCVC further highlight SBLS’s superior robustness in complex terrain, diverse target shapes, and varying scales. This indicates that SBLS is not only effective for the challenging black-soil detection task, but also has potential to generalize to other highland or complex terrain remote sensing tasks.
Nevertheless, SBLS has certain limitations. It is currently limited to visible-light imagery due to dataset constraints. Challenges remain for high-altitude regions with shadows or cloud cover, as well as for multi-modal data such as SAR and infrared imagery. Moreover, the dual-branch structure and contrastive learning mechanism increase computational complexity, which may hinder deployment in resource-constrained environments.
Future work could proceed along several directions. One is to explore multi-modal semi-supervised detection by incorporating SAR, infrared, or other types of imagery to improve performance in complex scenarios. Another is to extend SBLS to weakly supervised or unsupervised settings, reducing the reliance on labeled data. In addition, model lightweighting could facilitate practical deployment, and extending SBLS to multi-temporal data analysis may enable dynamic monitoring of grassland degradation, providing valuable support for highland ecological management. While the proposed framework is tailored to the QTP-BS dataset, its underlying principles are general and can be extended to other remote sensing domains by adjusting the augmentation configurations and network backbones. Future work will explore the scalability of our approach to multi-source datasets with varying spatial resolutions and imaging conditions.

6. Conclusions

We propose a novel semi-supervised framework (SBLS) for black-soil area detection on the QTP. The proposed method effectively utilizes both labeled and unlabeled remote sensing data by leveraging dual-level contrastive learning and cross-branch pseudo supervision. Specifically, we design a dual-branch mutual learning structure where two unshared branches extract complementary features from distinct augmented views and exchange supervision through a one-to-four consistency constraint. Additionally, we introduce dual-level contrastive losses in both image and feature spaces to enhance feature discrimination and improve the utilization of unlabeled data. Experimental results on the QTP-BS benchmark datasets demonstrate that SBLS achieves superior performance, particularly under low-label regimes, showing great potential for large-scale black-soil area mapping and ecological monitoring.

Author Contributions

Conceptualization, C.M. and X.M.; Methodology, Y.M., C.M. and X.M.; Software, Y.M., C.M., X.M. and Z.L.; Validation, Z.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Key R&D Program of China (2024YFF0729104) and Natural Science Foundation of Gansu Province (25JRRA495). The National Cryosphere Desert Data Center (China) provides computing platforms.

Data Availability Statement

All the data used in this study have been shared at https://drive.google.com/file/d/1x91CinTrJd08omRcuY4ZMm7XPn1yWFzZ/view, and the data availability is indefinite (accessed on 1 November 2025).

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Cao, J.; Adamowski, J.F.; Deo, R.C.; Xu, X.; Gong, Y.; Feng, Q. Grassland Degradation on the Qinghai-Tibetan Plateau: Reevaluation of Causative Factors. Rangel. Ecol. Manag. 2019, 72, 988–995. [Google Scholar] [CrossRef]
  2. Shang, Z.; Dong, Q.; Shi, J.; Zhou, H.; Dong, S.; Shao, X.; Li, S.; Wang, Y.; Ma, Y.; Ding, L.; et al. Research progress in recent ten years of ecological restoration for ‘Black Soil Land’ degraded grassland on Tibetan Plateau: Concurrently discuss of ecological restoration in Sangjiangyuan region. Acta Agrestia Sin. 2018, 26, 1. [Google Scholar]
  3. Daily, G.C. Restoring value to the world’s degraded lands. Science 1995, 269, 350–354. [Google Scholar] [CrossRef]
  4. Ma, X.; Lv, Z.; Ma, C.; Zhang, T.; Xin, Y.; Zhan, K. BS-Mamba for black-soil area detection on the Qinghai-Tibetan plateau. J. Appl. Remote Sens. 2025, 19, 028502. [Google Scholar] [CrossRef]
  5. Sohn, K.; Berthelot, D.; Carlini, N.; Zhang, Z.; Zhang, H.; Raffel, C.A.; Cubuk, E.D.; Kurakin, A.; Li, C.L. FixMatch: Simplifying semi-supervised learning with consistency and confidence. In Proceedings of the 34th Conference on Neural Information Processing Systems (NeurIPS 2020), Virtual, 6–12 December 2020; Volume 33, pp. 596–608. [Google Scholar]
  6. Yang, L.; Qi, L.; Feng, L.; Zhang, W.; Shi, Y. Revisiting weak-to-strong consistency in semi-supervised semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, Canada, 18–22 June 2023; pp. 7236–7246. [Google Scholar]
  7. Lu, X.; Jiao, L.; Li, L.; Liu, F.; Liu, X.; Yang, S.; Feng, Z.; Chen, P. Weak-to-strong consistency learning for semisupervised image segmentation. IEEE Trans. Geosci. Remote Sens. 2023, 61, 3272552. [Google Scholar] [CrossRef]
  8. Huang, W.; Shi, Y.; Xiong, Z.; Zhu, X.X. AdaptMatch: Adaptive matching for semisupervised binary segmentation of remote sensing images. IEEE Trans. Geosci. Remote Sens. 2023, 61, 3332490. [Google Scholar] [CrossRef]
  9. Sun, B.; Yang, Y.; Zhang, L.; Cheng, M.M.; Hou, Q. CorrMatch: Label Propagation via Correlation Matching for Semi-Supervised Semantic Segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 16–22 June 2024; pp. 3097–3107. [Google Scholar]
  10. Yun, S.; Han, D.; Oh, S.J.; Chun, S.; Choe, J.; Yoo, Y. CutMix: Regularization strategy to train strong classifiers with localizable features. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 6023–6032. [Google Scholar]
  11. Shang, Z.; Ma, Y.; Long, R.; Ding, L. Effect of fencing, artificial seeding and abandonment on vegetation composition and dynamics of ‘black soil land’ in the headwaters of the Yangtze and the Yellow rivers of the Qinghai-Tibetan plateau. Land Degrad. Dev. 2008, 19, 554–563. [Google Scholar] [CrossRef]
  12. Peláez-Vegas, A.; Mesejo, P.; Luengo, J. A survey on semi-supervised semantic segmentation. arXiv 2023, arXiv:2302.09899. [Google Scholar] [CrossRef]
  13. Lee, D.H. Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks. In Proceedings of the ICML 2013 Workshop: Challenges in Representation Learning (WREPL), Atlanta, GA, USA, 21 June 2013; Volume 3, p. 896. [Google Scholar]
  14. Cascante-Bonilla, P.; Tan, F.; Qi, Y.; Ordonez, V. Curriculum labeling: Revisiting pseudo-labeling for semi-supervised learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtual, 2–9 February 2021; Volume 35, pp. 6912–6920. [Google Scholar]
  15. Xie, Q.; Luong, M.T.; Hovy, E.; Le, Q.V. Self-training with noisy student improves imagenet classification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 14–19 June 2020; pp. 10687–10698. [Google Scholar]
  16. Sajjadi, M.; Javanmardi, M.; Tasdizen, T. Regularization with stochastic transformations and perturbations for deep semi-supervised learning. In Proceedings of the 29th Neural Information Processing Systems (NIPS), Barcelona, Spain, 5–11 December 2016; Volume 29, pp. 1171–1179. [Google Scholar]
  17. Miyato, T.; Maeda, S.I.; Koyama, M.; Ishii, S. Virtual adversarial training: A regularization method for supervised and semi-supervised learning. IEEE Trans. Pattern Anal. Mach. Intell. 2019, 41, 2858821. [Google Scholar] [CrossRef]
  18. Tarvainen, A.; Valpola, H. Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. In Proceedings of the Advances in Neural Information Processing Systems 30 (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017; Volume 30, pp. 1195–1204. [Google Scholar]
  19. Chen, X.; Yuan, Y.; Zeng, G.; Wang, J. Semi-supervised semantic segmentation with cross pseudo supervision. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 2613–2622. [Google Scholar]
  20. Wang, Z.; Zhao, Z.; Xing, X.; Xu, D.; Kong, X.; Zhou, L. Conflict-based cross-view consistency for semi-supervised semantic segmentation. In Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023; pp. 19585–19595. [Google Scholar]
  21. Liu, R.; Luo, T.; Huang, S.; Wu, Y.; Jiang, Z.; Zhang, H. CrossMatch: Cross-view matching for semi-supervised remote sensing image segmentation. IEEE Trans. Geosci. Remote Sens. 2024, 62, 3507050. [Google Scholar] [CrossRef]
  22. Tian, Z.; Zhang, X.; Zhang, P.; Zhan, K. Improving semi-supervised semantic segmentation with dual-level Siamese structure network. In Proceedings of the 31st ACM International Conference on Multimedia, Ottawa, ON, Canada, 29 October–3 November 2023; 2023; pp. 4200–4208. [Google Scholar]
  23. Guo, R.; Liu, J.; Li, N.; Liu, S.; Chen, F.; Cheng, B.; Duan, J.; Li, X.; Ma, C. Pixel-wise classification method for high resolution remote sensing imagery using deep neural networks. ISPRS Int. J. Geo-Inf. 2018, 7, 110. [Google Scholar] [CrossRef]
  24. Hossain, M.D.; Chen, D. Segmentation for Object-Based Image Analysis (OBIA): A review of algorithms and challenges from remote sensing perspective. ISPRS J. Photogramm. Remote Sens. 2019, 150, 115–134. [Google Scholar] [CrossRef]
  25. Zhang, S.; Li, C.; Qiu, S.; Gao, C.; Zhang, F.; Du, Z.; Liu, R. EMMCNN: An ETPS-based multi-scale and multi-feature method using CNN for high spatial resolution image land-cover classification. Remote Sens. 2019, 12, 66. [Google Scholar] [CrossRef]
  26. Temenos, A.; Temenos, N.; Kaselimi, M.; Doulamis, A.; Doulamis, N. Interpretable deep learning framework for land use and land cover classification in remote sensing using SHAP. IEEE Geosci. Remote Sens. Lett. 2023, 20, 8500105. [Google Scholar] [CrossRef]
  27. Pei, H.; Owari, T.; Tsuyuki, S.; Zhong, Y. Application of a novel multiscale global graph convolutional neural network to improve the accuracy of forest type classification using aerial photographs. Remote Sens. 2023, 15, 1001. [Google Scholar] [CrossRef]
  28. Ma, X.; Man, Q.; Yang, X.; Dong, P.; Yang, Z.; Wu, J.; Liu, C. Urban feature extraction within a complex urban area with an improved 3D-CNN using airborne hyperspectral data. Remote Sens. 2023, 15, 992. [Google Scholar] [CrossRef]
  29. Khan, S.D.; Basalamah, S. Multi-branch deep learning framework for land scene classification in satellite imagery. Remote Sens. 2023, 15, 3408. [Google Scholar] [CrossRef]
  30. Ansith, S.; Bini, A. Land use classification of high resolution remote sensing images using an encoder based modified GAN architecture. Displays 2022, 74, 102229. [Google Scholar] [CrossRef]
  31. Ma, A.; Yu, N.; Zheng, Z.; Zhong, Y.; Zhang, L. A supervised progressive growing generative adversarial network for remote sensing image scene classification. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5618818. [Google Scholar] [CrossRef]
  32. Xu, C.; Shu, J.; Zhu, G. Adversarial remote sensing scene classification based on lie group feature learning. Remote Sens. 2023, 15, 914. [Google Scholar] [CrossRef]
  33. Wang, J.; Liu, B.; Zhou, Y.; Zhao, J.; Xia, S.; Yang, Y.; Zhang, M.; Ming, L.M. Semisupervised multiscale generative adversarial network for semantic segmentation of remote sensing image. IEEE Geosci. Remote Sens. Lett. 2020, 19, 8003805. [Google Scholar] [CrossRef]
  34. Ma, A.; Filippi, A.M.; Wang, Z.; Yin, Z. Hyperspectral image classification using similarity measurements-based deep recurrent neural networks. Remote Sens. 2019, 11, 194. [Google Scholar] [CrossRef]
  35. Tang, Y.; Qiu, F.; Wang, B.; Wu, D.; Jing, L.; Sun, Z. A deep relearning method based on the recurrent neural network for land cover classification. GISci. Remote Sens. 2022, 59, 1344–1366. [Google Scholar] [CrossRef]
  36. Sohail, M.; Chen, Z.; Yang, B.; Liu, G. Multiscale spectral-spatial feature learning for hyperspectral image classification. Displays 2022, 74, 102278. [Google Scholar] [CrossRef]
  37. Ibanez, D.; Fernandez-Beltran, R.; Pla, F.; Yokoya, N. Masked auto-encoding spectral–spatial transformer for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5542614. [Google Scholar] [CrossRef]
  38. Chen, Y.; Jiao, L.; Li, Y.; Zhao, J. Multilayer projective dictionary pair learning and sparse autoencoder for PolSAR image classification. IEEE Trans. Geosci. Remote Sens. 2017, 55, 6683–6694. [Google Scholar] [CrossRef]
  39. Hjelm, R.D.; Fedorov, A.; Lavoie-Marchildon, S.; Grewal, K.; Bachman, P.; Trischler, A.; Bengio, Y. Learning deep representations by mutual information estimation and maximization. In Proceedings of the 7th International Conference on Learning Representations (ICLR 2019), New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
  40. Grill, J.B.; Strub, F.; Altché, F.; Tallec, C.; Richemond, P.; Buchatskaya, E.; Doersch, C.; Avila Pires, B.; Guo, Z.; Gheshlaghi Azar, M.; et al. Bootstrap your own latent-a new approach to self-supervised learning. NeurIPS 2020, 33, 21271–21284. [Google Scholar]
  41. Zhang, Y.; Wu, X.; Li, X.; Zhang, F.; Dong, X.; Wang, Y.; Zhang, H. Identification of Degraded Grassland in Qinghai Area of Yellow River Source Based on High-resolution Images. Acta Agric. Boreali-Occident. Sin. 2023, 32, 198–211. [Google Scholar]
  42. Shang, Z.; Long, R.; Ma, Y.; Ding, L. Spatial Heterogeneity and Similarity of Adult Plants and Seedlings in ‘Black Soil Land’ Secondary Weed Community, Qinghai-Tibetan Plateau. J. Plant Ecol. 2008, 32, 1157–1165. [Google Scholar]
  43. Guo, Z.G.; Zhou, X.R.; Hou, Y. Effect of available burrow densities of plateau pika (Ochotona curzoniae) on soil physicochemical property of the bare land and vegetation land in the Qinghai-Tibetan Plateau. Acta Ecol. Sin. 2012, 32, 104–110. [Google Scholar] [CrossRef]
Figure 1. Overview of the proposed SBLS framework. The model employs a dual-branch mutual learning structure, where each branch receives different augmented views of the input image, including low-level strong augmentation (lsag), high-level strong augmentation (hsag), and weak augmentation (wag). Each branch produces predictions supervised by cross-branch pseudo labels and one-hot ground truth when available. Dual-level contrastive consistency is applied at both feature and prediction levels to encourage agreement between branches while reducing the impact of noisy pseudo labels. The superscripts a or b denotes the branch, and the subcripts l o w and h i g h denote the level of augmentation. This design allows the model to learn richer features from limited labeled data and maintain stable performance under challenging conditions.
Figure 1. Overview of the proposed SBLS framework. The model employs a dual-branch mutual learning structure, where each branch receives different augmented views of the input image, including low-level strong augmentation (lsag), high-level strong augmentation (hsag), and weak augmentation (wag). Each branch produces predictions supervised by cross-branch pseudo labels and one-hot ground truth when available. Dual-level contrastive consistency is applied at both feature and prediction levels to encourage agreement between branches while reducing the impact of noisy pseudo labels. The superscripts a or b denotes the branch, and the subcripts l o w and h i g h denote the level of augmentation. This design allows the model to learn richer features from limited labeled data and maintain stable performance under challenging conditions.
Remotesensing 17 03977 g001
Figure 2. Cross-branch weak-to-strong pseudo supervision.
Figure 2. Cross-branch weak-to-strong pseudo supervision.
Remotesensing 17 03977 g002
Figure 3. Performance evaluation of SBLS backbones under the 1/4 split on the two test sets without pretraining: (a) the first test set; (b) the secondary test set, highlighting the suitability of BS-Mamba for black-soil detection.
Figure 3. Performance evaluation of SBLS backbones under the 1/4 split on the two test sets without pretraining: (a) the first test set; (b) the secondary test set, highlighting the suitability of BS-Mamba for black-soil detection.
Remotesensing 17 03977 g003
Figure 4. Visual comparison of SBLS with the fully supervised baseline and other advanced semi-supervised methods on the first test set with high patch coverage.
Figure 4. Visual comparison of SBLS with the fully supervised baseline and other advanced semi-supervised methods on the first test set with high patch coverage.
Remotesensing 17 03977 g004
Figure 5. Visual comparison of SBLS with the fully supervised baseline and other advanced semi-supervised methods on the first test set with low patch coverage.
Figure 5. Visual comparison of SBLS with the fully supervised baseline and other advanced semi-supervised methods on the first test set with low patch coverage.
Remotesensing 17 03977 g005
Figure 6. Visual comparison of SBLS with the fully supervised baseline and other advanced semi-supervised methods on a secondary test set with high patch coverage.
Figure 6. Visual comparison of SBLS with the fully supervised baseline and other advanced semi-supervised methods on a secondary test set with high patch coverage.
Remotesensing 17 03977 g006
Figure 7. Visual comparison of SBLS with the fully supervised baseline and other advanced semi-supervised methods on a secondary test set with low patch coverage.
Figure 7. Visual comparison of SBLS with the fully supervised baseline and other advanced semi-supervised methods on a secondary test set with low patch coverage.
Remotesensing 17 03977 g007
Figure 8. Visual comparison of SBLS with the fully-supervised baseline and other advanced semi-supervised methods on local regions of the first test set.
Figure 8. Visual comparison of SBLS with the fully-supervised baseline and other advanced semi-supervised methods on local regions of the first test set.
Remotesensing 17 03977 g008
Figure 9. Visual comparison of SBLS with the fully supervised baseline and other advanced semi-supervised methods on local regions of the secondary test set.
Figure 9. Visual comparison of SBLS with the fully supervised baseline and other advanced semi-supervised methods on local regions of the secondary test set.
Remotesensing 17 03977 g009
Figure 10. Ablation study comparing single-branch and cross-branch designs. (a) The first test set. (b) The secondary test set.
Figure 10. Ablation study comparing single-branch and cross-branch designs. (a) The first test set. (b) The secondary test set.
Remotesensing 17 03977 g010
Figure 11. Performance analysis under different pseudolabel confidence thresholds at 1/4 split on the two test sets. (a) The first test set. (b) The secondary test set.
Figure 11. Performance analysis under different pseudolabel confidence thresholds at 1/4 split on the two test sets. (a) The first test set. (b) The secondary test set.
Remotesensing 17 03977 g011
Figure 12. Ablation results of CutMix under different labeled data ratios on the two test sets. (a) The first test set. (b) The secondary test set. Bars marked as w/ CutMix represent models trained with CutMix augmentation, while those marked as w/o CutMix represent models trained without it.
Figure 12. Ablation results of CutMix under different labeled data ratios on the two test sets. (a) The first test set. (b) The secondary test set. Bars marked as w/ CutMix represent models trained with CutMix augmentation, while those marked as w/o CutMix represent models trained without it.
Remotesensing 17 03977 g012
Table 1. Results of different methods on the first test set in terms of mIoU (%) and Acc (%) under different data splits. The bold values indicate the best result.
Table 1. Results of different methods on the first test set in terms of mIoU (%) and Acc (%) under different data splits. The bold values indicate the best result.
Methods1/21/41/81/16
mIoUAccmIoUAccmIoUAccmIoUAcc
Onlysup80.6189.6677.5088.4275.8287.7175.7887.37
CPS [19]73.1786.5771.9285.8471.3485.3667.0783.17
CCVC [20]76.1387.3773.8586.5475.6087.8374.6586.06
DSSN [22]79.1788.7677.3087.7474.2286.9574.6886.58
UniMatch [6]80.1689.4877.4687.9076.3887.9676.8587.77
CorrMatch [9]79.9888.9977.3888.3274.4887.1876.9988.23
SBLS (ours)80.9789.7379.2188.6677.3087.3278.1387.92
Table 2. Results of different methods on a secondary test set in terms of mIoU (%) and Acc (%) under different data splits. The bold values indicate the best result.
Table 2. Results of different methods on a secondary test set in terms of mIoU (%) and Acc (%) under different data splits. The bold values indicate the best result.
Methods1/21/41/81/16
mIoUAccmIoUAccmIoUAccmIoUAcc
Onlysup64.2577.8563.8977.6663.0376.5058.7474.67
CPS [19]64.7077.5567.2979.6059.3373.6361.3775.11
CCVC [20]65.9278.7466.8380.9665.3278.2864.1678.68
DSSN [22]68.1981.6064.9379.8669.9382.1659.3975.85
UniMatch [6]70.8783.2365.3180.2365.5078.3967.3280.62
CorrMatch [9]70.2582.1666.3878.9263.4076.6567.0779.47
SBLS (ours)71.5083.9869.0781.9070.5283.1368.0381.55
Table 3. Ablation of low- and high-level contrastive learning under the 1/4 split on the two test sets. (a) The first test set. (b) The secondary test set.
Table 3. Ablation of low- and high-level contrastive learning under the 1/4 split on the two test sets. (a) The first test set. (b) The secondary test set.
(a) The first test set
L low L high mIoU
77.75
79.13
78.80
79.21
(b) The secondary test set
L low L high mIoU
67.43
67.98
68.82
69.07
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Min, Y.; Ma, C.; Ma, X.; Lv, Z. Semi-Supervised Black-Soil Area Detection on the Qinghai–Tibetan Plateau. Remote Sens. 2025, 17, 3977. https://doi.org/10.3390/rs17243977

AMA Style

Min Y, Ma C, Ma X, Lv Z. Semi-Supervised Black-Soil Area Detection on the Qinghai–Tibetan Plateau. Remote Sensing. 2025; 17(24):3977. https://doi.org/10.3390/rs17243977

Chicago/Turabian Style

Min, Yufang, Chengcai Ma, Xuan Ma, and Zewen Lv. 2025. "Semi-Supervised Black-Soil Area Detection on the Qinghai–Tibetan Plateau" Remote Sensing 17, no. 24: 3977. https://doi.org/10.3390/rs17243977

APA Style

Min, Y., Ma, C., Ma, X., & Lv, Z. (2025). Semi-Supervised Black-Soil Area Detection on the Qinghai–Tibetan Plateau. Remote Sensing, 17(24), 3977. https://doi.org/10.3390/rs17243977

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop