Spectral Prototype Attention Domain Adaptation for Hyperspectral Image Classification

Zhang, Weina; Hu, Runshan; Wang, Jierui; Zhang, Lanlan; Zhu, Chenyang

doi:10.3390/rs17233901

Open AccessArticle

Spectral Prototype Attention Domain Adaptation for Hyperspectral Image Classification

by

Weina Zhang

^1,†,

Runshan Hu

^2,3,†,

Jierui Wang

¹,

Lanlan Zhang

¹

and

Chenyang Zhu

^1,*

¹

School of Computer Science and Artificial Intelligence, Changzhou University, Changzhou 213164, China

²

Department of Electronics and Computer Science, University of Southampton, Southampton SO17 1BJ, UK

³

School of Computer Science and Engineering, Jimei University, Xiamen 361021, China

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Remote Sens. 2025, 17(23), 3901; https://doi.org/10.3390/rs17233901

Submission received: 25 October 2025 / Revised: 25 November 2025 / Accepted: 28 November 2025 / Published: 30 November 2025

Download

Browse Figures

Versions Notes

Highlights

What are the main findings?

SPADA integrates attention-guided spectral–spatial encoding with prototype-based alignment, achieving state-of-the-art accuracy and Kappa gains across Pavia, Houston, and Shanghai–Hangzhou benchmarks.
Active adaptation and confident refinement stabilize prototype updates, yielding consistent and low-variance improvements in domain adaptation.

What is the implication of the main finding?

Coupling attention and prototype alignment enhances cross-scene generalization with minimal target labels.
The approach offers a reliable path toward scalable, label-efficient hyperspectral image adaptation across varying sensors and conditions.

Abstract

Hyperspectral image (HSI) classification is often challenged by cross-scene domain shifts and limited target annotations. Existing approaches relying on class-agnostic moment matching or confidence-based pseudo-labeling tend to blur decision boundaries, propagate noise, and struggle with spectral overlap and class imbalance. We propose Spectral Prototype Attention Domain Adaptation (SPADA), a framework that integrates an attention-guided spectral–spatial backbone with dual prototype banks and distance-based posterior modeling. SPADA performs global and class-conditional alignment through source supervision, kernel-based distribution matching, and prototype coupling, followed by diversity-aware active adaptation and confidence-calibrated refinement via prior-adjusted self-training. Across multiple cross-scene benchmarks in urban and inter-city scenarios, SPADA consistently outperforms strong baselines in overall accuracy, average accuracy, and Cohen’s

κ

, achieving clear gains on classes affected by spectral overlap or imbalance and maintaining low variance across runs, demonstrating robust and stable domain transfer.

Keywords:

hyperspectral image classification; domain adaptation; prototype-based alignment

1. Introduction

Hyperspectral image (HSI) classification plays an essential role in land-cover mapping, crop and forest monitoring, and mineral exploration [1,2]. However, the reliable annotation of HSIs is often costly and limited to a few scenes. When the imaging condition, sensor type, or acquisition season changes, the data distribution also shifts. Consequently, a classifier trained on a labeled source scene often performs poorly on a new target scene without labels. This study focuses on domain adaptation for HSI classification, where unlabeled target samples are available along with a labeled source set.

In HSI classification, domain adaptation methods can be divided into three broad categories. Discrepancy minimization aligns source and target distributions by matching their statistical moments, typically through kernel mean embeddings or covariance alignment. Representative approaches include maximum mean discrepancy (MMD)-based feature alignment and CORrelation ALignment (CORAL)-based covariance matching [3,4]. Adversarial alignment introduces a domain discriminator to enforce domain-invariant representation learning, where the discriminator and feature extractor are optimized in an adversarial manner. Conditioning the discriminator on task predictions has been shown to reduce negative transfer and enhance feature discriminability [5,6]. Classifier-driven adaptation and self-training refine target-domain decision boundaries by exploiting pseudo-labels, disagreement signals, or margin-based selection to iteratively adjust classification boundaries [7,8]. In HSI classification, spectral–spatial networks such as RepSSRN and SpectralFormer [9,10] have further improved representation quality by jointly modeling spectral correlations and spatial structures. However, these strategies rarely integrate attention mechanisms that focus on informative spectral bands and spatial regions with prototype reliability for class-conditional alignment.

Several key limitations still restrict the effectiveness of prototype-based DA for HSIs. First, many alignment objectives operate at a global level, matching marginal distributions without explicitly protecting classwise separation. When classes share similar spectra, this may blur decision boundaries. Second, prototype-based or pseudo-label-driven schemes often rely on classifier confidence alone to update target prototypes. Confidence, however, is unstable under spectral distortions, sensor noise, and class imbalance, which frequently appear in HSI data. These approaches rarely incorporate the geometric structure of spectral clusters or use attention responses as indicators of prototype reliability. Third, spectral–spatial attention modules improve representation quality within modern HSI backbones, but they are rarely used to guide class-conditional prototype refinement. This disconnect limits the reliability of prototypes when the target domain exhibits strong spectral variability or contains underrepresented classes.

To tackle these issues, we introduce Spectral Prototype Attention Domain Adaptation (SPADA), a method designed to combine attention-guided spectral–spatial representations with classwise prototype modeling. SPADA is built around three coordinated modules. An attention-guided backbone extracts spectral–spatial embeddings where channel and spatial attention highlight informative spectral bands and regions. Two prototype banks store class centers for the source and target domains. For each feature, distance-based posteriors to class prototypes are computed and used to enforce class-conditional alignment together with distribution matching. A small labeled target subset initializes target prototypes, while a confidence-aware rule incorporates reliable unlabeled samples to update both prototype banks and the classifier. By coupling attention responses with prototype reliability and aligning classes through distance-based posteriors, SPADA improves cross-scene generalization under limited target supervision.

The main contributions of this work are summarized as follows:

An attention-guided spectral–spatial backbone that calibrates spectral and spatial responses before adaptation, stabilizing feature geometry across scenes.
Spectral prototype banks for both source and target domains that produce distance-based posteriors and enable class-conditional alignment by linking attention responses with prototype reliability.
A domain adaptation training pipeline that unifies source supervision, distribution matching, prototype coupling, and confidence-aware updates on a small labeled target subset, effectively reducing noise from incorrect pseudo labels and improving minority-class adaptation.

2. Related Work

2.1. Domain Adaptation

DA aims to transfer knowledge from a labeled source domain to an unlabeled target domain with differing data distributions. Theoretical analyses have shown that the target-domain risk can be upper-bounded by the source-domain risk and the divergence between source and target distributions [11]. This theoretical insight motivates the development of algorithms that either minimize explicit domain discrepancies or learn domain-invariant representations through adversarial training. Discrepancy-based methods focus on reducing the statistical distance between domains by aligning their feature distributions. Deep adaptation network (DAN) [3] employs the MMD criterion to align mean feature embeddings, while CORAL [4] matches second-order statistics to align feature covariances. More recently, the Domain Transformer (DoT) [12] incorporates domain-level attention and manifold regularization under a Wasserstein distance bound, improving cross-domain stability. Although effective, these methods often face challenges such as feature noise sensitivity and mode collapse under severe domain shifts [13].

Adversarial alignment techniques address these limitations through minimax optimization between a feature extractor and a domain discriminator. The Domain-Adversarial Training of Neural Networks (DANN) [5] formalizes this approach by jointly training the feature extractor to confuse the discriminator, thereby inducing domain invariance. The Conditional Domain Adversarial Network (CDAN) [6] extends this framework by conditioning the discriminator on task predictions to achieve class-aware feature alignment. To improve optimization smoothness and convergence stability, Smooth Domain Adversarial Training (SDAT) [14] introduces regularization with respect to task loss. Complementary to adversarial training, discrepancy-guided self-training methods refine feature alignment using classifier disagreement as an uncertainty measure. Maximum Classifier Discrepancy (MCD) [7] and Margin Disparity Discrepancy (MDD) [8] exploit prediction inconsistency across multiple classifiers to identify ambiguous regions and enhance decision boundary refinement.

Self-training and pseudo-labeling frameworks further enhance target-domain representation by leveraging model confidence and consistency constraints. Cross-Domain Transformer (CDTrans) [15] integrates triple-branch cross-attention with center-aware pseudo-labeling to strengthen inter-domain semantic consistency. Safe Self-Refinement for Transformer-based (SSRT) [16] domain adaptation introduces input perturbation and Kullback–Leibler divergence regularization to reduce confirmation bias during iterative refinement. Patch-Mix Transformer (PMTrans) [17] models unsupervised domain adaptation (UDA) as a three-player Nash game among a feature extractor, classifier, and PatchMix generator that synthesizes mixed-domain representations for more robust adaptation. The Transferable Vision Transformer (TVT) [18] embeds a transferability adaptation module within a Vision Transformer (ViT) backbone to simultaneously enhance feature discriminability and domain invariance. Beyond single-view adaptation, multiview-based UDA frameworks leverage complementary feature perspectives for improved robustness. Multiview Latent Space Learning (MLSL) [19] incrementally fine-tunes latent subspaces to ensure feature consistency, while Tensorial Multiview Low-Rank High-Order Graph Learning (MLRGL) [20] captures higher-order dependencies among multiple views for structure-preserving alignment. Consensus-Augmented Masking for Subspace Alignment (CAMSA) [21] enforces interview consensus constraints to achieve stable subspace adaptation. Collectively, these studies highlight the growing importance of integrating adversarial learning, pseudo-label refinement, and multiview representation learning to achieve stable, generalizable, and semantically consistent unsupervised domain adaptation.

2.2. Domain Adaptation for Hyperspectral Image Classification

For HSI classification, early UDA frameworks primarily employed convolutional neural networks (CNNs) with one-dimensional, two-dimensional, or three-dimensional kernels to capture spectral–spatial features. However, limited receptive fields hindered their generalization across scenes. Recent research focuses on hierarchical alignment, prototype- and memory-based learning, and generative adaptation to improve robustness against spectral distortion and class imbalance.

Hierarchical alignment frameworks such as multilevel unsupervised domain adaptation (MLUDA) [22] jointly align image-, feature-, and classifier-level representations through hierarchical consistency constraints. Masked Self-Distillation Domain Adaptation (MSDA) [23] reconstructs incomplete spectral–spatial information to enhance stability under domain shifts. Confident Learning-Based Domain Adaptation (CLDA) [24] employs reliability-aware pseudo-label filtering to mitigate noise and improve class separability.

Prototype- and memory-based adaptation methods have achieved fine-grained alignment between domains. The Prototype-Based Inter-Intra Domain Alignment Network (PIIDAN) [25] estimates class prototypes with uncertainty-aware pseudo-label filtering to prevent error propagation. Consistency-Aware Customized Learning (CACL) [26] combines adversarial and contrastive objectives within a prototype memory to enforce classwise and domain-level consistency. Soft Instance-Level Domain Adaptation (SoftInstance) [27] applies graph-based contrastive regularization at the instance level to enhance discriminative robustness under illumination variation. Prototype-Guided Class-Balanced Active Domain Adaptation (SPADA) [28] balances class distributions through prototype-guided active querying to alleviate sample bias. The Feature Consistency-Based Prototype Network (FCPN) [29] refines target features using supervised contrastive learning and prototype updates, while Supervised Contrastive Learning-Based Unsupervised Domain Adaptation (SCLUDA) [30] enhances cross-domain clustering through contrastive prototype refinement.

Generative and physically interpretable approaches further expand the adaptation spectrum. Unmixing-Based Domain Alignment (UBDA) [31] projects hyperspectral data into abundance space constrained by endmember composition and aligns distributions via metric learning. Class-Aligned and Class-Balancing Generative Domain Adaptation (CCGDA) [32] synthesizes balanced target-like samples using dual generators and capsule-based discriminators to reduce class imbalance. Causal Invariance Domain Adaptation (CIDA) [33] disentangles class-relevant and domain-specific features to eliminate spurious correlations. The Test-Time Adaptable Transformer for Hyperspectral Degradation (HyperTTA) [34] updates normalization parameters during inference to adapt to unseen target data without accessing the source domain.

Despite these advances, cross-scene HSI adaptation remains challenging due to spectral corruption, class imbalance, and domain-specific noise. Many methods rely heavily on pseudo labels, which can propagate errors under severe domain shifts. Future work may focus on uncertainty-aware adaptation, causal feature disentanglement, and leveraging vision–language priors to reduce dependence on labeled data. Continued improvements in dataset diversity and annotation quality are also essential for achieving reliable and transferable hyperspectral domain adaptation.

3. Methods

3.1. Framework Overview

We consider UDA for HSI classification, where the objective is to transfer discriminative knowledge from a labeled source domain to an unlabeled target domain that shares the same label space but exhibits distributional discrepancies. Formally, the source domain is denoted by

D_{S} = {(x_{i}^{S}, y_{i}^{S})}_{i = 1}^{N_{S}}

, where each hyperspectral patch

x_{i}^{S} \in R^{B \times H \times W}

contains B spectral bands and spatial dimensions

H \times W

, and

y_{i}^{S} \in {1, \dots, C}

represents its class label. The target domain is represented as

D_{T} = {x_{j}^{T}}_{j = 1}^{N_{T}}

, which is unlabeled during training. The encoder

F (\cdot; θ)

with parameters

θ

extracts a compact feature representation

f = F (x; θ) \in R^{D}

, and the classifier

g (\cdot; ϕ)

with parameters

ϕ

maps it to class logits

z = g (f; ϕ)

and probabilities

p = softmax (z)

.

As shown in Figure 1, the proposed framework integrates a Spectral–Spatial Attention Backbone (SSAB) with a Prototype-Coupled Domain Alignment (PCDA) mechanism and adopts a threephase training strategy composed of unsupervised adaptation, active adaptation, and confident refinement. SSAB serves as the feature extractor, capturing both spectral correlations and spatial context by employing two complementary branches. One emphasizing spectral dependencies and the other spatial structures. Their outputs are fused via a residual refinement and attention mechanism to yield domain-invariant spectral–spatial embeddings. These embeddings are subsequently fed into PCDA, which maintains source and target prototype banks representing classwise centroids in feature space. Each feature is linked to these prototypes through distance-based posteriors, enabling class-conditional alignment while preserving local decision boundaries. All features are

ℓ_{2}

-normalized, and prototype banks are updated each epoch to follow the evolving feature distribution during optimization.

The training process unfolds in three sequential phases. The first phase, unsupervised adaptation, constructs a high-confidence target subset through a three-way agreement among the nearest source prototype, the classifier prediction, and neighborhood consensus. The model is then optimized using a composite loss combining prototype-guided coupling, thereby aligning marginal and conditional distributions jointly.

The second phase, active adaptation, augments the framework with an uncertainty-driven querying mechanism. A unified score based on prediction entropy, top-two probability margin, and prototype disagreement identifies the most informative target samples. A k-center diversity criterion further ensures that selected queries are representative and diverse. The queried samples are pseudo-labeled or manually annotated and added to a labeled target pool, after which the model is updated in a manner using mixed batches from the source, unlabeled target, and labeled target sets.

The final phase, confident refinement, refines the decision boundaries by calibrating target posteriors with estimated class priors. A class-balanced confident subset is formed using quantile thresholds on confidence scores, and hard pseudo-labels are assigned. The network is then fine-tuned with a confidence-aware loss while maintaining alignment and prototype-update constraints from earlier phases. Through this progressive training pipeline, the encoder–classifier pair effectively reduces both global domain shifts and class-conditional misalignment, achieving reliable target generalization with minimal labeled supervision.

3.2. Spectral–Spatial Attention Backbone (SSAB)

The SSAB functions as the feature extractor of SPADA and is specifically designed to model both spectral correlations and spatial structures inherent in hyperspectral imagery. As shown in Figure 2, it consists of two parallel branches that separately capture spectral and spatial information, followed by an attention-guided fusion module that integrates them into a unified spectral–spatial representation. This structure allows the model to effectively balance fine-grained spectral variations with spatial coherence while maintaining a consistent representation dimension suitable for domain adaptation.

Let the hyperspectral input patch be denoted as

x \in R^{B \times H \times W}

, where B is the number of spectral bands, and H and W are the spatial dimensions. The spectral branch applies a series of three-dimensional convolutions with kernel size

(k_{b}, 1, 1)

to progressively remove spectral redundancy while preserving spatial context. The resulting tensor

S \in R^{D_{s} \times H \times W}

encodes inter-band dependencies and long-range spectral relationships. In parallel, the spatial branch performs spectral squeezing followed by two-dimensional convolutions with kernel size

(k_{s}, k_{s})

to model local continuity and spatial boundaries. This produces a spatial feature map

P \in R^{D_{p} \times H \times W}

that complements the spectral features.

The two branches are subsequently fused through channel-wise concatenation and refined via a residual transformation to stabilize joint learning, as given by Equation (1):

F = Res ([S, P]) \in R^{D \times H \times W}

(1)

where

D = D_{s} + D_{p}

,

[\cdot, \cdot]

denotes concatenation along the channel dimension, and

Res (\cdot)

represents a convolution–normalization–activation block. As presented in Equation (1), this residual refinement enables stable gradient propagation and the balanced integration of spectral and spatial cues by addressing potential disparities in their feature distributions.

To enhance feature selectivity, SSAB employs a dual-attention mechanism composed of channel and spatial attention modules. Channel attention adaptively recalibrates spectral responses by emphasizing channels with higher discriminative importance, while spatial attention identifies salient regions that contribute to class separation. The attention-weighted feature representation is expressed as Equation (2):

F_{att} = F ⊙ W_{c} ⊙ W_{s}

(2)

where ⊙ denotes element-wise multiplication, and

W_{c}

and

W_{s}

represent the channel and spatial attention weights, respectively. As indicated in Equation (2), these attention maps guide the network toward spectrally informative bands and spatially discriminative regions, improving representational robustness and focus.

Finally, a global aggregation function condenses the attended feature map into a compact embedding suitable for subsequent alignment, which is stated by Equation (3):

f = F (x; θ) = Ψ (F_{att})

(3)

where

Ψ (\cdot)

denotes global average pooling followed by dropout regularization. As shown in Equation (3), the resulting vector

f \in R^{D}

serves as a domain-stable representation, forming the foundation for prototype construction and domain alignment in later stages.

Through hierarchical spectral–spatial modeling and adaptive attention mechanisms, SSAB produces discriminative and transferable features that effectively capture intrinsic hyperspectral structures while mitigating domain discrepancies between source and target datasets.

3.3. Prototype-Coupled Domain Alignment (PCDA)

The PCDA module achieves class-conditional alignment by maintaining two prototype banks that capture the geometric structure of the source and target feature spaces. Each feature is linked to these banks during optimization, which allows PCDA to align distributions across domains while preserving class boundaries. Let

F (\cdot; θ)

denote the encoder parameterized by

θ

, and let

g (\cdot; ϕ)

denote the classifier parameterized by

ϕ

. Each hyperspectral patch is represented as a feature

f \in R^{D}

, where D is the feature dimension, and it is classified into one of C categories. For a labeled source sample

(x_{i}^{S}, y_{i}^{S})

, the encoded feature is

f_{i}^{S} = F (x_{i}^{S}; θ)

. For an unlabeled target sample

x_{j}^{T}

, we obtain

f_{j}^{T} = F (x_{j}^{T}; θ)

and the class probability vector

p_{j} = softmax (g (f_{j}^{T}; ϕ)) \in {[0, 1]}^{C}

, with

p_{j c}

representing the predicted probability of class c. To maintain consistent feature magnitudes across domains, all features are normalized as

\tilde{f} = f / {∥ f ∥}_{2}

in subsequent formulations.

Before constructing prototype banks, PCDA stabilizes the feature space using a preparatory batch objective composed of four complementary parts. The first term is the supervised source risk that enforces discriminability on labeled data and provides gradient stability during early training, as shown in Equation (4):

R_{S} (B_{S}) = \frac{1}{| B_{S} |} \sum_{(x, y) \in B_{S}} CE (g (F (x)), y)

(4)

where CE denotes cross-entropy. This term anchors class boundaries within the source domain and facilitates reliable prototype formation.

The second term mitigates global domain shifts by aligning first-order statistics in a reproducing kernel Hilbert space (RKHS). Given the normalized feature sets

{\tilde{F}}_{S} = {\tilde{f} (x) : (x, y) \in B_{S}}

and

{\tilde{F}}_{T} = {\tilde{f} (x) : x \in B_{T}}

, the squared maximum mean discrepancy (MMD) is defined in Equation (5):

{MMD}_{κ}^{2} (A, B) = {∥μ_{κ} (A) - μ_{κ} (B)∥}_{H_{κ}}^{2}, μ_{κ} (A) = \frac{1}{| A |} \sum_{a \in A} ϕ_{κ} (a)

(5)

where

ϕ_{κ}

is the feature mapping of a kernel

κ

(e.g., an RBF mixture), and

H_{κ}

is its induced RKHS. As shown in Equation (5), minimizing

{MMD}_{κ}^{2}

aligns the mean embeddings between domains and mitigates large global shifts.

The third term aligns the second-order statistics by matching covariance structures of the two domains, as defined in Equation (6):

Cov (\tilde{F}) = \frac{1}{| \tilde{F} | - 1} \sum_{\tilde{f} \in \tilde{F}} (\tilde{f} - \bar{f}) {(\tilde{f} - \bar{f})}^{⊤}, \bar{f} = \frac{1}{| \tilde{F} |} \sum_{\tilde{f} \in \tilde{F}} \tilde{f}

(6)

and the covariance discrepancy is regularized by a scale factor

1 / (4 D^{2})

to avoid dominance in high-dimensional spaces. This term, together with Equation (5), improves statistical consistency between domains by aligning both mean and spread.

The fourth term encourages low-entropy predictions on target data to refine decision boundaries and prevent class collapse. For a target batch

B_{T}

, the normalized entropy is given by Equation (7):

H_{T} (B_{T}) = \frac{1}{| B_{T} |} \sum_{x \in B_{T}} \frac{- \sum_{c = 1}^{C} p_{c} (x) log p_{c} (x)}{log C}, p (x) = softmax (g (F (x)))

(7)

which is bounded in

[0, 1]

for stable scaling. Integrating the four terms yields the preparatory objective as defined in Equation (8):

L_{align} (B_{S}, B_{T}) = R_{S} (B_{S}) + {MMD}_{κ}^{2} ({\tilde{F}}_{S}, {\tilde{F}}_{T}) + \frac{1}{4 D^{2}} {∥Cov ({\tilde{F}}_{S}) - Cov ({\tilde{F}}_{T})∥}_{F}^{2} + H_{T} (B_{T})

(8)

where

{∥ \cdot ∥}_{F}

is the Frobenius norm. As shown in Equation (8), this preparatory loss constructs a domain-balanced feature space where class separability, global shift reduction, and prediction confidence are jointly optimized, enabling stable prototype selection in subsequent stages.

After obtaining a stabilized space, the prototype banks are updated. For each class

c \in {1, \dots, C}

, the source prototype is computed as the

ℓ_{2}

-normalized class mean, as shown in Equation (9):

μ_{c}^{S} = \frac{\sum_{i : y_{i}^{S} = c} f_{i}^{S}}{{∥\sum_{i : y_{i}^{S} = c} f_{i}^{S}∥}_{2}} \in R^{D}

(9)

representing the centroid of class c in the source domain.

To construct the target prototypes, each target feature

f_{j}^{T}

is assigned a pseudo-label through a three-way agreement among prototype, k-nearest neighbors (k-NNs), and classifier predictions. Let

{\hat{y}}_{j}^{proto} = arg {min}_{c} {∥ {\tilde{f}}_{j}^{T} - μ_{c}^{S} ∥}_{2}

be the nearest source-prototype label,

{\hat{y}}_{j}^{pred} = arg {max}_{c} p_{j c}

be the classifier prediction, and

{\hat{y}}_{j}^{kNN - all}

be the k-NN consensus label that exists only when all k neighbors share the same class. The high-confidence set is defined in Equation (10):

\begin{matrix} H & = \{j | {\hat{y}}_{j}^{kNN - all} exists and {\hat{y}}_{j}^{pred} = {\hat{y}}_{j}^{proto} = {\hat{y}}_{j}^{kNN - all}\} \\ H_{c} & = \{j \in H | {\hat{y}}_{j}^{pred} = c\} \\ {\hat{B}}_{T} & = \{(x_{j}^{T}, {\hat{y}}_{j}^{pred}) : j \in H\} \end{matrix}

(10)

As described in Equation (10), this agreement criterion ensures that only samples with consistent pseudo labels are used to form reliable target prototypes, thereby avoiding error propagation.

The target prototype for each class c is then estimated in Equation (11):

μ_{c}^{T} = \frac{\sum_{j \in H_{c}} f_{j}^{T}}{{∥\sum_{j \in H_{c}} f_{j}^{T}∥}_{2}} \in R^{D}

(11)

When a small labeled target set is available, its samples are merged into the corresponding

H_{c}

before normalization. The source and target prototype banks,

{μ_{c}^{S}}

and

{μ_{c}^{T}}

, are periodically updated throughout training to reflect the evolving feature space.

To align the prototype geometry across domains, PCDA introduces a prototype-guided objective composed of three parts: (1) a source center pull, (2) a target center pull, and (3) a symmetric divergence term comparing the two posterior distributions derived from the prototype banks. For each feature, distance-based posteriors are computed using Equation (12):

\begin{matrix} π^{S} (\tilde{f}) & = \frac{[exp (- {∥ \tilde{f} - μ_{1}^{S} ∥}_{2}^{2}), \dots, exp (- {∥ \tilde{f} - μ_{C}^{S} ∥}_{2}^{2})]}{\sum_{r = 1}^{C} exp (- {∥ \tilde{f} - μ_{r}^{S} ∥}_{2}^{2})} \\ π^{T} (\tilde{f}) & = \frac{[exp (- {∥ \tilde{f} - μ_{1}^{T} ∥}_{2}^{2}), \dots, exp (- {∥ \tilde{f} - μ_{C}^{T} ∥}_{2}^{2})]}{\sum_{r = 1}^{C} exp (- {∥ \tilde{f} - μ_{r}^{T} ∥}_{2}^{2})} \end{matrix}

(12)

Equation (12) models each feature’s probability distribution over classes based on prototype proximity, where smaller distances yield higher posterior values.

Finally, the overall prototype-guided loss integrates these three components, as described by Equation (13):

\begin{matrix} L_{proto} & = \frac{1}{| B_{S} |} \sum_{(x, y) \in B_{S}} {∥ \tilde{f} (x) - μ_{y}^{S} ∥}_{2}^{2} + \frac{1}{| {\hat{B}}_{T} |} \sum_{(x, \hat{y}) \in {\hat{B}}_{T}} {∥ \tilde{f} (x) - μ_{\hat{y}}^{T} ∥}_{2}^{2} \\ + \frac{1}{2 | B_{S} \cup B_{T} |} \sum_{x \in B_{S} \cup B_{T}} [KL (π^{S} (\tilde{f} (x)) ∥ π^{T} (\tilde{f} (x))) + KL (π^{T} (\tilde{f} (x)) ∥ π^{S} (\tilde{f} (x)))] \end{matrix}

(13)

As shown in Equation (13), the first two terms attract source and target features toward their respective prototypes, while the symmetric Kullback–Leibler divergence enforces cross-domain consistency between the posterior distributions. During each training step, the model minimizes

L_{align}

in Equation (8) to stabilize the shared space, updates prototypes using Equations (9) and (11), computes posteriors from Equation (12), and optimizes

L_{proto}

from Equation (13). The combined loss therefore enforces domain alignment at both the global and class-conditional levels, ensuring stable and consistent adaptation across heterogeneous hyperspectral domains.

3.4. Training Pipeline

Training proceeds in three phases that share a common alignment backbone and a prototype-guided coupling mechanism. Phase I establishes a stable spectral–spatial geometry before any target labels are introduced. At the beginning of each epoch, we compute features for all source and target samples, construct the high-confidence target set

{\hat{B}}_{T}

using the three-way agreement rule in Equation (10), and recompute the source and target prototype banks via Equations (9) and (11) using the full, non-shuffled loaders. Within the epoch, each iteration draws paired mini-batches

(B_{S}, B_{T})

and updates the encoder and classifier by minimizing the unsupervised adaptation objective, as demonstrated in Equation (14):

L_{UDA} = L_{align} + λ_{proto} L_{proto}

(14)

where

L_{align}

combines the source cross-entropy in Equation (4), the MMD term in Equation (5), the covariance alignment induced by Equation (6), and the normalized target entropy in Equation (7), while

L_{proto}

is given in Equation (13). The prototype banks are maintained at the epoch level and can be refreshed online from mini-batch statistics so that Equation (13) is always evaluated on the current

{\hat{B}}_{T}

, which keeps the coupling aligned with the evolving target feature clusters.

Phase II introduces disagreement-aware querying while preserving the same alignment backbone. At the sampling epochs, all unlabeled target indices are scored by the uncertainty- and disagreement-aware function in Equation (15):

s_{j} = α H_{j} + β (1 - M_{j}) + γ δ_{j}

(15)

where

H_{j} = - \sum_{c} p_{j c} log p_{j c} / log C

is the normalized entropy,

M_{j} = p_{j (1)} - p_{j (2)}

is the top-two margin with

p_{j (1)} \geq p_{j (2)}

, and

δ_{j} = I [{\hat{y}}_{j}^{pred} \neq {\hat{y}}_{j}^{near}]

indicates prototype disagreement using

{\hat{y}}_{j}^{near} = arg {min}_{c} {∥ {\tilde{f}}_{j}^{T} - μ_{c}^{T} ∥}_{2}

. A k-center selection on cosine distance diversifies the top pool, and

η

samples are queried and added to the labeled target set. Within the same epoch, training then proceeds on mini-batches

(B_{S}, B_{T}, B_{LBT})

with the objective formulated in Equation (16):

L_{ADA} = L_{UDA} + \frac{1}{| B_{LBT} |} \sum_{(x, y) \in B_{LBT}} CE (g (F (x)), y)

(16)

which preserves the alignment terms from Phase I and adds supervised loss on the queried targets. Here,

B_{LBT}

is the mini-batch drawn from the labeled target set built by querying.

Phase III activates confidence-calibrated self-training. Class priors on the target are estimated and used to calibrate the posteriors in Equation (17):

{\tilde{p}}_{j c} = \frac{p_{j c} / {\hat{π}}_{c}}{\sum_{r = 1}^{C} p_{j r} / {\hat{π}}_{r}}

(17)

where

{\hat{π}}_{c} = \frac{1}{N} \sum_{j} I [arg {max}_{r} p_{j r} = c]

is the empirical prior for class c. Classwise quantile thresholds then define the confident set and the hard pseudo-labels, as shown in Equation (18):

C_{T} = ⋃_{c = 1}^{C} \{j : arg max_{r} {\tilde{p}}_{j r} = c, {\tilde{p}}_{j c} \geq τ_{c}\}, {\tilde{y}}_{j} = arg max_{c} {\tilde{p}}_{j c}

(18)

where the hyperparameter q determines each

τ_{c}

as the classwise q-quantile of

{\tilde{p}}_{j c}

over unlabeled targets. Training alternates the Phase II objective in Equation (16) with confidence-balanced fine-tuning, as defined in Equation (19), which is added to form the Phase III objective defined in Equation (20):

L_{conf} = \frac{\sum_{j \in C_{T}} CE (g (F (x_{j}^{T})), {\tilde{y}}_{j})}{| C_{T} |}

(19)

L_{FT} = L_{ADA} + L_{conf}

(20)

Algorithm 1 summarizes the three-phase training pipeline of the SPADA framework. At the beginning of each epoch, encoder

F (\cdot; θ)

is applied to the full, non-shuffled source and target loaders to compute SSAB features, construct the high-confidence target set

{\hat{B}}_{T}

via Equation (10), and recompute the source and target prototype banks using Equations (9) and (11); all prototype and posterior terms operate on

ℓ_{2}

-normalized features, and the resulting banks are kept fixed within that epoch.

Algorithm 1 Training Pipeline with Unsupervised Adaptation, Active Adaptation, and Confident Refinement

Require: Source set

D_{S}

, target set

D_{T}

; encoder

F (\cdot; θ)

, classifier

g (\cdot; ϕ)

Require: Phase breakpoints

T_{1} < T_{2}

; sampling epoch set

E_{act}

Ensure: Trained parameters

(θ, ϕ)

1:: Initialize $θ, ϕ$ ; labeled target set $D_{LBT} \leftarrow \emptyset$ ; annotated index set $I \leftarrow \emptyset$
2:: for epoch $e = 1, \dots, T_{max}$ do
3:: Build source loader $L_{S}^{full}$ and unlabeled target loader $L_{T}^{full}$
4:: Compute SSAB features by Equation (3) (using Equations (1) and (2))
5:: Compute source prototypes $P_{S} (e)$ by Equation (9) from $L_{S}^{full}$
6:: Build high-confidence target set ${\hat{B}}_{T} (e)$ by Equation (10) from $L_{T}^{full}$
7:: Compute target prototypes $P_{T} (e)$ by Equation (11) from ${\hat{B}}_{T} (e)$
8:: if $e \leq T_{1}$ then ▹ Phase I: Unsupervised adaptation
9:: for each mini-batch $(B_{S}, B_{T})$ do
10:: Compute $L_{align}$ by Equation (8)
11:: Compute posteriors by Equation (12) and $L_{proto}$ by Equation (13) using $(P_{S} (e), P_{T} (e))$
12:: Form $L_{UDA}$ by Equation (14) and update $(θ, ϕ)$
13:: end for
14:: else if $T_{1} < e \leq T_{2}$ then ▹ Phase II: Active adaptation
15:: if $e \in E_{act}$ then
16:: Compute target prototypes $P_{T}^{sel} (e)$ by Equation (11) using ${\hat{B}}_{T} (e)$ and $D_{LBT}$
17:: Score all unlabeled targets by Equation (15) with $P_{T}^{sel} (e)$
18:: Select $η$ targets by k-center diversity on cosine distance and query labels
19:: Update $D_{LBT}$ and index set $I$ ; update unlabeled pool $D_{T} ∖ D_{LBT}$
20:: end if
21:: for each mini-batch $(B_{S}, B_{T}, B_{LBT})$ do
22:: Compute $L_{align}$ and $L_{proto}$ with $(P_{S} (e), P_{T} (e))$
23:: Form $L_{ADA}$ by Equation (16) and update $(θ, ϕ)$
24:: end for
25:: else ▹ Phase III: Confident refinement
26:: Calibrate target posteriors by Equation (17)
27:: Build confident set $C_{T} (e)$ and hard labels ${\tilde{y}}_{j}$ by Equation (18)
28:: for each mini-batch $(B_{S}, B_{T}, B_{LBT})$ do
29:: Compute $L_{align}$ and $L_{proto}$ as in Phase II
30:: Update $(θ, ϕ)$ using $L_{ADA}$
31:: end for
32:: for each mini-batch $B_{conf} \subset C_{T} (e)$ do
33:: Compute $L_{conf}$ by Equation (19) with labels ${\tilde{y}}_{j}$
34:: Form $L_{FT}$ by Equation (20) and update $(θ, ϕ)$
35:: end for
36:: end if
37:: end for
38:: return $(θ, ϕ)$

In Phase I (Unsupervised Adaptation), the method trains on the mini-batches

(B_{S}, B_{T})

of source samples and unlabeled targets, using

L_{align}

and

L_{proto}

to define the unsupervised adaptation objective

L_{UDA}

in Equation (14).

In Phase II (Active Adaptation), a subset of epochs is designated as sampling epochs; at each of these epochs, the target prototype bank is recomputed for selection, all unlabeled targets are scored by Equation (15), k-center selection on cosine distance is applied, and

η

queried samples are added to the labeled buffer

D_{LBT}

. The model then optimizes the active adaptation loss

L_{ADA}

in Equation (16) on mixed mini-batches

(B_{S}, B_{T}, B_{LBT})

that combine source, unlabeled target, and labeled target data.

In Phase III (Confident Refinement), the calibrated posteriors from Equation (17) and the quantile rule in Equation (18) are used to build a confident target set

C_{T}

with hard labels

{\tilde{y}}_{j}

. This phase first reuses the Phase II objective

L_{ADA}

on

(B_{S}, B_{T}, B_{LBT})

and then performs confident fine-tuning on mini-batches from

C_{T}

by minimizing

L_{conf}

in Equation (19) and the Phase III objective

L_{FT}

in Equation (20).

4. Results

4.1. Experiment Settings

A. Dataset Description

To comprehensively assess the performance of the proposed unsupervised domain adaptation framework for HSI classification, three publicly available benchmark datasets are employed: Pavia, Houston, and Shanghai–Hangzhou. Each dataset represents distinct acquisition conditions, spatial resolutions, and spectral characteristics, enabling a systematic evaluation of cross-domain generalization ability. The following subsections provide detailed descriptions of each dataset, including their spatial–spectral properties, domain configurations, and selected class distributions.

(1) Pavia Dataset: The Pavia dataset comprises two subsets, the University of Pavia (UP) and the Pavia Center (PC), which serve as the source and target domains, respectively [35]. Both subsets were captured by the Reflective Optics System Imaging Spectrometer (ROSIS) sensor over Pavia, Italy. The UP image contains

610 \times 610

pixels with 103 spectral bands, while the PC image has

1096 \times 1096

pixels with 102 bands. After discarding noisy and invalid samples, the valid region of UP is

610 \times 315

pixels and that of PC is

1096 \times 715

pixels. To maintain consistent dimensionality across domains, the last band of UP is removed, yielding 102 aligned spectral bands covering the range of

0.430

–

0.834 μ m

. Figure 3 presents the false-color composites and corresponding ground-truth maps for both subsets.

Seven common land-cover classes shared between the two domains are used for evaluation, including vegetation, built-up, and bare surface types. The sample distribution for each class is listed in Table 1.

(2) Houston Dataset: The Houston dataset contains two temporally separated scenes of the University of Houston campus, captured with different sensors and acquisition years, making it suitable for temporal domain adaptation [36,37]. Houston 2013 serves as the source domain, and Houston 2018 acts as the target domain. The Houston 2013 image comprises

349 \times 1905

pixels with 144 spectral bands and a spatial resolution of 2.5 m, while Houston 2018 consists of

209 \times 955

pixels with 48 spectral bands at a spatial resolution of 1 m. Both datasets share a spectral range of 380–1050 nm. For consistent spatial correspondence, a

209 \times 955

subset from Houston 2013 spatially overlapping with Houston 2018 is extracted and resampled to match the 48 spectral bands. Figure 4 shows the false-color composites and reference maps for both domains.

For cross-domain classification, seven shared land-cover categories are used, encompassing vegetation, buildings, and roads. The sample counts for each class are summarized in Table 2.

(3) Shanghai–Hangzhou Dataset: The Shanghai–Hangzhou dataset consists of two scenes acquired by the EO-1 Hyperion sensor over urban and peri-urban regions in eastern China, forming a geographically distinct cross-city adaptation benchmark [38]. The Shanghai scene is used as the source domain, and the Hangzhou scene serves as the target domain. Each image originally contains 198 spectral bands, which are reduced after preprocessing by removing noisy and water-absorption bands while retaining all valid spectral information. The Shanghai image has dimensions of

1600 \times 230

pixels, and the Hangzhou image has

590 \times 230

pixels. Figure 5 displays the false-color composites and corresponding ground-truth maps for both datasets.

Three shared categories are considered for cross-domain classification: water, land/building, and vegetation. The sample distribution for each class is reported in Table 3.

Overall, these three datasets collectively capture diverse spectral–spatial characteristics, environmental conditions, and sensor modalities. Their inclusion ensures a robust evaluation of the proposed framework’s domain adaptation capability across spatial, temporal, and geographic shifts.

B. Experiment Settings

All experiments were conducted on a workstation equipped with an x86_64 CPU and a single NVIDIA GeForce RTX 4090 Ti GPU (12 GB) operating under CUDA 12.1. The encoder employed the SSAB-based DCRN backbone with an intermediate feature dimension of 288 and a dropout rate of 0.1, while the classifier consisted of a two-layer MLP equipped with batch normalization and a dropout rate of 0.2. Both modules were optimized using stochastic gradient descent (SGD) with a learning rate of

10^{- 3}

, momentum of

0.9

, and weight decay of

5 \times 10^{- 4}

. Training was performed for 100 epochs using a mini-batch size of 32.

Active querying was performed at predefined sampling epochs, where each round selected

η = 15

target samples based on the score defined in Equation (15). The scoring function combines three criteria with fixed weights of 0.6, 0.4, and 0.5. A k-center selection strategy based on cosine distance was then applied to ensure diversity among queried samples. The confident self-training stage used a class-balanced controller with a retention ratio of

ρ = 0.5

and a confidence weight of

γ = 0.03

to prevent class imbalance during pseudo-labeling.

For the Pavia dataset, hyperspectral patches contained

B = 102

spectral bands within a spatial window of

9 \times 9

(half-width 4), and training was conducted on seven land-cover classes. The same optimization settings and training protocol were applied to the Houston and Shanghai–Hangzhou datasets, using their respective spectral dimensions and class definitions. Each experiment was repeated ten times with different random seeds, and the reported results include the mean and standard deviation across all runs.

The model’s performance was assessed using three standard metrics: Overall Accuracy (OA), Average Accuracy (AA), and Kappa

κ

. Let

N = {[n_{i j}]}_{i, j = 1}^{C}

denote the confusion matrix with C classes, where

n_{i j}

is the number of samples from ground-truth class i predicted as class j,

n_{i +} = \sum_{j = 1}^{C} n_{i j}

,

n_{+ i} = \sum_{j = 1}^{C} n_{j i}

, and

N = \sum_{i = 1}^{C} \sum_{j = 1}^{C} n_{i j}

. The three metrics are defined as Equations (21)–(23):

OA = \frac{1}{N} \sum_{i = 1}^{C} n_{i i}

(21)

AA = \frac{1}{C} \sum_{i = 1}^{C} \frac{n_{i i}}{n_{i +}}

(22)

κ = \frac{OA - p_{e}}{1 - p_{e}}, p_{e} = \frac{1}{N^{2}} \sum_{i = 1}^{C} n_{i +} n_{+ i}

(23)

where

OA

measures the overall classification correctness,

AA

represents the mean per-class accuracy, and

κ

quantifies agreement beyond chance. These metrics are widely adopted in hyperspectral image classification and provide a comprehensive evaluation of both global and class-specific model performance.

4.2. Comparison with SOTA

Table 4 shows the experimental results of different methods on the Pavia task.The proposed SPADA method achieves an OA of 97.89 ± 0.68, AA of 97.73 ± 0.84, and Kappa of 97.46 ± 0.82, surpassing the strongest baseline CLDA, which obtains 92.64 ± 1.12 in OA and 91.13 ± 1.35 in Kappa. The method also exceeds SCLUDA in AA by +4.73 (93.00 ± 1.11). In terms of error-rate reduction, OA improves from 7.36% to 2.11%, corresponding to a 71.3% reduction. The OA margin is approximately four times the pooled standard deviation, indicating a strong standardized effect. On the class level, SPADA markedly improves class 4 from 85.32 to 97.28 (+11.96) and class 7 from 96.95 to 98.53 (+1.58), while maintaining near-perfect accuracy for class 5 (99.99). Slight performance declines are observed for classes 1, 3, and 6, with a maximum drop of 1.76 points. These results are consistent with the design rationale of attention-weighted spectral–spatial encoding, which reduces band confusion before adaptation, and distance-based posteriors that guide features toward class prototypes. The largest gains appear in classes exhibiting pronounced cross-scene spectral drift (class 4) or clutter (class 7), where global moment matching alone fails and class-conditional coupling becomes crucial for improved Kappa.

Across the five comparative panels shown in Figure 6, SPADA yields the highest visual consistency with the ground truth. The predicted maps exhibit strong spatial coherence, with homogeneous filling across large regions and the effective suppression of isolated artifacts and background noise. Boundary delineations are thin, continuous, and well aligned with the true contours, while corner structures are sharply preserved without the feathering observed in several baselines. In contrast, CLDA and MSDA show noticeable label bleeding across boundaries and partial edge erosion, leading to blurred or elongated contours in textured zones. SCLUDA generates scattered misclassifications and fragmented regions, particularly in cluttered areas, whereas MLUDA tends to oversmooth fine structures, breaking slender or elongated segments into discontinuous pieces. SPADA effectively preserves compact small regions and narrow linear features, maintaining clear separation between adjacent materials and demonstrating reduced spectral confusion after adaptation. The few remaining artifacts produced by SPADA are sparse and largely confined to minute or highly textured areas, where cross-scene spectral drift introduces ambiguity in decision boundaries.

On the Houston dataset shown in Table 5, SPADA achieves an OA of 84.24 ± 1.51, AA of 80.12 ± 3.38, and Kappa of 74.32 ± 2.53, outperforming the best baseline MSDA by +3.21 in OA, +2.46 in AA, and +4.07 in Kappa. The OA error rate is reduced from 18.97% to 15.76%, a 16.9% improvement, with an OA margin of approximately 1.46 pooled standard deviations. The method achieves top accuracy for classes 2 (90.97, +1.60), 3 (79.13, +10.35), 6 (87.24, +5.55), and 7 (81.20, +2.53), demonstrating the effectiveness of attention-guided feature selection and prototype-based alignment in stabilizing class separation. However, performance lags in classes 1, 4, and 5 by 28.29, 10.91, and 9.86 points, respectively, suggesting that the single-prototype assumption may be insufficient for classes with multi-modal spectra or spatial fragmentation. Qualitatively, misclassifications on Class 1 often occur at mixed or transitional pixels along the interfaces between healthy grass, stressed grass, and road margins, where the spectral signatures are affected by shadows and soil exposure. In such regions, the prototype-guided attraction in Equation (13) and the global alignment in Equation (8) can pull ambiguous spectra toward the more abundant stressed-grass or road prototypes, causing healthy grass to be under-represented. For Class 4, errors concentrate on thin water bodies and small ponds near bridges and building shadows, where water pixels exhibit spectral mixing with adjacent built-up areas; these pixels are easily absorbed into urban classes when the prototypes for water are weakly defined. For Class 5, failures appear near the boundaries between residential and non-residential buildings, where roofs with similar materials but different surrounding context are grouped together; the single centroid used for residential buildings in the prototype banks does not fully capture the diversity of roof types and local neighborhood patterns. In these situations, the symmetric Kullback–Leibler divergence in Equation (13) can amplify early biases by driving posteriors toward dense, but slightly mismatched, prototype regions.

A major driver of these limitations is the severe class imbalance between source and target domains. Class 4 has only 22 labeled target pixels, and Class 1 has far fewer samples than the dominant urban classes. In the early stages of training, the high-confidence target set constructed by Equation (10) and the queried targets selected by Equation (15) are dominated by Classes 2, 6, and 7, because they occupy large areas and produce high-confidence predictions. As a result, the initial target prototypes for Classes 1, 4, and 5 are estimated from very few examples and are highly sensitive to noise and local context. The prototype banks for these rare or heterogeneous classes are therefore less stable, and small shifts in the feature space can move their centroids toward nearby dense clusters of stressed grass, non-residential buildings, or roads. Even though posterior calibration and quantile-based thresholding in Equations (17) and (18) improve the overall AA by boosting recall for minority classes, they may still down-weight extremely rare classes when their confidence distributions remain dominated by ambiguous boundary pixels.

On the Houston dataset shown in Figure 7, SPADA produces classification maps that accurately capture the rectilinear urban structure, aligning closely with the underlying spatial layout. Building footprints remain compact with well-defined right angles, while long road segments retain consistent single-lane widths without exhibiting edge swelling. Intersections are continuous and coherent, avoiding the fragmentation observed in baseline methods. In comparison, CLDA tends to widen boundaries and occasionally propagates labels across major corridors. MSDA introduces granular noise within otherwise homogeneous regions, and SCLUDA produces scattered mislabeled patches that disrupt block regularity and break thin connectors. MLUDA over-smooths the scene, merging adjacent parcels and erasing narrow strips. By contrast, SPADA effectively suppresses patch-like noise along dense block arrangements, maintains precise alignment of parallel edges over extended distances, and distinctly separates neighboring polygons with minimal label bleeding. These results demonstrate SPADA’s superior capability in mitigating spectral mixing and preserving fine structural details near class boundaries.

As shown in Table 6, for the Shanghai–Hangzhou dataset, SPADA attains an OA of 94.93 ± 1.88, AA of 94.14 ± 2.09, and Kappa of 90.91 ± 3.44, outperforming MSDA by +2.16, +1.38, and +3.65, respectively. The OA error rate decreases from 7.23% to 5.07%, yielding a 29.9% reduction and an OA margin of 0.86 pooled standard deviations. The most notable improvement occurs in class 2 (98.56 vs. 95.04, +3.52), suggesting that attention-guided band selection enhances spectral discrimination, while prototype coupling reduces class overlap. Minor drops in classes 1 and 3 (–2.97 and –1.50) indicate residual sensitivity to intra-class variability, implying that multi-centroid modeling could further alleviate confusion. The variance remains similar to baselines for OA and slightly higher for AA and Kappa, which is expected given the smaller number of categories and stronger influence of per-class fluctuations.

On the Shanghai–Hangzhou dataset shown in Figure 8, CLDA exhibits boundary leakage along riverbanks and introduces scattered noise within agricultural regions. MSDA mitigates some leakage but leaves residual salt-and-pepper artifacts within dense parcel clusters. SCLUDA produces numerous pointwise misclassifications and small breaks along narrow paths, while MLUDA excessively smooths boundaries, causing merged patches and rounded shoreline corners. SPADA effectively suppresses both high-frequency speckle and over-smoothing artifacts, maintaining clear separations between adjacent land-cover types and preventing color bleeding at class interfaces. These observations confirm SPADA’s strong capacity for spatial consistency preservation and precise adaptation under complex cross-scene spectral variations.

Across all three cross-scene benchmarks, SPADA consistently improves OA by +2.16–+5.25, AA by +1.38–+4.73, and Kappa by +3.65–+6.33 compared with recent strong methods. The consistent gains stem from two complementary mechanisms: (1) attention-weighted spectral–spatial representations that reduce inter-band ambiguity before adaptation and (2) prototype-coupled distance-based posteriors that enable fine-grained class-conditional alignment and sharpen decision boundaries, which is reflected in the consistently higher Kappa. The remaining weaknesses in a few Houston classes highlight the limitations of single-centroid modeling under multi-modal distributions. Extending SPADA with mixture prototypes, region-aware priors, and adaptive margins would be a promising direction for further improvement. Overall, the results confirm that integrating attention-guided feature selection with prototype-based domain alignment yields robust and transferable hyperspectral representations for cross-scene HSI adaptation.

5. Discussion

5.1. Ablation Study

The contribution of each component in the proposed framework is systematically analyzed using the Pavia University → Pavia Center cross-scene HSI UDA benchmark. All results are averaged over ten independent runs, with mean ± standard deviation values of OA, AA, and Cohen’s

κ

reported in Table 7.

Phase I jointly optimizes the alignment loss

L_{align}

in Equation (8) and the prototype-guided loss

L_{proto}

in Equation (13) using unlabeled target samples. Phases II and III introduce active adaptation, driven by the query-based selection strategy defined in Equation (15), and confident refinement, based on posterior calibration and quantile selection, as formulated in Equations (17) and (18), respectively.

When only

L_{align}

is applied, the framework achieves an OA of 96.82 ± 0.72, AA of 97.22 ± 0.75, and

κ

of 96.17 ± 0.87. Incorporating only

L_{proto}

further improves the performance, reaching an OA of 97.47 ± 1.06, AA of 97.57 ± 0.94, and

κ

of 96.96 ± 1.28. This shows that class-conditional feature structuring guided by Equation (12) enhances discriminability even in the absence of labeled target samples. However, directly combining

L_{align}

and

L_{proto}

within Phase I without any refinement leads to a severe decline in performance, yielding an OA of 90.12 ± 2.29, AA of 89.83 ± 1.32, and

κ

of 88.17 ± 2.68. In this early single-phase setting, the prototype banks are built only from high-confidence pseudo-labels extracted by Equation (10), which are still noisy and biased toward dominant classes. The global alignment term in Equation (8) pushes source and target means and covariances together, while the symmetric Kullback–Leibler divergence in Equation (13) forces posteriors toward class prototypes that may already be misaligned. The combination of noisy prototypes and a symmetric divergence that increases the gradient magnitude when distributions disagree produces a positive feedback loop: early mistakes in pseudo-labels distort prototypes, which then attract more features to incorrect centers and further amplify the error over epochs, leading to unstable optimization dynamics and the observed performance drop.

Introducing Phase II on top of the Phase I objective restores the performance of OA to 97.28 ± 0.86, AA to 97.12 ± 0.86, and

κ

to 96.71 ± 1.03. The queried labels obtained via Equation (15) provide sparse but reliable supervision that reduces prototype noise and stabilizes the posterior feature distribution in Equation (12) so that the interaction between Equations (8) and (13) becomes consistent instead of conflicting. When only Phase III is applied (without Phase II), the framework attains the highest AA, 97.81 ± 0.57, along with an OA of 97.79 ± 0.85 and

κ

of 97.33 ± 1.02. The posterior calibration in Equation (17) and the adaptive classwise confidence thresholding in Equation (18) increase the reliability of pseudo-labels, especially for minority classes, which leads to a larger gain in AA than in OA by improving recall on under-represented categories. When Phases II and III are jointly activated, the framework achieves the best overall results, with an OA of 97.89 ± 0.68, AA of 97.73 ± 0.84, and

κ

of 97.46 ± 0.82, and it also exhibits the smallest OA variance. These results indicate that the integration of queried supervision and calibrated pseudo-labels regularizes the prototype memory and suppresses the error amplification that appears when Equations (8) and (13) are used together without any correction.

These behaviors explain why training is organized as a strict three-phase sequence rather than a loosely coupled set of independent modules. Phase I serves as a noise–aware pre-alignment stage: It learns global spectral–spatial alignment and a preliminary prototype structure, but its prototypes are intentionally treated as tentative because they are derived only from high-confidence predictions. Phase II is then introduced to inject a small but trusted set of target labels that anchor the prototype banks and break the feedback loop caused by early pseudo-label errors; the queried samples guide the update of Equation (12) so that prototypes move toward true class centers instead of drifting toward biased regions. Phase III is activated only after the prototypes and approximate class priors have been stabilized by Phases I–II so that calibrated pseudo-labels in Equations (17) and (18) densify supervision without reinforcing systematic errors. Table 7 shows that skipping these refinement phases while still applying joint loss leads to the largest degradation, whereas progressively adding active adaptation and confident refinement recovers and surpasses the performance of single-loss baselines, which provides empirical evidence that the ordered three-phase pipeline is needed to control prototype instability and error amplification.

In summary, the global alignment formulated in Equation (8) provides a robust baseline, and the prototype-coupled alignment defined in Equation (13) becomes the primary mechanism for class-conditional transfer once reliable prototypes are established. The synergistic integration of active adaptation and confident refinement enables the framework to fully exploit prototype-guided alignment by reducing label noise and correcting class-prior biases, which is particularly beneficial for HSI UDA under limited annotation conditions.

5.2. Sensitivity Analysis

To evaluate the influence of the weighting parameters in Equation (15), a sensitivity analysis is conducted by varying one coefficient at a time while keeping the remaining two fixed at their default values. This analysis investigates how the entropy term

H_{j}

(weight

α

), the margin term

M_{j}

(weight

β

), and the prototype disagreement term

δ_{j}

(weight

γ

) affect the selection of target samples and the resulting classification accuracy in the Pavia University → Pavia Center domain adaptation task.The results are shown in Figure 9.

For the entropy weight

α

, its performance exhibits a non-monotonic pattern. The configuration

α = 0.2

provides the highest performance, achieving an OA of 97.96 ± 0.62, AA of 97.98 ± 0.45, and

κ

of 97.55 ± 0.74. Increasing

α

beyond this point leads to reduced accuracy:

α = 0.4

results in an OA of 97.51 ± 0.99, and

α = 0.6

further declines to an OA of 97.39 ± 0.70, followed by partial recovery at

α = 0.8

with an OA of 97.71 ± 0.60. These results indicate that an excessively large entropy weight tends to overemphasize ambiguous samples with uncertain pseudo-labels, introducing noise into the queried set. Conversely, a smaller entropy weight maintains a favorable balance between exploration and stability, prioritizing informative yet reliable samples.

For the margin weight

β

, a moderate value yields the most consistent improvement. Setting

β = 0.6

achieves an OA of 97.51 ± 0.99, AA of 97.45 ± 0.84, and

κ

of 97.00 ± 1.18, outperforming both

β = 0.2

(OA: 97.09 ± 1.06) and

β = 0.8

(OA: 97.28 ± 0.66). This observation suggests that a balanced weighting of margin-based uncertainty facilitates decision boundary refinement without introducing excessive bias toward uncertain or outlier samples, thereby maintaining stable class separation.

The effect of the disagreement weight

γ

is comparatively minor. Across

γ \in {0.4, 0.6, 0.8}

, performance remains largely unchanged, with an OA of 97.39 ± 0.70, AA of 97.37 ± 0.52, and

κ

of 96.85 ± 0.84, while

γ = 0.2

leads to a negligible increase in OA: 97.41 ± 0.73. This limited variation indicates that the prototype memory is well-regularized, and the classifier rarely exhibits persistent conflicts with its nearest prototype associations. Consequently,

γ

primarily contributes to secondary stabilization rather than exerting a strong influence on sample selection.

Overall, the scoring function in Equation (15) demonstrates robust performance across a wide parameter range, with OA varying within approximately 0.9% across all configurations. The optimal configuration for the Pavia University → Pavia Center adaptation is identified as

α = 0.2

,

β = 0.6

, and

γ \in [0.2, 0.4]

, which jointly balance the exploitation of confident and informative samples with the controlled exploration of boundary regions, ensuring both stability and adaptability in the active querying process.

5.3. Visualization of Model Attention

As shown in Figure 10, the proposed SPADA yields a markedly different attention pattern from CLDA, MSDA, SCLUDA, and MLUDA. For CLDA and MSDA, the highlighted responses appear as numerous isolated patches that spread over both foreground and background, including building roofs, water surfaces, and open areas, which indicates that these baselines still respond strongly to local textures and domain-specific clutter rather than to stable semantic structures. SCLUDA and MLUDA partially enhance region-level consistency, but their heatmaps remain noisy with high responses, forming scattered blocks that frequently spill across class boundaries and mix roads, buildings, and backgrounds in the same attended regions. In contrast, SPADA combines channel attention and spatial attention on domain-aligned spectral prototypes to first reweight spectral channels toward discriminative bands and then modulate spatial locations according to semantic structure. The resulting heatmap is sparse yet coherent. Strong responses concentrate along building contours, road networks, and river boundaries, while homogeneous regions such as large roof interiors and water areas are largely suppressed. This pattern indicates that SPADA focuses on physically meaningful spectral–spatial cues that are stable across domains, reduces attention to irrelevant background, and provides more reliable and interpretable guidance for target predictions than CLDA, MSDA, SCLUDA, and MLUDA.

6. Conclusions and Future Work

This work introduced SPADA, a DA framework for HSI domain adaptation that integrates SSAB with PCDA under a structured three-phase training pipeline. The proposed approach effectively mitigates global distribution shifts and enhances classwise separability by jointly exploiting spectral–spatial representations and prototype-based alignment. The experimental results on multiple cross-scene benchmarks demonstrate consistent improvements in OA, AA, and Cohen’s

κ

, confirming that the integration of queried supervision and calibrated pseudo-labels substantially stabilizes prototype dynamics and strengthens cross-domain transferability.

However, SPADA improves global metrics and structural fidelity but remains vulnerable on rare or spectrally ambiguous classes, where the single-prototype assumption and class imbalance jointly limit the stability of prototype-based alignment. This motivates future work on class-aware query budgets, reweighted prototype updates, and mixture or region-conditional prototypes for Houston-like urban HSI scenes.

Author Contributions

Conceptualization, W.Z. and C.Z.; methodology, W.Z.; software, W.Z., J.W. and L.Z.; validation, R.H., L.Z. and C.Z.; formal analysis, J.W. and C.Z.; investigation, W.Z.; data curation, R.H.; writing—original draft preparation, W.Z. and R.H.; writing—review and editing, C.Z.; visualization, J.W. and L.Z.; supervision, C.Z.; project administration, C.Z.; funding acquisition, C.Z. and L.Z. All authors have read and agreed to the published version of this manuscript.

Funding

This work was supported by the CNPC Innovation Fund (No. 2024DQ02-0501); Intelligent Manufacturing Longcheng Laboratory under Grant CJ20254004; Royal Society (IEC_NSFC_233444); Postgraduate Research and Practice Innovation Project of Jiangsu Province (No. KYCX25_3385); and Youth Science and Technology Talent Promotion Project of Jiangsu Province (JSTJ-2025-137).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The datasets employed in this study are publicly accessible. The two subsets of the Pavia dataset (University of Pavia and Pavia Centre) were obtained from the UPV/EHU Computer Vision Group’s Hyperspectral Remote Sensing Scenes archive (https://www.ehu.eus/ccwintco/index.php/Hyperspectral_Remote_Sensing_Scenes (accessed on 30 October 2025)). The Houston 2013 and Houston 2018 hyperspectral imagery were accessed via the University of Houston Hyperspectral Image Analysis Lab’s IEEE GRSS Data Fusion Contest pages (2013 contest: https://machinelearning.ee.uh.edu/2013-ieee-grss-data-fusion-contest/ (accessed on 30 October 2025); 2018 challenge: https://machinelearning.ee.uh.edu/2018-ieee-grss-data-fusion-challenge-fusion-of-multispectral-lidar-and-hyperspectral-data/ (accessed on 30 October 2025)). The Shanghai–Hangzhou dataset (EO-1 Hyperion scenes) is described in Li et al. [38] “A Two-Stage Deep Domain Adaptation Method for Hyperspectral Image Classification” (Remote Sensing, 12(7):1054; https://www.mdpi.com/2072-4292/12/7/1054 (accessed on 30 October 2025)). No new proprietary datasets were generated in this study.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

HSI	Hyperspectral Image
SPADA	Spectral Prototype Attention Domain Adaptation
MMD	Maximum Mean Discrepancy
CORAL	CORrelation ALignment
DA	Domain Adaptation
DAN	Deep Adaptation Network
DoT	Domain Transformer
DANN	Domain-Adversarial Training of Neural Network
CDAN	Conditional Domain Adversarial Network
SDAT	Smooth Domain Adversarial Training
MCD	Maximum Classifier Discrepancy
MDD	Margin Disparity Discrepancy
CDTrans	Cross-Domain Transformer
SSRT	Safe Self-Refinement for Transformer-Based Domain Adaptation
PMTrans	Patch-Mix Transformer
UDA	Unsupervised Domain Adaptation
TVT	Transferable Vision Transformer
ViT	Vision Transformer
MLSL	Multiview Latent Space Learning
MLRGL	Tensorial Multiview Low-Rank High-Order Graph Learning
CAMSA	Consensus Augmented Masking for Subspace Alignment
CNNs	Convolutional Neural Networks
MLUDA	Multilevel Unsupervised Domain Adaptation
MSDA	Masked Self-Distillation Domain Adaptation
CLDA	Confident Learning-Based Domain Adaptation
PIIDAN	Prototype-Based Inter–Intra Domain Alignment Network
CACL	Consistency-Aware Customized Learning
SoftInstance	Soft Instance-Level Domain Adaptation
FCPN	Feature Consistency-Based Prototype Network
SCLUDA	Supervised Contrastive Learning-Based Unsupervised Domain Adaptation
UBDA	Unmixing-Based Domain Alignment
CCGDA	Class-Aligned and Class-Balancing Generative Domain Adaptation
CIDA	Causal Invariance Domain Adaptation
HyperTTA	Test-Time Adaptable Transformer for Hyperspectral Degradation
SSAB	Spectral–Spatial Attention Backbone
PCDA	Prototype-Coupled Domain Alignment
UP	University of Pavia
PC	Pavia Center
ROSIS	Reflective Optics System Imaging Spectrometer
SGD	Stochastic Gradient Descent
OA	Overall Accuracy
AA	Average Accuracy
MDPI	Multidisciplinary Digital Publishing Institute
DOAJ	Directory of Open Access Journals
TLA	Three-Letter Acronym
LD	Linear Dichroism

References

Zhang, W.T.; Bai, Y.; Zheng, S.D.; Cui, J.; Huang, Z.Z. Tensor Transformer for hyperspectral image classification. Pattern Recognit. 2025, 163, 111470. [Google Scholar] [CrossRef]
Wu, X.; Arshad, T.; Peng, B. Spectral spatial window attention transformer for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2025, 63, 1–13. [Google Scholar]
Long, M.; Cao, Y.; Wang, J.; Jordan, M. Learning Transferable Features with Deep Adaptation Networks. In Proceedings of the International Conference on Machine Learning PMLR, Lille, France, 7–9 July 2015; pp. 97–105. [Google Scholar]
Sun, B.; Saenko, K. Deep CORAL: Correlation Alignment for Deep Domain Adaptation. In Computer Vision–ECCV 2016 Workshops; Springer International Publishing: Berlin/Heidelberg, Germany, 2016; pp. 443–450. [Google Scholar] [CrossRef]
Ganin, Y.; Ustinova, E.; Ajakan, H.; Germain, P.; Larochelle, H.; Laviolette, F.; Marchand, M.; Lempitsky, V. Domain-Adversarial Training of Neural Networks. In Domain Adaptation in Computer Vision Applications; Springer International Publishing: Berlin/Heidelberg, Germany, 2017; Volume 17, pp. 189–209. [Google Scholar] [CrossRef]
Long, M.; Zhu, H.; Wang, J.; Jordan, M.I. Conditional adversarial network for unsupervised domain adaptation. In Proceedings of the Advances in Neural Information Processing Systems, Montréal, QC, Canada, 3–8 December 2018; pp. 1647–1657. [Google Scholar]
Saito, K.; Watanabe, K.; Ushiku, Y.; Harada, T. Maximum Classifier Discrepancy for Unsupervised Domain Adaptation. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 3723–3732. [Google Scholar] [CrossRef]
Zhang, W.; Tang, Y.; Chen, Y.; Zou, J.; Wang, J. Bridging theory and algorithm for domain adaptation. In Proceedings of the International Conference on Machine Learning, Long Beach, CA, USA, 9–15 June 2019; pp. 7364–7373. [Google Scholar]
Wu, Y.; Zhou, T.; Hu, X.; Shi, L.; Yang, W. RepSSRN: The Structural Reparameterization Applied to SSRN for Hyperspectral Image Classification. IEEE Geosci. Remote Sens. Lett. 2022, 19, 1–5. [Google Scholar] [CrossRef]
Hong, D.; Han, Z.; Yao, J.; Gao, L.; Zhang, B.; Plaza, A.; Chanussot, J. SpectralFormer: Rethinking Hyperspectral Image Classification with Transformers. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–15. [Google Scholar] [CrossRef]
Ben-David, S.; Blitzer, J.; Crammer, K.; Kulesza, A.; Pereira, F.; Vaughan, J.W. A theory of learning from different domains. Mach. Learn. 2009, 79, 151–175. [Google Scholar] [CrossRef]
Ren, C.X.; Zhai, Y.; Luo, Y.W.; Yan, H. Towards Unsupervised Domain Adaptation via Domain-Transformer. Int. J. Comput. Vis. 2024, 132, 6163–6183. [Google Scholar] [CrossRef]
Zhu, C.; Zhu, H.; Zhang, L.; Wang, F.; Zhu, Z. Statistically-aligned feature augmentation for robust unsupervised domain adaptation in industrial fault diagnosis. J. Intell. Manuf. 2025, 1–17. [Google Scholar] [CrossRef]
Rangwani, H.; Aithal, S.K.; Mishra, M.; Jain, A.; Babu, R.V. A closer look at smoothness in domain adversarial training. In Proceedings of the International Conference on Machine Learning, Baltimore, MD, USA, 17–23 July 2022; pp. 18378–18399. [Google Scholar]
Xu, T.; Chen, W.; Wang, P.; Wang, F.; Li, H.; Jin, R. CDTrans: Cross-domain transformer for unsupervised domain adaptation. In Proceedings of the International Conference on Learning Representations, Online, 25–29 April 2022. [Google Scholar]
Sun, T.; Lu, C.; Zhang, T.; Ling, H. Safe Self-Refinement for Transformer-based Domain Adaptation. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 7181–7190. [Google Scholar] [CrossRef]
Zhu, J.; Bai, H.; Wang, L. Patch-Mix Transformer for Unsupervised Domain Adaptation: A Game Perspective. In Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 18–22 June 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 3561–3571. [Google Scholar] [CrossRef]
Yang, J.; Liu, J.; Xu, N.; Huang, J. TVT: Transferable Vision Transformer for Unsupervised Domain Adaptation. In Proceedings of the 2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA, 2–7 January 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 520–530. [Google Scholar] [CrossRef]
Zhu, C.; Wang, Q.; Xie, Y.; Xu, S. Multiview latent space learning with progressively fine-tuned deep features for unsupervised domain adaptation. Inf. Sci. 2024, 662, 120223. [Google Scholar] [CrossRef]
Zhu, C.; Zhang, L.; Luo, W.; Jiang, G.; Wang, Q. Tensorial multiview low-rank high-order graph learning for context-enhanced domain adaptation. Neural Netw. 2025, 181, 106859. [Google Scholar] [CrossRef]
Zhu, C.; Luo, W.; Xie, Y.; Fu, L. Multiview unsupervised domain adaptation through consensus augmented masking for subspace alignment. Appl. Intell. 2025, 55, 946. [Google Scholar] [CrossRef]
Cai, M.; Xi, B.; Li, J.; Feng, S.; Li, Y.; Li, Z.; Chanussot, J. Mind the Gap: Multilevel Unsupervised Domain Adaptation for Cross-Scene Hyperspectral Image Classification. IEEE Trans. Geosci. Remote Sens. 2024, 62, 1–14. [Google Scholar] [CrossRef]
Fang, Z.; He, W.; Li, Z.; Du, Q.; Chen, Q. Masked Self-Distillation Domain Adaptation for Hyperspectral Image Classification. IEEE Trans. Geosci. Remote Sens. 2024, 62, 1–20. [Google Scholar] [CrossRef]
Fang, Z.; Yang, Y.; Li, Z.; Li, W.; Chen, Y.; Ma, L.; Du, Q. Confident Learning-Based Domain Adaptation for Hyperspectral Image Classification. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–16. [Google Scholar] [CrossRef]
Xie, Z.; Duan, P.; Liu, W.; Kang, X.; Li, S. Prototype-based Inter-Intra Domain Alignment Network for Unsupervised Cross-Scene Hyperspectral Image Classification. In Proceedings of the IGARSS 2024—2024 IEEE International Geoscience and Remote Sensing Symposium, Athens, Greece, 7–12 July 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 7795–7798. [Google Scholar] [CrossRef]
Ding, K.; Lu, T.; Fang, Y.; Li, S. Consistency-Aware Customized Learning for Cross-Scene Hyperspectral Image Classification. In Proceedings of the IGARSS 2024—2024 IEEE International Geoscience and Remote Sensing Symposium, Athens, Greece, 7–12 July 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 9080–9083. [Google Scholar] [CrossRef]
Cheng, Y.; Chen, Y.; Kong, Y.; Wang, X. Soft Instance-Level Domain Adaptation with Virtual Classifier for Unsupervised Hyperspectral Image Classification. IEEE Trans. Geosci. Remote Sens. 2023, 61, 1–13. [Google Scholar] [CrossRef]
Luo, H.; Zhong, S.; Gong, C. Prototype-Guided Class-Balanced Active Domain Adaptation for Hyperspectral Image Classification. IEEE Trans. Geosci. Remote Sens. 2025, 63, 1–16. [Google Scholar] [CrossRef]
Xie, Z.; Duan, P.; Liu, W.; Kang, X.; Wei, X.; Li, S. Feature Consistency-Based Prototype Network for Open-Set Hyperspectral Image Classification. IEEE Trans. Neural Netw. Learn. Syst. 2024, 35, 9286–9296. [Google Scholar] [CrossRef] [PubMed]
Li, Z.; Xu, Q.; Ma, L.; Fang, Z.; Wang, Y.; He, W.; Du, Q. Supervised Contrastive Learning-Based Unsupervised Domain Adaptation for Hyperspectral Image Classification. IEEE Trans. Geosci. Remote Sens. 2023, 61, 1–17. [Google Scholar] [CrossRef]
Baghbaderani, R.K.; Qu, Y.; Qi, H. Unsupervised Hyperspectral Image Domain Adaptation through Unmixing-Based Domain Alignment. In Proceedings of the IGARSS 2023—2023 IEEE International Geoscience and Remote Sensing Symposium, Pasadena, CA, USA, 16–21 July 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 5906–5909. [Google Scholar] [CrossRef]
Feng, J.; Zhou, Z.; Shang, R.; Wu, J.; Zhang, T.; Zhang, X.; Jiao, L. Class-Aligned and Class-Balancing Generative Domain Adaptation for Hyperspectral Image Classification. IEEE Trans. Geosci. Remote Sens. 2024, 62, 1–17. [Google Scholar] [CrossRef]
Wang, B.; Xu, Y.; Wu, Z.; Wei, Z.; Chanussot, J. Unsupervised Domain Adaptation for Hyperspectral Image Classification via Causal Invariance. In Proceedings of the IGARSS 2024–2024 IEEE International Geoscience and Remote Sensing Symposium, Athens, Greece, 7–12 July 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 1522–1525. [Google Scholar] [CrossRef]
Yue, X.; Liu, A.; Chen, N.; Huang, C.; Liu, H.; Huang, Z.; Fang, L. HyperTTA: Test-Time Adaptation for Hyperspectral Image Classification under Distribution Shifts. arXiv 2025, arXiv:2509.08436. [Google Scholar]
UPV/EHU Computer Vision Group. Hyperspectral Remote Sensing Scenes. Available online: https://www.ehu.eus/ccwintco/index.php/Hyperspectral_Remote_Sensing_Scenes (accessed on 24 October 2025).
University of Houston, Hyperspectral Image Analysis Lab. 2013 IEEE GRSS Data Fusion Contest. Hyperspectral (144 Bands, 2.5 m) and LiDAR over the University of Houston. 2013. Available online: https://machinelearning.ee.uh.edu/2013-ieee-grss-data-fusion-contest/ (accessed on 24 October 2025).
University of Houston, Hyperspectral Image Analysis Lab. 2018 IEEE GRSS Data Fusion Challenge: Fusion of Multispectral LiDAR and Hyperspectral Data. Hyperspectral (48 Bands, 1 m), Multispectral LiDAR, and RGB. 2018. Available online: https://machinelearning.ee.uh.edu/2018-ieee-grss-data-fusion-challenge-fusion-of-multispectral-lidar-and-hyperspectral-data/ (accessed on 24 October 2025).
Li, Z.; Tang, X.; Li, W.; Wang, C.; Liu, C.; He, J. A two-stage deep domain adaptation method for hyperspectral image classification. Remote Sens. 2020, 12, 1054. [Google Scholar] [CrossRef]

Figure 1. Overall architecture of the proposed SPADA framework. The model integrates a Spectral–Spatial Attention Backbone (SSAB) with a Prototype-Coupled Domain Alignment (PCDA) module and employs a three-phase training pipeline.

Figure 2. Architecture of the Spectral–Spatial Attention Backbone (SSAB).

Figure 3. False-color composite images and corresponding ground-truth maps for the Pavia University (source) and Pavia Center (target) hyperspectral datasets. (a) False-color image of Pavia University. (b) Ground-truth map of Pavia University. (c) False-color image of Pavia Center. (d) Ground-truth map of Pavia Center.

Figure 4. False-color composite images and corresponding ground-truth maps for the Houston 2013 (source) and Houston 2018 (target) hyperspectral datasets. (a) False-color image of Houston 2013. (b) False-color image of Houston 2018. (c) Ground-truth map of Houston 2013. (d) Ground-truth map of Houston 2018.

Figure 5. False-color composite images and corresponding ground-truth maps for the Shanghai (source) and Hangzhou (target) hyperspectral datasets. (a) False-color image of Shanghai. (b) False-color image of Hangzhou. (c) Ground-truth map of Shanghai. (d) Ground-truth map of Hangzhou.

Figure 6. Visual comparison of classification maps generated by different domain adaptation methods for the Pavia Dataset.

Figure 7. Visual comparison of classification maps generated by different domain adaptation methods for the Houston dataset.

Figure 8. Visual comparison of classification maps generated by different domain adaptation methods for the Shanghai–Hangzhou dataset.

Figure 9. Sensitivity analysis of hyperparameters

α

,

β

, and

γ

on the Pavia University → Pavia Center cross-scene DA task.

Figure 9. Sensitivity analysis of hyperparameters

α

,

β

, and

γ

on the Pavia University → Pavia Center cross-scene DA task.

Figure 10. Comparison of attention heatmaps on the target scene for (a) CLDA, (b) MSDA, (c) SCLUDA, (d) MLUDA, and (e) SPADA of the Pavia dataset. Warmer colors indicate higher attention responses.

Table 1. Number of samples per class in the Pavia University (source) and Pavia Center (target) datasets.The Total row is highlighted for emphasis.

No.	Class Name	Pavia University (Source)	Pavia Center (Target)
1	Trees	3064	7598
2	Asphalt	6631	9248
3	Bricks	3682	2685
4	Bitumen	1330	7287
5	Shadows	947	2863
6	Meadows	18,649	3090
7	Bare Soil	5029	6584
Total		39,332	39,355

Table 2. Number of samples per class in the Houston 2013 (source) and Houston 2018 (target) datasets.

No.	Class Name	Houston 2013 (Source)	Houston 2018 (Target)
1	Grass healthy	345	1353
2	Grass stressed	365	4888
3	Trees	365	2766
4	Water	285	22
5	Residential buildings	319	5347
6	Non-residential buildings	408	32,459
7	Road	443	6365
Total		2530	53,200

Table 3. Number of samples per class in the Shanghai (source) and Hangzhou (target) datasets.

No.	Class Name	Shanghai (Source)	Hangzhou (Target)
1	Water	123,123	18,043
2	Land/Building	161,689	77,450
3	Plant	83,188	40,207
Total		368,000	135,700

Table 4. Classification results (mean ± standard deviation) on the Pavia task. Bold values indicate the best performance in each row.

Class No.	Methods
Class No.	CLDA [24]	MSDA [23]	SCLUDA [30]	MLUDA [22]	SPADA
1	97.20	93.44	95.63	92.00	97.02
2	99.90	98.49	97.28	93.15	99.26
3	80.63	79.92	93.39	98.31	96.55
4	83.73	83.87	84.66	85.32	97.28
5	99.98	99.84	99.01	99.97	99.99
6	91.14	86.47	95.10	97.74	96.75
7	91.26	96.95	85.93	77.22	98.53
OA	92.64 ± 1.12	92.44 ± 0.98	92.42 ± 0.82	89.96 ± 0.93	97.89 ± 0.68
AA	91.45 ± 1.38	91.28 ± 1.85	93.00 ± 1.11	91.96 ± 0.68	97.73 ± 0.84
Kappa	91.13 ± 1.35	90.90 ± 1.19	90.92 ± 0.97	88.11 ± 1.10	97.46 ± 0.82

Table 5. Classification results (mean ± standard deviation) on the Houston task. Bold values indicate the best performance in each row.

Class No.	Methods
Class No.	CLDA [24]	MSDA [23]	SCLUDA [30]	MLUDA [22]	SPADA
1	61.86	59.65	88.10	65.01	59.81
2	89.37	88.97	67.84	77.32	90.97
3	68.78	67.61	63.83	58.60	79.13
4	93.18	80.91	90.00	94.09	83.18
5	94.02	92.53	85.38	86.58	84.16
6	57.93	81.69	80.37	78.08	87.24
7	78.67	72.27	55.20	46.19	81.20
OA	66.16 ± 1.85	81.03 ± 1.59	76.05 ± 2.25	73.69 ± 3.32	84.24 ± 1.51
AA	77.35 ± 1.61	77.66 ± 2.01	75.81 ± 3.45	72.27 ± 4.30	80.12 ± 3.38
Kappa	53.44 ± 1.99	70.25 ± 2.00	62.18 ± 2.51	58.35 ± 5.36	74.32 ± 2.53

Table 6. Classification results (mean ± standard deviation) on the Shanghai–Hangzhou task. Bold values indicate the best performance in each row.

Class No.	Methods
Class No.	CLDA [24]	MSDA [23]	SCLUDA [30]	MLUDA [22]	SPADA
1	98.59	96.57	99.63	99.99	97.02
2	88.67	95.04	86.93	84.89	98.56
3	88.66	86.69	86.84	88.53	87.16
OA	89.92 ± 2.57	92.77 ± 1.66	88.59 ± 5.83	87.96 ± 2.88	94.93 ± 1.88
AA	91.81 ± 1.57	92.76 ± 1.46	91.13 ± 4.54	91.14 ± 2.43	94.14 ± 2.09
Kappa	82.70 ± 4.14	87.26 ± 2.93	80.85 ± 9.36	79.83 ± 4.67	90.91 ± 3.44

Table 7. Ablation results on the Pavia University → Pavia Center cross-scene DA task. Bold values indicate the best performance in each row.

Loss	Active Adaptation	Confident Refinement	OA	AA	Kappa
$L_{align}$	×	×	96.82 ± 0.72	97.22 ± 0.75	96.17 ± 0.87
$L_{proto}$	×	×	97.47 ± 1.06	97.57 ± 0.94	96.96 ± 1.28
$L_{align}$ + $L_{proto}$	×	×	90.12 ± 2.29	89.83 ± 1.32	88.17 ± 2.68
$L_{align}$ + $L_{proto}$	✓	×	97.28 ± 0.86	97.12 ± 0.86	96.71 ± 1.03
$L_{align}$ + $L_{proto}$	×	✓	97.79 ± 0.85	97.81 ± 0.57	97.33 ± 1.02
$L_{align}$ + $L_{proto}$	✓	✓	97.89 ± 0.68	97.73 ± 0.84	97.46 ± 0.82

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhang, W.; Hu, R.; Wang, J.; Zhang, L.; Zhu, C. Spectral Prototype Attention Domain Adaptation for Hyperspectral Image Classification. Remote Sens. 2025, 17, 3901. https://doi.org/10.3390/rs17233901

AMA Style

Zhang W, Hu R, Wang J, Zhang L, Zhu C. Spectral Prototype Attention Domain Adaptation for Hyperspectral Image Classification. Remote Sensing. 2025; 17(23):3901. https://doi.org/10.3390/rs17233901

Chicago/Turabian Style

Zhang, Weina, Runshan Hu, Jierui Wang, Lanlan Zhang, and Chenyang Zhu. 2025. "Spectral Prototype Attention Domain Adaptation for Hyperspectral Image Classification" Remote Sensing 17, no. 23: 3901. https://doi.org/10.3390/rs17233901

APA Style

Zhang, W., Hu, R., Wang, J., Zhang, L., & Zhu, C. (2025). Spectral Prototype Attention Domain Adaptation for Hyperspectral Image Classification. Remote Sensing, 17(23), 3901. https://doi.org/10.3390/rs17233901

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Spectral Prototype Attention Domain Adaptation for Hyperspectral Image Classification

Highlights

Abstract

1. Introduction

2. Related Work

2.1. Domain Adaptation

2.2. Domain Adaptation for Hyperspectral Image Classification

3. Methods

3.1. Framework Overview

3.2. Spectral–Spatial Attention Backbone (SSAB)

3.3. Prototype-Coupled Domain Alignment (PCDA)

3.4. Training Pipeline

4. Results

4.1. Experiment Settings

4.2. Comparison with SOTA

5. Discussion

5.1. Ablation Study

5.2. Sensitivity Analysis

5.3. Visualization of Model Attention

6. Conclusions and Future Work

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI