Next Article in Journal
Experimental Review of the Quarkonium Physics at the LHC
Previous Article in Journal
A YOLO-Based Multi-Scale and Small Object Detection Framework for Low-Altitude UAVs in Cluttered Scenes
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

DSCSRN: Physically Guided Symmetry-Aware Spatial-Spectral Collaborative Network for Single-Image Hyperspectral Super-Resolution

1
School of Computer Science, Hubei University of Technology, Wuhan 430068, China
2
Hubei Provincial Key Laboratory of Green Intelligent Computing Power Network, Wuhan 430068, China
3
Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China
4
China Centre for Resources Satellite Data and Application, Beijing 100095, China
*
Author to whom correspondence should be addressed.
Symmetry 2025, 17(9), 1520; https://doi.org/10.3390/sym17091520
Submission received: 11 August 2025 / Revised: 6 September 2025 / Accepted: 8 September 2025 / Published: 12 September 2025
(This article belongs to the Section Computer)

Abstract

Hyperspectral images (HSIs), with their rich spectral information, are widely used in remote sensing; yet the inherent trade-off between spectral and spatial resolution in imaging systems often limits spatial details. Single-image hyperspectral super-resolution (HSI-SR) seeks to recover high-resolution HSIs from a single low-resolution input, but the high dimensionality and spectral redundancy of HSIs make this task challenging. In HSIs, spectral signatures and spatial textures often exhibit intrinsic symmetries, and preserving these symmetries provides additional physical constraints that enhance reconstruction fidelity and robustness. To address these challenges, we propose the Dynamic Spectral Collaborative Super-Resolution Network (DSCSRN), an end-to-end framework that integrates physical modeling with deep learning and explicitly embeds spatial–spectral symmetry priors into the network architecture. DSCSRN processes low-resolution HSIs with a Cascaded Residual Spectral Decomposition Network (CRSDN) to compress redundant channels while preserving spatial structures, generating accurate abundance maps. These maps are refined by two Synergistic Progressive Feature Refinement Modules (SPFRMs), which progressively enhance spatial textures and spectral details via a multi-scale dual-domain collaborative attention mechanism. The Dynamic Endmember Adjustment Module (DEAM) then adaptively updates spectral endmembers according to scene context, overcoming the limitations of fixed-endmember assumptions. Grounded in the Linear Mixture Model (LMM), this unmixing–recovery–reconstruction pipeline restores subtle spectral variations alongside improved spatial resolution. Experiments on the Chikusei, Pavia Center, and CAVE datasets show that DSCSRN outperforms state-of-the-art methods in both perceptual quality and quantitative performance, achieving an average PSNR of 43.42 and a SAM of 1.75 (×4 scale) on Chikusei. The integration of symmetry principles offers a unifying perspective aligned with the intrinsic structure of HSIs, producing reconstructions that are both accurate and structurally consistent.

1. Introduction

Hyperspectral imaging, which integrates spatial and spectral information, has attracted increasing attention in recent years for applications in remote sensing and fine-grained recognition tasks. Hyperspectral images (HSIs) simultaneously capture two-dimensional spatial information along with dozens to hundreds of contiguous, narrow spectral bands. Each pixel in an HSI contains not only spatial location data but also complete spectral features, enabling the identification of subtle spectral differences across various materials. This high-dimensional representation grants HSIs superior material discrimination capabilities compared to multispectral or natural images. Thanks to their rich spectral characteristics, HSIs are widely applied in high-precision sensing tasks such as defect detection [1], mineral exploration [2], agricultural monitoring [3], and environmental protection [4]. However, due to the limited energy reception capacity of imaging sensors, acquiring narrow and continuous spectral bands typically requires lowering the spatial sampling rate within a fixed acquisition time, leading to image blurring and loss of spatial details. This reduction in spatial resolution undermines the image’s ability to represent fine-grained spatial structures and negatively affects performance in practical tasks such as change detection [5], object recognition [6], and classification [7]. Notably, many hyperspectral scenes and their spectral–spatial representations possess inherent symmetries, such as repetitive textures in spatial patterns and balanced reflectance characteristics across certain spectral bands. Exploiting these symmetries can act as an additional prior, reinforcing both the physical and structural consistency of reconstructed HSIs. Motivated by the challenges of low spatial resolution, HSI-SR has emerged as an active research area. HSI-SR enhances the spatial resolution of HSIs through post-processing, without requiring modifications to the imaging hardware. This approach not only improves visual quality but also provides richer spatial–spectral representations for downstream tasks, making it valuable in both theoretical research and practical applications. Current HSI-SR methods can be broadly categorized into two types: fusion-based super-resolution and single-image super-resolution [8]. Fusion-based methods typically integrate low-resolution HSIs with high-resolution multispectral or RGB images to reconstruct high-quality, high-resolution HSIs [9]. In contrast, single-image methods rely solely on a low-resolution HSI for super-resolution, eliminating the need for auxiliary data [10]. This independence simplifies data acquisition and avoids the challenges of image alignment, making single-image approaches more practical and adaptable.
Early single-image HSI-SR methods were primarily based on traditional image processing techniques, such as bicubic interpolation [11], edge-guided reconstruction [12], sparse representation [13], and low-rank tensor recovery [14]. While these methods have achieved certain improvements under predefined priors, they generally suffer from limited modeling capacity and struggle to recover complex textures and details. With the advancement of deep learning, convolutional neural network (CNN)-based methods for HSI-SR have achieved remarkable progress [15]. Mei et al. [16] proposed a method based on a three-dimensional fully convolutional neural network (3D-FCNN) to jointly model spectral and spatial information, capturing spectral–spatial correlations at the voxel level. However, HSIs typically consist of hundreds of spectral bands, resulting in significant computational and storage overhead in practical applications. Moreover, the intrinsic locality of 3D convolutions restricts their ability to capture long-range spectral dependencies. To address these limitations, Li et al. [17] proposed a divide-and-conquer strategy, partitioning the spectral dimension into multiple subgroups for separate modeling. This approach reduced computational complexity and improved network efficiency. However, it failed to fully consider inter-group redundancy and cross-channel correlations, which constrained the overall reconstruction performance. To better capture critical information, Hu et al. [18] introduced a spectral-spatial attention mechanism that assigns explicit weights to features by evaluating the importance of different spectral channels and spatial locations. Although this method improved spectral fidelity and detail restoration, spectral distortions still occurred in complex scenes. Additionally, these attention-based methods heavily rely on feature fusion strategies, which reduce their robustness and generalization ability. Furthermore, Li et al. [19] developed a multi-stage refinement framework, and Yan et al. [20] proposed a recursive residual enhancement method to improve high-frequency detail recovery. However, their increased structural complexity and training costs hinder practical deployment. More recently, Li et al. [21] introduced Transformer-based architectures and frequency-domain modeling to capture long-range dependencies, further enhancing reconstruction quality. Nevertheless, these methods face challenges such as large parameter counts [22], slow training convergence [23], and limited adaptability to small-sample scenarios [24], which restrict their applications. Recent studies have shown that incorporating physical priors, such as abundance modeling, can help alleviate redundancy caused by high spectral dimensionality [25]. In particular, the linear mixture model (LMM) framework [26], which maps low-resolution HSIs to a low-dimensional abundance space and combines endmember information to reconstruct high-resolution images, has proven to be an effective and physically interpretable approach. For example, the aeDPCN framework proposed by Wang et al. [27] employs an autoencoder to extract abundance features and uses dilated convolutional networks to progressively refine reconstruction, achieving high-quality results with low computational cost.
Despite advancements in network design and reconstruction performance, current methods still face three critical challenges. First, most approaches focus solely on feature-level spectral–spatial fusion while overlooking the physical prior that HSIs are formed by a limited number of endmembers through linear mixing, thereby compromising the physical consistency of spectral reconstruction. Second, these methods often struggle to accurately model mixed pixels in low-resolution images, leading to suboptimal recovery of high-frequency details, particularly in complex scenes. Third, the spatial and spectral attention mechanisms in fusion modules are typically designed in isolation, lacking deep interaction and resulting in limited synergy between spatial textures and spectral features. Consequently, there is an urgent need for a method that can incorporate spectral unmixing priors and strengthen the joint modeling of spectral and spatial information, in order to achieve more accurate and physically consistent HSI-SR. Equally important, explicitly modeling spatial–spectral symmetries offers a complementary pathway to reduce redundancy, better handle mixed pixels, and align the reconstruction process with the intrinsic organization of real-world hyperspectral data.
To address these issues, we propose the Dynamic Spectral Collaborative Super-Resolution Network (DSCSRN)—an HSI-SR framework that integrates the strengths of physical priors with deep learning. The core idea is to establish a super-resolution closed-loop process, leveraging abundance maps as intermediaries and grounded in the LMM. This design enables the synergistic enhancement of spatial detail and spectral consistency through a two-stage progressive reconstruction strategy. Unlike prior works that independently adopt spatial–spectral priors, attention mechanisms, or unmixing-based strategies, DSCSRN introduces a collaborative closed-loop framework that unifies these paradigms under the guidance of the LMM. Specifically, (i) CRSDN estimates abundances with residual refinement, (ii) DEAM dynamically adjusts endmember representations to strengthen spectral reconstruction, and (iii) SPFRM progressively enhances spatial–spectral features via the DDSA mechanism. This synergy not only ensures strong physical interpretability but also delivers superior reconstruction accuracy, distinguishing DSCSRN from existing approaches. Our main contributions are summarized as follows:
(1)
Unified Physics–Deep Learning framework: Embedding the LMM into a data-driven architecture bridges the gap between traditional spectral unmixing and modern deep learning, directly addressing the neglect of physical priors and improving the physical interpretability of reconstructed spectra. By incorporating symmetry priors into this framework, the network benefits from balanced dual-domain processing that reflects the natural organization of spectral–spatial information.
(2)
Cascaded Residual Spectral Decomposition Network (CRSDN): By performing residual channel compression and spatial preservation simultaneously, CRSDN enforces the spectral formation prior while reducing spectral redundancy, enabling accurate and efficient abundance estimation—thereby overcoming the challenge of physically inconsistent spectral reconstruction. Its channel–spatial decomposition structure follows a symmetric processing pattern across domains, ensuring balanced feature extraction.
(3)
Two Synergistic Progressive Feature Refinement Modules (SPFRMs): Arranged sequentially, SPFRM employs a two-level upsampling structure with multiple Dual-Domain Synergy Attention (DDSA) blocks to progressively recover spatial textures and spectral details. This design directly mitigates the difficulty of modeling mixed pixels and restores high-frequency details even in complex scenes. The dual-branch DDSA is inherently symmetric in its spatial and spectral pathways, enabling mutual enhancement and avoiding bias toward a single domain.
(4)
Dynamic Endmember Adjustment Module (DEAM): By adaptively updating endmember signatures according to scene context, DEAM overcomes the rigidity of fixed-endmember assumptions and strengthens the synergy between spectral and spatial domains, solving the limitation of isolated attention mechanisms. The DEAM process maintains symmetry in its treatment of spectral endmembers and abundance maps, ensuring consistency in both directions of the reconstruction pipeline.

2. Methods

2.1. Overall Architecture

The architecture of the DSCSRN network is illustrated in Figure 1. Based on the LMM, the framework adopts a three-stage process of unmixing–recovery–reconstruction, encompassing spectral unmixing modeling, abundance map recovery, and high-frequency detail reconstruction. This hierarchical design enables collaborative optimization that integrates physical modeling with high-fidelity image reconstruction. Moreover, the architecture is intentionally designed with spatial–spectral symmetry in mind: each processing stage maintains a balanced, dual-path interaction between spatial and spectral domains, ensuring that improvements in one domain are mirrored and reinforced in the other.
Specifically, the input low-resolution hyperspectral image X R C × W × H is first processed by the CRSDN to generate the initial abundance map A R N × W × H , where C denotes the number of spectral channels, N is the number of endmembers, and W × H represents the spatial resolution. CRSDN preserves both the spatial distribution and the low-dimensional spectral structure of the input, providing a physically meaningful prior for the subsequent super-resolution stages. This initial stage forms a symmetric mapping between spectral signatures and their spatial distributions, acting as the first step of the network’s symmetry-preserving pipeline.
Building on CRSDN’s output, SPFRM enhances the spatial resolution of the abundance map in two stages. Each SPFRM incorporates multiple DDSA modules (see Figure 1) to progressively extract and fuse spatial–spectral features. The first SPFRM upsamples A to an intermediate resolution, yielding A 0 R N × ( r / 2 W ) × ( r / 2 H ) , while the second SPFRM further refines it to produce the high-resolution abundance map A R N × r W × r H , where r is the spatial scaling factor. The two-stage SPFRM processing exhibits a scale-symmetric refinement: each upsampling step mirrors the spectral–spatial attention flow, ensuring that feature enhancement is structurally consistent across resolutions.
Subsequently, the DEAM leverages A and a globally shared endmember library E R C × N to adaptively refine endmember representations through context-aware modulation. These adjusted endmembers are linearly combined with A to generate a preliminary high-resolution hyperspectral image Y 0 R C × r W × r H . This adaptive refinement also follows a symmetry-driven design, treating each endmember–abundance pair as a matched structural unit to maintain balance between spectral precision and spatial coherence.
To enhance texture recovery, a global residual connection incorporates a bicubically upsampled version of the input image, X R C × r W × r H , which is added to Y 0 to produce the final reconstructed image Y R C × r W × r H . The residual addition completes a symmetry loop, aligning the reconstructed output with both the original low-resolution input and the refined high-resolution prediction.
Unlike a purely sequential design, DSCSRN implements the unmixing–recovery–reconstruction pipeline in a closed-loop, end-to-end manner. Abundance maps estimated by CRSDN are iteratively refined through DEAM, while SPFRMs progressively enforce spatial–spectral consistency. This joint optimization, combined with multi-level supervision, effectively suppresses error propagation across stages and ensures stable, high-fidelity reconstructions.
In summary, DSCSRN’s hierarchical architecture revolves around CRSDN as the cornerstone, with SPFRM and DEAM collaboratively refining its output to achieve high-quality hyperspectral reconstruction. By embedding symmetry principles into the network’s multi-stage pipeline and feature interaction design, DSCSRN achieves a harmonious balance between spatial detail enhancement and spectral fidelity, which is essential for physically consistent reconstruction. This bottom-up process balances physical interpretability with the expressive power of deep neural networks, ensuring robust performance under stringent structural and spectral constraints.

2.2. Cascaded Residual Spectral Decomposition Network

To obtain physically interpretable abundance representations from low-resolution HSIs, we designed the CRSDN as the first-stage core module (Figure 2). Unlike existing single-image HSI-SR methods that directly fuse spectral–spatial features without explicit physical constraints—often leading to physically inconsistent spectra in mixed-pixel-rich scenes—CRSDN embeds the spectral unmixing process, grounded in LMM, into the network architecture. This integration allows the network to compress redundant spectral channels while preserving spatial structures, producing abundance maps that adhere to non-negativity and sum-to-one constraints, and thereby provides a physically meaningful prior for subsequent high-resolution reconstruction.
This module is grounded in the classical LMM, which assumes that the spectral vector of each pixel can be expressed as a linear combination of multiple endmember spectra:
X 1 i , j = n = 1 N   A n , i , j e n + ε
Here, X 1 ( i , j ) R C denotes the observed spectral vector at pixel location ( i , j ) , e n R C represents the spectral signature of the n-th endmember, A n , i , j is the corresponding abundance coefficient, and ε is a residual error term.
Within the proposed framework, CRSDN takes the low-resolution hyperspectral image X as input and generates the corresponding abundance maps A :
A = H C R S D N X
The architecture of CRSDN follows a layer-wise residual stacking design, consisting of multiple Residual Space Preservation Blocks (RSPBs) arranged in series. As shown in Figure 2, each RSPB contains a 3 × 3 convolutional layer, a Spatial Attention (SA) mechanism [28], a Tanh activation function [29], and a residual connection that directly links the input and output of the block.
The SA layer enhances the model’s capacity to capture fine-grained local features by integrating global spatial context, making it especially effective at delineating edges and structural transitions. The Tanh activation function provides smooth nonlinear mapping while mitigating the gradient saturation problem, maintaining a stable gradient flow during training. Meanwhile, the residual connection facilitates shallow feature reuse and enables better information propagation across layers, effectively mitigating degradation in deep structures.
The channel dimension in the RSPBs is progressively reduced in a cascading fashion as follows:
C R S P B 1 8 N R S P B 2 4 N R S P B 3 2 N
This progressive channel compression mechanism aids in aggregating high-dimensional spectral features into a compact representation, mitigating the “curse of dimensionality” [30] while preserving multi-scale spectral information.
After the final RSPB, a 1 × 1 convolutional layer maps the channel dimension to N, corresponding to the number of endmembers. The output is subsequently passed through the GELU activation function [31], which enhances the nonlinearity modeling capacity of the network. To ensure physical interpretability, the model imposes two constraints from the LMM—non-negativity and sum-to-one—on the abundance maps via Softmax normalization [32]:
A n , i , j = exp Z n , i , j l = 1 K exp Z l , i , j
where Z n , i , j denotes the unnormalized abundance response of the n-th endmember at pixel ( i , j ) . This operation ensures that all abundance coefficients are non-negative and sum to one at each pixel, thereby aligning with the physical principles of hyperspectral unmixing. Enforcing these constraints not only enhances the physical realism of the reconstruction process but also improves the model’s generalization across diverse scenes.
Beyond these standard constraints, CRSDN introduces a symmetry prior to enhance physical interpretability. The key intuition is that abundance maps corresponding to spectrally correlated or geometrically symmetric endmembers should exhibit structural consistency. This is formulated as:
L s y m = i ,   j ϵ P A i S ( A j ) 2 2
where A i and A j denote the abundance maps of an endmember pair ( i , j ) , and S ( · ) represents a predefined symmetry transformation. Unlike conventional regularization that penalizes parameter magnitudes, this prior encodes a physically grounded constraint, ensuring consistency across abundance maps of symmetric spectral structures.
Collectively, CRSDN integrates (i) LMM-guided unmixing constraints (non-negativity and sum-to-one), (ii) residual cascaded compression with spatial attention for effective spectral–spatial feature preservation, and (iii) a symmetry prior that enforces structural consistency across abundance maps. This combination yields abundance representations that are both physically interpretable and robust, providing a solid foundation for the subsequent refinement modules in DSCSRN.

2.3. Synergistic Progressive Feature Refinement Module

To bridge the gap between physically interpretable abundance estimation and perceptually faithful high-resolution reconstruction, we propose the SPFRM as the second-stage core component of the DSCSRN framework. Unlike direct single-step upsampling strategies that often amplify noise and blur fine textures, SPFRM adopts a progressive refinement paradigm, gradually enhancing structural integrity and spectral fidelity through coarse-to-fine optimization. By embedding cross-domain feature interactions—explicitly coupling spatial and spectral cues—into each refinement stage, SPFRM not only improves local detail recovery but also enhances robustness to spectral distortions, ensuring that the reconstructed HSI remains consistent with both physical priors and high-frequency scene details.
Within the DSCSRN framework, SPFRM adopts a two-stage stacked structure for coarse recovery and fine enhancement. Given a scaling factor r { 4,8 } , the first stage upsamples the initial abundance map A R N × W × H by a factor of r/2, producing an intermediate map A 0 R N × ( r / 2 W ) × ( r / 2 H ) . The second stage then performs another 2× upsampling, resulting in the final high-resolution abundance map A R N × r W × r H . This progressive design facilitates layer-wise optimization from structural perception to fine-grained detail enhancement:
A 0 = S P F R M 1 A , A = S P F R M 2 A 0
Each SPFRM module incorporates multiple DDSA units, whose detailed structure is illustrated in Figure 3. DDSA is designed to jointly extract spatial and spectral features, achieving effective multi-scale refinement. Given an input feature f x R N × W × H , DDSA models spectral and spatial interactions through two parallel branches.
In the spectral branch, to capture inter-band relationships and spectral correlations, depthwise convolution [33] is first applied to extract local responses of each band, followed by pointwise convolution [34] to integrate cross-channel information. This yields the spectral attention map:
α s p e = σ C o n v 1 × 1 C o n v 3 × 3 d w f x
where σ ( · ) is the Sigmoid function. The enhanced spectral features are obtained via element-wise multiplication: f s p e x = f x · α s p e .
In parallel, the spatial branch enhances structural sensitivity and texture details. The channel dimension is compressed through a 1 × 1 convolution, producing f r e d R C / 2 × W × H , followed by a 3 × 3 convolution with GELU activation and another 1 × 1 convolution to generate the spatial attention map:
α s p a = σ C o n v 1 × 1 G E L U C o n v 3 × 3 f r e d
The spatially enhanced output is f s p a x = f x · α s p a .
To fuse spatial and spectral features, the two outputs f s p e x and f s p a x are concatenated along the channel dimension, forming f c a t R 2 C × H × W , and then passed through two sequential 1 × 1 convolutions with GELU activation:
f f u s e d x = C o n v 1 × 1 2 G E L U C o n v 1 × 1 1 f c a t
This process generates high-order features that effectively combine spatial and spectral information with reduced computational complexity.
Finally, to adaptively fuse the original and enhanced features, DDSA employs a Multi-Source Adaptive Fusion Gate (MSAFG). The four feature maps f x ,   f s p e x , f s p a x ,   and   f f u s e d x are concatenated and compressed to f r e d R C × H × W via a 1 × 1 convolution. A gating map α 0,1 C × W × H is generated by another 1 × 1 convolution followed by a Sigmoid function:
α = σ C o n v 1 × 1 f r e d
The final fused feature is obtained through gated multiplication and residual connection:
f x = C o n v 3 × 3 2 G E L U C o n v 3 × 3 1 f r e d α + f x
Unlike conventional attention that may simply amplify dominant features, our collaborative attention mechanism is explicitly formulated to align spatial–spectral dependencies across multiple scales, ensuring consistent feature refinement throughout the reconstruction pipeline. Moreover, by employing a gating strategy with dynamic weighting, it suppresses redundant feature responses while preserving complementary information, thereby enhancing cross-scale consistency without introducing unnecessary complexity.
Taken together, SPFRM integrates progressive upsampling, dual-branch spatial–spectral attention, and multi-source adaptive fusion into a unified refinement pipeline. This design enables stepwise enhancement from structural perception to fine-grained texture reconstruction, while maintaining spectral consistency through adaptive cross-domain interactions. By jointly leveraging abundance priors and learned high-order features, SPFRM effectively mitigates detail loss, suppresses spectral artifacts, and delivers high-resolution HSIs with both physical plausibility and visual fidelity. Furthermore, the progressive refinement inherently respects a form of structural symmetry, where spatial details and spectral patterns are iteratively balanced and reinforced across scales. This symmetry-driven interaction ensures that enhancement in one domain does not compromise the other, resulting in reconstructions that exhibit both geometrical coherence and spectral integrity.

2.4. Dynamic Endmember Adjustment Module

The Dynamic Endmember Adjustment Module (DEAM) is designed to address the challenges of complex and nonlinear spectral mixing by adaptively refining endmember representations and enhancing the fusion between abundances and endmembers, as shown in Figure 4. Through a nonlinear mapping from abundance space to spectral space, DEAM captures intricate spectral dependencies, enabling more accurate reconstruction and improved spectral fidelity.
DEAM begins with an attention generation sub-network, which aggregates spatial features from the input abundance map A R N × W × H . This process extracts importance weights for each endmember across samples by applying convolution, GELU activation, global average pooling, and a Sigmoid function to generate an attention map M R N × 1 × 1 :
M = σ C o n v 1 × 1 A v g P o o l G E L U C o n v 3 × 3 A
Here, σ ( ) denotes the Sigmoid function, and AvgPool represents adaptive global average pooling, which captures the global spatial context.
The generated attention map is then used to adaptively adjust the globally shared endmember library E R C × N , allowing the model to refine endmember representations per sample. For the b-th sample, the adjusted endmember matrix E ~ ( b ) R C × N is computed as:
E ~ b = E M b , b = 1 , , B α = σ C o n v 1 × 1 f r e d
where denotes element-wise multiplication.
To synthesize the spectral image, DEAM employs a Dynamic Matrix Synthesis (DMS) strategy [35] that fuses the adjusted endmembers and the abundance map. The high-resolution abundance tensor is first flattened into A f l a t ( b ) R N × ( r W · r H ) , then linearly combined with E ~ b using Einstein summation [36]:
Y b = E ~ b × A f l a t b = c , n N   E ~ c , n b A n , q b , Y b R C × ( r W · r H )
Finally, the spectral output is reshaped into spatial dimensions to produce a preliminary high-resolution hyperspectral image Y 0 R C × r H × r W . This mechanism enables more accurate estimation of abundances and endmembers, significantly improving the reconstruction quality under complex mixing conditions.
While adaptive endmember models have been explored in traditional unmixing—such as linear perturbation schemes, low-rank constrained adjustments, or adaptive linear mixing models—these methods generally treat endmember refinement as an isolated optimization task.
In contrast, DEAM is seamlessly embedded into the DSCSRN framework, where endmember features are dynamically refined in tandem with abundance estimation and progressive reconstruction. This joint optimization not only captures complex spectral variability but also enforces consistency between the abundance and endmember domains, thereby bridging physical interpretability with deep network adaptability.
Collectively, DEAM tailors endmember spectra to each input scene, mitigating spectral variability and nonlinearity effects. By synergistically integrating refined endmembers with high-resolution abundances, it enhances spectral reconstruction accuracy and improves robustness across diverse hyperspectral scenarios. Moreover, DEAM preserves a functional symmetry between abundance distributions and endmember spectra: adjustments to one are balanced by refinements in the other. This bidirectional symmetry ensures spectral fidelity while maintaining spatial coherence, harmonizing physical interpretability with the flexibility of deep learning.

3. Experiments

3.1. Datasets

To thoroughly evaluate the performance of the proposed method on HSI-SR tasks, three widely used benchmark datasets were selected: Chikusei [37], Pavia Centre (PaviaC) [38], and CAVE [39]. These datasets differ significantly in terms of spectral range, spatial resolution, and scene complexity, offering a comprehensive and diverse testing platform for performance validation. A summary of their basic characteristics is presented in Table 1.
For the Chikusei dataset, which contains rich spatial and spectral content, we selected the central region of the image (2048 × 2048 pixels). The top portion was cropped into twelve non-overlapping patches of 512 × 512 pixels to form the training set, while the remaining region was divided into 256 × 256 patches for validation and testing.
For the PaviaC dataset, the original image was partitioned into 24 spatial patches of 120 × 120 pixels. Among these, eighteen were used for training, and the remaining six were used for validation and testing.
The CAVE dataset consists of 32 HSIs of natural indoor scenes. After simulating the corresponding low- and high-resolution image pairs, 22 images were randomly selected for training, and the remaining 10 were used for validation and testing.
These three datasets were deliberately chosen to ensure complementary diversity: Chikusei captures large-scale natural and agricultural areas, Pavia Centre focuses on high-resolution urban environments, and CAVE represents indoor laboratory scenes with controlled spectral variations. This combination covers a broad spectrum of acquisition settings (airborne, urban remote sensing, and indoor imaging), thereby supporting the generalizability of our evaluation and enabling fair benchmarking against existing HSI-SR methods.

3.2. Experimental Setup

In the proposed DSCSRN framework, the number of input channels in the first layer is configured according to the number of spectral bands in the dataset. Following previous work [27], the number of endmembers is set to N = 30, serving as the basis for abundance estimation and subsequent reconstruction.
Super-resolution experiments are conducted using scale factors of ×4 and ×8, and a progressive upsampling strategy is adopted to reduce model complexity. At each upsampling stage, sub-pixel convolution (pixel shuffle) is applied to effectively preserve spatial details.
Model training is conducted using the PyTorch framework on an NVIDIA RTX 4090 GPU. The Adam optimizer is used, with an initial learning rate of 2 × 10 4 , and training is performed over 1000 epochs. The batch size is set to 16, and automatic mixed precision (AMP) [40] is employed to accelerate training and reduce memory usage. For data augmentation, random rotations and flipping are applied to improve generalization. The model is optimized using L1 Loss [41] to ensure pixel-level reconstruction accuracy.
Table 2 summarizes the computational efficiency of DSCSRN during training and validation. For the Chikusei dataset at a ×4 scale, each epoch takes approximately 75 s, with a total training time of 1375 min. Validation of a single image requires around 2.76 s. Pavia Center and CAVE exhibit faster per-epoch training times (45 s and 26 s, respectively), reflecting their smaller image sizes and fewer spectral bands. GPU memory consumption remains moderate across all datasets, requiring approximately 5 GB for training and around 3–4 GB for validation, demonstrating that DSCSRN can be trained and evaluated efficiently on a single high-end GPU.

3.3. Experimental Evaluation

To quantitatively assess the quality of the reconstructed high-resolution HSIs, six widely used objective metrics are adopted: Peak Signal-to-Noise Ratio (PSNR) [42], Structural Similarity Index (SSIM) [43], the dimensionless Global Error of Synthesis (ERGAS) [44], Spectral Angle Mapper (SAM) [45], Cross-Correlation (CC) [46], and Root Mean Square Error (RMSE) [47]. Higher values of PSNR, CC, and SSIM indicate better reconstruction performance, while lower values of SAM, RMSE, and ERGAS suggest more accurate results. The ideal values for these metrics are as follows: PSNR → +∞, SAM → 0, CC → 1, RMSE → 0, SSIM → 1, and ERGAS → 0. In practice, a PSNR above 30 dB is generally considered to indicate high-quality reconstruction in HSI tasks. These metrics jointly evaluate signal fidelity (PSNR), global synthesis error (ERGAS), spectral consistency (SAM), and structural similarity (SSIM), providing a comprehensive evaluation framework for assessing the quality of hyperspectral reconstruction.

3.4. Comparative Experiments

To demonstrate the effectiveness of the proposed method, comparative experiments are conducted with several state-of-the-art single-image super-resolution (SISR) methods, including FCNN [16], RFSR [48], DualSR [17], aeDPCN [27], EUNet [49], CST [50], and SNLSR [51]. For each method, the number of channels in the convolutional layers is adjusted to match the spectral dimensionality of the input datasets. Wherever possible, publicly available official implementations are used, and hyperparameters are tuned to achieve optimal performance, ensuring fair and reproducible comparisons across all methods.

3.4.1. Evaluation of the Chikusei Dataset

Table 3 presents the average results of six evaluation metrics across multiple test samples from the Chikusei dataset. At both 4× and 8× scaling factors, the DSCSRN framework consistently outperforms state-of-the-art methods, achieving superior scores in key metrics such as PSNR, SSIM, and SAM. These results demonstrate its exceptional capability in preserving spatial structures and ensuring spectral consistency. This performance stems from DSCSRN’s three-stage hierarchical design, which integrates the CRSDN, SPFRM, and DEAM. CRSDN generates a physically constrained initial abundance map, SPFRM enhances spatial resolution, and DEAM optimizes endmember representations, collectively ensuring a balance between spatial detail and spectral fidelity.
Figure 5 further illustrates DSCSRN’s reconstruction performance through visual comparisons of three representative test samples at a 4× scaling factor, accompanied by their corresponding average spectral difference curves. Pseudo-color images, formed by combining the 70th, 100th, and 36th spectral bands, are annotated with their respective PSNR and SAM values. DSCSRN exhibits higher fidelity in restoring edge structures and texture details, significantly reducing blurring, over-smoothing, and blocky artifacts (highlighted in red boxes). This advantage arises from its multi-stage feature-refinement strategy. The SPFRM’s DDSA modules progressively fuse spatial–spectral features to enhance the spatial resolution of the abundance map, while CRSDN’s initial abundance map ensures physical consistency. The spectral difference curves shown in the right part of Figure 5 further confirm that DSCSRN achieves the lowest spectral distortion across the full wavelength range, validating its superior spectral fidelity.
Other methods, such as FCNN, aeDPCN, DualSR, RFSR, and EUNet, exhibit limitations in spatial integrity or spectral context capture. FCNN produces over-smoothed outputs due to its lack of multi-stage refinement; aeDPCN disrupts spatial integrity by neglecting spectral autocorrelation; DualSR and RFSR show instability under high-bandwidth conditions; and EUNet struggles with edge preservation. In contrast, DSCSRN achieves a robust balance between spatial detail and spectral fidelity through CRSDN’s abundance map generation, SPFRM’s spatial–spectral feature fusion, and DEAM’s dynamic endmember adjustment.
Overall, both quantitative metrics and qualitative visualization results verify that DSCSRN delivers superior reconstruction performance, exhibiting strong generalization abilities, better interpretability, and a balanced preservation of both spatial detail and spectral integrity.

3.4.2. Evaluation of the Pavia Center Dataset

Table 4 summarizes the average quantitative performance results for the test images from the PaviaC dataset. Characterized by complex scene structures, pronounced spatial variations, and high spectral noise—along with a relatively limited number of training samples compared to other datasets—PaviaC poses considerable challenges to both reconstruction accuracy and model generalization. Experimental results show that the proposed DSCSRN consistently achieves superior performance across key metrics, including PSNR, SSIM, and SAM, underscoring its strong modeling capability and generalization robustness.
To further assess the visual reconstruction quality, the left part of Figure 6 presents the results for three representative test images at a 4× scaling factor. These include the ground-truth grayscale images, the absolute error maps of the reconstructions, and the corresponding average spectral difference curves. The visualized images are synthesized from the 99th, 46th, and 89th spectral bands. In the absolute error maps, DSCSRN’s reconstructions exhibit consistently darker regions, indicating substantially lower pixel-wise errors compared to the original high-resolution images—especially in edge-dense areas such as building boundaries and heterogeneous land-cover zones—demonstrating its superior error suppression. Moreover, the average spectral difference curves in the right part of Figure 6 further confirm DSCSRN’s high spectral fidelity: its curve remains the lowest across the entire wavelength range, reflecting minimal spectral distortion and strong spectral consistency.
This enhanced performance can be attributed to two key components of the model: DDSA mechanism, which effectively captures complementary spatial and spectral features, and the DEAM, which adaptively refines endmembers to improve both the accuracy and stability of spectral reconstruction.

3.4.3. Evaluation of the CAVE Dataset (Natural Hyperspectral Images)

To further evaluate the generalization ability, adaptability, and robustness of the proposed method on natural-scene hyperspectral images (HSIs), additional experiments were conducted using the CAVE dataset. Unlike remote-sensing hyperspectral data, CAVE images feature more diverse and visually complex indoor scenes. Although the spectral dimensionality of CAVE is relatively lower, slightly reducing the difficulty of high-dimensional feature modeling, the dataset provides a rich variety of structural and textural patterns. This allows the model to demonstrate its capacity to generalize across different domains and effectively learn both spatial structures and spectral characteristics from a broader context.
For these experiments, the number of endmembers was set to N = 8 to match the lower spectral complexity of the CAVE dataset. The quantitative results, presented in Table 5, show that in the ×4 magnification task, the proposed DSCSRN method outperforms existing state-of-the-art methods on core metrics such as PSNR, SSIM, and ERGAS, demonstrating its superior ability in structural restoration and detail recovery. Although DSCSRN performs slightly worse than CST in terms of SAM, CC, and RMSE, it still surpasses SNLSR overall, achieving a favorable balance between spectral consistency and spatial clarity.
In the more challenging ×8 magnification task, DSCSRN also demonstrates superior performance, outperforming SNLSR in PSNR, SSIM, ERGAS, and RMSE, and significantly exceeding CST. These results indicate that DSCSRN provides stronger structural reconstruction and spectral stability under the complex-scene content and high scaling of natural images. Notably, although CST achieves better SAM scores, it struggles to maintain spectral accuracy at high magnifications. While SNLSR performs steadily in spectral preservation, it lacks sharpness and structural detail.
To further illustrate the reconstruction quality, three test image sets from the CAVE dataset were selected. The ×4 magnification results of the 18th spectral band are shown in Figure 7, along with the corresponding average spectral difference curves.
The visual comparisons clearly show that DSCSRN excels in structure recovery and texture preservation—especially in terms of edge sharpness, fine details, and transitions between materials. The generated reconstructed images are more natural and visually accurate. In contrast, methods such as RFSR, DualSR, and aeDPCN exhibit noticeable blurring and block artifacts in textured regions, while CST and SNLSR show structural distortions. Analysis of the spectral difference curves further reveals that DSCSRN maintains the lowest deviation across all bands, confirming its superior performance in preserving spectral fidelity.
Overall, DSCSRN demonstrates strong generalization ability and robustness in natural scenes, as evidenced by its performance on the CAVE dataset. Despite the domain gap between CAVE and remote-sensing datasets, the model maintains high reconstruction quality, showcasing its adaptability to diverse environments. This ability to balance spatial detail and spectral fidelity arises from the synergistic interplay among CRSDN’s physically grounded abundance estimation, SPFRM’s progressive feature refinement, and DEAM’s adaptive spectral optimization.

3.4.4. Cross-Dataset Evaluation (Trained on Chikusei, Tested on Pavia)

To further investigate the generalization ability of hyperspectral super-resolution models under domain shifts, we conducted a cross-dataset evaluation by training all models on the Chikusei dataset and testing them on the Pavia dataset. The results are evaluated on an input size of 102 × 40 × 40 at a scale factor of 4. Compared with Chikusei, which contains large agricultural scenes with rich spectral variability, the Pavia dataset primarily captures urban areas with distinct man-made structures, resulting in noticeable differences in both spatial complexity and spectral distributions. This setting introduces significant spectral distribution and spatial content discrepancies between training and testing scenes, making it a challenging benchmark for evaluating model robustness.
Table 6 reports the quantitative metrics under this cross-domain setting. The proposed DSCSRN achieves the best overall performance, with a PSNR of 28.95 dB and SSIM of 0.7566, alongside the lowest ERGAS (6.3550), SAM (5.9191), and RMSE (0.0381). These results clearly demonstrate the model’s superior ability to maintain both spectral fidelity and spatial detail despite significant dataset differences. Among competing methods, FCNN and DualSR exhibit relatively strong performance but remain inferior to DSCSRN, while other approaches such as CST and EUNet show a substantial decline in accuracy, highlighting their sensitivity to distribution shifts.
Figure 8 provides visual comparisons of three representative Pavia test scenes. The left column shows the reconstructed grayscale images and their corresponding absolute error maps. DSCSRN produces consistently darker error regions, particularly around building boundaries and mixed land-cover zones, reflecting its superior reconstruction stability in complex scenes. The mean absolute difference curves (right) further validate this advantage: DSCSRN achieves the lowest spectral error across the entire wavelength range, followed by FCNN and DualSR, indicating its effectiveness in preserving spectral consistency under cross-domain conditions.
This performance gain is primarily attributed to the synergy between DSCSRN’s DDSA mechanism, which extracts complementary spatial–spectral representations, and DEAM, which dynamically refines endmember features to enhance spectral reconstruction robustness across diverse datasets.

3.4.5. Parameter and Complexity Analysis

Beyond reconstruction accuracy, model complexity and parameter size are critical for assessing the practicality of hyperspectral image super-resolution methods. Figure 9 illustrates the trade-off between computational cost (measured in FLOPs) and reconstruction performance (PSNR) for different models, with their parameter sizes annotated alongside each marker. Specifically, blue squares represent existing approaches, while the red circle highlights our proposed DSCSRN. The results are obtained on the Chikusei dataset with a 4× upscaling factor, using input patches of size 128 × 50 × 50. The horizontal axis corresponds to FLOPs (in G), and the vertical axis shows PSNR (in dB).
As shown in Figure 9, DualSR suffers from extremely high computational complexity (1666.9 G FLOPs) despite achieving only moderate accuracy, reflecting the heavy cost of 3D convolutions. CST also involves a large number of parameters (21.3 M) and notable FLOPs (150.8 G), which limits its efficiency in practice. By contrast, lightweight methods such as FCNN (0.84 M, 69.3 G) and EUNet (0.84 M, 69.3 G) exhibit lower complexity but also deliver relatively weaker performance. RFSR strikes a balance with 0.98 M parameters and 60.4 G FLOPs, though its reconstruction quality remains behind the leading methods. Among prior works, SNLSR achieves a strong balance of efficiency and accuracy (1.65 M, 13.7 G FLOPs, 43.16 dB), while aeDPCN further reduces parameters (0.57 M, 14.4 G FLOPs) but with slightly lower accuracy. Most importantly, our proposed DSCSRN (red circle) achieves the best reconstruction quality (43.43 dB) with only 2.13 M parameters and 18.9 G FLOPs, clearly standing out from other methods. This favorable trade-off demonstrates that DSCSRN not only ensures superior image fidelity but also maintains high efficiency, making it well-suited for practical HSI super-resolution tasks.

3.5. Ablation Study

In the proposed DSCSRN, the CRSDN module serves as the core unmixing component, designed to estimate abundance maps under physical constraints. To evaluate its effectiveness, we vectorize the input hyperspectral image into a two-dimensional matrix and replace the spectral constraint decomposition process in CRSDN with a Multilayer Perceptron (MLP) of similar structure. The MLP uses the same activation functions and parameter sizes as CRSDN and directly predicts the initial abundance map from the image pixels. As shown in Table 7, removing CRSDN results in a PSNR drop of 0.94 dB, a decrease of 0.0444 in SSIM, and a significant increase in SAM to 5.7098. These results indicate that removing the physically constrained unmixing modeling introduces more spectral distortion and weakens spatial structure perception, thereby degrading reconstruction accuracy.
The DEAM is designed to dynamically update the endmember library based on high-resolution abundance features to better handle complex mixing scenarios. To evaluate its impact, we replace DEAM with static endmembers, where a fixed set of endmember vectors is used throughout training and inference without adaptive updates. The results show that removing DEAM causes PSNR to drop to 30.5431 (a decrease of 0.10 dB), and SAM increases to 5.2438. This demonstrates that static endmembers limit the model’s capacity to handle nonlinear mixing effects and reduce its ability to capture fine-grained spectral variations.
The MSAFG plays a key role in DSCSRN by integrating spatial, spectral, and cross-domain features with backbone information. To assess its fusion performance, we compared two alternative strategies: (1) concatenating the four feature branches followed by a 3×3 convolution (Concatenation), and (2) performing element-wise addition of the four branches (Addition). The results indicated that compared to MSAFG, the concatenation strategy reduced the PSNR by 0.0931 dB, while the Addition strategy yielded a slightly larger drop of 0.0993 dB. In both cases, the SAM also saw an increase. These findings suggest that such simple fusion mechanisms cannot adequately capture the nonlinear interactions among heterogeneous features. By contrast, MSAFG incorporates a gating mechanism to dynamically control feature flow, enabling more efficient coordination of multi-source features and improved spatial–spectral reconstruction.
To visually compare the impact of different settings, the left part of Figure 10 shows pseudo-color reconstructions and their corresponding absolute error maps under five configurations: the complete DSCSRN model, and four ablation settings (removing CRSDN, removing DEAM, and replacing MSAFG with Concatenation or Addition). It is evident that the DSCSRN-generated images display clearer textures and more precise edge restoration, producing visual results closer to the original image.
In the absolute error maps, DSCSRN outputs appear significantly darker in most regions, indicating lower reconstruction errors. At image boundaries and high-frequency regions in particular, DSCSRN effectively suppresses block artifacts and preserves structural details, whereas the ablated models exhibit noticeable error accumulation in these areas.
To further quantify spectral reconstruction performance, we plot the average spectral difference curves for each ablation variant in the right part of Figure 10. The full DSCSRN achieves the lowest spectral error, while removing CRSDN leads to the highest degradation. Although replacing DEAM or substituting MSAFG with simple concatenation or addition slightly reduces performance, none of these variants outperform the complete DSCSRN.
Together, these visual and quantitative analyses demonstrate the complementary contributions of each module, confirming their indispensable role in enhancing spatial detail and spectral accuracy in DSCSRN.
These ablation results collectively underscore the complementary and synergistic roles of the core modules in DSCSRN. CRSDN provides physically grounded unmixing, forming the basis for accurate abundance estimation. DEAM adaptively refines the spectral bases to better capture complex, scene-specific endmember variations, effectively bridging the gap between low-resolution inputs and high-resolution spectral representations. MSAFG further enhances feature integration by dynamically regulating the flow of information across spatial, spectral, and cross-domain dimensions. Together, these modules construct a coherent “unmix–refine–reconstruct” pipeline: physically interpretable unmixing establishes the foundation, progressive refinement amplifies discriminative features, and adaptive fusion facilitates efficient, context-aware reconstruction. The interplay among modules not only improves reconstruction accuracy across quantitative metrics but also enhances robustness in visually challenging regions, validating the effectiveness of DSCSRN’s design from both theoretical and empirical perspectives.

4. Conclusions

This paper presents a novel HSI-SR method named Dynamic Spectral Collaborative Super-Resolution Network (DSCSRN), which seamlessly integrates physical modeling with deep learning. By formulating a three-stage collaborative pipeline—unmixing, recovery, and reconstruction—the proposed framework effectively models the complex hierarchical relationships between spatial and spectral domains, while preserving physical interpretability. In addition, DSCSRN explicitly leverages the intrinsic symmetries present in hyperspectral data—such as repetitive spatial structures and balanced spectral relationships—using a dual-domain symmetric architecture to ensure consistency between spatial and spectral reconstructions.
The framework jointly exploits CRSDN for abundance estimation, DEAM for adaptive endmember refinement, SPFRM for spatial–spectral feature enhancement, and MSAFG for balanced multi-scale fusion. Collectively, these modules ensure robust reconstruction under diverse scenarios and highlight the unique synergy between physics-inspired priors and deep architectures.
While grounded in the Linear Mixture Model (LMM), we explicitly acknowledge its limitations under nonlinear mixing and illumination variations. DSCSRN addresses these challenges through DEAM, which dynamically refines endmember spectra, and the DDSA mechanism in SPFRM, which enforces spatial–spectral consistency. This hybrid design enhances robustness to practical deviations from LMM, distinguishing DSCSRN from conventional LMM-based methods.
Extensive experiments on three widely used datasets—Chikusei, PaviaC, and CAVE—demonstrate that DSCSRN consistently surpasses state-of-the-art methods across multiple evaluation metrics, exhibiting superior performance in both spatial detail preservation and spectral fidelity. The integration of symmetry principles into both the network architecture and the feature interaction process contributes to this improvement, ensuring that reconstructed HSIs maintain both physical realism and structural balance. The strong performance across diverse datasets further highlights the generalization ability of the proposed model in handling various HSI types and scene domains. Ablation studies also validate the effectiveness and necessity of each proposed module.
Future research will explore cross-modal collaboration mechanisms and endmember uncertainty modeling to further enhance the generalization and robustness of DSCSRN in real-world scenarios. In particular, future work will extend the evaluation to task-driven metrics (e.g., classification accuracy on super-resolved HSIs) and investigate more sophisticated symmetry-aware feature modeling strategies, enabling the network to exploit higher-order geometric and spectral symmetries for improved adaptability. Moreover, the symmetry-driven design principle is not confined to super-resolution; it holds potential for broader inverse imaging problems such as unmixing, fusion, and denoising, where maintaining cross-domain consistency is crucial. We will further pursue optimized symmetry-aware strategies to extend DSCSRN’s applicability across these tasks.

Author Contributions

Conceptualization, J.L. and X.C.; methodology, J.L. and X.C.; validation, X.C. and X.H.; formal analysis, M.Y. and G.W.; investigation, X.H. and M.Y.; data curation, X.C.; writing—original draft preparation, J.L.; writing—review and editing, M.Y. and X.C.; visualization, M.Y. and G.W.; supervision, X.C.; project administration, M.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China (Grant No. 41701527) and the Natural Science Foundation of Hubei Province (Grant No. 2022CFB447).

Data Availability Statement

The datasets used and/or analyzed during the current study are publicly available and can be accessed without any restrictions. If needed, they can also be obtained by contacting the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Luo, J.; Yang, Z.; Li, S.; Wu, Y. FPCB surface defect detection: A decoupled two-stage object detection framework. IEEE Trans. Instrum. Meas. 2021, 70, 5012311. [Google Scholar] [CrossRef]
  2. Zhang, L.; Zhang, M.; Huang, J.; Zhang, C.; Ye, F.; Pan, W. A new approach for mineral mapping using drill-core hyperspectral image. IEEE Geosci. Remote Sens. Lett. 2023, 20, 5511705. [Google Scholar] [CrossRef]
  3. Avola, G.; Matese, A.; Riggi, E. An overview of the special issue on “precision agriculture using hyperspectral images”. Remote Sens. 2023, 15, 1917. [Google Scholar] [CrossRef]
  4. Ellis, R.J.; Scott, P.W. Evaluation of hyperspectral remote sensing as a means of environmental monitoring in the St. Austell China clay (kaolin) region, Cornwall, UK. Remote Sens. Environ. 2004, 93, 118–130. [Google Scholar] [CrossRef]
  5. Karaca, A.C.; Çeşmeci, D.; Ertürk, A.; Güllü, M.K.; Ertürk, S. Hyperspectral change detection with stereo disparity information enhancement. In Proceedings of the IEEE 2014 6th Workshop on Hyperspectral Image and Signal Processing: Evolution in Remote Sensing (WHISPERS), Lausanne, Switzerland, 24–27 June 2014; pp. 1–4. [Google Scholar]
  6. Guo, S.; Wang, Y.; Tan, Y.; Liu, T.; Qin, Q. Efficient Coastal Mangrove Species Recognition Using Multi-Scale Features Enhanced by Multi-Head Attention. Symmetry 2025, 17, 461. [Google Scholar] [CrossRef]
  7. Wang, A.; Li, M.; Wu, H. A Novel Classification Framework for Hyperspectral Image Data by Improved Multilayer Perceptron Combined with Residual Network. Symmetry 2022, 14, 611. [Google Scholar] [CrossRef]
  8. Wang, X.; Hu, Q.; Cheng, Y.; Ma, J. Hyperspectral image super-resolution meets deep learning: A survey and perspective. IEEE/CAA J. Autom. Sin. 2023, 10, 1668–1691. [Google Scholar] [CrossRef]
  9. Xie, Q.; Zhou, M.; Zhao, Q.; Meng, D.; Zuo, W.; Xu, Z. Multispectral and hyperspectral image fusion by MS/HS fusion net. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–19 June 2019; pp. 1585–1594. [Google Scholar]
  10. Chen, C.; Wang, Y.; Zhang, N.; Zhang, Y.; Zhao, Z. A review of hyperspectral image super-resolution based on deep learning. Remote Sens. 2023, 15, 2853. [Google Scholar] [CrossRef]
  11. Dong, C.; Loy, C.C.; He, K.; Tang, X. Learning a deep convolutional network for image super-resolution. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, 6–12 September 2014, Proceedings, Part IV 13; Springer International Publishing: Berlin/Heidelberg, Germany, 2014; pp. 184–199. [Google Scholar]
  12. Wang, Y.; Huang, Z.; Wang, X.; Zhang, S.; Liu, S.; Feng, L. Lightweight Edge-Guided Super-Resolution Network for Remote Sensing Images. IEEE Trans. Geosci. Remote Sens. 2025, 63, 5626714. [Google Scholar] [CrossRef]
  13. Dong, W.; Fu, F.; Shi, G.; Cao, X.; Wu, J.; Li, G.; Li, X. Hyperspectral image super-resolution via non-negative structured sparse representation. IEEE Trans. Image Process. 2016, 25, 2337–2352. [Google Scholar] [CrossRef] [PubMed]
  14. Chang, Y.; Yan, L.; Zhao, X.L.; Fang, H.; Zhang, Z.; Zhong, S. Weighted low-rank tensor recovery for hyperspectral image restoration. IEEE Trans. Cybern. 2020, 50, 4558–4572. [Google Scholar] [CrossRef]
  15. Dixit, M.; Yadav, R.N. A review of single image super resolution techniques using convolutional neural networks. Multimed. Tools Appl. 2024, 83, 29741–29775. [Google Scholar] [CrossRef]
  16. Mei, S.; Yuan, X.; Ji, J.; Zhang, Y.; Wan, S.; Du, Q. Hyperspectral image spatial super-resolution via 3D full convolutional neural network. Remote Sens. 2017, 9, 1139. [Google Scholar] [CrossRef]
  17. Li, Q.; Yuan, Y.; Jia, X.; Wang, Q. Dual-stage approach toward hyperspectral image super-resolution. IEEE Trans. Image Process. 2022, 31, 7252–7263. [Google Scholar] [CrossRef] [PubMed]
  18. Hu, J.F.; Huang, T.Z.; Deng, L.J.; Jiang, T.X.; Vivone, G.; Chanussot, J. Hyperspectral image super-resolution via deep spatiospectral attention convolutional neural networks. IEEE Trans. Neural Netw. Learn. Syst. 2021, 33, 7251–7265. [Google Scholar] [CrossRef] [PubMed]
  19. Li, J.; Zheng, K.; Gao, L.; Ni, L.; Huang, M.; Chanussot, J. Model-informed multi-stage unsupervised network for hyperspectral image super-resolution. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5516117. [Google Scholar]
  20. Yan, S.; Li, M.; He, Y.; Gou, Y.; Zhang, Y. SHSRD: Efficient Conditional Diffusion Model for Single Hyperspectral Image Super-Resolution. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2025, 18, 15537–15550. [Google Scholar] [CrossRef]
  21. Li, A.; Zhang, L.; Liu, Y.; Zhu, C. Exploring frequency-inspired optimization in transformer for efficient single image super-resolution. IEEE Trans. Pattern Anal. Mach. Intell. 2025, 47, 3141–3158. [Google Scholar] [CrossRef]
  22. Wang, Z.; Feng, P.; Lin, Y.; Cai, S.; Bian, Z.; Yan, J.; Zhu, X. Crowdvlm-r1: Expanding r1 ability to vision language model for crowd counting using fuzzy group relative policy reward. arXiv 2025, arXiv:2504.03724. [Google Scholar]
  23. Lu, W.; Wang, J.; Wang, T.; Zhang, K.; Jiang, X.; Zhao, H. Visual style prompt learning using diffusion models for blind face restoration. Pattern Recognit. 2025, 161, 111312. [Google Scholar] [CrossRef]
  24. Yao, Z.; Zhu, Q.; Zhang, Y.; Huang, H.; Luo, M. Minimizing Long-Term Energy Consumption in RIS-Assisted UAV-Enabled MEC Network. IEEE Internet Things J. 2025, 12, 20942–20958. [Google Scholar] [CrossRef]
  25. Irmak, H.; Akar, G.B.; Yuksel, S.E. A MAP-based approach for hyperspectral imagery super-resolution. IEEE Trans. Image Process. 2018, 27, 2942–2951. [Google Scholar] [CrossRef]
  26. Nascimento, J.M.P.; Bioucas-Dias, J.M. Nonlinear mixture model for hyperspectral unmixing. In Proceedings of the SPIE Image and Signal Processing for Remote Sensing XV, Berlin, Germany, 31 August–3 September 2009; Volume 7477, pp. 157–164. [Google Scholar]
  27. Wang, X.; Ma, J.; Jiang, J.; Zhang, X.P. Dilated projection correction network based on autoencoder for hyperspectral image super-resolution. Neural Netw. 2022, 146, 107–119. [Google Scholar] [CrossRef] [PubMed]
  28. Zhu, X.; Cheng, D.; Zhang, Z.; Lin, S.; Dai, J. An empirical study of spatial attention mechanisms in deep networks. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 6688–6697. [Google Scholar]
  29. Fan, E. Extended tanh-function method and its applications to nonlinear equations. Phys. Lett. A 2000, 277, 212–218. [Google Scholar] [CrossRef]
  30. Donoho, D.L. High-dimensional data analysis: The curses and blessings of dimensionality. AMS Math Chall. Lect. 2000, 1, 32. [Google Scholar]
  31. Hendrycks, D.; Gimpel, K. Gaussian error linear units (gelus). arXiv 2016, arXiv:1606.08415. [Google Scholar]
  32. Liu, W.; Wen, Y.; Yu, Z.; Yang, M. Large-margin softmax loss for convolutional neural networks. arXiv 2016, arXiv:1612.02295. [Google Scholar]
  33. Chollet, F. Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1251–1258. [Google Scholar]
  34. Hua, B.S.; Tran, M.K.; Yeung, S.K. Pointwise convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 984–993. [Google Scholar]
  35. Kuhar, E.J.; Stahle, C.V. Dynamic transformation method for modal synthesis. AIAA J. 1974, 12, 672–678. [Google Scholar] [CrossRef]
  36. Åhlander, K. Einstein summation for multidimensional arrays. Comput. Math. Appl. 2002, 44, 1007–1017. [Google Scholar] [CrossRef]
  37. Yokoya, N.; Iwasaki, A. Airborne Hyperspectral Data Over Chikusei; SAL-2016-05-27; Space Application Laboratory, The University of Tokyo: Tokyo, Japan, 2016; p. 5. [Google Scholar]
  38. Huang, X.; Zhang, L. A comparative study of spatial approaches for urban mapping using hyperspectral ROSIS images over Pavia City, northern Italy. Int. J. Remote Sens. 2009, 30, 3205–3221. [Google Scholar] [CrossRef]
  39. Yasuma, F.; Mitsunaga, T.; Iso, D.; Nayar, S.K. Generalized assorted pixel camera: Postcapture control of resolution, dynamic range, and spectrum. IEEE Trans. Image Process. 2010, 19, 2241–2253. [Google Scholar] [CrossRef]
  40. Micikevicius, P.; Narang, S.; Alben, J.; Diamos, G.; Elsen, E.; Garcia, D.; Ginsburg, B.; Houston, M.; Kuchaiev, O.; Venkatesh, G.; et al. Mixed precision training. arXiv 2017, arXiv:1710.03740. [Google Scholar]
  41. Barron, J.T. A general and adaptive robust loss function. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–19 June 2019; pp. 4331–4339. [Google Scholar]
  42. Singh, A.K.; Kumar, H.V.; Kadambi, G.R.; Kishore, J.K.; Shuttleworth, J.; Manikandan, J. Quality metrics evaluation of hyperspectral images. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2014, 40, 1221–1226. [Google Scholar] [CrossRef]
  43. Wang, Z.; Bovik, A.C.; Sheikh, H.R.; Simoncelli, E.P. Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Process. 2004, 13, 600–612. [Google Scholar] [CrossRef] [PubMed]
  44. Wald, L. Quality of high resolution synthesised images: Is there a simple criterion? In Proceedings of the SEE/URISCA 3rd Conference “Fusion of Earth Data: Merging Point Measurements, Raster Maps and Remotely Sensed Images”, Sophia Antipolis, France, 26–28 January 2000; pp. 99–103. [Google Scholar]
  45. Yuhas, R.H.; Boardman, J.W.; Goetz, A.F.H. Determination of semi-arid landscape endmembers and seasonal trends using convex geometry spectral unmixing techniques. In Proceedings of the JPL 4th Annual JPL Airborne Geoscience Workshop—AVIRIS Workshop, Washington, DC, USA, 25–29 October 1993; Volume 1. [Google Scholar]
  46. Oppenheim, A.V. Discrete-Time Signal Processing; Pearson Education: Delhi, India, 1999. [Google Scholar]
  47. Willmott, C.J.; Matsuura, K. Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance. Clim. Res. 2005, 30, 79–82. [Google Scholar] [CrossRef]
  48. Wang, X.; Hu, Q.; Jiang, J.; Ma, J. A group-based embedding learning and integration network for hyperspectral image super-resolution. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–16. [Google Scholar] [CrossRef]
  49. Liu, D.; Li, J.; Yuan, Q.; Zheng, L.; He, J.; Zhao, S.; Xiao, Y. An efficient unfolding network with disentangled spatial-spectral representation for hyperspectral image super-resolution. Inf. Fusion 2023, 94, 92–111. [Google Scholar] [CrossRef]
  50. Chen, S.; Zhang, L.; Zhang, L. Cross-scope spatial-spectral information aggregation for hyperspectral image super-resolution. IEEE Trans. Image Process. 2024, 33, 5878–5891. [Google Scholar] [CrossRef] [PubMed]
  51. Hu, Q.; Wang, X.; Jiang, J.; Zhang, X.P.; Ma, J. Exploring the spectral prior for hyperspectral image super-resolution. IEEE Trans. Image Process. 2024, 33, 5260–5272. [Google Scholar] [CrossRef]
Figure 1. The overall framework of the proposed DSCSRN.
Figure 1. The overall framework of the proposed DSCSRN.
Symmetry 17 01520 g001
Figure 2. The overall structure of the CRSDN module.
Figure 2. The overall structure of the CRSDN module.
Symmetry 17 01520 g002
Figure 3. The architecture of the DDSA module.
Figure 3. The architecture of the DDSA module.
Symmetry 17 01520 g003
Figure 4. The structure of the DEAM module.
Figure 4. The structure of the DEAM module.
Symmetry 17 01520 g004
Figure 5. Three groups of reconstructed images from the Chikusei dataset and their corresponding mean spectral difference curves.
Figure 5. Three groups of reconstructed images from the Chikusei dataset and their corresponding mean spectral difference curves.
Symmetry 17 01520 g005
Figure 6. Three groups of reconstructed images from the Pavia Center dataset and their corresponding mean spectral difference curves.
Figure 6. Three groups of reconstructed images from the Pavia Center dataset and their corresponding mean spectral difference curves.
Symmetry 17 01520 g006
Figure 7. Three groups of reconstructed images from the CAVE dataset and their corresponding mean spectral difference curves.
Figure 7. Three groups of reconstructed images from the CAVE dataset and their corresponding mean spectral difference curves.
Symmetry 17 01520 g007
Figure 8. Cross-dataset evaluation results when trained on Chikusei and tested on Pavia.
Figure 8. Cross-dataset evaluation results when trained on Chikusei and tested on Pavia.
Symmetry 17 01520 g008
Figure 9. PSNR–FLOPs–Params comparison for the Chikusei dataset. Blue squares denote existing methods and the red circle represents the proposed DSCSRN.
Figure 9. PSNR–FLOPs–Params comparison for the Chikusei dataset. Blue squares denote existing methods and the red circle represents the proposed DSCSRN.
Symmetry 17 01520 g009
Figure 10. Visualization of ablation study results for the Pavia Center dataset and the corresponding mean spectral difference curves.
Figure 10. Visualization of ablation study results for the Pavia Center dataset and the corresponding mean spectral difference curves.
Symmetry 17 01520 g010
Table 1. Descriptions of the Chikusei, PaviaC, and CAVE datasets.
Table 1. Descriptions of the Chikusei, PaviaC, and CAVE datasets.
DatasetSpatial ResolutionSpectral BandsWavelength Range (nm)Image Size (C × H × W)Sensor/Platform
Chikusei2.5 m128363–1018128 × 2517 × 2335Headwall Hyperspec VNIR
Pavia Center1.3 m102430–860102 × 1096 × 1096ROSIS
CAVE~0.3 mm/pixel31400–70031 × 512 × 512Indoor Acquisition
Table 2. Training and validation efficiency.
Table 2. Training and validation efficiency.
Dataset PhaseAvg Time per Epoch (s)Total Training Time (min)GPU Memory Usage
ChikuseiTraining7513755.1 GB
Validation (per image)2.76-3.5 GB
Pavia CenterTraining458254.9 GB
Validation (per image)1.30-3.3 GB
CAVETraining264774.7 GB
Validation (per image)4.95-3.8 GB
Table 3. Quantitative performance of the Chikusei dataset. Note: Bold values indicate the best performance, and underlined values denote the second-best. d denotes the upscaling factor.
Table 3. Quantitative performance of the Chikusei dataset. Note: Bold values indicate the best performance, and underlined values denote the second-best. d denotes the upscaling factor.
MethodsdPSNR↑SSIM↑ERGAS↓SAM↓CC↑RMSE↓
Bicubic440.01070.92115.56022.66700.92460.0129
FCNN [16]441.19590.94194.98282.34630.94070.0111
RFSR [48]442.74930.95684.17462.01350.95650.0095
DualSR [17]441.60730.94665.99222.41820.94110.0105
aeDPCN [27]442.55650.95454.23071.93710.95510.0098
EUNet [49]442.88890.95754.09471.86480.95790.0094
CST [50]443.06380.95924.02901.87750.95950.0093
SNLSR [51]443.16220.96013.99861.80070.96000.0092
DSCSRN443.42830.96183.88861.75420.96180.0089
Bicubic836.73400.84954.01934.00130.83480.0190
FCNN [16]836.81210.85114.01013.85570.84870.0174
RFSR [48]838.01380.87883.77093.53750.86550.0168
DualSR [17]837.35670.86724.06353.93460.85450.0172
aeDPCN [27]838.34040.88843.35433.05870.88330.0160
EUNet [49]838.30950.88823.37003.06850.88280.0161
CST [50]838.05690.88243.47913.42180.87680.0164
SNLSR [51]838.52560.89203.30363.00020.88730.0157
DSCSRN838.70120.89613.25262.91460.89100.0154
Table 4. Quantitative performance of the Pavia Center dataset. Note: Bold values indicate the best performance, and underlined values denote the second-best. d denotes the upscaling factor.
Table 4. Quantitative performance of the Pavia Center dataset. Note: Bold values indicate the best performance, and underlined values denote the second-best. d denotes the upscaling factor.
MethodsdPSNR↑SSIM↑ERGAS↓SAM↓CC↑RMSE↓
Bicubic428.94640.75616.35845.90780.88920.0380
FCNN [16]429.67090.80055.86346.01060.90520.0348
RFSR [48]430.49100.83355.33985.42870.92140.0317
DualSR [17]429.85520.80685.76096.37100.90730.0343
EUNet [49]430.50210.83415.33865.25270.92150.0316
CST [50]429.74510.83115.58295.38850.90210.0337
SNLSR [51]430.46620.83625.34925.08230.92110.0319
DSCSRN430.64270.84055.25315.07320.92380.0311
Bicubic826.04200.54537.92157.61570.77080.0542
FCNN [16]826.28820.55247.42057.58310.77590.0532
RFSR [48]826.58720.59887.16627.34750.78990.0511
DualSR [17]825.94740.55137.46979.38030.76220.0544
EUNet [49]826.52510.59707.18267.39870.79480.0510
CST [50]826.52820.55707.16527.29880.79840.0511
SNLSR [51]826.59140.60377.15667.24270.79730.0505
DSCSRN826.62750.61187.13657.01530.79970.0504
Table 5. Quantitative performance of the CAVE dataset. Note: Bold values indicate the best performance, and underlined values denote the second-best. d denotes the upscaling factor.
Table 5. Quantitative performance of the CAVE dataset. Note: Bold values indicate the best performance, and underlined values denote the second-best. d denotes the upscaling factor.
MethodsdPSNR↑SSIM↑ERGAS↓SAM↓CC↑RMSE↓
Bicubic435.83030.96244.35693.60990.99200.0170
FCNN [16]436.41730.95735.10943.29350.99280.0155
RFSR [48]439.14740.97302.95523.95090.99630.0116
DualSR [17]438.96820.97272.96833.98880.99600.0120
aeDPCN [27]437.65790.96893.51533.68760.99470.0138
EUNet [49]439.13030.97482.95233.41060.99620.0117
CST [50]440.10060.97723.36862.91190.99690.0103
SNLSR [51]439.87430.97772.70043.35530.99680.0107
DSCSRN440.14490.97842.67923.26170.99680.0105
Bicubic830.96450.91143.67454.62260.97660.0297
FCNN [16]832.33650.92462.97524.59750.98020.0247
RFSR [48]834.53220.93862.52264.15010.98780.0201
DualSR [17]834.24300.93472.56935.16260.98680.0208
aeDPCN [27]833.34740.93172.81434.55460.98590.0227
EUNet [49]834.04720.93722.63584.42450.98740.0208
CST [50]834.65270.92123.06055.61850.98980.0193
SNLSR [51]834.61810.94372.49174.16390.98860.0194
DSCSRN834.69350.94382.48204.14580.98880.0192
Table 6. Quantitative performance of cross-dataset evaluation, where models are trained on the Chikusei dataset and tested on the Pavia dataset. Note: Bold values indicate the best performance, and underlined values denote the second-best.
Table 6. Quantitative performance of cross-dataset evaluation, where models are trained on the Chikusei dataset and tested on the Pavia dataset. Note: Bold values indicate the best performance, and underlined values denote the second-best.
MethodsPSNR↑SSIM↑ERGAS↓SAM↓CC↑RMSE↓
FCNN [16]28.62690.75906.83447.51960.88490.0403
DualSR [17]28.39310.72366.95608.45170.87050.0410
EUNet [49]26.60290.69948.880815.09280.88120.0494
CST [50]27.29890.71558.055713.39220.88110.0457
SNLSR [51]28.32280.73076.92686.19270.88690.0406
DSCSRN28.95130.75666.35505.91910.88930.0381
Table 7. Ablation study results for the Pavia Center dataset. Note: Bold values indicate the best performance, and underlined values denote the second-best.
Table 7. Ablation study results for the Pavia Center dataset. Note: Bold values indicate the best performance, and underlined values denote the second-best.
MethodsPSNR↑SSIM↑SAM↓RMSE↓
w/o CRSDN29.70700.79615.70980.0347
w/o DEAM30.54310.83605.24380.0314
Concatenation30.54960.83645.15940.0314
Addition30.54340.83775.07220.0315
DSCSRN30.64270.84055.07320.0311
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Chang, X.; Liu, J.; Wen, G.; Huang, X.; Yan, M. DSCSRN: Physically Guided Symmetry-Aware Spatial-Spectral Collaborative Network for Single-Image Hyperspectral Super-Resolution. Symmetry 2025, 17, 1520. https://doi.org/10.3390/sym17091520

AMA Style

Chang X, Liu J, Wen G, Huang X, Yan M. DSCSRN: Physically Guided Symmetry-Aware Spatial-Spectral Collaborative Network for Single-Image Hyperspectral Super-Resolution. Symmetry. 2025; 17(9):1520. https://doi.org/10.3390/sym17091520

Chicago/Turabian Style

Chang, Xueli, Jintong Liu, Guotao Wen, Xiaoyu Huang, and Meng Yan. 2025. "DSCSRN: Physically Guided Symmetry-Aware Spatial-Spectral Collaborative Network for Single-Image Hyperspectral Super-Resolution" Symmetry 17, no. 9: 1520. https://doi.org/10.3390/sym17091520

APA Style

Chang, X., Liu, J., Wen, G., Huang, X., & Yan, M. (2025). DSCSRN: Physically Guided Symmetry-Aware Spatial-Spectral Collaborative Network for Single-Image Hyperspectral Super-Resolution. Symmetry, 17(9), 1520. https://doi.org/10.3390/sym17091520

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop