Recursive Deep Feature Learning for Hyperspectral Image Super-Resolution

Liu, Jiming; Yi, Chen; Li, Hehuan

doi:10.3390/app16021060

Open AccessArticle

Recursive Deep Feature Learning for Hyperspectral Image Super-Resolution

by

Jiming Liu

,

Chen Yi

^* and

Hehuan Li

School of Communication and Information Engineering, Xi’an University of Posts and Telecommunications, Xi’an 710100, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2026, 16(2), 1060; https://doi.org/10.3390/app16021060

Submission received: 3 December 2025 / Revised: 12 January 2026 / Accepted: 19 January 2026 / Published: 20 January 2026

(This article belongs to the Special Issue Remote Sensing Image Processing and Application, 2nd Edition)

Download

Browse Figures

Versions Notes

Featured Application

Based on the recursive transformer, we conducted research on hyperspectral image super-resolution reconstruction, and the results can be applied to object detection, trajectory tracking, etc.

Abstract

The advancement of hyperspectral image super-resolution (HSI-SR) has been significantly propelled by deep learning techniques. However, current methods predominantly rely on 2D or 3D convolutional networks, which are inherently local and thus limited in modeling long-range spectral–depth interactions. This work introduces a novel network architecture designed to address this gap through recursive deep feature learning. Our model initiates with 3D convolutions to extract preliminary spectral–spatial features, which are progressively refined via densely connected grouped convolutions. A core innovation is a recursively formulated generalized self-attention mechanism, which captures long-range dependencies across the spectral dimension with linear complexity. To reconstruct fine spatial details across multiple scales, a progressive upsampling strategy is further incorporated. Evaluations on several public benchmarks demonstrate that the proposed approach outperforms existing state-of-the-art methods in both quantitative metrics and visual quality.

Keywords:

hyperspectral image; super-resolution; convolutional neural network; self-attention

1. Introduction

Hyperspectral imaging distinguishes itself by acquiring a detailed spectral signature for every spatial pixel, covering dozens to hundreds of narrow, contiguous bands. Operating primarily within the 400–2500 nm range with a resolution often finer than 10 nm, this technology generates continuous spectral curves for each pixel. These curves serve as unique fingerprints, enabling the discrimination of materials based on subtle spectral absorption and reflection features that are typically indistinguishable in broader-band multispectral data [1]. Owing to this high spectral fidelity, hyperspectral remote sensing has emerged as an indispensable tool across numerous fields. Its applications span from mineral exploration and environmental assessment to precision agriculture, forestry management, and defense-related surveillance [2]. Nevertheless, the very richness of the information it provides poses substantial challenges for subsequent image processing and the development of robust information extraction algorithms.

However, the practical utility of this rich spectral information is often hampered by a fundamental limitation: low spatial resolution. Due to hardware constraints and environmental factors during data acquisition [3,4], fine spatial details are frequently blurred or lost. Consequently, fine spatial details are often blurred or lost in the raw data. This loss of high-frequency information directly undermines a key strength of hyperspectral image (HSI)—the precise identification and discrimination of materials or objects—which in turn limits its utility in real-world scenarios [5]. One avenue to obtain higher-resolution data is through advances in sensor hardware, but this approach is often prohibitively expensive and time-consuming. Therefore, the research community has increasingly turned to computational image processing methods as a viable and efficient alternative for hyperspectral image super-resolution [6,7].

Driven by its practical value as a cost-efficient means to improve image fidelity, hyperspectral image super-resolution (HSI-SR) has become a vibrant area of research [8,9,10,11,12]. The prevailing methodologies in this field can be broadly categorized by the type of input they utilize. Fusion-based techniques aim to reconstruct a high-resolution HSI by integrating a low-resolution HSI with complementary high-resolution data from another sensor modality, such as RGB or panchromatic imagery [13,14,15,16,17,18,19,20]. In contrast, single-image-based (or blind) SR methods tackle the more challenging task of upscaling a solitary low-resolution HSI without any auxiliary inputs, which is the primary focus of this work [15]. This single-image SR problem is inherently ill-posed, as reconstructing the lost high-frequency details from a single degraded observation admits an infinite number of plausible solutions [21]. Pioneering efforts to address this challenge were predominantly model based. They employed carefully handcrafted priors—like sparsity and learned dictionaries—to constrain the solution space [21,22,23,24,25]. While theoretically sound, these approaches often suffer from two major drawbacks: computationally expensive iterative optimization at test time, and a reliance on priors that are both limited to the internal statistics of the given image and lack the generalizability afforded by large-scale external data [26].

In recent years, with the rapid advancement of deep learning, numerous HSI-SR methods have been proposed, achieving promising results [27,28,29,30,31,32,33,34,35,36,37,38,39]. Most existing approaches focus either on spatial feature extraction through 2D convolutions or on spectral feature learning through 3D convolutions. However, these methods do not adequately exploit spectral–depth information: while 2D convolutions are unable to capture spectral dependencies, the extensive use of 3D convolutions often incurs high computational complexity and substantial memory consumption, particularly under large upscaling factors [40,41].

Recently, Transformer architectures have shown remarkable success in capturing long-range dependencies for image restoration tasks [42]. Their potential for HSI-SR is evident, as the self-attention mechanism can, in principle, model global spectral correlations. However, direct application is hindered by two major challenges: (i) the quadratic computational complexity of standard self-attention is prohibitive for high-dimensional hyperspectral data and (ii) effective training that often requires large datasets, which are scarce in the HSI domain. To reconcile efficiency with global modeling, recent efforts have explored hybrid CNN-Transformer architectures [43,44,45]. These works highlight a promising direction but often either incur significant computational overhead or do not fully address the efficient learning of spectral-specific long-range dependencies with limited data.

Based on the aforementioned analysis, this paper proposes the recursive deep feature learning network (RFLSR), designed to resolve the core dilemma between spectral dependency modeling and computational efficiency. The primary novelties and contributions of this work are summarized as follows:

First, we introduce a novel, synergistic framework for HSI-SR. Unlike previous works that often stack isolated improvements, RFLSR presents a coherent architectural paradigm that unifies efficient shallow spectral–spatial extraction with a deeply recursive feature learning pathway. This holistic design ensures that each component functionally reinforces the others.

Second, and most critically, we propose the first adaptation of a recursive generalized self-attention mechanism to HSI-SR. This is a key theoretical contribution, as it provides a provably linear-complexity solution for modeling long-range, global dependencies across the spectral depth—a capability that is either missing in 2D convolutions or prohibitively expensive in standard 3D/Transformer models. This directly addresses a fundamental scalability bottleneck in the field.

Third, we develop a dedicated grouped convolution strategy and integrate a progressive upsampling scheme within the deep learning pipeline. These are significant practical innovations that enhance spectral feature diversity and multi-scale detail recovery, respectively. Crucially, their integration within the RFLSR framework is novel and validated by our ablation studies to be essential for achieving state-of-the-art performance.

We conduct extensive experiments on multiple benchmark datasets, which demonstrate that our proposed framework not only achieves consistent superior performance over existing methods but also establishes a new, efficient direction for global context modeling in HSI-SR.

2. Materials and Methods

2.1. Overall Framework

Hyperspectral images consist of tens to hundreds of spectral bands, which provide abundant spectral and spatial information. To fully exploit this information, we propose two dedicated modules, namely the Local Feature Learning Module and the Deep Feature Learning Module. As illustrated in Figure 1, these two components collaboratively extract local information and deep features from hyperspectral images. The input low-resolution hyperspectral image is denoted as I_LR ∈

R

^h×w×C, where h × w represents the spatial resolution and C denotes the number of spectral bands. The corresponding original high-resolution hyperspectral image is represented as I_HR ∈

R

^H×W×C, while the reconstructed high-resolution output is denoted as I_SR ∈

R

^H×W×C. Our objective is to predict I_SR solely from I_LR using the proposed RFLSR, with the aim of making I_SR approximate I_HR as closely as possible. This process can be formulated as:

I_{S R} = H_{R F L S R} (I_{L R})

(1)

In the equation,

H_{R F L S R}

denotes the mapping function of the proposed RFLSR model. Given the high dimensionality of hyperspectral images, which contain abundant spectral information across multiple bands, effectively modeling such information is of great importance. Existing methods primarily adopt 3D convolutions to capture spectral features. Therefore, this study likewise employs 3D convolution layers to initially extract spectral representations. As illustrated in Figure 1, the input hyperspectral image

F_{L R}^{'}

∈

R

^h×w×C×1 is processed by our 3D convolution layer (Figure 2) to generate the feature F_3D. Here, residual connections are introduced by adding them to the initial data, after which the result is reshaped to produce the output of the shallow feature learning module, denoted as F₀. This process can be formulated as:

F_{L R}^{'} = H_{r e s h a p e} (I_{L R})

(2)

F_{0} = {r e s h a p e (F}_{L R}^{'} + H_{3 D - l a y e r} {(F}_{L R}^{'}))

(3)

Here,

H_{3 D - l a y e r}

represents the 3D convolution operation shown in Figure 2. To balance feature learning effectiveness with computational complexity, the 3D convolution layer is configured with two sub-layers, each consisting of two 3 × 3 × 3 3D convolution operations, with a ReLU activation function applied between them. The specific configurations and parameters are detailed in the subsequent ablation experiments. The shallow features are then processed using progressive operations. In contrast to most existing methods, our deep feature learning module consists of

\log_{2} s

layers, where s represents the size factor, allowing us to capture deep features at multiple scales. This study primarily focuses on the case where s = 4. When s = 4, the results can be expressed as:

F_{D F L M}^{2} = H_{D F L M} (H_{D F L M} (F_{0}))

(4)

I_{H R} = F_{D F L M}^{2} + I_{L R}^{'}

(5)

Here,

H_{D F L M}

denotes the operation of the deep feature learning module, and

I_{L R}^{'}

represents the result obtained by applying bicubic upsampling to the original low-resolution hyperspectral image.

F_{D F L M}^{2}

indicates the feature map produced after applying the deep feature learning module twice.

2.2. 2D-Layer

In the shallow feature learning module, 3D convolution is employed to extract feature information along the spectral dimension, while the spatial features remain unexplored. Therefore, in the deep feature learning module, we first introduce 2D-layer structures to capture spatial information, as illustrated in Figure 3. Existing hyperspectral image super-resolution methods typically either process all bands indiscriminately or group several bands together for processing. The former approach may overlook redundant information across certain bands, which could be crucial for improving super-resolution performance. The latter approach, due to the varying number of bands across different hyperspectral datasets, often requires distinct grouping strategies, thereby reducing robustness. Moreover, grouping in the presence of spectral overlaps significantly increases computational complexity. To address these limitations, this study introduces group convolution for spatial feature learning, where the input features are processed in two parts, beginning with splitting the input features according to their spectral bands.

F_{0}^{2 D - 1}, F_{0}^{2 D - 2} = H_{s p l i t} {(F}_{0}) \in R^{H \times W \times (\frac{C}{2})}

(6)

Here, the shallow features of the upsampled input are divided into two groups, with the first half of the spectral bands forming one group and the second half forming another. Each group is then processed independently through a dense convolutional network, as illustrated in Figure 2. Each dense convolutional network is composed of four 2D convolutional layers and three interleaved ReLU activation functions.

F_{D e n s e}^{1} = H_{D e n s e} (F_{0}^{2 D - 1})

(7)

F_{D e n s e}^{2} = H_{D e n s e} (F_{0}^{2 D - 2})

(8)

F_{D e n s e}^{3} = H_{D e n s e} {(F}_{0})

(9)

Here,

H_{D e n s e}

denotes the dense convolutional network. After independently processing the first and second halves of the spectral bands, the results of both parts are concatenated and followed by a ReLU activation function, which transforms the output into nonlinear low-frequency features. Furthermore, the same processing is applied to all bands, and the result is added to the concatenated output from the other branch.

F_{D e n s e} = F_{D e n s e}^{3} + σ (H_{c o n c a t} (F_{D e n s e}^{2} + F_{D e n s e}^{1}))

(10)

Here, σ denotes the ReLU activation function, and

H_{c o n c a t}

represents the concatenation operation. Finally, a channel attention mechanism is applied, followed by two convolutional layers, to enhance the integration of grouped features and enable more efficient information extraction.

F_{2 D}^{1} = f_{c o n v} (f_{c o n v} (f_{C A} (F_{D e n s e})))

(11)

Here,

f_{c o n v}

denotes a 3 × 3 2D convolution, and

f_{C A}

represents the channel attention operation.

2.3. Efficient Recursive Self-Attention

Simply using convolutional operations, due to the limited receptive field, can only capture local features and fails to model long-range dependencies. On the other hand, Transformer networks are capable of capturing internal similarities within the input image. However, traditional Transformers suffer from high computational complexity, especially when applied to the large volume of hyperspectral image data. Inspired by the Recursive Generalized Self-Attention mechanism [46], which theoretically ensures linear complexity while preserving global context through recursive steps, we introduce a lightweight recursive self-attention module. This design allows our model to approximate global interactions with linear computational complexity.as shown in Figure 4. The model architecture is invariant to the spectral sequence length; adapting it to a new dataset requires only configuring the in_chans parameter to match the number of bands.

Since the original recursive generalized self-attention was designed for RGB image super-resolution, it cannot be directly applied to hyperspectral data. Therefore, to capture long-range dependencies along the spectral dimension, we introduce several improvements, as illustrated in Figure 4.

Specifically, the input feature

X_{i n} \in R^{H \times W \times C}

is first processed by the Recursive Generalized Module (RGM) to obtain a spatially compressed representation

X^{'} \in R^{h \times w \times C}

, where (h, w) = (H/4, W/4).

The core cross-attention is then computed between the original feature

X_{i n}

and the compressed feature

X^{'}

. This involves three linear projections defined by learnable weight matrices

W_{Q}

,

W_{K}

, and

W_{V}

:

The Query (Q) matrix is derived from the original input

X_{i n}

via channel compression:

Q = W_{Q} X_{i n}

, where

W_{Q} \in R^{C \times C r}

and

C r = C / 2

. Thus,

Q \in R^{H \times W \times C r}

.

The Key (K) and Value (V) matrices are derived from the compressed feature X′:

K = W_{K} X^{'}

, with

W_{K} \in R^{C \times C r}

, yielding

K \in R^{h \times w \times C r}

,

V = W_{V} X^{'}

, with

W_{V} \in R^{C \times C}

, yielding

V \in R^{h \times w \times C}

.

The attention matrix A, which represents the affinity between each spatial location in the high-resolution query (Q) and the low-resolution key (K), is computed as:

{A = S o f t M a x (K}^{Τ} Q / \sqrt{h w})

(12)

where

K^{Τ}

denotes the transpose of K, and

\sqrt{h w}

is the scaling factor.

Finally, the output of the cross-attention mechanism is obtained by aggregating the values V with the guidance of the attention weights A, followed by a linear projection

W_{m} \in R^{C \times C}

for channel mixing:

Cross-Attention (X_{i n}, X^{'}) = W_{m} (A \cdot V)

(13)

The results of the cross-attention are reshaped to obtain the output

X_{o u t} \in R^{H \times W \times C}

. We employ multi-head attention with two heads. For simplicity, bias terms in the linear projections are omitted in the equations.

Justification for Linear Complexity: The key to achieving linear complexity lies in decoupling the global interaction into two efficient, sequential stages (similar to the RG-SA [46]’s “RGM” and “cross-attention” components):

Stage 1 (Local/Channel-wise Gating): The first recursive step performs element-wise or channel-wise gating operations (analogous to the RGM). This has a complexity of

O (H W C)

, which is linear with respect to the spatial size

H \times W

and channels

C

.

Stage 2 (Condensed Cross-Attention): Instead of applying self-attention directly on all

H \times W

tokens (which would be quadratic,

O ((H W)^{2})

), the mechanism applies attention on a recursively refined and condensed feature representation. This is analogous to the RG-SA’s cross-attention between the gated feature and a compressed context. As shown in the attachment, this leads to a dominant complexity term of

O (H W C^{2})

, which remains linear with respect to the spatial dimensions

H \times W

.

2.4. Loss Function

A substantial body of research has demonstrated the effectiveness of both L₁ and L₂ losses in super-resolution tasks [47]. While the L₂ loss often leads to overly smooth reconstructions, the L₁ loss provides a more balanced error distribution and exhibits better convergence properties. Therefore, we adopt the L₁ loss to evaluate the accuracy of the super-resolved images. The loss function is defined as:

L_{1} (Θ) = \frac{1}{N} \sum_{n = 1}^{N} {| | H_{h r}^{n} - H_{s r}^{n} | |}_{1}

(14)

Let N denote the number of images in a training batch, and Θ represents the set of parameters.

H_{h r}^{n}

and

H_{s r}^{n}

correspond to the high-resolution hyperspectral image and its super-resolved counterpart for the n-th image, respectively. Moreover, since L1 loss neglects the spectral characteristics in hyperspectral images, which may lead to spectral distortion, we have introduced the Spectral Angle Mapper (SAM) loss

L_{s p e} (Θ)

and the gradient loss

L_{g r a} (Θ)

, following the approach in [41].

L_{s p e} (Θ) = \frac{1}{N} \sum_{n = 1}^{N} \frac{1}{Π} a r c c o s (\frac{H_{h r}^{n} \cdot H_{s r}^{n}}{{| | H_{h r}^{n} | |}_{2} \cdot {{| | H}_{s r}^{n} | |}_{2}})

(15)

For the gradient loss, we first compute the gradient map of the hyperspectral image H.

\nabla H = (\nabla_{h} H, \nabla_{w} H, \nabla_{l} H)

(16)

M (H) = {| |\nabla H| |}_{2}

(17)

L_{g r a} (Θ) = \frac{1}{N} \sum_{n = 1}^{N} {| | M (H_{h r}^{n}) - M (H_{s r}^{n}) | |}_{1}

(18)

where M(⋅) denotes the operator for obtaining the gradient map, and

\nabla_{h}

,

\nabla_{w}

and

\nabla_{l}

represent the gradient functions along the horizontal, vertical, and spectral dimensions, respectively. Ultimately, the total loss for our training is given by:

L_{t o t a l} (Θ) = L_{1} + {λ_{1} L}_{s p e} + {λ_{2} L}_{g r a}

(19)

λ1 and λ2 represent the weights for the spectral loss and gradient loss, respectively. The initial values (λ1 = 0.5, λ2 = 0.1) were chosen by following the balanced configuration suggested in [28], which emphasizes spectral fidelity while regularizing spatial gradients. To ensure this configuration is suitable for our network and datasets, we performed a small-scale grid search on the validation split of the Chikusei dataset (scale factor ×4). We evaluated combinations where λ1 ∈ {0.1, 0.5, 1.0} and λ2 ∈ {0.05, 0.1, 0.2}. The combination (λ1 = 0.5, λ2 = 0.1) consistently yielded the best overall performance, particularly in terms of the spectral metric (SAM) and the comprehensive error metric (ERGAS). Therefore, we retained these values for all experiments.

3. Results and Discussion

3.1. Implementation Details (Network Architecture, Datasets and Experimental Setup)

To ensure reproducibility, we provide a complete specification of the proposed RFLSR’s architectural hyperparameters. Unless otherwise noted, these configurations are consistent across all experiments.

The input low-resolution HSI

I_{L R} \in R^{h \times w \times C}

is first reshaped to

F_{L R}^{'} \in R^{h \times w \times C \times 1}

by adding a singleton dimension, where

C

is the number of spectral bands (dataset-dependent). This results in an effective input channel dimension of 1 for the subsequent 3D convolutions. The initial 3D convolutional layer uses a kernel size of

3 \times 3 \times 3

, with

i n_c h a n n e l s

= 1 and

o u t_c h a n n e l s

= 32. All subsequent 3D convolutions in this module maintain

i n_c h a n n e l s

= 32 and

o u t_c h a n n e l s

= 32. The module consists of two residual sub-layers, each containing two 3D convolutions with ReLU activation in between (see Figure 2).

In the 2D-layer, the input feature map is split into two equal groups along the channel dimension for parallel processing via grouped convolutions (Equation (6)). Each dense convolutional branch comprises four 2D convolutional layers with intermediate channel dimensions of 64, 128, 64. The Channel Attention (CA) block uses a reduction ratio of 0.125. For the Efficient Recursive Self-Attention (ERSA) module (Figure 4): the Recursive Generalized Module (RGM) compresses spatial dimensions by a factor of 4; the channel compression ratio for generating Key and Query matrices is 0.5 (i.e.,

C r = C / 2

), while the Value matrix retains full channel dimension

C

; the number of attention heads is 2; and the feed-forward network employs a hidden dimension expansion factor of 2.

For an upscaling factor of

s = 4

, the network employs

{l o g}_{2} (4) = 2

upsampling stages. Each stage uses a sub-pixel convolution (pixel-shuffle) layer preceded by a

3 \times 3

convolutional layer.

The input HSI with

C

bands is projected to 32 feature channels by the initial 3D convolution. This dimension is maintained as the primary feature channel count through most of the deep feature learning pathway. The final layers reconstruct the output with

C

channels.

We utilized PyTorch 2.9.1 libraries to implement and train the proposed RFLSR, training our network with scaling factors of 4 and 8, respectively, and fine-tuning the hyperparameters to achieve optimal results. Unless specifically mentioned, the convolution kernels for our 3D convolutions are all 3 × 3 × 3, and for 2D convolutions, they are all 3 × 3. The reduction ratio for the channel attention module is set to 0.125, following the common practice established in [40]. This value provides an effective compromise between model efficiency and the capacity for modeling spectral-channel interdependencies. Our 3D layer consists of four 3D convolutions and two ReLU activation functions, which will be discussed in subsequent ablation studies. For the training process, we employed the Adam optimizer with default settings, training the network for 30 epochs, with a learning rate of 0.0001, and implemented it on a PyTorch setup using an NVIDIA RTX 4070 GPU.

We evaluated the proposed method on three datasets: Chikusei [48], CAVE [49], and Harvard [50], with scaling factors of 4 and 8. During training, standard data augmentation including random horizontal/vertical flips and 90-degree rotations was applied to the image patches to enhance generalization. The method was compared with six other approaches, including Bicubic, SFCSR [51], SSPSR [27], FRSR [52], MSD [41], and F3DUN [22]. Among these, MSDformer [41] represents a recent and state-of-the-art Transformer-based architecture specifically designed for HSI-SR, which has demonstrated superior performance in modeling complex spectral–spatial dependencies. Therefore, we consider it a strong and highly relevant Transformer-based benchmark for evaluating the efficacy of our proposed efficient attention mechanism. We also adjusted their hyperparameters as much as possible to achieve the best performance. To demonstrate the superiority of our method in terms of spectral and spatial quality, we adopted six commonly used evaluation metrics: Peak Signal-to-Noise Ratio (PSNR), Spectral Angle Mapper (SAM), Structural Similarity Index (SSIM), Correlation Coefficient (CC), Root Mean Square Error (RMSE), and Normalized Global Error (ERGAS). For brevity in table captions, we refer to these collectively as PQIS (Perceptual Quality and Image Similarity metrics). It is worth noting that the evaluation in this work relies on datasets with available ground-truth HR images, following the common benchmark protocol. For potential real-world applications where such ground truth is unavailable, performance assessment would require alternative strategies, such as expert-led qualitative inspection or indirect validation via improved performance in downstream tasks (e.g., classification or detection). The robust cross-dataset generalization demonstrated in our experiments (across Chikusei, CAVE, and Harvard) provides a foundational indication of the model’s applicability in more diverse, practical scenarios.

Statistical Analysis. To ensure the robustness and statistical significance of our performance comparisons, we conducted a rigorous statistical analysis. For each dataset and scaling factor, the performance metrics (PSNR, SSIM, SAM, CC, RMSE, ERGAS) were calculated for each individual test image. The reported performance for each method is expressed as the mean ± standard deviation across all test images, providing a measure of central tendency and variability.

To determine whether the performance difference between our proposed RFLSR method and a baseline method is statistically significant, we focused our formal inference on the primary image quality metric, PSNR. Recognizing that the distribution of such metrics may not satisfy the normality assumption required for parametric tests, we employed the non-parametric Wilcoxon signed-rank test. This test is appropriate for paired comparisons (i.e., comparing two methods on the same set of images) and does not rely on assumptions about the underlying data distribution. A p-value less than 0.05 was considered to indicate a statistically significant difference. The results of this test are indicated in the performance tables (Table 1, Table 2 and Table 3), where a dagger symbol (†) next to a baseline method’s PSNR value signifies that our RFLSR method is statistically superior to that baseline. The other metrics (SSIM, SAM, etc.) are reported for descriptive completeness.

Note on Baseline Re-implementation and Fair Comparison. The quantitative results presented in Table 1, Table 2 and Table 3 are obtained from our re-implementation of all methods—including the proposed RFLSR and the baseline approaches—under strictly identical experimental conditions. This encompasses: identical training and testing data partitions, the same data augmentation strategies, consistent optimization settings (Adam optimizer, 30 epochs, learning rate 0.0001, batch size 8), and unified evaluation code for all six metrics. While absolute performance values may exhibit slight variations from those reported in the original publications—a common occurrence in reproducibility studies due to differences in training details, hyperparameter tuning, and implementation nuances—the relative performance rankings observed in our experiments align with those established in the literature. Crucially, this controlled protocol ensures that any observed performance differences are attributable to architectural innovations rather than advantages in training strategy or data processing. For reference, we note that the original publications report the following best performances on similar test settings: MSD [41] achieves ~40.21 dB PSNR on Chikusei at ×4 scale; SSPSR [27] reports ~40.15 dB; and F3DUN [22] reports ~40.05 dB. Our re-implementations yield comparable results, validating the fairness of our comparative framework.

3.2. Experimental Results on Chikusei Dataset

The Chikusei dataset consists of hyperspectral images of Ibaraki, Japan, acquired using the Hyperspec-VNIR-CIRIS spectrometer. The ground sampling distance is 2.5 m, and the original scene size is 2517 × 2335 pixels with 128 spectral bands covering a spectral range from 363 nm to 1018 nm. For our experiments, we cropped the central region (size 2304 × 2048 × 128), which contains rich information, for training and validation purposes. Following prior works, we cropped the upper region into four non-overlapping hyperspectral images of size 512 × 512 × 128 for testing. The remaining area was used to extract overlapping patches for training, with 10% of these patches reserved for validation.

To ensure sufficient training samples and avoid boundary artifacts, we employed a stride-based overlapping cropping strategy during training patch extraction. Specifically:

For the scaling factor of ×4, we extracted high-resolution (HR) training patches of size 64 × 64 × 128 with a stride of 32 pixels, resulting in a 50% overlap (i.e., 32 pixels of overlap in both height and width dimensions).

For the scaling factor of ×8, we extracted HR training patches of size 128 × 128 × 128 with a stride of 64 pixels, similarly resulting in a 50% overlap.

For the scaling factor of ×2, we extracted HR training patches of size 32 × 32 × 128 with a stride of 26 pixels, similarly resulting in a 50% overlap.

This overlapping strategy generates a larger number of training samples from the limited available data while ensuring spatial continuity in the learned features. The HR patches were then bicubically downsampled by the respective scaling factors (×2, ×4 or ×8) to generate the corresponding low-resolution (LR) training pairs.

As shown in Table 1, we present a comparison between our method and several advanced methods on the Chikusei dataset, where bold represents the best result, underline denotes the second best. The performance of these methods is evaluated using the average values of six objective quantification metrics for scaling factors of 2, 4 and 8. From the table, it is evident that F3DUN, which only employs 3D-convolutions for hyperspectral image super-resolution, may overlook some spatial information, thus performing poorly on the Chikusei dataset. In contrast, SSPSR and MSD, which combine 2D and 3D convolutions, perform well on certain metrics. However, since SFCSR, FRSR, and SSPSR do not use Transformers, they might overlook important long-range dependencies. Therefore, our method consistently demonstrates superior performa-3nce across all objective metrics.

The statistical validation provided in Table 1 confirms the consistent superiority of our RFLSR method across all compared baselines and evaluation metrics. The performance improvements are statistically significant, further solidifying the effectiveness of our approach. This objectively reflects the high competitiveness of recent SOTA methods. Crucially, RFLSR’s advantage is most pronounced and often statistically significant in spectral fidelity (SAM) and overall reconstruction error (ERGAS, RMSE), underscoring its strength in delivering balanced, high-quality reconstruction for HSI.

For visual assessment, we generate false-color RGB images by assigning spectral bands 70 (~700 nm, red-edge), 100 (~850 nm, NIR), and 36 (~520 nm, green) to the red, green, and blue channels, respectively. This band combination is effective for highlighting vegetation health and land cover distinctions in remote sensing imagery. In Figure 5 and Figure 6, we also provide a visualization of the reconstruction results on the test set of the Chikusei dataset (for scaling factors of ×4 and ×8). Specifically, we selected the 36th, 70th, and 100th spectral bands from the Chikusei dataset and visualized them as RGB channels. It is evident that our method outperforms other algorithms in recovering finer details.

Furthermore, to quantitatively substantiate the visual superiority in spectral preservation evident in Figure 5, we computed the average PSNR for the three specific bands used in the visualization (bands 36, 70, and 100, corresponding to approximately 520 nm, 700 nm, and 850 nm, respectively). Our RFLSR method achieves PSNR of 41.2 dB, 40.8 dB, and 40.5 dB for these bands, compared to SSPSR’s 40.9 dB, 40.6 dB, 40.2 dB and MSD’s 40.8 dB, 40.5 dB, 40.1 dB. This confirms that the improved visual fidelity corresponds to measurable quantitative gains of 0.2–0.4 dB in these representative spectral bands, with particular strength in the critical red-edge region around 700 nm (band 70) where vegetation discrimination occurs.

3.3. Experimental Results on Harvard Dataset

The Harvard dataset consists of 77 hyperspectral images captured under daytime lighting conditions, both indoors and outdoors, using the Nuance FX camera by CRI Inc. Each image covers a wavelength range of 400 nm to 700 nm, evenly divided into 31 spectral bands, with a spatial resolution of 1040 × 1392. All images are stored in .mat file format. Our training samples are randomly cropped spatially, with 5 outdoor and 3 indoor images selected for the test set, and the remaining images used for training. When the scaling factor is 4, we cropped the images into 64 × 64 × 31 patches (with 32 pixels of overlap), and when the scaling factor is 8, we extracted 128 × 128 × 31 patches (with 64 pixels of overlap). These high spatial resolution hyperspectral images were bicubically downsampled to generate corresponding low-resolution hyperspectral images.

As shown in Table 2, we present a comparison of our method with several advanced methods on the Harvard dataset. The performance of these methods is evaluated using the average values of six objective quantification metrics for scaling factors of 4 and 8. From the table, it is evident that our method achieves excellent reconstruction performance on the Harvard dataset. Additionally, SSPSR, likely due to the introduction of channel attention, also achieved similar high performance. On the other hand, F3DUN still struggles on certain datasets, such as performing poorly on the CC metric, likely due to the loss of some spatial information.

The statistical analysis for the Harvard dataset (Table 2) reveals a highly competitive landscape. Our RFLSR method achieves the best mean performance across all metrics, with notably lower standard deviations, indicating superior result stability. The improvements over all baseline methods are statistically significant, with RFLSR consistently exhibiting an advantage in spectral preservation, as evidenced by lower SAM values across both scales. This underscores that our method’s primary contribution lies in enhancing spectral fidelity while maintaining state-of-the-art pixel-level accuracy.

To complement the visual assessment in Figure 7, we quantified the reconstruction accuracy for the specific bands (8, 15, 23) composing the RGB visualization. RFLSR achieves PSNR of 43.5 dB, 43.1 dB, and 42.8 dB for these bands, outperforming SSPSR (43.2 dB, 42.9 dB, 42.5 dB) and MSD (43.1 dB, 42.8 dB, 42.4 dB) by margins of 0.2–0.4 dB. This band-wise analysis reinforces that our method’s advantage extends beyond composite metrics to individual spectral components.

The visualized results are presented as RGB images composed of bands 8 (~470 nm, blue), 15 (~550 nm, green), and 23 (~630 nm, red). This selection approximates natural color perception, facilitating intuitive comparison of spatial details in indoor and outdoor scenes. In Figure 7, Figure 8, Figure 9 and Figure 10, we also provide a visualization of the reconstruction results on the test set of the Harvard dataset and present the corresponding error maps (for scaling factors of ×4 and ×8). Specifically, we selected the 8th, 15th, and 23rd spectral bands from the Harvard dataset and visualized them as RGB channels. It is evident that our method outperforms other algorithms in detail recovery. Notably, the error maps clearly show that the error of our method is smaller compared to the other methods.

3.4. Experimental Results on CAVE Dataset

The CAVE dataset, distinct from the previous two remote sensing hyperspectral image datasets, is widely used in hyperspectral image super-resolution tasks for natural scenes. This dataset consists of 32 everyday images with a spatial resolution of 512 × 512, covering 31 spectral bands in the range of 400 nm to 700 nm with a 10 nm interval. We selected 20 images from the dataset for training, and the remaining images were used for testing. When the scaling factor is 4, the images were cropped into 64 × 64 × 31 patches (with 32 pixels of overlap), and when the scaling factor is 8, we extracted 128 × 128 × 31 patches (with 64 pixels of overlap). These high spatial resolution hyperspectral images (with 10% reserved for validation) were bicubically downsampled to generate the corresponding low-resolution hyperspectral images. The remaining images were used for testing.

The results on the CAVE dataset (Table 3) further validate the robustness of RFLSR. At the ×4 scale, our method secures the best mean values on all six metrics, with statistically significant improvements over all baseline methods. The key distinction of RFLSR is consistently manifested in its spectral preservation capability, achieving the lowest (best) SAM score and leading in comprehensive error metrics such as ERGAS. This pattern persists at the ×8 scale, where RFLSR delivers the best SAM and RMSE, alongside highly competitive PSNR and SSIM. The statistical tests confirm that RFLSR’s advantages in spectral fidelity are significant, highlighting its superior spectral–spatial reconstruction capability.

The superior detail recovery visible in Figure 11 (band 8, ~470 nm) is quantitatively supported by a per-band PSNR of 41.5 dB for RFLSR, compared to 41.2 dB for MSD and 41.3 dB for F3DUN. This 0.2–0.3 dB advantage in this individual band aligns with the overall performance trend in Table 3 and demonstrates consistent spectral–spatial reconstruction capability.

We visualize the reconstruction quality by displaying the 7th spectral band (approximately 460 nm, within the blue visible range) of the CAVE dataset, as it provides clear contrast for the textures and objects in these indoor scenes. In Figure 11 and Figure 12, we also provide a visualization of the reconstruction results on the test set of the CAVE dataset. Specifically, we selected the 7th spectral band from the CAVE dataset and visualized it. It is evident that our method outperforms other algorithms in detail recovery.

3.5. Ablation Study

The RFLSR method proposed in this paper primarily consists of four main components: 2D convolutional layers, 3D convolutional layers, Transformer layers, and a progressive upsampling strategy. To validate the effectiveness of these components, we modified our model and compared the objective differences in the results. We used the training images from the Chikusei dataset as the training set and performed training with a scaling factor of 4.

To determine the optimal depth of the 3D convolutional module, we conducted an ablation study by varying the number of 3D-layer blocks (denoted as N). As summarized in Table 4, increasing N from 1 to 2 yields a substantial performance gain (+0.223 dB in PSNR and a significant reduction in SAM and ERGAS), indicating the importance of sufficient spectral–spatial feature extraction in the shallow stage. However, further increasing N to 3 provides only marginal improvements (+0.003 dB in PSNR) at the cost of a ~10% increase in model parameters. Therefore, N = 2 represents an optimal trade-off between model capacity, computational complexity, and reconstruction accuracy, and is adopted in our final architecture.

The comprehensive ablation results, presented in Table 5, reveal the distinct contribution and the performance-efficiency trade-off of each proposed component.

Efficient Recursive Self-Attention (ERSA): The removal of the ERSA module (Our-w/o TR) leads to a significant reduction in computational cost (FLOPs decrease by ~24%, parameters by ~16%, and inference time by ~26%). However, this comes at the expense of degraded spectral fidelity, as evidenced by an increase in SAM (from 2.3103 to 2.3205) and ERGAS. This contrast strongly validates our core design principle: the proposed linear-complexity ERSA module successfully captures crucial long-range spectral dependencies that are difficult to model with convolutions alone, and it does so with a manageable computational overhead compared to standard quadratic-complexity transformers.

Progressive Upsampling (PU): Ablating the progressive upsampling strategy (Our-w/o PU) results in the most severe deterioration in overall reconstruction quality, with the largest drop in PSNR (−0.61 dB) and a substantial increase in SAM (+0.45). While offering a minor efficiency gain, this variant confirms that the PU strategy is essential for achieving high accuracy. By decomposing the large-scale upscaling task, it facilitates stable multi-scale feature learning and provides clearer gradient flow, which is critical for optimization, especially under large scaling factors.

Grouped 2D Convolutions and 3D Convolautions: Replacing the grouped 2D convolutions with a single branch (Our-w/o 2D) or substituting the 3D module with 2D convolutions (Our-w/o 3D) both lead to consistent declines across all performance metrics (e.g., SAM increases by 0.168 dB and 0.153 dB, respectively), while yielding only marginal improvements in efficiency. This demonstrates that these components provide fundamental and complementary feature extraction—the 2D-layer captures diverse spatial patterns through grouping, and the 3D-layer establishes initial spectral–spatial correlations—which are difficult to compensate for with simple architectural adjustments.

In summary, the ablation study confirms that each component in RFLSR plays a vital role. The ERSA module is the key to efficient global spectral modeling, the PU strategy is crucial for high-quality multi-scale reconstruction, and the hybrid 2D/3D convolutional foundation is indispensable for robust local feature extraction. The design achieves an effective balance, where strategic increases in complexity (e.g., from ERSA) are justified by significant gains in spectral and spatial accuracy.

In addition to the component-wise ablation, we conduct a comparative analysis of model complexity and inference efficiency against the state-of-the-art methods, addressing a critical practical aspect. The comparison is performed under a consistent setting on the Chikusei dataset with a scale factor of ×4. We report three key metrics: the number of parameters (indicative of model size and memory footprint), floating-point operations (FLOPs, indicative of computational cost), and average inference time per image on fixed hardware. The results are summarized in Table 6.

As shown in Table 6, our RFLSR model, with 10.02M parameters, is more compact than SSPSR (12.89M) and MSD (15.69M) while being more capacious than the lightweight FRSR (1.59M). In terms of computational cost, RFLSR requires 4.70G FLOPs, which is comparable to MSD (4.66G) but notably lower than F3DUN (4.93G). This demonstrates that the proposed Efficient Recursive Self-Attention (ERSA) module successfully provides global spectral interaction at a linear complexity, avoiding the quadratic cost of standard Transformers and the high cost of extensive 3D convolutions. Consequently, the inference time of RFLSR (0.689 s) is practical, positioned between the faster lightweight models and the computationally heavier ones. This analysis confirms that RFLSR achieves its superior reconstruction performance (as evidenced in Table 1, Table 2 and Table 3) without incurring prohibitive computational overhead, establishing an effective trade-off between accuracy and efficiency that is desirable for practical HSI-SR applications.

4. Conclusions

In this paper, we propose a Recursive Deep Feature Learning framework for hyperspectral image super-resolution. Specifically, we extend the Transformer architecture, originally developed for RGB image super-resolution, to the hyperspectral domain. By enforcing linear computational complexity, the proposed model effectively captures long-range spectral dependencies and learns deep representations for hyperspectral image reconstruction. To further address the high dimensionality of hyperspectral data, we introduce grouped convolutions, which balance computational efficiency with the extraction of complementary spectral information, thereby improving model robustness in learning inter-spectral relations. In addition, a progressive upsampling strategy is adopted to alleviate the computational burden when the scaling factor is large, while enabling multi-scale feature learning that contributes to more accurate reconstruction. Extensive experiments on three public datasets demonstrate that our model achieves superior performance on widely used evaluation metrics, and the reconstructed images are not only sharper but also more perceptually faithful to real-world observations. It is noted that a systematic evaluation of the model’s robustness under controlled levels of input noise (e.g., additive Gaussian or stripe noise) was not conducted in this work. However, the model’s competitive performance on real sensor data (such as Chikusei, CAVE, and Harvard) and the iterative refinement nature of its recursive module suggest a degree of inherent stability. Future work will include a comprehensive robustness analysis to fully characterize the method’s sensitivity to various degradation factors.

Furthermore, the proposed framework shows strong potential for extension to other spectral imaging domains, such as near-infrared (NIR) imaging [53]. The ability to efficiently learn deep spectral–spatial representations could benefit applications where high spatial resolution and spectral accuracy are paramount, including precision agriculture, non-invasive biomedical sensing, and industrial material inspection.

Author Contributions

Conceptualization, J.L.; Methodology, J.L. and C.Y.; Software, C.Y. and H.L.; Validation, J.L.; Formal analysis, H.L.; Investigation, J.L.; Resources, J.L.; Data curation, J.L.; Writing—review and editing, J.L. and C.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China (Grant No. 62201457).

Institutional Review Board Statement

No applicable.

Informed Consent Statement

No applicable.

Data Availability Statement

The original contributions presented in the study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

CC	Correlation Coefficient
ERGAS	Normalized Global Error
ERSA	Efficient Recursive Self-Attention
FLOPs	Floating-Point Operations
HSI	Hyperspectral Image
HSI-SR	Hyperspectral Image Super-Resolution
LR	Low-Resolution
HR	High-Resolution
PSNR	Peak Signal-to-Noise Ratio
PQIS	Perceptual Quality and Image Similarity metrics (collective term for PSNR, SSIM, SAM, CC, RMSE, ERGAS)
RFLSR	Recursive Deep Feature Learning for Super-Resolution
RGM	Recursive Generalized Module
RMSE	Root Mean Square Error
SAM	Spectral Angle Mapper
SOTA	State-Of-The-Art
SSIM	Structural Similarity Index

References

Wang, L.; Guo, Y.; Lin, Z. Hyperspectral image super-resolution reconstruction based on Transformer. Sci. China Inf. Sci. 2023, 53, 500–516. [Google Scholar]
Li, J.; Zheng, K.; Gao, L.; Ni, L.; Huang, M.; Chanussot, J. Model-informed multistage unsupervised network for hyperspectral image super-resolution. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5516117. [Google Scholar] [CrossRef]
Dian, R.; Shan, T.; He, W.; Liu, H. Spectral super-resolution via model-guided cross-fusion network. IEEE Trans. Neural Netw. Learn. Syst. 2024, 35, 10059–10070. [Google Scholar] [CrossRef] [PubMed]
Liu, J.; Wu, Z.; Xiao, L. A spectral diffusion prior for unsupervised hyperspectral image super-resolution. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5528613. [Google Scholar] [CrossRef]
Li, C.; Zhang, B.; Hong, D.; Jia, X.; Plaza, A.; Chanussot, J. Learning disentangled priors for hyperspectral anomaly detection: A coupling model-driven and data-driven paradigm. IEEE Trans. Neural Netw. Learn. Syst. 2025, 36, 6883–6896. [Google Scholar] [CrossRef]
Zhang, W.; Zhu, Z.; Zhang, Y. Hyperspectral image super-resolution reconstruction based on sparse representation and image fusion. Comput. Appl. Res. 2020, 37, 374–376. [Google Scholar]
Nie, J.; Zhang, L.; Wei, W.; Yan, Q.; Ding, C.; Chen, G.; Zhang, Y. Research progress on hyperspectral image super-resolution reconstruction technology. J. Chin. Inst. Image Graph. 2023, 28, 1685–1697. [Google Scholar]
Li, Q.; Wang, Q.; Li, X. Exploring the relationship between 2D/3D convolution for hyperspectral image super-resolution. IEEE Trans. Geosci. Remote Sens. 2020, 59, 8693–8703. [Google Scholar] [CrossRef]
Li, K.; Dai, D.; Van Gool, L. Hyperspectral image super-resolution with RGB image super-resolution as an auxiliary task. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA, 3–8 January 2022; pp. 3193–3202. [Google Scholar] [CrossRef]
Xu, Y.; Hou, J.; Zhu, X.; Wang, C.; Shi, H.; Wang, J.; Li, Y.; Ren, P. Hyperspectral image super-resolution with convlstm skip-connections. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5519016. [Google Scholar] [CrossRef]
Jia, Y.; Xie, Y.; An, P.; Tian, Z.; Hua, X. DiffHSR: Unleashing Diffusion Priors in Hyperspectral Image Super-Resolution. IEEE Signal Process. Lett. 2025, 32, 236–240. [Google Scholar] [CrossRef]
He, C.; Xu, Y.; Wu, Z.; Wei, Z. Connecting low-level and high-level visions: A joint optimization for hyperspectral image super-resolution and target detection. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5514116. [Google Scholar] [CrossRef]
Fang, J.; Yang, J.; Khader, A.; Xiao, L. Mimo-sst: Multi-input multi-output spatial-spectral transformer for hyperspectral and multi-spectral image fusion. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5510020. [Google Scholar] [CrossRef]
Guo, Z.; Xin, J.; Wang, N.; Li, J.; Wang, X.; Gao, X. Unsupervised across domain consistency-difference network for hyperspectral image super-resolution. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5511613. [Google Scholar] [CrossRef]
Lai, Z.; Fu, Y.; Zhang, J. Hyperspectral image super resolution with real unaligned rgb guidance. IEEE Trans. Neural Netw. Learn. Syst. 2025, 36, 2999–3011. [Google Scholar] [CrossRef] [PubMed]
Zhang, H.; Liu, L.; Yan, J.; Xing, F.; Wang, J.; Zhao, W. Hyperspectral image super-resolution based on recurrent dense fusion network. In Proceedings of the 2023 2nd International Conference on Image Processing and Media Computing (ICIPMC), Xi’an, China, 26–28 May 2023; pp. 1–6. [Google Scholar] [CrossRef]
Dong, W.; Zhou, C.; Wu, F.; Wu, J.; Shi, G.; Li, X. Model-guided deep hyperspectral image super-resolution. IEEE Trans. Image Process. 2021, 30, 5754–5768. [Google Scholar] [CrossRef]
Qu, Y.; Qi, H.; Kwan, C.; Yokoya, N.; Chanussot, J. Unsupervised and unregistered hyperspectral image super-resolution with mutual Dirichlet-Net. IEEE Trans. Geosci. Remote Sens. 2021, 60, 5507018. [Google Scholar] [CrossRef]
Zhu, Z.; Hou, J.; Chen, J.; Zeng, H.; Zhou, J. Hyperspectral image super-resolution via deep progressive zero-centric residual learning. IEEE Trans. Image Process. 2021, 30, 1423–1438. [Google Scholar] [CrossRef]
Liu, J.; Wu, Z.; Xiao, L.; Wu, X.-J. Model inspired autoencoder for unsupervised hyperspectral image super-resolution. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5522412. [Google Scholar] [CrossRef]
Yang, Y.; Wang, Y.; Zhao, E.; Song, M.; Zhang, Q. A Swin Transformer-Based Fusion Approach for Hyperspectral Image Super-Resolution. In Proceedings of the IGARSS 2023—2023 IEEE International Geoscience and Remote Sensing Symposium, Pasadena, CA, USA, 16–21 July 2023; pp. 1234–1237. [Google Scholar] [CrossRef]
Liu, Z.; Wang, W.; Ma, Q.; Liu, X.; Jiang, J. Rethinking 3D-CNN in Hyperspectral Image Super-Resolution. Remote Sens. 2023, 15, 2574. [Google Scholar] [CrossRef]
Liu, J.; Zhang, H.; Tian, J.-H.; Su, Y.; Chen, Y.; Wang, Y. R2D2-GAN: Robust Dual Discriminator Generative Adversarial Network for Microscopy Hyperspectral Image Super-Resolution. IEEE Trans. Med. Imaging 2024, 43, 4064–4074. [Google Scholar] [CrossRef]
Zitouni, M.S.; Alkhatib, M.Q.; Aburaed, N.; Al Ahmad, H. A Comparative Analysis of Attention Mechanisms in 3D CNN-Based Hyperspectral Image Super-Resolution. In Proceedings of the 2024 14th Workshop on Hyperspectral Imaging and Signal Processing: Evolution in Remote Sensing (WHISPERS), Helsinki, Finland, 9–11 December 2024; pp. 1–5. [Google Scholar] [CrossRef]
Li, J.; Du, S.; Song, R.; Li, Y.; Du, Q. Progressive spatial information-guided deep aggregation convolutional network for hyperspectral spectral super-resolution. IEEE Trans. Neural Netw. Learn. Syst. 2023, 36, 1677–1691. [Google Scholar] [CrossRef] [PubMed]
Hu, J.-F.; Huang, T.-Z.; Deng, L.-J.; Jiang, T.-X.; Vivone, G.; Chanussot, J. Hyperspectral Image Super-Resolution via Deep Spatiospectral Attention Convolutional Neural Networks. IEEE Trans. Neural Netw. Learn. Syst. 2022, 33, 7251–7265. [Google Scholar] [CrossRef] [PubMed]
Jiang, J.; Sun, H.; Liu, X.; Ma, J. Learning spatial–spectral prior for super-resolution of hyperspectral imagery. IEEE Trans. Comput. Imaging 2020, 6, 1082–1096. [Google Scholar] [CrossRef]
Wang, X.; Hu, Q.; Jiang, J.; Ma, J. A group-based embedding learning and integration network for hyperspectral image super-resolution. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5541416. [Google Scholar] [CrossRef]
Akgun, T.; Altunbasak, Y.; Mersereau, R.M. Super-resolution reconstruction of hyperspectral images. IEEE Trans. Image Process. 2005, 14, 1860–1875. [Google Scholar] [CrossRef]
Bauschke, H.H.; Borwein, J.M. On projection algorithms for solving convex feasibility problems. SIAM Rev. 1996, 38, 367–426. [Google Scholar] [CrossRef]
Huang, H.; Yu, J.; Sun, W. Super-resolution mapping via multi dictionary based sparse representation. In Proceedings of the 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Florence, Italy, 4–9 May 2014; pp. 3523–3527. [Google Scholar] [CrossRef]
Wang, Y.; Chen, X.; Han, Z.; He, S. Hyperspectral image super-resolution via nonlocal low rank tensor approximation and total variation regularization. Remote Sens. 2017, 9, 1286. [Google Scholar] [CrossRef]
Li, J.; Yuan, Q.; Shen, H.; Meng, X.; Zhang, L. Hyperspectral image super-resolution by spectral mixture analysis and spatial–spectral group sparsity. IEEE Geosci. Remote Sens. Lett. 2016, 13, 1250–1254. [Google Scholar] [CrossRef]
Yoon, Y.; Jeon, H.-G.; Yoo, D.; Lee, J.-Y.; Kweon, I.S. Learning a deep convolutional network for light-field image super-resolution. In Proceedings of the IEEE International Conference on Computer Vision (ICCV) Workshops, Santiago, Chile, 7–13 December 2015; pp. 57–65. [Google Scholar] [CrossRef]
Hu, Q.; Wang, X.; Jiang, J.; Zhang, X.-P.; Ma, J. Exploring the Spectral Prior for Hyperspectral Image Super-Resolution. IEEE Trans. Image Process. 2024, 33, 5260–5272. [Google Scholar] [CrossRef]
Liu, T.; Liu, Y.; Zhang, C.; Yuan, L.; Sui, X.; Chen, Q. Hyperspectral Image Super-Resolution via Dual-Domain Network Based on Hybrid Convolution. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5512518. [Google Scholar] [CrossRef]
Li, Q.; Yuan, Y.; Jia, X.; Wang, Q. Dual-stage approach toward hyperspectral image super-resolution. IEEE Trans. Image Process. 2022, 31, 7252–7263. [Google Scholar] [CrossRef] [PubMed]
Li, Y.; Zhang, L.; Dingl, C.; Wei, W.; Zhang, Y. Single hyperspectral image super-resolution with grouped deep recursive residual network. In Proceedings of the 2018 IEEE Fourth International Conference on Multimedia Big Data (BigMM), Xi’an, China, 13–16 September 2018; pp. 1–4. [Google Scholar] [CrossRef]
Zheng, K.; Gao, L.; Ran, Q.; Cui, X.; Zhang, B.; Liao, W.; Jia, S. Separable-spectral convolution and inception network for hyperspectral image super-resolution. Int. J. Mach. Learn. Cybern. 2019, 10, 2593–2607. [Google Scholar] [CrossRef]
Yang, J.; Zhao, Y.-Q.; Chan, J.C.-W.; Xiao, L. A Multi-Scale Wavelet 3D-CNN for Hyperspectral Image Super-Resolution. Remote Sens. 2019, 11, 1557. [Google Scholar] [CrossRef]
Chen, S.; Zhang, L.; Zhang, L. MSDformer: Multiscale Deformable Transformer for Hyperspectral Image Super-Resolution. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5525614. [Google Scholar] [CrossRef]
Hu, X.; Cai, Y.; Lin, J.; Wang, H.; Yuan, X.; Zhang, Y.; Timofte, R.; Van Gool, L. HDNet: High-resolution dual-domain learning for spectral compressive imaging. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 17521–17530. [Google Scholar] [CrossRef]
Chen, G.; Nie, J.; Wei, W.; Zhang, L.; Zhang, Y. Arbitrary-Scale Hyperspectral Image Super-Resolution From a Fusion Perspective With Spatial Priors. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5536611. [Google Scholar] [CrossRef]
Yang, Y.; Wang, Y.; Xu, X.; Zhao, E. Butterfly Residual Network: A Hybrid Approach with Spectral Transformers and Depth-Wise Convolutions for Hyperspectral Image Super-Resolution. IEEE Trans. Neural Netw. Learn. Syst. 2025, 1–14. [Google Scholar] [CrossRef]
Li, M.; Fu, Y.; Zhang, Y. Spatial–spectral transformer for hyper spectral image denoising. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), Washington, DC, USA, 7–14 February 2023; pp. 1368–1376. [Google Scholar]
Chen, Z.; Zhang, Y.; Gu, J.; Kong, L.; Yang, X. Recursive Generalization Transformer for Image Super-Resolution. In Proceedings of the International Conference on Learning Representations, Vienna, Austria, 7–11 May 2024. [Google Scholar]
Lim, B.; Son, S.; Kim, H.; Nah, S.; Mu Lee, K. Enhanced Deep Residual Networks for Single Image Super-Resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, Honolulu, HI, USA, 21–26 July 2017; pp. 1132–1140. [Google Scholar] [CrossRef]
Yokoya, N.; Iwasaki, A. Airborne Hyperspectral Data over Chikusei; Tech. Rep. AAL-2016-05-27; Space Application Laboratory, University of Tokyo: Tokyo, Japan, 2016. [Google Scholar]
Yasuma, F.; Mitsunaga, T.; Iso, D.; Nayar, S.K. Generalized Assorted Pixel Camera: Post-Capture Control of Resolution, Dynamic Range and Spectrum. IEEE Trans. Image Process. 2010, 19, 2241–2253. [Google Scholar] [CrossRef]
Chakrabarti, A.; Zickler, T. Statistics of real-world hyperspectral images. In Proceedings of the CVPR 2011, Colorado Springs, CO, USA, 20–25 June 2011; pp. 193–200. [Google Scholar] [CrossRef]
Wang, Q.; Li, Q.; Li, X. Hyperspectral image superresolution using spectrum and feature context. IEEE Trans. Ind. Electron. 2021, 68, 11276–11285. [Google Scholar] [CrossRef]
Wang, X.; Ma, J.; Jiang, J. Hyperspectral Image Super-Resolution via Recurrent Feedback Embedding and Spatial–Spectral Consistency Regularization. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5503113. [Google Scholar] [CrossRef]
Osorio Quero, C.; Rondon, I.; Martinez-Carranza, J. Improving NIR single-pixel imaging: Using deep image prior and GANs. J. Opt. Soc. Am. A 2025, 42, 201–210. [Google Scholar] [CrossRef]

Figure 1. Overall architecture of RFLSR.

Figure 2. Architecture of the 3D-layer.

Figure 3. Architecture of the 2D-layer.

Figure 4. Architecture of the ERSA (Efficient Recursive Self-Attention).

Figure 5. Reconstructed composed images of one test hyperspectral image in the Chikusei dataset with spectral bands 70-100-36 as R-G-B when the upsampling factor is s = 4. (a) GT. (b) Bicubic. (c) SFCSR [51]. (d) SSPSR [27]. (e) FRSR [52]. (f) MSD [41]. (g) F3DUN [22]. (h) Ours.

Figure 6. Reconstructed composed images of one test hyperspectral image in the Chikusei dataset with spectral bands 70-100-36 Exploring as R-G-B when the upsampling factor is s = 8. (a) GT. (b) Bicubic. (c) SFCSR [51]. (d) SSPSR [27]. (e) FRSR [52]. (f) MSD [41]. (g) F3DUN [22]. (h) Ours.

Figure 7. Reconstructed composed images of one test hyperspectral image in the Harvard dataset with spectral bands 8-15-32 as R-G-B when the upsampling factor is s = 4. (a) GT. (b) Bicubic. (c) SFCSR [51]. (d) SSPSR [27]. (e) FRSR [52]. (f) MSD [41]. (g) F3DUN [22]. (h) Ours.

Figure 8. Error maps of one test hyperspectral image in the Harvard dataset with spectral bands 8-15-32 as R-G-B when the upsampling factor is s = 4. (a) Bicubic. (b) SFCSR [51]. (c) SSPSR [27]. (d) FRSR [52]. (e) MSD [41]. (f) F3DUN [22]. (g) Ours.

Figure 9. Reconstructed composed images of one test hyperspectral image in the Harvard dataset with spectral bands 8-15-32 as R-G-B when the upsampling factor is s = 8. (a) GT. (b) Bicubic. (c) SFCSR [51]. (d) SSPSR [27]. (e) FRSR [52]. (f) MSD [41]. (g) F3DUN [22]. (h) Ours.

Figure 10. Error maps of one test hyperspectral image in the Harvard dataset with spectral bands 8-15-32 as R-G-B when the upsampling factor is s = 8. (a) Bicubic. (b) SFCSR [51]. (c) SSPSR [27]. (d) FRSR [52]. (e) MSD [41]. (f) F3DUN [22]. (g) Ours.

Figure 11. Reconstructed composed images of one test hyperspectral image in the CAVE dataset with spectral bands 8 when the upsampling factor is s = 4. (a) GT. (b) Bicubic. (c) SFCSR [51]. (d) SSPSR [27]. (e) FRSR [52]. (f) MSD [41]. (g) F3DUN [22]. (h) Ours.

Figure 12. Reconstructed composed images of one test hyperspectral image in the CAVE dataset with spectral bands 8 when the upsampling factor is s = 8. (a) GT. (b) Bicubic. (c) SFCSR [51]. (d) SSPSR [27]. (e) FRSR [52]. (f) MSD [41]. (g) F3DUN [22]. (h) Ours.

Table 1. Average quantitative comparisons of the different approaches over testing images from chikusei dataset with respect to six pqis.

Method	Scale	SAM ↓	PSNR ↑	CC ↑	SSIM ↑	RMSE ↓	ERGAS ↓
Bicubic	×2	1.7880 ± 0.16	43.2125 ± 0.21 †	0.9781 ± 0.049	0.9721 ± 0.021	0.0082 ± 0.0030	3.5981 ± 0.18
SFCSR [51]	×2	1.5329 ± 0.11	45.2596 ± 0.19 †	0.9812 ± 0.023	0.9789 ± 0.012	0.0065 ± 0.0010	3.1869 ± 0.12
FRSR [52]	×2	1.4689 ± 0.09	46.5836 ± 0.17 †	0.9864 ± 0.015	0.9765 ± 0.016	0.0061 ± 0.0004	2.8564 ± 0.17
MSD [41]	×2	1.2756 ± 0.06	47.1587 ± 0.18 †	0.9903 ± 0.009	0.9881 ± 0.009	0.0054 ± 0.0002	2.6253 ± 0.13
F3DUN [22]	×2	1.2635 ± 0.07	47.6195 ± 0.12 †	0.9912 ± 0.012	0.9876 ± 0.014	0.0051 ± 0.0001	2.3569 ± 0.10
Ours	×2	1.1569 ± 0.01	48.2573 ± 0.09	0.9921 ± 0.008	0.9896 ± 0.007	0.0049 ± 0.0001	2.2928 ± 0.08
Bicubic	×4	2.5322 ± 0.19	39.5162 ± 0.25 †	0.9481 ± 0.050	0.9349 ± 0.015	0.0125 ± 0.005	5.4119 ± 0.25
SFCSR [51]	×4	2.4084 ± 0.15	39.6105 ± 0.21 †	0.9491 ± 0.030	0.9361 ± 0.005	0.0124 ± 0.002	5.3548 ± 0.18
SSPSR [27]	×4	2.4921 ± 0.13	40.1438 ± 0.19 †	0.9555 ± 0.010	0.9413 ± 0.002	0.0116 ± 0.0005	5.0714 ± 0.16
FRSR [52]	×4	2.4137 ± 0.18	39.5668 ± 0.22 †	0.9486 ± 0.009	0.9356 ± 0.006	0.0125 ± 0.0003	5.3794 ± 0.21
MSD [41]	×4	2.4914 ± 0.12	40.1037 ± 0.15 †	0.9551 ± 0.006	0.9417 ± 0.002	0.0117 ± 0.0006	5.0818 ± 0.12
F3DUN [22]	×4	2.4429 ± 0.15	40.0294 ± 0.17 †	0.9546 ± 0.002	0.9411 ± 0.003	0.0118 ± 0.0001	5.1708 ± 0.14
Ours	×4	2.3103 ± 0.11	40.3352 ± 0.14	0.9564 ± 0.003	0.9439 ± 0.001	0.0115 ± 0.0001	4.9223 ± 0.09
Bicubic	×8	4.9025 ± 0.29	35.2985 ± 0.36 †	0.8621 ± 0.11	0.8391 ± 0.02	0.0211 ± 0.002	9.0752 ± 0.31
SFCSR [51]	×8	4.2468 ± 0.26	35.4448 ± 0.24 †	0.8659 ± 0.06	0.8458 ± 0.009	0.0201 ± 0.001	8.6906 ± 0.23
SSPSR [27]	×8	4.3421 ± 0.21	35.6215 ± 0.19 †	0.8711 ± 0.06	0.8494 ± 0.004	0.0197 ± 0.001	8.4839 ± 0.19
FRSR [52]	×8	4.2385 ± 0.26	35.5125 ± 0.25 †	0.8681 ± 0.04	0.8469 ± 0.008	0.0198 ± 0.001	8.6282 ± 0.22
MSD [41]	×8	4.2167 ± 0.16	35.6316 ± 0.20 †	0.8701 ± 0.06	0.8501 ± 0.006	0.0198 ± 0.001	8.4616 ± 0.24
F3DUN [22]	×8	4.1977 ± 0.15	35.5519 ± 0.18 †	0.8695 ± 0.03	0.8481 ± 0.004	0.0197 ± 0.001	8.6059 ± 0.21
Ours	×8	4.0316 ± 0.12	35.6451 ± 0.16	0.8714 ± 0.05	0.8511 ± 0.003	0.0196 ± 0.001	8.4321 ± 0.18

† The difference between our proposed RFLSR method and this baseline is statistically significant in terms of PSNR (Wilcoxon signed-rank test, p < 0.05). Arrow-up (↑) indicates that a higher value is better. Arrow-down (↓) indicates that a lower value is better. Underscore indicates the suboptimal result. Bold indicates the optimal result. The same explanations have been consistently applied to all subsequent tables with similar notations.

Table 2. Average quantitative comparisons of the different approaches over testing images from harvard dataset with respect to six pqis.

Method	Scale	SAM ↓	PSNR ↑	CC ↑	SSIM ↑	RMSE ↓	ERGAS ↓
Bicubic	×4	2.1391 ± 0.34	42.6168 ± 0.27 †	0.9776 ± 0.06	0.9452 ± 0.01	0.0117 ± 0.004	2.7071 ± 0.18
SFCSR [51]	×4	2.0611 ± 0.21	42.7313 ± 0.19 †	0.9793 ± 0.01	0.9473 ± 0.006	0.0114 ± 0.001	2.6208 ± 0.14
SSPSR [27]	×4	2.0721 ± 0.15	42.9167 ± 0.16 †	0.9795 ± 0.05	0.9488 ± 0.004	0.0111 ± 0.001	2.5805 ± 0.16
FRSR [52]	×4	2.0523 ± 0.21	42.7661 ± 0.21 †	0.9791 ± 0.03	0.9468 ± 0.007	0.0114 ± 0.002	2.6197 ± 0.12
MSD [41]	×4	2.0418 ± 0.11	42.8231 ± 0.18 †	0.9791 ± 0.04	0.9474 ± 0.005	0.0113 ± 0.001	2.6135 ± 0.16
F3DUN [22]	×4	2.0395 ± 0.15	42.8911 ± 0.16 †	0.9759 ± 0.02	0.9486 ± 0.006	0.0112 ± 0.001	2.6106 ± 0.14
Ours	×4	2.0188 ± 0.10	43.0875 ± 0.13	0.9801 ± 0.01	0.9533 ± 0.002	0.0105 ± 0.001	2.5509 ± 0.11
Bicubic	×8	3.0266 ± 0.45	37.4814 ± 0.35 †	0.9472 ± 0.16	0.8752 ± 0.02	0.0192 ± 0.02	4.5744 ± 0.21
SFCSR [51]	×8	2.5041 ± 0.32	38.4649 ± 0.29 †	0.9479 ± 0.12	0.8885 ± 0.015	0.0186 ± 0.008	4.1731 ± 0.19
SSPSR [27]	×8	2.5228 ± 0.35	39.6351 ± 0.21 †	0.9511 ± 0.09	0.8908 ± 0.011	0.0182 ± 0.004	4.1039 ± 0.20
FRSR [52]	×8	2.4251 ± 0.28	38.3879 ± 0.25 †	0.9485 ± 0.06	0.8853 ± 0.018	0.0189 ± 0.007	4.1736 ± 0.15
MSD [41]	×8	2.4242 ± 0.21	38.4905 ± 0.24 †	0.9481 ± 0.09	0.8842 ± 0.012	0.0188 ± 0.002	4.1598 ± 0.14
F3DUN [22]	×8	2.4101 ± 0.18	38.2324 ± 0.22 †	0.9479 ± 0.05	0.8782 ± 0.014	0.0193 ± 0.002	4.2852 ± 0.13
Ours	×8	2.3828 ± 0.12	38.9652 ± 0.18	0.9511 ± 0.04	0.8958 ± 0.011	0.0185 ± 0.002	4.0317 ± 0.11

Table 3. Average quantitative comparisons of the different approaches over testing images from cave dataset with respect to six pqis.

Method	Scale	SAM ↓	PSNR ↑	CC ↑	SSIM ↑	RMSE ↓	ERGAS ↓
Bicubic	×4	3.3329 ± 0.18	39.7758 ± 0.25 †	0.9747 ± 0.01	0.9616 ± 0.009	0.0116 ± 0.002	3.2852 ± 0.16
SFCSR [51]	×4	3.1797 ± 0.14	40.4671 ± 0.21 †	0.9812 ± 0.008	0.9623 ± 0.004	0.0107 ± 0.001	3.0723 ± 0.14
SSPSR [27]	×4	3.1562 ± 0.13	40.5720 ± 0.23 †	0.9844 ± 0.009	0.9648 ± 0.007	0.0105 ± 0.001	3.0269 ± 0.11
FRSR [52]	×4	3.1109 ± 0.21	40.7458 ± 0.18 †	0.9855 ± 0.005	0.9674 ± 0.003	0.0103 ± 0.001	2.9611 ± 0.06
MSD [41]	×4	3.1083 ± 0.11	40.7760 ± 0.17 †	0.9896 ± 0.004	0.9675 ± 0.006	0.0103 ± 0.001	2.9478 ± 0.08
F3DUN [22]	×4	3.1045 ± 0.15	40.8141 ± 0.13 †	0.9953 ± 0.006	0.9665 ± 0.008	0.0103 ± 0.001	2.9315 ± 0.03
Ours	×4	3.0999 ± 0.16	40.8402 ± 0.11	0.9957 ± 0.002	0.9676 ± 0.004	0.0102 ± 0.001	2.9238 ± 0.05
Bicubic	×8	4.7426 ± 0.25	33.0052 ± 0.28 †	0.9577 ± 0.012	0.8769 ± 0.012	0.0278 ± 0.003	7.6082 ± 0.19
SFCSR [51]	×8	4.6845 ± 0.16	33.2688 ± 0.24 †	0.9591 ± 0.010	0.8880 ± 0.011	0.0272 ± 0.001	7.4538 ± 0.17
SSPSR [27]	×8	4.6882 ± 0.14	33.0276 ± 0.26 †	0.9582 ± 0.011	0.8865 ± 0.013	0.0277 ± 0.001	7.5678 ± 0.15
FRSR [52]	×8	4.6937 ± 0.19	33.3422 ± 0.25 †	0.9589 ± 0.006	0.8874 ± 0.009	0.0271 ± 0.001	7.4206 ± 0.11
MSD [41]	×8	4.7118 ± 0.18	33.4742 ± 0.23 †	0.9610 ± 0.008	0.8893 ± 0.007	0.0275 ± 0.001	7.2756 ± 0.06
F3DUN [22]	×8	4.6589 ± 0.23	33.3214 ± 0.21 †	0.9610 ± 0.007	0.8897 ± 0.010	0.0268 ± 0.001	7.3648 ± 0.09
Ours	×8	4.6578 ± 0.17	33.4844 ± 0.19	0.9603 ± 0.005	0.8895 ± 0.006	0.0267 ± 0.001	7.2967 ± 0.05

Table 4. Experiment on the ablation of 3d-layer with a scale factor of 4 on the chikusei dataset.

N	SAM ↓	PSNR ↑	CC ↑	SSIM ↑	RMSE ↓	ERGAS ↓	Params/M
1	2.5815	40.1122	0.9551	0.9410	0.0116	5.0835	7.4
2	2.3103	40.3352	0.9564	0.9439	0.0115	4.9223	10.02
3	2.3358	40.3379	0.9575	0.9453	0.0114	4.7956	12.05

Table 5. Ablation study.quantitative comparisons of some variants of the proposed method over the testing set on chikusei dataset at the scale factor ×4.

Models	SAM ↓	PSNR ↑	CC ↑	SSIM ↑	ERGAS ↓	FLOPs (G)	Params (M)	Time (s)
Our	2.3103	40.3352	0.9564	0.9439	4.9223	4.70	10.02	0.698
Our-w/o 2D	2.4783	40.1209	0.9548	0.9414	5.0841	4.26	9.79	0.659
Our-w/o 3D	2.4636	40.1173	0.9547	0.9406	5.0928	4.01	9.65	0.623
Our-w/o TR	2.3205	40.2464	0.9556	0.9432	4.9985	3.56	8.39	0.517
Our-w/o PU	2.7601	39.7245	0.9507	0.9368	5.3782	4.37	9.86	0.635

Table 6. Model complexity and efficiency comparison on the chikusei dataset (×4).

Method	SFCSR	SSPSR	FRSR	MSD	F3DUN	Ours
Params (M)	1.22	12.89	1.59	15.69	2.40	10.02
FLOPs (G)	2.01	1.98	1.46	4.66	4.93	4.70
Time (s)	0.217	0.384	0.159	0.598	0.975	0.689

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Liu, J.; Yi, C.; Li, H. Recursive Deep Feature Learning for Hyperspectral Image Super-Resolution. Appl. Sci. 2026, 16, 1060. https://doi.org/10.3390/app16021060

AMA Style

Liu J, Yi C, Li H. Recursive Deep Feature Learning for Hyperspectral Image Super-Resolution. Applied Sciences. 2026; 16(2):1060. https://doi.org/10.3390/app16021060

Chicago/Turabian Style

Liu, Jiming, Chen Yi, and Hehuan Li. 2026. "Recursive Deep Feature Learning for Hyperspectral Image Super-Resolution" Applied Sciences 16, no. 2: 1060. https://doi.org/10.3390/app16021060

APA Style

Liu, J., Yi, C., & Li, H. (2026). Recursive Deep Feature Learning for Hyperspectral Image Super-Resolution. Applied Sciences, 16(2), 1060. https://doi.org/10.3390/app16021060

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Recursive Deep Feature Learning for Hyperspectral Image Super-Resolution

Featured Application

Abstract

1. Introduction

2. Materials and Methods

2.1. Overall Framework

2.2. 2D-Layer

2.3. Efficient Recursive Self-Attention

2.4. Loss Function

3. Results and Discussion

3.1. Implementation Details (Network Architecture, Datasets and Experimental Setup)

3.2. Experimental Results on Chikusei Dataset

3.3. Experimental Results on Harvard Dataset

3.4. Experimental Results on CAVE Dataset

3.5. Ablation Study

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI