Semi-Supervised Hyperspectral Reconstruction from RGB Images via Spectrally Aware Mini-Patch Calibration

Su, Runmu; Huang, Haosong; Wang, Hai; Yan, Zhiliang; Zhang, Jingang; Nie, Yunfeng

doi:10.3390/rs18030432

Open AccessArticle

Semi-Supervised Hyperspectral Reconstruction from RGB Images via Spectrally Aware Mini-Patch Calibration

by

Runmu Su

¹,

Haosong Huang

¹,

Hai Wang

¹,

Zhiliang Yan

¹,

Jingang Zhang

^2,* and

Yunfeng Nie

³

¹

School of Aerospace Science and Technology, Xidian University, Xi’an 710126, China

²

School of Future Technology, University of Chinese Academy of Sciences, Beijing 100039, China

³

Brussels Photonics Team, Department of Applied Physics and Photonics, Vrije Universiteit Brussel and Flanders Make, Pleinlaan 2, B-1050 Brussels, Belgium

^*

Author to whom correspondence should be addressed.

Remote Sens. 2026, 18(3), 432; https://doi.org/10.3390/rs18030432

Submission received: 30 December 2025 / Revised: 19 January 2026 / Accepted: 19 January 2026 / Published: 29 January 2026

Download

Browse Figures

Versions Notes

Highlights

What are the main findings?

We propose SSHSR, a semi-supervised hyperspectral reconstruction method that uses spectrally aware mini-patches (SA-MP) for high-quality reconstruction with limited spectral annotations.
We replace the fixed-form Tikhonov physical layer with an optimizable version, improving reconstruction accuracy through the collaborative optimization of physical and data-driven learning.

What are the implications of the main findings?

SSHSR outperforms state-of-the-art methods on benchmark and real-world remote sensing data, improving PSNR and reducing SAM.
The model is lightweight (1.59 M parameters), making it suitable for practical applications by lowering deployment thresholds while maintaining high performance.

Abstract

Hyperspectral reconstruction (SR) refers to the computational process of generating high-dimensional hyperspectral images (HSI) from low-dimensional observations. However, the superior performance of most supervised learning-based reconstruction algorithms is predicated on the availability of fully labeled three-dimensional data. In practice, this requirement demands complex optical paths with dual high-precision registrations and stringent calibration. To address this gap, we extend the fully supervised paradigm to a semi-supervised setting and propose SSHSR, a semi-supervised SR method for scenarios with limited spectral annotations. The core idea is to leverage spectrally aware mini-patches (SA-MP) as guidance and form region-level supervision from averaged spectra, so it can learn high-quality reconstruction without dense pixel-wise labels over the entire image. To improve reconstruction accuracy, we replace the conventional fixed-form Tikhonov physical layer with an optimizable version, which is then jointly trained with the deep network in an end-to-end manner. This enables the collaborative optimization of physical constraints and data-driven learning, thereby explicitly introducing learnable physical priors into the network. We also adopt a reconstruction network that combines spectral attention with spatial attention to strengthen spectral–spatial feature fusion and recover fine spectral details. Experimental results demonstrate that SSHSR outperforms existing state-of-the-art (SOTA) methods on several publicly available benchmark datasets, as well as on remote sensing and real-world scene data. On the GDFC remote sensing dataset, our method yields a 6.8% gain in PSNR and a 22.1% reduction in SAM. Furthermore, on our self-collected real-world scene dataset, our SSHSR achieves a 6.0% improvement in PSNR and a 11.9% decrease in SAM, confirming its effectiveness under practical conditions. Additionally, the model has only 1.59 M parameters, which makes it more lightweight than MST++ (1.62 M). This reduction in parameters lowers the deployment threshold while maintaining performance advantages, demonstrating its feasibility and practical value for real-world applications.

Keywords:

hyperspectral reconstruction; semi-supervised paradigm; spectrally aware mini-patches; optimizable Tikhonov physical layer; spectral and spatial attention

1. Introduction

Hyperspectral imaging is a frontier technology in optical sensing. By integrating imaging and spectroscopy, it performs dense narrowband sampling of spectral signatures to simultaneously capture 2D spatial information and high-dimensional spectral information of a target. Compared with conventional three-band RGB images, hyperspectral images (HSIs) contain substantially richer material and fine-grained details, and have been widely used in many critical fields, such as remote sensing [1,2,3], medical image processing [4,5], food safety [6,7], environmental monitoring [8,9], and geological exploration [10,11]. HSI is also increasingly leveraged in downstream perception tasks, e.g., hyperspectral video tracking [12,13,14,15] and hyperspectral anomaly detection [16,17,18], as well as adverse-weather sensing with near-infrared multi-/tri-spectral imaging [19,20,21]. However, traditional hyperspectral imaging systems often rely on bulky spectrometers and perform scanning along the spatial and/or spectral dimension, which makes acquisition time-consuming and the hardware large in size. These limitations severely hinder their adoption in dynamic scenes and in portable or low-cost applications. To address these challenges, researchers have developed various lightweight hyperspectral acquisition solutions, including on-chip spectral imaging [22], spectral encoders based on nanophotonic structures or metasurfaces [23], and snapshot compressive imaging (SCI) systems [24,25]. Despite progress in hardware miniaturization, many of these techniques remain at the laboratory-prototype stage. They are often constrained by fabrication processes, optical efficiency, or system stability, and thus are difficult to deploy broadly in real-world scenarios. Moreover, even relatively low-cost SCI systems typically cost on the order of tens of thousands to one hundred thousand dollars, which further limits their adoption in consumer-grade and cost-sensitive applications.

To alleviate the above bottlenecks, hyperspectral reconstruction (SR) has emerged as a promising alternative. SR aims to recover high-dimensional, continuous, and physically consistent spectral information from low-dimensional observations (e.g., RGB or multispectral images). Since mapping from three bands to hundreds of bands is inherently a severely under-determined inverse problem [26], the SR task is intrinsically ill-posed. This ill-posedness is shared by a broad class of underdetermined inverse problems in signal processing (e.g., localization/suppression in underdetermined SAR systems [27]), which highlights the necessity of incorporating effective priors or learned constraints. Accordingly, existing SR research has broadly evolved into two major paradigms [28]. Model-based optimization methods [26,29,30,31] explicitly formulate an imaging degradation model and incorporate priors such as sparsity and low-rankness as regularization terms, then solve the problem via iterative optimization. These methods are interpretable, but they are sensitive to hand-crafted priors, and they often struggle to capture non-local and higher-order spatial–spectral dependencies in complex scenes. In contrast, data-driven deep learning approaches [32,33,34,35,36,37] have achieved substantial performance gains on public benchmarks due to their strong representation capacity. However, most state-of-the-art reconstruction algorithms still follow a supervised learning paradigm and heavily rely on large-scale, pixel-wise, precisely registered spectral annotations. Because hyperspectral data acquisition requires expensive hardware, complicated procedures, and strict calibration, annotation is highly costly, which imposes clear scalability limitations on supervised methods in real-world applications.

Semi-supervised/unsupervised SR methods [38,39,40,41] can substantially reduce the reliance on densely annotated spectral labels; however, in complex scenes, the stability of proxy supervision signals such as consistency constraints or pseudo-labels still needs to be improved. Moreover, the lack of explicit physical constraints may also compromise the physical plausibility of the reconstructed spectra. Therefore, it is necessary to explore reconstruction frameworks that achieve both reliable supervision and physical consistency under limited annotations.

We propose a semi-supervised learning framework guided by spectrally aware mini-patches (SA-MP), as illustrated in Figure 1, which delivers reliable reconstruction performance under limited spectral annotations. The proposed method introduces patch-wise averaged spectra as region-level statistical supervision on only a small number of local patches. This design reformulates the conventional pixel-wise spectral constraint into a local, region-based statistical constraint, enabling hyperspectral reconstruction without dense pixel-level spectral labels over the entire image and substantially reducing the dependence on high-density spectral annotations. Given the inherently under-determined nature of hyperspectral reconstruction, we explicitly integrate a Tikhonov-based physical prior into the network optimization process to balance physical plausibility and data-driven representation. In particular, the physical layer is formulated in an optimizable manner and its regularization is adaptively updated during end-to-end training, which improves reconstruction accuracy while preserving the physical prior. In addition, we develop a deep reconstruction network that fuses spectral and spatial information. A hybrid attention mechanism jointly models inter-spectral correlations and spatial structural details, strengthens feature representation, and leads to higher-quality reconstruction results.

The main contributions of this paper are summarized as follows:

A deployment-friendly new paradigm: We introduce SSHSR, a semi-supervised hyperspectral reconstruction framework that reduces reliance on high-precision spectral annotations while maintaining a lightweight design. With just 1.59 M parameters, which is fewer than MST++ [42] (1.62 M parameters), it becomes an ideal choice for practical deployment. SSHSR runs efficiently on general-purpose hardware, such as portable laptops, without the need for dedicated workstations.
Low data requirements with SA-MP guidance: The core of our SSHSR is the SA-MP guidance module, which extracts patch-level averaged spectra from local regions to provide supervision for the reconstruction model. This design allows the network to stably learn the RGB–HSI mapping even when spectral annotations are scarce.
Performance gains via physics–data synergistic fusion: We integrate an optimizable Tikhonov prior for adaptive regularization and enforce spectral–spatial attention with a frequency-domain consistency loss, improving physical fidelity and reconstruction accuracy. On the GDFC remote-sensing dataset [43], our method achieves a 6.8% improvement in PSNR and a 22.1% reduction in SAM.

The remainder of this paper is organized as follows. Section 2 reviews related work on RGB-to-HSI reconstruction and semi-supervised learning. Section 3 introduces the proposed SSHSR framework, including the learnable Tikhonov prior, the SA-MP supervision mechanism, as well as the network architecture and loss functions. Section 4 presents the experimental settings and results on multiple benchmarks, together with ablation studies and further analyses. Section 5 reports experimental results and analysis on the real-world collected dataset. Section 6 discusses the challenges of the method and future work. Finally, Section 7 concludes the paper.

2. Related Work

2.1. Physics-Model–Based Optimization Methods for SR

Model-based optimization methods aim to address the severely ill-posed RGB-to-HSI inverse problem by imposing hand-crafted regularization terms. Arad et al. [26], based on sparse coding theory, construct a hyperspectral dictionary and its corresponding RGB projection, and recover spectra via sparse reconstruction. Building on this idea, Aeschbacher et al. [44] propose a shallow anchor regression model that explicitly learns a locally linear mapping between RGB observations and the spectral manifold, achieving spectral super-resolution with extremely low model complexity. To cope with the ill-posedness caused by the 3-to-31 dimensional expansion, Jia et al. [45] take a manifold learning perspective; they embed RGB observations and hyperspectral signatures into a shared low-dimensional manifold and mitigate the one-to-many ambiguity by preserving neighborhood structure and enforcing local linear reconstruction. Akhtar et al. [46] introduce Gaussian processes to probabilistically model spectral priors, reformulating reconstruction as a Bayesian regression problem with uncertainty estimation. In addition, several studies [47,48] adopt Bayesian inference and Tikhonov regularization, treat RGB as observations degraded by the camera spectral response function (CSRF), and recover spectra by solving a linear inverse problem, thereby enabling quantitative analysis of physiological parameters such as blood oxygenation. Although these model-based approaches are limited by the representational capacity of shallow priors and often generalize poorly to complex scenes, their solution processes and outputs are typically more interpretable.

2.2. Data-Driven Supervised SR

With the advancement of deep learning, reconstruction based on deep neural networks has gradually become the dominant paradigm, mainly due to its strong capability in learning complex nonlinear mappings. For hyperspectral image reconstruction, early studies largely focused on designing deep convolutional neural network (CNN) architectures. Xiong et al. [49] proposed a unified deep convolutional framework to recover hyperspectral images from spectrally downsampled observations, such as RGB images and coded aperture snapshot spectral imaging (CASSI) [50]. Subsequently, Shi et al. [51] introduced residual learning and dense connections to develop HSCNNR and HSCNND, which established a new performance benchmark in the NTIRE 2018 challenge [52].

Building upon the CNN paradigm, later research shifted toward more powerful feature representations and finer modeling of the reconstruction mapping. Zhang et al. [53] proposed a pixel-aware deep function-mixture network that treats multiple subnetworks as different basis mappings and learns pixel-wise weights to nonlinearly fuse their outputs, thereby improving adaptability to spatially heterogeneous regions. Li et al. [54] explicitly incorporated the camera spectral sensitivity (CSS) prior in AWAN and jointly modeled the RGB-to-HSI mapping through channel attention and physical constraints, strengthening the physical consistency of the reconstruction process. To facilitate multi-scale feature interaction and long-range dependency modeling, Zhao et al. [55] designed a hierarchical regression network (HRNet), which employs PixelShuffle for information-preserving upsampling and integrates residual dense blocks to capture long-range correlations, achieving leading performance in the NTIRE 2020 competition [56]. Shortly thereafter, Cai et al. [42] introduced Transformer architectures [57,58,59] into hyperspectral reconstruction and proposed MST++, which leverages self-attention to characterize long-range self-similarity across spectral channels, further enhancing global modeling capability.

Beyond this main line, recent studies have also explored improvements oriented toward specific structural priors and efficient deployment. Tan et al. [60] proposed the frequency–spatial feature fusion network (FSDFF), which enhances reconstruction quality by jointly modeling frequency- and spatial-domain features. Dian et al. [61] developed a spectral super-resolution method based on deep low-rank tensor representation (LTRN), where adaptive low-rank prior learning and multi-dimensional self-attention are introduced to strengthen reconstruction expressiveness. Wu et al. [62] proposed RepCPSI, a lightweight spectral super-resolution approach that combines coordinate-preserving neighboring spectral interaction with re-parameterization, effectively improving spectral and spatial feature extraction while maintaining high reconstruction accuracy.

In addition to these data-driven methods, a growing body of work has begun to explicitly integrate physical imaging models into deep networks to improve physical consistency and generalization across imaging conditions. Recently, Chen et al. [63] formulated RGB-to-HSI reconstruction as a degradation-aware generative process and explicitly incorporated a spectral degradation model into diffusion models, enabling robust reconstruction under different imaging conditions. Xu et al. [64] proposed PGDL-Net, a physics-guided deep prior learning network that integrates graph networks and Transformers under physical degradation constraints to model spectral–spatial relationships, achieving joint reconstruction of hyperspectral images and spectral response functions. Overall, these studies suggest that tightly coupling physical models with deep networks can effectively enhance physical consistency and generalization, which motivated us to integrate physical priors with data-driven modeling within a semi-supervised learning framework and thus develop the method presented in this work.

2.3. Semi-Supervised/Unsupervised SR

To reduce the strong reliance on paired annotated datasets, recent studies have begun to explore semi-supervised and unsupervised paradigms for spectral reconstruction. Simonetto et al. [38] explicitly incorporate the HSI to RGB mapping into the loss function and add a small number of pixel-wise spectral annotations to train an RGB to HSI network under limited supervision. However, the spectral labels remain sparse, and most regions are constrained mainly through RGB reprojection error, which provides limited spectral–spatial consistency over the whole image. Chen et al. [39] propose a semi-supervised spectral degradation–constrained network (SSDCN) that embeds the spectral degradation model of HSI to MSI into the network structure. By using reprojection, SSDCN jointly leverages labeled and unlabeled samples for MSI to HSI super-resolution, but it still requires a nontrivial amount of paired MSI/HSI data and its performance degrades when only very few pixel-level labels are available. Since these two semi-supervised methods do not release reproducible code, this paper provides only a qualitative discussion of their core ideas and differences from our approach. From the perspective of unsupervised representation learning, Li et al. [40] introduce masked reconstruction tasks defined on spectral or spatial–spectral dimensions to learn an implicit correspondence between RGB and hyperspectral data without paired annotations, offering a new direction for unsupervised spectral reconstruction. Cao et al. [41] propose an unsupervised DSR-Net based on a dual-illumination imaging model, which combines proximal gradient iterations with a U-Net prior and achieves high-quality spectral reconstruction through physics-consistency constraints without paired supervision. Although the latter removes the explicit dependence on paired labels, its spectral constraints are mostly implicit, and the physical accuracy and controllability of the reconstruction remain limited.

3. Materials and Methods

This section presents the proposed SSHSR. We first formulate the problem model. We then describe the initial spectral estimation based on an optimizable Tikhonov regularization. Next, the reconstruction network architecture that integrates spectral and spatial information is introduced. After that, we present the semi-supervised setting under limited spectral annotations and the construction of SA-MP. Finally, we detail the corresponding composite loss function. The complete training and inference procedure is outlined in Algorithm 1.

Algorithm 1 Training and inference of SSHSR.

Require:: $I_{R G B} \in R^{h \times w \times 3}$ (RGB), SRF matrix $Φ$ , MaxEpoch
Ensure:: $H_{S R} \in R^{h \times w \times 31}$ (reconstructed HSI)
1:: Parameters: learnable $(Γ, γ)$ , network $θ$ , weights $λ_{avg}, λ_{freq}, λ_{\deg}$
2:: for $e = 1$ to MaxEpoch do
3:: $\hat{H} \leftarrow Tikhonov (X; Φ, Γ, γ)$ with Equation (4)
4:: $H_{S R} \leftarrow f_{θ} (\hat{H})$
5:: // SA-MP from GT, then aligned sampling on SR
6:: ${Ω_{s, k}^{G T}}_{k = 1}^{9} \leftarrow S A M P (H_{G T})$ with Equations (10) and (11)
7:: ${Ω_{s, k}^{S R}}_{k = 1}^{9} \leftarrow S A M P (H_{S R})$ with Equations (10) and (11)
8:: $({\bar{h}}_{s, k}^{S R}, {\bar{h}}_{s, k}^{G T}) \leftarrow AvgSpec (Ω_{s, k}^{S R}, Ω_{s, k}^{G T})$ with Equation (12)
9:: $L_{s, avg} \leftarrow \sum_{k} {∥ {\bar{h}}_{s, k}^{S R} - {\bar{h}}_{s, k}^{G T} ∥}_{1}$ with Equation (13)
10:: $L_{s, freq} \leftarrow \sum_{k} {∥ F_{1 D} ({\bar{h}}_{s, k}^{S R}) - F_{1 D} ({\bar{h}}_{s, k}^{G T}) ∥}_{1}$ with Equation (14)
11:: $L_{s e m i} \leftarrow λ_{avg} \sum_{s} L_{s, avg} + λ_{freq} \sum_{s} L_{s, freq}$ with Equation (15)
12:: $L_{\deg} \leftarrow \sum_{s} ∥ Φ H_{S R}^{(s)} - I_{R G B}^{(s)} ∥_{1}$ with Equations (16) and (17)
13:: $L \leftarrow L_{semi} + λ_{\deg} L_{\deg}$ with Equation (18)
14:: Update $(θ, Γ, γ)$ by Adam
15:: end for
16:: Inference: $\hat{H} \leftarrow Tikhonov (X; Φ, Γ, γ)$ ; $H_{S R} \leftarrow f_{θ} (\hat{H})$ ; return $H_{S R}$

3.1. Problem Formulation

In the imaging model, an RGB image can be regarded as the result of a weighted spectral integration of a HSI

H (x, y, λ)

along the spectral dimension under the camera spectral response function (SRF). Specifically, the pixel value of an RGB observation

I_{c} (x, y)

at channel

c \in {R, G, B}

can be modeled as an integration process over the spectral dimension:

I_{c} (x, y) = \int_{λ_{\min}}^{λ_{\max}} ϕ_{c} (λ) H (x, y, λ) d λ

(1)

where

ϕ_{c} (λ)

denotes the spectral response function of channel c and

[λ_{m i n}, λ_{m a x}]

is typically set to 400–700 nm. For a more compact representation, we rewrite the above process in matrix form:

I = Φ H

(2)

Let

I \in R^{h w \times 3}

denote the RGB vector,

H \in R^{h w \times L}

denote the corresponding hyperspectral vector, and

Φ \in R^{3 \times L}

be the spectral response matrix. Here, h and w denote the image height and width, respectively. L denotes the number of spectral bands.

3.2. Initial Spectral Estimation Based on Tikhonov Regularization

Based on the imaging model described above, we construct a physics-inspired layer using Tikhonov regularization. This layer performs a linear inverse mapping from the input RGB image to produce an initial hyperspectral estimate, and it serves as the first module of our network. Specifically, we define the initial spectrum

\hat{H}

as the solution to the following Tikhonov-type optimization problem:

\hat{H} = \underset{H}{\arg \min} ({∥ Φ H - I ∥}^{2} + γ {∥Γ H∥}^{2})

(3)

The first term is a data fidelity term, which constrains the error between the reconstructed HSI and the observation in the RGB space so that it adheres to the imaging model described above. The second term is a regularization term that introduces a spectral smoothness prior. Here,

Γ

is the Tikhonov matrix (typically implemented as a second-order finite-difference Laplacian operator), and

γ

is the regularization coefficient, which is generally set to 0.01.

Since Equation (3), above, is a quadratic convex function with respect to H, and all terms are continuously differentiable, an analytical solution can be obtained by taking the derivative with respect to H and setting it to zero:

\hat{H} = {(Φ^{T} Φ + γ Γ^{T} Γ)}^{- 1} Φ^{T} I

(4)

The resulting closed-form solution

\hat{H}

is treated as a physics-constrained initial spectral estimate and is fed into the subsequent deep network for refinement. This design provides a more reasonable initialization for the network and, at the architectural level, explicitly embeds a spectral continuity prior that carries through the entire reconstruction process. Furthermore, we extend this fixed formulation to an optimizable version by treating

Γ

and

γ

as learnable parameters and updating them adaptively during training, which allows the model to better match the spectral statistics of different datasets.

3.3. Overall Network Architecture

We construct a multi-scale hybrid attention network based on U-Net (Figure 2) to jointly model the complex mapping relationship between spectral and spatial information, and to achieve multi-scale feature fusion in the encoder–decoder structure. The network is built upon three key designs. First, the network embeds hybrid attention modules in the encoder/decoder to model the dependencies between local spatial textures and spectral information through the combination of spectral–spatial attention. Then, we adopt the MIMO (multi-input, multi-output) paradigm [65,66] together with PixelShuffle [67] for information-preserving rearrangement, which promotes progressive reconstruction of multi-scale features from coarse to fine. Following this, we introduce a physics-guided global residual learning mechanism; the initial smooth estimate produced by the optimizable Tikhonov regularization is used as the network input, and image-level skip connections add this estimate to the network predictions at each scale. With these designs, the reconstruction task is explicitly reformulated as learning the residual between the “physics-based initial estimate” and the target spectra. This encourages the network to focus on recovering high-frequency details suppressed by regularization, reduces the optimization difficulty to some extent, and builds synergy between physics-based constraints and data-driven learning, thereby improving the stability and interpretability of the reconstruction process.

Hybrid Attention Module (HAM)

The hybrid attention module includes spectral attention and spatial attention blocks. The structure of the spectral attention block is shown in Figure 3a. Its main function is similar to that of channel attention [68]; it explicitly models inter-band correlations to adaptively recalibrate the responses of spectral channels. Let the input feature be

F \in R^{C \times h \times w}

. First, the global average pooling (GAP) branch computes the mean response of each spectral channel over the spatial dimensions, while the global max pooling (GMP) branch extracts the corresponding maximum response, resulting in two spectral descriptor vectors:

E_{a v g} = G A P (F), E_{\max} = G M P (F)

(5)

Next, the two descriptor vectors are separately fed into two

1 \times 1

convolution layers (with non-shared weights) followed by nonlinear activations, producing two spectral weight vectors.

S_{a v g} = σ (W_{2}^{(a)} δ (W_{1}^{(a)} E_{a v g})), S_{\max} = σ (W_{2}^{(m)} δ (W_{1}^{(m)} E_{\max}))

(6)

The two weight vectors are then multiplied with the input feature F in a spectral-wise manner, yielding two weighted feature maps.

F_{a v g} = S_{a v g} ⊙ F, F_{\max} = S_{\max} ⊙ F

(7)

Finally, the two weighted features are concatenated along the spectral dimension and fused through a

1 \times 1

convolution to obtain the output of the spectral attention block:

F_{S p A} = W_{c} [[F_{a v g}; F_{\max}]]

(8)

In the above,

σ

and

δ

denote ReLU and Sigmoid, respectively, ⊙ denotes spectral-wise multiplication,

⟦ ⟧

denotes concatenation along the spectral dimension, and

W_{1}, W_{2}, W_{c}

represents the learnable weights of the

1 \times 1

convolution.

Spatial attention [69] aims to generate a spatial saliency map that guides the network to focus on key regions with complex textures or rich edges (Figure 3b). First, we apply GAP and GMP to F along the channel dimension, producing two single-channel feature maps

F_{G A P}, F_{G M P} \in R^{1 \times h \times w}

. We then concatenate them and feed the result into a

7 \times 7

convolution followed by a Sigmoid activation to obtain a spatial attention map. Finally, this attention map is multiplied with the input feature in an element-wise manner, yielding the output of the spatial attention module.

\begin{matrix} S_{S A} = σ (W_{3} [[F_{a v g}; F_{\max}]]) \\ F_{S A} = S_{S A} ⊙ F \end{matrix}

(9)

3.4. SSHSR Scheme

Based on SA-MP, we first construct spectral supervision on a small number of local patches and then extend this mechanism to the multi-scale outputs of the network, as shown in Figure 2. Specifically, to avoid spatial bias caused by sparse annotations, we perform regular grid sampling on the ground-truth (GT) HSI H at the highest resolution. We partition the image into a

3 \times 3

grid and crop a

5 \times 5

patch centered in each grid cell as a supervised region, denoted as

Ω_{1, k} (k = 1, \dots, 9)

. The spectral vectors of all pixels

H_{G T} (p)

within the k-th patch are averaged to obtain its local mean spectrum.

{\bar{h}}_{1, k} = \frac{1}{|Ω_{1, k}|} \sum_{p \in Ω_{1, k}} H_{G T} (p)

(10)

with

|Ω_{1, k}| = 25

. To extend the supervised patches to a multi-scale setting, we progressively resize these

5 \times 5

supervised patches extracted at the highest resolution along the same downsampling pathway as the network. This produces the corresponding supervised regions

\{Ω_{s, k}\}

at each scale s. Let

T_{s} (•)

denote the downsampling operator at scale s. The hyperspectral patch at scale s for the k-th region is given by

H_{k}^{(s)} = T (H |_{Ω_{1, K}})

(11)

where

{H |}_{Ω_{1, k}}

denotes the

5 \times 5

patch cropped from the original high-resolution image.

\{Ω_{s, k}\}

denotes the pixel set of the downsampled patch. The mean spectrum of the k-th patch at scale s is defined as

{\bar{h}}_{k}^{(s)} = \frac{1}{|Ω_{s, k}|} \sum_{q \in Ω_{s, k}} H_{k}^{(s)} (q)

(12)

By first selecting nine

5 \times 5

patches in a

3 \times 3

grid at the highest resolution and then scaling each patch along the network’s downsampling pathway to compute the mean spectrum at every scale, we obtain a set of cross-scale local mean spectra

\{{\bar{h}}_{s, k}\}

for each SA-MP. These spectra are then used to define the multi-scale mean-spectrum loss.

3.5. Composite Loss Function

Under the setting of limited spectral annotations, we design a composite loss function composed of multiple constraints. First, we define a semi-supervised loss

L_{s e m i}

only on the few regions with spectral annotations. It consists of two terms: a mean-spectrum loss and a frequency-domain consistency loss. Specifically, following the SA-MP construction in Section 3.4, we compute the patch-wise mean spectra at each scale for both the ground truth and the reconstructed HSI, denoted as

{\bar{h}}_{s, k}^{G T}, {\bar{h}}_{s, k}^{S R}

. At scale s, the mean-spectrum loss

L_{s, a v g}

is defined as the average

L_{1}

distance between the mean spectra over the K patches. Correspondingly, after applying a 1D FFT to the mean spectrum of each patch, we compute the average

L_{1}

difference between the GT and reconstructed frequency-domain representations over the K patches, which yields the frequency-consistency loss

L_{s, f r e q}

at scale s.

L_{s, a v g} = \frac{1}{K} \sum_{k = 1}^{K} {∥{\bar{h}}_{s, k}^{S R} - {\bar{h}}_{s, k}^{G T}∥}_{1}

(13)

L_{s, f r e q} = \frac{1}{K} \sum_{k = 1}^{K} {∥F_{1 D} ({\bar{h}}_{s, k}^{S R}) - F_{1 D} ({\bar{h}}_{s, k}^{G T})∥}_{1}

(14)

Finally, we obtain the overall semi-supervised loss

L_{s e m i}

by a weighted summation of

L_{s, a v g}

and

L_{s, f r e q}

across all scales:

L_{s e m i} = λ_{a v g} \sum_{s = 1}^{S} L_{s, a v g} + λ_{f r e q} \sum_{s = 1}^{S} L_{s, f r e q}

(15)

in which

λ_{a v g}, λ_{f r e q}

are the weighting factors for the two terms.

Degradation-consistency loss: Since spectral labels are available only at a few locations, while RGB observations are known over the entire image, we explicitly model the HSI to RGB imaging degradation using the camera SRF, as shown in Figure 1. For the reconstructed HSI at each scale,

H_{S R}^{(s)}

, we project it to the RGB space through the degradation model to obtain a synthesized RGB image

I_{R e c o n ., R G B}^{(s)}

. We then compare

I_{R e c o n ., R G B}^{(s)}

with the input RGB at the same scale (obtained by applying the same downsampling operator to the original RGB) in a pixel-wise manner, and define the degradation-consistency loss at scale s as

L_{d e g}^{(s)}

. The total degradation-consistency loss is the weighted sum across scales:

I_{Re c o n ., R G B}^{(s)} = Φ H_{S R}^{(s)}

(16)

L_{\deg} = \sum_{s = 1}^{3} {∥I_{Re c o n ., R G B}^{(s)} - I_{G T, R G B}^{(s)}∥}_{1}

(17)

By combining the two sets of constraints above, the final overall training objective is written as

L = L_{s e m i} + λ_{\deg} L_{\deg}

(18)

In our experiments, the three balancing factors

λ_{a v g}, λ_{f r e q}, λ_{d e g}

are set to 0.1, 0.1, and 100, respectively, to weight the spectral statistical terms and the physics-based degradation consistency term.

4. Experiments and Results

4.1. Dataset

We conduct independent experiments on three public hyperspectral benchmark datasets and further validate the proposed SSHSR on a self-collected real-world scene dataset, in order to assess its reconstruction performance under different imaging scenarios and spectral distributions. The public benchmarks cover natural scenes (ARAD-1K [70] and CAVE [71]), as well as complex remote-sensing earth-observation scenarios (IEEE GRSS DFC 2018 [43,72] (GDFC)). The real-world dataset is captured using a commercial Specim IQ hyperspectral camera (Specim, Oulu, Finland) and includes typical indoor targets such as office supplies, potted plants, cups, and a standard color chart. In total, 31 hyperspectral images are collected, with a spectral range of 400–700 nm. All RGB observations used in our experiments are synthesized via the spectral integration projection model described in Section 3.1, using the SRFs of standard cameras (e.g., Canon 60D and Basler ace 2 [70]), which ensures that the constructed RGB–HSI pairs are consistent with the physical imaging mechanism. Detailed dataset specifications, including spatial resolution, spectral band range, and train/val splits, etc., are summarized in Table 1.

Dataset-specific preprocessing is applied to ensure physical validity and to reduce the influence of abnormal radiometric values. For ARAD-1K, we perform data cleaning and remove abnormal samples containing all-zero values or invalid pixels. For CAVE, the original images contain invalid black borders and boundary artifacts; we therefore crop the images to remove invalid edge regions and retain an effective field of view of

512 \times 482

. For GDFC, RGB observations are generated using the SRF of the Basler ace 2 camera. The mismatch between the camera response range (360–750 nm) and the HSI spectral coverage (380–1050 nm) is mitigated by linearly interpolating and extending the SRF, which establishes a more physically consistent RGB–HSI mapping over the full spectral range.

4.2. Evaluation Metrics

A rigorous quantitative evaluation and fair comparison are conducted using four widely adopted metrics: mean relative absolute error (MRAE), root mean squared error (RMSE), peak signal-to-noise ratio (PSNR) to measure pixel-wise numerical reconstruction fidelity, and spectral angle mapper (SAM) to assess the geometric similarity of spectral signatures. A higher PSNR indicates better performance, whereas lower values are preferred for the other metrics.

M R A E = \frac{1}{N} \sum_{i = 1}^{N} (\frac{|H_{G T}^{i} - H_{S R}^{i}|}{H_{G T}^{i}})

(19)

R M S E = \sqrt{\frac{1}{N} \sum_{i = 1}^{N} {(H_{G T}^{i} - H_{S R}^{i})}^{2}}

(20)

S A M = \frac{1}{M} \arccos (\sum_{j = 1}^{M} \frac{{(S_{G T}^{j})}^{T} \cdot S_{S R}^{j}}{{∥S_{G T}^{j}∥}_{2} \cdot {∥S_{S R}^{j}∥}_{2}})

(21)

P S N R = - \frac{20}{N} \sum_{i = 1}^{N} \log (|H_{G T}^{i} - H_{S R}^{i}|)

(22)

In the above,

H_{G T}^{i}

and

H_{S R}^{i}

denote the i-th pixel values of the GT and reconstructed HSI, respectively, and N is the total number of pixels.

S_{G T}^{j}

and

S_{S R}^{j}

denote the 1D spectral signatures of the j-th hyperspectral pixel in the GT and reconstructed images, respectively, and M is the number of pixels in an image slice.

4.3. Implementation Details

During training, we randomly crop

128 \times 128

patches from the original RGB images as input. End-to-end optimization is performed using the Adam optimizer. During initialization, we set

Γ

as a second-order finite difference Laplacian operator and initialize

γ

to a small positive value of 0.01. To ensure that the regularization strength remains positive during the optimization process, we use the non-negative parameterization

R e l u (γ)

, ensuring that

γ > 0

throughout training. To accommodate the different convergence behaviors of the physics layer and the deep network, we adopt a hierarchical learning-rate scheme; the initial learning rate is set to

1 \times 10^{- 5}

for the learnable regularization matrix and coefficient in the Tikhonov layer, and

1 \times 10^{- 4}

for the remaining network parameters. The learning rate is halved every 500 epochs, and the model is trained for a total of 3000 epochs. The batch size is set to 16 on ARAD-1K and 4 on the other datasets. These learnable parameters are optimized alongside the network weights. All compared methods are implemented under the same semi-supervised training framework and use only about 2% labeled data as well. It is worth noting that most competing methods follow a single-input, single-output architecture; for consistency and simplicity, we therefore train these methods with a unified single-scale version of the semi-supervised loss. The proposed network is implemented in PyTorch 2.0 and trained on a single NVIDIA RTX 4090 GPU.

4.4. Results

4.4.1. Quantitative Results

We conducted comparative experiments on the ARAD-1K [70], CAVE [71], and GDFC datasets [43] against several representative spectral reconstruction methods, including HSCNND [51], HRNet [55], AWAN [54], MST++ [42], RepCPSI [62], FSDFF [60], and LTRN [61]. The results are summarized in Table 2. On ARAD-1K, our method achieves improvements of 11.8%, 7.3%, 10.2%, and 0.029% in MRAE, RMSE, SAM, and PSNR, respectively, over the second-best result, indicating a clear gain in reconstruction accuracy. On CAVE, our approach further improves MRAE, RMSE, and PSNR by 15.4%, 7.1%, and 2.2%, respectively, and remains the top performer overall. Notably, the improvement in SAM is relatively limited, suggesting that the spectral shape agreement still leaves room for improvement. This behavior may be attributed to severe metamerism in CAVE, under which all methods exhibit degraded SAM performance. Several recent studies [73] have examined this phenomenon in depth, yet an effective remedy has not been established. On the GDFC dataset, the proposed method continues to show a pronounced advantage; compared with the second-best method, it improves MRAE, RMSE, SAM, and PSNR by 23.1%, 20.9%, 22.1%, and 6.8%, respectively. Overall, consistent gains are observed across these representative datasets. In addition, although MST++ performs strongly under full supervision, its performance drops markedly with only 2% labeled data, indicating limited effectiveness when spectral supervision is scarce.

4.4.2. Qualitative and Visual Results

To assess the perceptual quality of the reconstructed HSI, we visualize band-wise MRAE maps for randomly selected validation samples from the ARAD-1K, CAVE, and GDFC datasets, comparing our method with other representative reconstruction approaches, as shown in Figure 4, Figure 5 and Figure 6. In these error maps, darker colors indicate higher reconstruction accuracy. The heatmaps show that our method produces darker regions over a larger area, suggesting lower reconstruction errors and richer texture details in the reconstructed HSI. Compared with other methods, our results exhibit a more uniform error distribution and better recovery of local structures and fine details. In addition, spectral fidelity is evaluated by comparing the mean spectral signatures within selected regions, as shown in Figure 7. The selected regions are highlighted by red boxes in Figure 4 and Figure 6. As can be seen from Figure 7, the spectra recovered by our method are closer to the GT.

4.4.3. Model Complexity and Efficiency

Beyond reconstruction performance, we further report the number of model parameters (Params), floating-point operations (FLOPs), and inference time to comprehensively assess storage overhead, computational complexity, and practical runtime efficiency. Since FLOPs depend on the input resolution, we compute FLOPs on the ARAD-1K dataset using a unified input size of

256 \times 256

, and measure inference time under the same hardware conditions. As shown in Table 3, our method achieves lower Params and FLOPs, resulting in a smaller model size and reduced computational cost. Compared with HRNet and AWAN, which obtain the second-best performance on some datasets, our approach significantly reduces the parameter scale and computational burden while delivering higher reconstruction accuracy. Moreover, relative to the recently proposed FSDFF and LTRN, our method also demonstrates superior efficiency in terms of Params and FLOPs. In addition, our method requires only 2.26 s for inference, further validating its high efficiency in practical deployment scenarios. Overall, the proposed method achieves a better trade-off between high-fidelity reconstruction and lightweight, efficient inference.

4.5. Ablation Analysis

This section conducts a systematic ablation study on the CAVE dataset to validate the effectiveness of the spectrally aware mini-patches (SA-MP), the contributions of different module designs, the role of the frequency-domain consistency loss, the impact of single-scale versus multi-scale loss design, and the comparison of mini-patch selection strategies (random sampling vs. grid-center sampling). The corresponding ablation results are summarized in Table 4, Table 5, Table 6, Table 7 and Table 8, providing a systematic analysis of how each component contributes to improving semi-supervised spectral reconstruction performance.

Effect of SA-MP. We investigate the role of SA-MP in semi-supervised learning. As reported in Table 4, compared with the purely unsupervised setting without SA-MP, introducing SA-MP reduces MRAE, RMSE, and SAM by 10.6%, 36.8%, and 27.1%, respectively, and increases PSNR by 15.1%. These results indicate that relying solely on the unsupervised degradation-consistency loss tends to cause noticeable spectral distortion, whereas the regional mean-spectrum guidance provided by SA-MP imposes more effective constraints on the learned spectral distribution, leading to a substantial improvement in reconstruction performance.

Effect of different modules: To evaluate the contribution of each component, we adopt a baseline network composed only of residual blocks and then progressively incorporate the hybrid attention module (HAM) and the physics-prior layer (Tikhonov layer) for comparison. As reported in Table 5, incorporating HAM results in improvements MRAE, RMSE, SAM, and PSNR by 4.4%, 17.1%, 6.5%, and 6.3%, respectively, compared to the baseline. This demonstrates that the hybrid attention mechanism can more effectively couple spatial textures with spectral correlations, thereby enhancing the feature representation capabilities. Further introducing the Tikhonov layer yields additional gains over both the baseline and the HAM-only setting, demonstrating the benefit of the physics prior for reconstruction. When the fixed-form Tikhonov layer is upgraded to an optimizable version, the best performance is achieved; relative to the baseline, the four metrics improve by 19.2%, 31.7%, 19.8%, and 12.2%, respectively, and compared with the fixed (non-optimizable) Tikhonov layer, the improvements are of 3.3%, 9.6%, 6.8%, and 2.2%, respectively. Meanwhile, the parameter count remains low, indicating a favorable trade-off between reconstruction accuracy and model complexity.

Effect of the Frequency Consistency Loss: In SSHSR, we introduce Frequency Consistency Loss (FCL) into the spectral supervision to constrain the discrepancy between the reconstructed and GT spectra in the frequency domain, thereby encouraging better modeling of inter-band correlations. As shown in Table 6, compared with the setting without FCL, adding FCL reduces MRAE, RMSE, and SAM by 5.1%, 8.0%, and 14.7%, respectively, and improves PSNR by 3.1%. These results indicate that the frequency-domain consistency constraint provides stable performance gains within our framework and enhances the model’s ability to capture spectral structural information.

Effect of single-scale/multi-scale loss functions: We further conduct an ablation study on single-scale and multi-scale losses. The single-scale loss is calculated based on the mean spectrum of SA-MP at the highest spatial resolution output. In contrast, the multi-scale loss applies joint constraints across the network’s multi-scale outputs; using the GT SA-MPs at the highest resolution as a reference, these patches are downsampled at each scale to obtain corresponding spectral references. The mean spectrum of each patch at each scale is then compared with the mean spectrum of the corresponding patch in the network’s output at that scale, strengthening the local spectral consistency across different spatial resolutions. As shown in Table 7, compared to the single-scale loss, introducing the multi-scale loss reduces MRAE, RMSE, and SAM by 3.6%, 6.1%, and 7.0%, respectively, and improves PSNR by 2.0%. This indicates that the multi-scale constraint provides more comprehensive spectral supervision across different spatial resolutions, alleviating the scale bias caused by supervision at a single scale, and thus improving overall reconstruction accuracy.

Comparison of mini-patch selection strategies:To examine the effect of different mini-patch selection schemes on SA-MP mean-spectrum supervision, we keep all other settings unchanged and vary only the sampling strategy of the

5 \times 5

mini-patches, comparing random sampling with grid-center sampling (i.e., selecting the centers of a

3 \times 3

grid within each 128 × 128 crop). Both strategies use the same number of mini-patches for supervision. As shown in Table 8, compared with random sampling, grid-center sampling achieves reductions of 2.7%, 4.7%, and 5.8% in MRAE, RMSE, and SAM, respectively, along with a 2.0% improvement in PSNR. These results indicate that the spatial placement of mini-patches affects the effectiveness of mean-spectrum supervision and, consequently, the final reconstruction performance.

5. Analysis of Results in Real-World Scenarios

Following the validation on standard benchmark datasets, this section further explores the performance of the proposed method in real-world environments. We conducted extended experimental analysis based on hyperspectral data collected from real-world scenes to further validate the effectiveness and advantages of the semi-supervised hyperspectral reconstruction paradigm in practical application scenarios. The real-world dataset was captured using the commercial Specim IQ hyperspectral camera (Specim, Oulu, Finland), with scenes including typical objects such as indoor office supplies, potted plants, cups, and standard color targets. The spectral range of the data is 400–700 nm, with 31 spectral bands and a spatial resolution of

512 \times 512

. A total of 31 hyperspectral images were collected, with 26 randomly selected for the training set and 5 for the validation set. During the data collection process, the hyperspectral camera was fixed above the target object at an approximately 45° viewing angle. A halogen lamp was used as the light source, positioned directly above the camera to achieve uniform illumination of the target. The corresponding RGB observations were generated by spectral integration of the collected hyperspectral data using the SRF of the Canon 60D camera.

Quantitative and Visual Results

Our method achieves strong performance on three representative datasets, namely ARAD-1K, CAVE, and GDFC. To further evaluate its reconstruction capability under different data distributions and imaging conditions, we apply the proposed approach to real captured data for comparative validation. As shown in Table 9, the proposed method also delivers consistent performance gains on the real-scene dataset, outperforming the second-best result by 13.7%, 16.2%, 11.9%, and 6.0% in terms of MRAE, RMSE, SAM, and PSNR, respectively. This indicates that our method can maintain stable and competitive reconstruction performance in real-world acquisition scenarios. Figure 8 presents the MRAE maps on the real-scene validation set, and Figure 9 shows the corresponding spectral curves. From the visualization of the MRAE maps, our method exhibits darker colors over larger regions, indicating higher-quality reconstructed hyperspectral images and more accurate recovery of both spatial details and spectral information. As can be clearly observed in Figure 9, the spectra reconstructed by our method are closer to the ground-truth spectra, demonstrating its notable advantages in spectral consistency and spectral fidelity.

6. Discussion

6.1. Challenges

Although this study enhances the robustness of reconstruction under limited labeling conditions through region-level mean spectrum supervision (SA-MP), we also recognize several inherent limitations. Metamerism remains a fundamental challenge in RGB-to-HSI reconstruction, especially in regions containing multiple materials or exhibiting strong spectral mixing, which can introduce ambiguity in the recovered spectra. Additionally, if there is a mismatch between the assumed SRF and the actual SRF, reconstruction accuracy could be affected. To quantify this effect, we conduct an SRF-mismatch sensitivity study and report the detailed settings and results in the Supplementary Materials (Section S1, Table S1). This issue may become more pronounced in cross-sensor deployment, where SRFs can differ substantially across devices. Moreover, generalization under extreme illumination conditions (e.g., strong shadows, saturation, or significant illumination spectral variations) remains challenging and may affect reconstruction reliability in real-world scenarios. These limitations highlight the potential for improvement and open up directions for future work.

6.2. Future Work

To address the limitations discussed earlier, we intend to explore two complementary directions. Firstly, we will investigate more effective semi-supervised strategies to alleviate metamerism and improve the reliability of SA-MP supervision; for example, by incorporating uncertainty-aware constraints and robustness-oriented regularization in spectrally heterogeneous regions. Secondly, we will explore SRF-aware robust modeling (e.g., SRF augmentation and calibration/estimation) for cross-sensor scenarios, and investigate illumination-robust training/augmentation to improve generalization under extreme lighting variations. Beyond these SSHSR-specific refinements, it is also worthwhile drawing inspiration from broader HSI reconstruction paradigms to further enhance detail modeling and robustness.

In broader hyperspectral imaging (HSI) tasks, various advanced reconstruction paradigms have been explored [74,75,76,77,78]. GAN-based frameworks [74] introduce adversarial objectives to enhance perceptual quality and sharpen high-frequency details, whereas blind-spot/self-supervised reconstruction networks [75,76] reduce reliance on dense ground truth by leveraging masking strategies and independence assumptions. In future work, lightweight adversarial constraints could be incorporated into SSHSR to strengthen detail modeling, and blind-spot-style masking consistency could also be integrated to further reduce the dependence on spectral annotations, thereby potentially improving robustness and generalization under limited-label settings. Overall, these directions may help improve SSHSR’s robustness and applicability in real-world remote sensing tasks.

7. Conclusions

This paper focuses on the RGB-to-hyperspectral reconstruction problem under the condition of limited spectral annotations, and proposes a semi-supervised reconstruction framework guided by spectrally aware mini-patches (SA-MP). The framework samples a small number of local spectral mini-patches through SA-MP and constructs supervision terms based on their region-averaged spectra, allowing the limited local spectral information to effectively constrain the full-image prediction, thereby enabling semi-supervised full-image hyperspectral recovery. Furthermore, we embed a learnable Tikhonov regularization physical layer into the reconstruction process, jointly optimizing the regularization matrix and coefficients to provide more stable physical constraints and reliable initialization. At the network level, a hybrid attention reconstruction structure is designed, which promotes the full interaction of spectral–spatial information and detailed recovery by jointly modeling spectral and spatial features. Extensive experimental results demonstrate that the proposed method achieves competitive reconstruction performance on multiple public benchmark and extended datasets.

Finally, we summarize the main limitations of SSHSR (e.g., metamerism-related ambiguity, potential SRF mismatch in practical remote sensing, and limited generalization under extreme illumination variations) and outline several possible future directions in the Discussion section, including more robust semi-supervised supervision designs, improved tolerance to SRF uncertainty, illumination-robust training/augmentation, and the potential use of complementary self-supervised or adversarial constraints.

Supplementary Materials

The following supporting information can be downloaded at https://www.mdpi.com/article/10.3390/rs18030432/s1: Section S1: Sensitivity to SRF mismatch; Table S1: Quantitative results under SRF mismatch perturbations.

Author Contributions

R.S. conducted the experimental design, data analysis, and wrote the manuscript. H.H. and Z.Y. were responsible for writing the manuscript and performing English language editing. H.W., J.Z., and Y.N. provided valuable suggestions for revising the paper and optimized its overall structure and content. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the Chinese Academy of Sciences No. WX2021-PY-0110, in part by Guangzhou National Laboratory No. GZNL2023A03009, and in part by the Beijing Natural Science Foundation No. JQ22029, No. 7252019, L255014. Yunfeng Nie acknowledges the Fonds Wetenschappelijk Onderzoek for supporting her research (G0A3O24N, G098026N).

Data Availability Statement

Public benchmark datasets used in this study are available from their original sources (e.g., ARAD-1K, CAVE, GDFC). https://zenodo.org/records/18241454 (accessed on 18 January 2026) (Real scene hyperspectral data).

Acknowledgments

The authors would like to thank those who provide open-source code for the whole community, such as HSCNND, HRNet, AWAN, and MST++.

Conflicts of Interest

The authors declare that there are no conflicts of interest.

References

Bioucas-Dias, J.M.; Plaza, A.; Camps-Valls, G.; Scheunders, P.; Nasrabadi, N.; Chanussot, J. Hyperspectral remote sensing data analysis and future challenges. IEEE Geosci. Remote Sens. Mag. 2013, 1, 6–36. [Google Scholar] [CrossRef]
Yuan, Y.; Zheng, X.; Lu, X. Hyperspectral image superresolution by transfer learning. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2017, 10, 1963–1974. [Google Scholar] [CrossRef]
Li, J.; Zhang, Z.; Song, R.; Li, Y.; Du, Q. SCFormer: Spectral coordinate transformer for cross-domain few-shot hyperspectral image classification. IEEE Trans. Image Process. 2024, 33, 840–855. [Google Scholar] [CrossRef] [PubMed]
Yan, Z.; Huang, H.; Geng, R.; Zhang, J.; Chen, Y.; Nie, Y. Improving lung cancer pathological hyperspectral diagnosis through cell-level annotation refinement. Sci. Rep. 2025, 15, 8086. [Google Scholar] [CrossRef] [PubMed]
Meng, Z.; Qiao, M.; Ma, J.; Yu, Z.; Xu, K.; Yuan, X. Snapshot multispectral endomicroscopy. Opt. Lett. 2020, 45, 3897–3900. [Google Scholar] [CrossRef]
Pu, H.; Lin, L.; Sun, D.W. Principles of hyperspectral microscope imaging techniques and their applications in food quality and safety detection: A review. Compr. Rev. Food Sci. Food Saf. 2019, 18, 853–866. [Google Scholar] [CrossRef]
Gowen, A.A.; O’Donnell, C.P.; Cullen, P.J.; Downey, G.; Frias, J.M. Hyperspectral imaging–an emerging process analytical tool for food quality and safety control. Trends Food Sci. Technol. 2007, 18, 590–598. [Google Scholar] [CrossRef]
Wang, Y.; Yuan, Q.; Zhou, S.; Zhang, L. Global spatiotemporal completion of daily high-resolution TCCO from TROPOMI over land using a swath-based local ensemble learning method. ISPRS J. Photogramm. Remote Sens. 2022, 194, 167–180. [Google Scholar] [CrossRef]
Li, J.; Pei, Y.; Zhao, S.; Xiao, R.; Sang, X.; Zhang, C. A review of remote sensing for environmental monitoring in China. Remote Sens. 2020, 12, 1130. [Google Scholar] [CrossRef]
Wang, Y.; Yuan, Q.; Li, T.; Zhu, L.; Zhang, L. Estimating daily full-coverage near surface O₃, CO, and NO₂ concentrations at a high spatial resolution over China based on S5P-TROPOMI and GEOS-FP. ISPRS J. Photogramm. Remote Sens. 2021, 175, 311–325. [Google Scholar] [CrossRef]
Van der Meer, F.D.; Van der Werff, H.M.; Van Ruitenbeek, F.J.; Hecker, C.A.; Bakker, W.H.; Noomen, M.F.; Van Der Meijde, M.; Carranza, E.J.M.; De Smeth, J.B.; Woldai, T. Multi-and hyperspectral geologic remote sensing: A review. Int. J. Appl. Earth Obs. Geoinf. 2012, 14, 112–128. [Google Scholar] [CrossRef]
Zhao, D.; Wang, M.; Huang, K.; Zhong, W.; Arun, P.V.; Li, Y.; Asano, Y.; Wu, L.; Zhou, H. OCSCNet-Tracker: Hyperspectral Video Tracker Based on Octave Convolution and Spatial–Spectral Capsule Network. Remote Sens. 2025, 17, 693. [Google Scholar] [CrossRef]
Zhao, D.; Zhong, W.; Ge, M.; Jiang, W.; Zhu, X.; Arun, P.V.; Zhou, H. SiamBSI: Hyperspectral video tracker based on band correlation grouping and spatial–spectral information interaction. Infrared Phys. Technol. 2025, 151, 106063. [Google Scholar] [CrossRef]
Zhao, D.; Hu, B.; Jiang, W.; Zhong, W.; Arun, P.V.; Cheng, K.; Zhao, Z.; Zhou, H. Hyperspectral video tracker based on spectral difference matching reduction and deep spectral target perception features. Opt. Lasers Eng. 2025, 194, 109124. [Google Scholar] [CrossRef]
Zhao, D.; Zhang, H.; Arun, P.V.; Jiao, C.; Zhou, H.; Xiang, P.; Cheng, K. SiamSTU: Hyperspectral video tracker based on spectral spatial angle mapping enhancement and state aware template update. Infrared Phys. Technol. 2025, 150, 105919. [Google Scholar] [CrossRef]
Zhao, D.; Yan, W.; You, M.; Zhang, J.; Arun, P.V.; Jiao, C.; Wang, Q.; Zhou, H. Hyperspectral Anomaly Detection Based on Empirical Mode Decomposition and Local Weighted Contrast. IEEE Sens. J. 2024, 24, 33847–33861. [Google Scholar] [CrossRef]
Zhao, D.; Xu, X.; You, M.; Arun, P.V.; Zhao, Z.; Ren, J.; Wu, L.; Zhou, H. Local Sub-Block Contrast and Spatial–Spectral Gradient Feature Fusion for Hyperspectral Anomaly Detection. Remote Sens. 2025, 17, 695. [Google Scholar] [CrossRef]
Huo, Y.; Dong, Y.; Wang, C.; Zhang, M.; Wang, H. Multi-scale memory network with separation training for hyperspectral anomaly detection. Inf. Process. Manag. 2026, 63, 104494. [Google Scholar] [CrossRef]
Zhao, D.; Tang, L.; Arun, P.V.; Asano, Y.; Zhang, L.; Xiong, Y.; Tao, X.; Hu, J. City-scale distance estimation via near-infrared trispectral light extinction in bad weather. Infrared Phys. Technol. 2023, 128, 104507. [Google Scholar] [CrossRef]
Zhao, D.; Zhou, L.; Li, Y.; He, W.; Arun, P.V.; Zhu, X.; Hu, J. Visibility estimation via near-infrared bispectral real-time imaging in bad weather. Infrared Phys. Technol. 2024, 136, 105008. [Google Scholar] [CrossRef]
Zhao, D.; Asano, Y.; Gu, L.; Sato, I.; Zhou, H. City-Scale Distance Sensing via Bispectral Light Extinction in Bad Weather. Remote Sens. 2020, 12, 1401. [Google Scholar] [CrossRef]
Wang, Z.; Yi, S.; Chen, A.; Zhou, M.; Luk, T.S.; James, A.; Nogan, J.; Ross, W.; Joe, G.; Shahsafi, A.; et al. Single-shot on-chip spectral sensors based on photonic crystal slabs. Nat. Commun. 2019, 10, 1020. [Google Scholar] [CrossRef]
Zhang, Z.; Liu, Y.; Wang, Z.; Zhang, Y.; Guo, X.; Xiao, S.; Xu, K.; Song, Q. Folded digital meta-lenses for on-chip spectrometer. Nano Lett. 2023, 23, 3459–3466. [Google Scholar] [CrossRef] [PubMed]
Wagadarikar, A.A.; Pitsianis, N.P.; Sun, X.; Brady, D.J. Video rate spectral imaging using a coded aperture snapshot spectral imager. Opt. Express 2009, 17, 6368–6388. [Google Scholar] [CrossRef] [PubMed]
Meng, Z.; Ma, J.; Yuan, X. End-to-end low cost compressive spectral imaging with spatial-spectral self-attention. In European Conference on Computer Vision; Springer: Berlin/Heidelberg, Germany, 2020; pp. 187–204. [Google Scholar]
Arad, B.; Ben-Shahar, O. Sparse recovery of hyperspectral signal from natural RGB images. In European Conference on Computer Vision; Springer: Berlin/Heidelberg, Germany, 2016; pp. 19–34. [Google Scholar]
Sheng, C.; Tang, S.; Deng, Y.K.; Zhang, H.; Liu, D.; Wang, W. An Advanced Scheme for Deceptive Jammer Localization and Suppression in Elevation Multichannel SAR for Underdetermined Scenarios. IEEE Trans. Aerosp. Electron. Syst. 2025, 1–18. [Google Scholar] [CrossRef]
Zhang, J.; Su, R.; Fu, Q.; Ren, W.; Heide, F.; Nie, Y. A survey on computational spectral reconstruction methods from RGB to hyperspectral imaging. Sci. Rep. 2022, 12, 11905. [Google Scholar] [CrossRef]
Gou, S.; Liu, S.; Yang, S.; Jiao, L. Remote sensing image super-resolution reconstruction based on nonlocal pairwise dictionaries and double regularization. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2014, 7, 4784–4792. [Google Scholar] [CrossRef]
Xue, J.; Zhao, Y.Q.; Bu, Y.; Liao, W.; Chan, J.C.W.; Philips, W. Spatial-spectral structured sparse low-rank representation for hyperspectral image super-resolution. IEEE Trans. Image Process. 2021, 30, 3084–3097. [Google Scholar] [CrossRef]
Dong, W.; Fu, F.; Shi, G.; Cao, X.; Wu, J.; Li, G.; Li, X. Hyperspectral image super-resolution via non-negative structured sparse representation. IEEE Trans. Image Process. 2016, 25, 2337–2352. [Google Scholar] [CrossRef]
Dian, R.; Shan, T.; He, W.; Liu, H. Spectral super-resolution via model-guided cross-fusion network. IEEE Trans. Neural Netw. Learn. Syst. 2023, 35, 10059–10070. [Google Scholar] [CrossRef]
Wu, Y.; Dian, R.; Li, S. Multistage spatial–spectral fusion network for spectral super-resolution. IEEE Trans. Neural Netw. Learn. Syst. 2024, 36, 12736–12746. [Google Scholar] [CrossRef] [PubMed]
Li, J.; Wu, C.; Song, R.; Li, Y.; Xie, W.; He, L.; Gao, X. Deep hybrid 2-D–3-D CNN based on dual second-order attention with camera spectral sensitivity prior for spectral super-resolution. IEEE Trans. Neural Netw. Learn. Syst. 2021, 34, 623–634. [Google Scholar] [CrossRef] [PubMed]
Zhu, Z.; Liu, H.; Hou, J.; Zeng, H.; Zhang, Q. Semantic-embedded unsupervised spectral reconstruction from single RGB images in the wild. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 11–17 October 2021; pp. 2279–2288. [Google Scholar]
He, J.; Li, J.; Yuan, Q.; Shen, H.; Zhang, L. Spectral response function-guided deep optimization-driven network for spectral super-resolution. IEEE Trans. Neural Netw. Learn. Syst. 2021, 33, 4213–4227. [Google Scholar] [CrossRef] [PubMed]
Han, X.; Zhang, H.; Xue, J.H.; Sun, W. A spectral–spatial jointed spectral super-resolution and its application to HJ-1A satellite images. IEEE Geosci. Remote Sens. Lett. 2021, 19, 5505905. [Google Scholar] [CrossRef]
Simonetto, A.; Zanuttigh, P.; Parret, V.; Sartor, P.; Gatto, A. Semi-supervised deep learning techniques for spectrum reconstruction. In Proceedings of the 2020 25th International Conference on Pattern Recognition (ICPR); IEEE: New York, NY, USA, 2021; pp. 7767–7774. [Google Scholar]
Chen, W.; Zheng, X.; Lu, X. Semisupervised spectral degradation constrained network for spectral super-resolution. IEEE Geosci. Remote Sens. Lett. 2021, 19, 5506205. [Google Scholar] [CrossRef]
Li, J.; Leng, Y.; Song, R.; Liu, W.; Li, Y.; Du, Q. MFormer: Taming masked transformer for unsupervised spectral reconstruction. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5508412. [Google Scholar] [CrossRef]
Cao, X.; Lian, Y.; Liu, Z.; Li, J.; Wang, K. Unsupervised spectral reconstruction from RGB images under two lighting conditions. Opt. Lett. 2024, 49, 1993–1996. [Google Scholar] [CrossRef]
Cai, Y.; Lin, J.; Lin, Z.; Wang, H.; Zhang, Y.; Pfister, H.; Timofte, R.; Van Gool, L. Mst++: Multi-stage spectral-wise transformer for efficient spectral reconstruction. In Proceedings of the CVPR, New Orleans, LA, USA, 19–24 June 2022; pp. 745–755. [Google Scholar]
Liu, L.; Li, W.; Shi, Z.; Zou, Z. Physics-informed hyperspectral remote sensing image synthesis with deep conditional generative adversarial networks. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5528215. [Google Scholar] [CrossRef]
Aeschbacher, J.; Wu, J.; Timofte, R. In defense of shallow learned spectral reconstruction from RGB images. In Proceedings of the 2017 IEEE International Conference on Computer Vision Workshops; IEEE: New York, NY, USA, 2017; pp. 471–479. [Google Scholar]
Jia, Y.; Zheng, Y.; Gu, L.; Subpa-Asa, A.; Lam, A.; Sato, Y.; Sato, I. From RGB to spectrum for natural scenes via manifold-based mapping. In Proceedings of the 2017 IEEE International Conference on Computer Vision; IEEE: New York, NY, USA, 2017; pp. 4705–4713. [Google Scholar]
Akhtar, N.; Mian, A. Hyperspectral recovery from RGB images using Gaussian processes. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 42, 100–113. [Google Scholar] [CrossRef]
Jones, G.; Clancy, N.T.; Helo, Y.; Arridge, S.; Elson, D.S.; Stoyanov, D. Bayesian estimation of intrinsic tissue oxygenation and perfusion from RGB images. IEEE Trans. Med. Imaging 2017, 36, 1491–1501. [Google Scholar] [CrossRef]
Jones, G.; Clancy, N.T.; Arridge, S.; Elson, D.S.; Stoyanov, D. Inference of tissue haemoglobin concentration from Stereo RGB. In International Conference on Medical Imaging and Augmented Reality; Springer: Berlin/Heidelberg, Germany, 2016; pp. 50–58. [Google Scholar]
Xiong, Z.; Shi, Z.; Li, H.; Wang, L.; Liu, D.; Wu, F. Hscnn: Cnn-based hyperspectral image recovery from spectrally undersampled projections. In Proceedings of the 2017 IEEE International Conference on Computer Vision Workshops; IEEE: New York, NY, USA, 2017; pp. 518–525. [Google Scholar]
Arce, G.R.; Brady, D.J.; Carin, L.; Arguello, H.; Kittle, D.S. Compressive coded aperture spectral imaging: An introduction. IEEE Signal Process. Mag. 2013, 31, 105–115. [Google Scholar] [CrossRef]
Shi, Z.; Chen, C.; Xiong, Z.; Liu, D.; Wu, F. Hscnn+: Advanced cnn-based hyperspectral recovery from rgb images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Salt Lake City, UT, USA, 18–22 June 2018; pp. 939–947. [Google Scholar]
Arad, B.; Ben-Shahar, O.; Timofte, R.; Gool, V. NTIRE 2018 Challenge on Spectral Reconstruction from RGB Images. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Salt Lake City, UT, USA, 18–22 June 2018; pp. 1042–104209. [Google Scholar]
Zhang, L.; Lang, Z.; Wang, P.; Wei, W.; Liao, S.; Shao, L.; Zhang, Y. Pixel-aware deep function-mixture network for spectral super-resolution. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 12821–12828. [Google Scholar]
Li, J.; Wu, C.; Song, R.; Li, Y.; Liu, F. Adaptive weighted attention network with camera spectral sensitivity prior for spectral reconstruction from RGB images. In Proceedings of the CVPRW, Seattle, WA, USA, 13–19 June 2020; pp. 462–463. [Google Scholar]
Zhao, Y.; Po, L.M.; Yan, Q.; Liu, W.; Lin, T. Hierarchical regression network for spectral reconstruction from RGB images. In Proceedings of the CVPRW, Seattle, WA, USA, 13–19 June 2020; pp. 422–423. [Google Scholar]
Arad, B.; Timofte, R.; Ben-Shahar, O.; Lin, Y.T.; Finlayson, G.D. Ntire 2020 challenge on spectral reconstruction from an rgb image. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Seattle, WA, USA, 13–19 June 2020; pp. 446–447. [Google Scholar]
Wang, Z.; Cun, X.; Bao, J.; Zhou, W.; Liu, J.; Li, H. Uformer: A general u-shaped transformer for image restoration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 17683–17693. [Google Scholar]
Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 11–17 October 2021; pp. 10012–10022. [Google Scholar]
Zamir, S.W.; Arora, A.; Khan, S.; Hayat, M.; Khan, F.S.; Yang, M.H. Restormer: Efficient transformer for high-resolution image restoration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 5728–5739. [Google Scholar]
Tan, L.; Dian, R.; Li, S.; Liu, J. Frequency-spatial domain feature fusion for spectral super-resolution. IEEE Trans. Comput. Imaging 2024, 10, 589–599. [Google Scholar] [CrossRef]
Dian, R.; Liu, Y.; Li, S. Spectral Super-Resolution via Deep Low-Rank Tensor Representation. IEEE Trans. Neural Netw. Learn. Syst. 2025, 36, 5140–5150. [Google Scholar] [CrossRef] [PubMed]
Wu, C.; Li, J.; Song, R.; Li, Y.; Du, Q. RepCPSI: Coordinate-preserving proximity spectral interaction network with reparameterization for lightweight spectral super-resolution. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5508313. [Google Scholar] [CrossRef]
Chen, Y.; Zhang, X. Ddsr: Degradation-aware diffusion model for spectral reconstruction from rgb images. Remote Sens. 2024, 16, 2692. [Google Scholar] [CrossRef]
Xu, H.; Yang, J.; Lin, T.; Liu, J.; Liu, F.; Xiao, L. Hyperspectral Reconstruction from RGB Images via Physical Guided Graph Deep Prior Learning. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5533614. [Google Scholar] [CrossRef]
Nie, Y.; Su, R.; Zhang, J.; Ottevaere, H. End-to-end aberration correction network for enhancing miniature microscope resolution. Opt. Lasers Eng. 2025, 184, 108558. [Google Scholar] [CrossRef]
Cui, Y.; Tao, Y.; Bing, Z.; Ren, W.; Gao, X.; Cao, X.; Huang, K.; Knoll, A. Selective frequency network for image restoration. In Proceedings of the Eleventh International Conference on Learning Representations, Kigali, Rwanda, 1–5 May 2023. [Google Scholar]
Shi, W.; Caballero, J.; Huszár, F.; Totz, J.; Aitken, A.P.; Bishop, R.; Rueckert, D.; Wang, Z. Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition; IEEE: New York, NY, USA, 2016; pp. 1874–1883. [Google Scholar]
Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition; IEEE: New York, NY, USA, 2018; pp. 7132–7141. [Google Scholar]
Zamir, S.W.; Arora, A.; Khan, S.; Hayat, M.; Khan, F.S.; Yang, M.H.; Shao, L. Cycleisp: Real image restoration via improved data synthesis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 2696–2705. [Google Scholar]
Arad, B.; Timofte, R.; Yahel, R.; Morag, N.; Bernat, A.; Cai, Y.; Lin, J.; Lin, Z.; Wang, H.; Zhang, Y.; et al. Ntire 2022 spectral recovery challenge and data set. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 863–881. [Google Scholar]
Yasuma, F.; Mitsunaga, T.; Iso, D.; Nayar, S.K. Generalized assorted pixel camera: Postcapture control of resolution, dynamic range, and spectrum. IEEE Trans. Image Process. 2010, 19, 2241–2253. [Google Scholar] [CrossRef]
Xu, Y.; Du, B.; Zhang, L.; Cerra, D.; Pato, M.; Carmona, E.; Prasad, S.; Yokoya, N.; Hänsch, R.; Le Saux, B. Advanced multi-sensor optical remote sensing for urban land use and land cover classification: Outcome of the 2018 IEEE GRSS data fusion contest. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2019, 12, 1709–1724. [Google Scholar] [CrossRef]
Fu, Q.; Souza, M.; Choi, E.; Shin, S.; Baek, S.H.; Heidrich, W. Limitations of Data-Driven Spectral Reconstruction: An Optics-Aware Analysis. IEEE Trans. Comput. Imaging 2025, 11, 1505–1520. [Google Scholar] [CrossRef]
Wang, D.; Gao, L.; Qu, Y.; Sun, X.; Liao, W. Frequency-to-spectrum mapping GAN for semisupervised hyperspectral anomaly detection. CAAI Trans. Intell. Technol. 2023, 8, 1258–1273. [Google Scholar] [CrossRef]
Wang, D.; Zhuang, L.; Gao, L.; Sun, X.; Zhao, X. Global feature-injected blind-spot network for hyperspectral anomaly detection. IEEE Geosci. Remote Sens. Lett. 2024, 21, 5509305. [Google Scholar] [CrossRef]
Wang, D.; Ren, L.; Sun, X.; Gao, L.; Chanussot, J. Non-local and local feature-coupled self-supervised network for hyperspectral anomaly detection. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2025, 18, 6981–6993. [Google Scholar] [CrossRef]
Wang, D.; Zhuang, L.; Gao, L.; Sun, X.; Huang, M.; Plaza, A. BockNet: Blind-block reconstruction network with a guard window for hyperspectral anomaly detection. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5531916. [Google Scholar] [CrossRef]
Wang, D.; Zhuang, L.; Gao, L.; Sun, X.; Zhao, X.; Plaza, A. Sliding dual-window-inspired reconstruction network for hyperspectral anomaly detection. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5504115. [Google Scholar] [CrossRef]

Figure 1. This semi-supervised hyperspectral reconstruction framework from RGB images is guided by spectrally aware mini-patches. The middle-left panel shows how the RGB observation is obtained. The middle-right panel illustrates the “physics-constrained–data-driven” architecture; the input RGB image is processed by an optimizable Tikhonov physical layer to generate an initial HSI estimate, which is then refined by a hybrid-attention network with a multi-input, multi-output design. During training, spectrally aware mini-patches are extracted from the reconstructed and ground-truth HSI to compute a patch-based semi-supervised spectral loss. The reconstructed HSI is also projected back to RGB using the camera spectral response function (SRF) and compared with the input RGB image to compute a degradation-consistency loss.

Figure 2. The proposed hybrid attention network uses a multi-scale architecture, with each branch processing a three-channel RGB image. The initial reconstruction is obtained via a differentiable or adaptive Tikhonov layer, which is then passed into the backbone network to produce the high-dimensional reconstructed HSI at each scale. The core component of the network is the hybrid attention module (HAM), which consists of the spectral attention block (SpAB) and the spatial attention block (SAB). The right side illustrates the semi-supervised training mechanism based on SA-MP.

Figure 3. The components of the proposed hybrid attention module (HAM): (a) spectral attention block; (b) spatial attention block.

Figure 4. The 23rd band reconstruction quality on an ARAD-1K validation sample is compared visually using the RGB image and the MRAE maps of HSCNND, HRNet, AWAN, MST++, RepCPSI, FSDFF, LTRN, and our method. The RGB images are highlighted for clarity.

Figure 5. A representative sample from the CAVE validation set is visualized on the 6th spectral band, showing the RGB observation and the corresponding MRAE maps of all methods. The RGB images are highlighted for clarity.

Figure 6. Qualitative visualization on the 35th band of the GDFC validation set, including the RGB observation and the MRAE maps produced by different methods.The RGB images are highlighted for clarity.

Figure 7. Spectral response curves are plotted for several randomly selected spatial locations (marked by red boxes in Figure 4 and Figure 6) to compare the reconstructed HSI from representative SR methods with the GT HSI. Panels (a–d) correspond to the ARAD-1K dataset, while panels (e,f) correspond to the GDFC dataset.

Figure 8. Visual quality comparison on the 13th band of an image selected from the real-scene validation set. Shown are the RGB image and the MRAE maps of HSCNND, HRNet, AWAN, MST++, RepCPSI, FSDFF, LTRN, and our method.The RGB images are highlighted for clarity.

Figure 9. Spectral response curves of the reconstructed hyperspectral images produced by representative SR methods and the ground-truth HSI at several randomly selected spatial locations (marked with red boxes in Figure 8). (a,b) Spectral curves for the first and second rows of Figure 8, respectively.

Table 1. Summary of the four datasets.

Dataset	Spectrum/nm	Resolution (h, w, L)	Train/Val	SRF	Acquisition Device	Scene
ARAD-1K [70]	400–700	$482 \times 512 \times 31$	897/49	Canon 60D	Specim IQ camera	Outdoor
CAVE [71]	400–700	$512 \times 482 \times 31$	24/8	Canon 60D	Generalized assorted pixel (GAP) camera	Indoor
GDFC [43]	380–1050	$512 \times 512 \times 48$	24/3	Basler ace 2	ITRES CASI 1500 camera	Remote sensing
Real scene	400–700	$512 \times 512 \times 31$	26/5	Canon 60D	Specim IQ camera	Lab setup

Table 2. Comparison results of various SR methods on the ARAD-1K, CAVE, and GDFC validation sets. The best and second-best entries are shown in bold and underlined, respectively.

Methods	ARAD-1K				CAVE				GDFC
Methods	MRAE ↓	RMSE ↓	SAM ↓	PSNR ↑	MRAE ↓	RMSE ↓	SAM ↓	PSNR ↑	MRAE ↓	RMSE ↓	SAM ↓	PSNR ↑
HSCNND [51]	0.1369	0.0180	5.84	37.82	0.3888	0.0299	23.50	30.78	0.1863	0.0561	11.68	25.04
HRNet [55]	0.1277	0.0173	5.89	37.47	0.5813	0.0283	21.97	31.31	0.1509	0.0325	6.69	29.80
AWAN [54]	0.1265	0.0164	5.69	38.51	0.4036	0.0287	22.06	31.18	0.1678	0.0337	7.33	29.50
MST++ [42]	0.2386	0.0262	13.21	32.81	0.5653	0.0307	24.45	30.55	0.5207	0.0849	25.42	21.51
RepCPSI [62]	0.1365	0.0190	6.09	36.65	0.5112	0.0284	16.35	31.31	0.1522	0.0324	6.89	29.80
FSDFF [60]	0.1990	0.0267	9.62	33.29	0.7712	0.0666	39.64	23.81	0.4339	0.0929	25.78	20.80
LTRN [61]	0.1716	0.0237	8.28	34.23	0.5736	0.0594	28.50	24.79	0.2865	0.0518	12.51	25.72
Ours	0.1116	0.0152	5.11	38.62	0.3290	0.0263	24.87	32.00	0.1161	0.0256	5.21	31.83

Table 3. Summary of model complexity and efficiency (Params, FLOPs, and inference time) for different methods. The best and second-best entries are shown in bold and underlined, respectively.

Methods	HSCNND [51]	HRNet [55]	AWAN [54]	MST++ [42]	RepCPSI [62]	FSDFF [60]	LTRN [61]	Ours
Params/M	2.69	31.70	4.04	1.62	2.08	6.93	5.10	1.59
FLOPs/G	176.29	163.8	264.34	23.31	135.7	152.12	27.47	14.62
Inference/s	3.34	3.12	3.22	3.19	3.08	3.33	3.13	2.26

Table 4. Ablation study of SA-MP. “w/o” denotes without the component and “w/” denotes with it. The best values for each metric are highlighted in bold.

Methods	MRAE ↓	RMSE ↓	SAM ↓	PSNR ↑	Params
w/o SA-MP	0.3682	0.0416	34.13	27.80	1.59
w/ SA-MP	0.3290	0.0263	24.87	32.00	1.59

Table 5. The impact of each module is analyzed via ablation, and the best entries are highlighted in bold.

Methods	MRAE ↓	RMSE ↓	SAM ↓	PSNR ↑	Params/M
Baseline	0.4074	0.0385	31.00	28.52	1.49
HAM	0.3896	0.0319	28.99	30.32	1.58
Tikhonov layer	0.3404	0.0291	26.68	31.30	1.58
Optimized Tikhonov layer (Ours)	0.3290	0.0263	24.87	32.00	1.59

Table 6. Ablation study of the Frequency Consistency Loss. The best values in the table are highlighted in bold.

Methods	MRAE ↓	RMSE ↓	SAM ↓	PSNR ↑	Params
w/o $L_{f r e q}$	0.3468	0.0286	29.14	31.03	1.59
w/ $L_{f r e q}$	0.3290	0.0263	24.87	32.00	1.59

Table 7. Analysis of single-scale and multi-scale losses. The best values in the table are highlighted in bold.

Methods	MRAE ↓	RMSE ↓	SAM ↓	PSNR ↑	Params
Single scale loss	0.3413	0.0280	26.75	31.38	1.59
Multi scale loss	0.3290	0.0263	24.87	32.00	1.59

Table 8. Effect of mini-patch selection strategy (random sampling, grid-center sampling). The best values in the table are highlighted in bold.

Methods	MRAE ↓	RMSE ↓	SAM ↓	PSNR ↑	Params
Random	0.3382	0.0276	26.40	31.38	1.59
Grid-center	0.3290	0.0263	24.87	32.00	1.59

Table 9. A comparison of representative SR methods on the real scene validation set in terms of MRAE, RMSE, SAM, and PSNR metrics. The best results are highlighted in bold, and the second-best results are indicated with an underline.

Methods	Real Scene
Methods	MRAE ↓	RMSE ↓	SAM ↓	PSNR ↑
HSCNND [51]	0.1547	0.0511	13.38	25.70
HRNet [55]	0.1506	0.0364	7.97	28.82
AWAN [54]	0.1402	0.0367	8.96	28.80
MST++ [42]	0.4726	0.1432	26.55	16.99
RepCPSI [62]	0.1191	0.0372	6.83	28.63
FSDFF [60]	0.4733	0.1441	25.88	16.92
LTRN [61]	0.5852	0.1643	29.79	15.77
Ours	0.1028	0.0305	6.02	30.54

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Su, R.; Huang, H.; Wang, H.; Yan, Z.; Zhang, J.; Nie, Y. Semi-Supervised Hyperspectral Reconstruction from RGB Images via Spectrally Aware Mini-Patch Calibration. Remote Sens. 2026, 18, 432. https://doi.org/10.3390/rs18030432

AMA Style

Su R, Huang H, Wang H, Yan Z, Zhang J, Nie Y. Semi-Supervised Hyperspectral Reconstruction from RGB Images via Spectrally Aware Mini-Patch Calibration. Remote Sensing. 2026; 18(3):432. https://doi.org/10.3390/rs18030432

Chicago/Turabian Style

Su, Runmu, Haosong Huang, Hai Wang, Zhiliang Yan, Jingang Zhang, and Yunfeng Nie. 2026. "Semi-Supervised Hyperspectral Reconstruction from RGB Images via Spectrally Aware Mini-Patch Calibration" Remote Sensing 18, no. 3: 432. https://doi.org/10.3390/rs18030432

APA Style

Su, R., Huang, H., Wang, H., Yan, Z., Zhang, J., & Nie, Y. (2026). Semi-Supervised Hyperspectral Reconstruction from RGB Images via Spectrally Aware Mini-Patch Calibration. Remote Sensing, 18(3), 432. https://doi.org/10.3390/rs18030432

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Semi-Supervised Hyperspectral Reconstruction from RGB Images via Spectrally Aware Mini-Patch Calibration

Highlights

Abstract

1. Introduction

2. Related Work

2.1. Physics-Model–Based Optimization Methods for SR

2.2. Data-Driven Supervised SR

2.3. Semi-Supervised/Unsupervised SR

3. Materials and Methods

3.1. Problem Formulation

3.2. Initial Spectral Estimation Based on Tikhonov Regularization

3.3. Overall Network Architecture

Hybrid Attention Module (HAM)

3.4. SSHSR Scheme

3.5. Composite Loss Function

4. Experiments and Results

4.1. Dataset

4.2. Evaluation Metrics

4.3. Implementation Details

4.4. Results

4.4.1. Quantitative Results

4.4.2. Qualitative and Visual Results

4.4.3. Model Complexity and Efficiency

4.5. Ablation Analysis

5. Analysis of Results in Real-World Scenarios

Quantitative and Visual Results

6. Discussion

6.1. Challenges

6.2. Future Work

7. Conclusions

Supplementary Materials

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI