HCNet: Multi-Exposure High-Dynamic-Range Reconstruction Network for Coded Aperture Snapshot Spectral Imaging

Shi, Hang; Chen, Jingxia; Li, Yahui; Zhang, Pengwei; Tian, Jinshou

doi:10.3390/s26010337

Open AccessArticle

HCNet: Multi-Exposure High-Dynamic-Range Reconstruction Network for Coded Aperture Snapshot Spectral Imaging

by

Hang Shi

^1,2

,

Jingxia Chen

¹,

Yahui Li

^2,*,

Pengwei Zhang

¹ and

Jinshou Tian

²

¹

School of Electronics Information and Artificial Intelligence, Shaanxi University of Science and Technology, Xi’an 710021, China

²

The Key Laboratory of Ultra-Fast Photoelectric Diagnostics Technology, Xi’an Institute of Optics and Precision Mechanics of CAS, Xi’an 710119, China

^*

Author to whom correspondence should be addressed.

Sensors 2026, 26(1), 337; https://doi.org/10.3390/s26010337

Submission received: 20 November 2025 / Revised: 21 December 2025 / Accepted: 2 January 2026 / Published: 5 January 2026

(This article belongs to the Section Optical Sensors)

Download

Browse Figures

Versions Notes

Abstract

Coded Aperture Snapshot Spectral Imaging (CASSI) is a rapid hyperspectral imaging technique with broad application prospects. Due to limitations in three-dimensional compressed data acquisition modes and hardware constraints, the compressed measurements output by actual CASSI systems have a finite dynamic range, leading to degraded hyperspectral reconstruction quality. To address this issue, a high-quality hyperspectral reconstruction method based on multi-exposure fusion is proposed. A multi-exposure data acquisition strategy is established to capture low-, medium-, and high-exposure low-dynamic-range (LDR) measurements. A multi-exposure fusion-based high-dynamic-range (HDR) CASSI measurement reconstruction network (HCNet) is designed to reconstruct physically consistent HDR measurement images. Unlike traditional HDR networks for visual enhancement, HCNet employs a multiscale feature fusion architecture and combines local–global convolutional joint attention with residual enhancement mechanisms to efficiently fuse complementary information from multiple exposures. This makes it more suitable for CASSI systems, ensuring high-fidelity reconstruction of hyperspectral data in both spatial and spectral dimensions. A multi-exposure fusion CASSI mathematical model is constructed, and a CASSI experimental system is established. Simulation and real-world experimental results demonstrate that the proposed method significantly improves hyperspectral image reconstruction quality compared to traditional single-exposure strategies, exhibiting high robustness against multi-exposure interval jitters and shot noise in practical systems. Leveraging the higher-dynamic-range target information acquired through multiple exposures, especially in HDR scenes, the method enables reconstruction with enhanced contrast in both bright and dark details and also demonstrates higher spectral correlation, validating the enhancement of CASSI reconstruction and effective measurement capability in HDR scenarios.

Keywords:

snapshot compressive spectral imaging; multi-exposure fusion; high dynamic range; compressed measurement reconstruction

1. Introduction

Hyperspectral imaging technology can simultaneously capture spatial and spectral information from a scene, exhibiting significant application prospects in fields such as environmental monitoring [1], agricultural detection [2,3], medical diagnosis [4], and cultural heritage diagnosis [5]. The Coded Aperture Snapshot Spectral Imaging (CASSI) [6] technique utilizes a coded mask and a dispersive element to encode and compress three-dimensional (3D) data into a two-dimensional (2D) measurement image. Through compressed sensing algorithms, it reconstructs the 3D data from the 2D compressed measurement, enabling snapshot hyperspectral imaging. Numerous advanced CASSI reconstruction algorithms have been developed, including numerical iterative methods [7,8] and deep learning approaches [9,10,11,12,13,14,15], all of which have demonstrated excellent performance on idealized synthetic datasets. However, practical CASSI systems inevitably encounter factors that significantly degrade reconstruction quality, such as dynamic range limitations. Therefore, it is urgent to develop high-quality hyperspectral imaging methods suitable for real CASSI systems. Regarding dynamic ranges, constrained by the system’s detection scheme and hardware limitations, all spectral channel images are compressed into a single measurement with a finite dynamic range. The recovered hyperspectral images have to segment the dynamic range of the measurement, leading to low-quality reconstruction due to dynamic range shrinking. The more spectral channels involved, the lower the reconstruction quality. Additionally, the system exposure level (overexposure or underexposure) directly determines the data acquisition quality of CASSI. Overexposure causes loss of highlight detail, while underexposure reduces dark-field signal-to-noise ratio (SNR, significantly affected by quantization noise and shot noise), further degrading or invalidating subsequent CASSI hyperspectral image reconstruction.

To address the issue of low-quality reconstruction caused by limited dynamic ranges, inspired by high-dynamic-range (HDR) imaging technology [16,17], this work proposes a high-quality CASSI hyperspectral imaging method based on multi-exposure fusion. Multiple CASSI compressed measurements with different exposure levels are captured to expand the system’s data acquisition dynamic range. Lower-exposure images preserve highlight details, while higher-exposure images improve SNRs in dark regions. The multi-exposure low-dynamic-range (LDR) measurements are then fused to generate an HDR measurement image. Multi-exposure HDR reconstruction networks have matured in RGB imaging (RGB-HDR), widely applied in visual enhancement fields like camera photography and image display, such as Hdr-gan [18], SelfHDR [19], SAFNet [20], Cen-HDR [21], DRHDR [22] and HDRFlow [23]. However, adapting these methods to HDR CASSI compressed measurement prediction presents fundamental differences and challenges. For input data, RGB-HDR reconstruction networks process RGB images containing spatial structures and three color channels, whereas the proposed HDR-CASSI reconstruction network operates on compressed measurement maps—two-dimensional grayscale images resulting from encoding and compressing three-dimensional spatiospectral data. For output targets, RGB-HDR aims to produce visually perception-friendly nonlinear HDR images, whereas HDR-CASSI prioritizes predicting a linear HDR compressed measurement that satisfies physical imaging models. This ensures high fidelity and physical consistency of subsequent hyperspectral reconstruction results. For loss functions, RGB-HDR accommodates visually driven tonal compression, whereas HDR-CASSI must preserve true physical proportions to avoid compromising spatial–spectral consistency.

Therefore, to enhance the imaging quality of real CASSI systems, this work proposes a multi-exposure data acquisition strategy based on CASSI and a multi-exposure fusion-based HDR CASSI measurement reconstruction network (HCNet). Without increasing hardware costs, it can effectively expand the dynamic range of CASSI-compressed measurements through the proposed data acquisition strategy and reconstruction algorithm, providing highly physically consistent HDR compressed measurements for subsequent high-quality hyperspectral image reconstruction. We construct a mathematical model of the multi-exposure CASSI system, comprehensively considering the effects of imaging exposure levels, overexposure clipping effects, and noise on LDR measurements. The composition and working principle of the proposed HCNet is introduced and evaluated by comparing the impact of traditional single exposure versus the proposed multi-exposure strategy on hyperspectral image reconstruction quality. HCNet’s reconstruction robustness under different multi-exposure settings and noise conditions is verified. A real CASSI imaging system is built to validate the high-quality hyperspectral imaging capabilities of the proposed method for scenes with relatively low and high dynamic ranges.

2. Methods

Figure 1 shows the working principle of the traditional single-exposure and proposed multi-exposure CASSI data acquisition strategies. For a CASSI system, a three-dimensional hyperspectral dataset (

x - y - λ

) is compressed into a two-dimensional (2D) measurement by spatially encoding, spectrally shifting, and data integrating on a camera. For single-exposure CASSI, a 2D compressed measurement is used to recover hyperspectral images via a hyperspectral reconstruction network. For multi-exposure CASSI, three measurements, captured with different exposure levels, are first fused into an HDR measurement using HCNet, which is then fed into the hyperspectral reconstruction network to provide high-quality hyperspectral images.

To achieve high-quality hyperspectral reconstruction, this section builds upon the single-exposure CASSI model to establish a mathematical description for the proposed multi-exposure strategy, clarifying the objective function and fusion mechanism. Subsequently, the proposed HDR CASSI measurement reconstruction network (HCNet) is described.

2.1. Mathematical Model for Single- and Multi-Exposure CASSI

2.1.1. Single-Exposure CASSI

For single-exposure CASSI, the target’s three-dimensional information (

x, y, λ

) can be represented as a cubic data block

X = {X_{l}}_{l = 1}^{L}

comprising L spectral bands, where

X_{l} \in R^{H \times W}

denotes the spatial image of the lth spectral band.

For a real CASSI system, the camera’s exposure directly affects the number of photons received by the detector, thereby altering the intensity and noise level of the measurement. The exposure can be quantified by the exposure value

E V = {log}_{2} (G / K)

, where G is the exposure coefficient and K is the reference coefficient. When

G = K

,

E V = 0

, indicating that the exposure value matches the reference exposure. In this work,

G = 1

stands for optimal exposure, where the gray values of the output image occupy the full dynamic range without overexposure.

After the spatial modulation with an encoding pattern

C \in R^{H \times W}

, the lth spectral image detected by the camera can be expressed as

X_{l}^{M} = G \times (C ⊙ X_{l})

(1)

where ⊙ denotes Hadamard multiplication (element-wise multiplication). Subsequently, the modulated spectral channels undergo dispersion scanning via a dispersive element with distinct imaging positions. The shifted spectral channels overlap on the detector, providing a compressed image

Y \in R^{H^{'} \times W^{'}}

, which can be expressed as

Y = \sum_{l = 1}^{L} S_{d} (X_{l}^{M})

(2)

where

H^{'} = H

,

W^{'} = W + d (L - 1)

, and d are the shifted pixels between adjacent channels, and

S_{d} (\cdot)

denotes the spatial shift operation.

By considering the inherent shot noise of the detected signal as well as the camera’s limited full well capacity (FWC) and analog-to-digital converter (ADC) bit depth, the detected image can be expressed as

Y_{D} = Q (S C (S N (Y)))

(3)

where

S N (\cdot)

denotes the shot noise addition [12],

S C (\cdot) = min (Y, Y_{m a x})

represents the saturation clipping,

Y_{m a x}

is the maximum output value of the camera, and

Q (\cdot)

denotes the quantization via the ADC. Ideally,

Y_{D}

is the linear combination of all spectral channels, providing an optimal measurement for subsequent hyperspectral reconstruction. However, when the compressed measurement is overexposed, clipping occurs in bright regions, which is nonlinear and irreversible, leading to reconstruction distortion. Additionally, quantization errors obscure subtle contrasts within the compressed measurement, leading to the loss of spatial and spectral details in the recovered images, especially in dark regions.

2.1.2. Multi-Exposure CASSI

To enhance reconstruction quality and robustness, a multi-exposure CASSI measurement strategy is proposed. CASSI measurements with different exposure levels are captured for the same scene. Low exposure preserves bright-region details, while high exposure enhances dark-region contrasts, enabling higher-dynamic-range signal acquisition.

Referring to the common settings for HDR, three exposure levels are adopted, where the medium exposure serves as the reference with

E V = 0

. The exposure coefficients are denoted as

G_{l o w}

,

G_{m i d}

, and

G_{h i g h}

for low, medium, and high exposures, respectively. According to Equations (2) and (3), the compressed measurements from the limited low-dynamic-range (LDR) camera under the three exposure levels can be expressed as

Y_{L D R}^{E} = Q (S C (S N (Y^{E}))), E = \{l o w, m i d, h i g h\}

(4)

where

Y^{E} = \sum_{l = 1}^{L} S_{d} (X_{l}^{E}), E = \{l o w, m i d, h i g h\}

(5)

2.1.3. CASSI Measurement Fusion and Hyperspectral Reconstruction Objectives

Multi-exposure CASSI measurements

Y_{L D R}^{E}, E = \{l o w, m i d, h i g h\}

are jointly fed into a HDR CASSI measurement reconstruction network for the prediction of an HDR CASSI measurement

{\hat{Y}}_{H D R} \in R^{H^{'} \times W^{'}}

,

{\hat{Y}}_{H D R} = f_{θ} (Y_{L D R}^{l o w}, Y_{L D R}^{m i d}, Y_{L D R}^{h i g h})

(6)

where

f_{θ} (\cdot)

denotes the HDR CASSI measurement reconstruction network. The predicted

{\hat{Y}}_{H D R}

is then input into a pretrained hyperspectral reconstruction network to recover the hyperspectral data cube

\hat{X}

.

2.2. HDR CASSI Measurement Reconstruction

CASSI-compressed measurements are two-dimensional projections obtained after spatial–spectral encoding and multiplexing. Existing HDR fusion methods are primarily designed for RGB formats and optimized for human visualization, including nonlinear operations such as gamma mapping, while the task of HDR CASSI measurement recovery is to maintain the linear mapping relationship. Ensuring fidelity is the priority, as it directly influences the reconstruction quality and physical interpretability of the hyperspectral images extracted from the HDR CASSI measurement.

To address the issue, an HDR CASSI measurement estimation framework is proposed for high-quality hyperspectral reconstruction. The overview of the framework is shown in Figure 2, composed of an HDR CASSI measurement reconstruction network (HCNet) (Figure 2a) and a pre-trained hyperspectral reconstruction network. Multi-exposure LDR measurements

Y_{L D R}^{E}, E = \{l o w, m i d, h i g h\}

are first fused by HCNet into an HDR measurement, which is then fed into the pre-trained hyperspectral reconstruction network to produce a high-quality hyperspectral data cube. HCNet prioritizes physical consistency and structural efficiency, comprising a fusion module for preliminary merging of multi-exposure measurements and an enhancement module for feature compression and augmentation. HCNet is constrained using an HDR measurement loss, and the hyperspectral reconstruction network is trained on synthetic HDR measurements without clipping and quantization.

2.2.1. Fusion Module

The Fusion module centers on multi-scale context awareness and channel–spatial attention enhancement, integrating Parallel Adaptive Channel-Spatial Fusion Attention (PAFCA), context enhancement mechanisms, and cross-scale feature interaction. It constructs a feature fusion architecture with context modeling capabilities, as illustrated in the Fusion module block in Figure 2b. Firstly, three LDR measurements undergo

3 \times 3

convolutional feature extraction,

F^{E} = C o n v_{3 \times 3} (Y_{L D R}^{E}), E = \{l o w, m i d, h i g h\}

(7)

The three feature maps

F^{E}

are concatenated along the channel dimension to form the fused input feature

F_{c a t}

. Subsequently,

F_{c a t}

passes through PAFCA to yield

F_{f u s e}^{a t t n}

. PAFCA synergistically employs global channel statistical modeling and local spatial feature perception to adaptively enhance features, emphasizing HDR information in measurements while suppressing redundancy and noise. After attention-enhanced feature fusion, a lightweight single-layer downsampling–upsampling architecture is employed to expand the feature receptive field. Attention-enhanced features undergo downsampling convolution to yield a low-resolution contextual feature

F_{d o w n}

. Similarly,

F_{d o w n}

passes through PAFCA to obtain

F_{d o w n}^{a t t n}

, strengthening cross-spatial-scale contextual information.

F_{d o w n}^{a t t n}

is then upsampled to its original size via transposed convolution and residually fused with the previous features to yield

F_{r e s}

. Finally, the fused features undergo progressive enhancement through PAFCA to produce

F_{e}

, ensuring comprehensive and hierarchical feature representation.

To mitigate information loss and maintain feature consistency, the progressively attention-enhanced feature

F_{e}

is added to the initial concatenated feature

F_{c a t}

, forming the cross-stage residual fusion feature,

F_{m e r g e} = F_{e} + F_{c a t}

(8)

This residual connection retains the discriminative power of enhanced features while recirculating original information from shallow layers, mitigating potential feature drift or information dilution after multiple processing steps. Finally, the fused feature

F_{m e r g e}

undergoes further integration of channel information through convolution and activation functions and the output serves as input for subsequent enhancement modules.

2.2.2. Parallel Adaptive Channel-Spatial Fusion Attention (PAFCA)

During multi-exposure feature fusion and context enhancement, efficiently modeling the response distribution within channels and the local–global dynamic characteristics of spatial regions is crucial for suppressing redundancy and enhancing detail preservation and generalization capabilities in HDR measurement prediction. Traditional convolutional attention mechanisms (e.g., BAM [24], CBAM [25]), and existing HDR fusion networks [21,26,27] typically employ compressed vectors from global pooling to model global information. However, local regions in CASSI’s HDR scene maps reflect not only brightness response and exposure intensity distributions but also band-specific information and mask-overlap effects. Simple global statistics struggle to capture these nuances, leading to dilution of critical features. To address this, we designed the Parallel Adaptive Channel-Spatial Fusion Attention Module (PAFCA), as shown in Figure 2d. PAFCA models local–global statistical properties along both the channel (CA) (Figure 2e) and spatial (SA) (Figure 2f) branches, then fuses them within a unified attention adjustment flow. This effectively enhances feature representation and contextual adaptability.

To capture fine-grained statistical distributions, PAFCA first employs localized global pooling. Feature map

F \in R^{C \times H \times W}

undergoes adaptive average pooling and max pooling with a scale factor s

F_{a v g} = A v g P o o l_{\frac{H}{s} \times \frac{W}{s}} (F), F_{m a x} = M a x P o o l_{\frac{H}{s} \times \frac{W}{s}} (F)

(9)

Unlike traditional global pooling, this

\frac{H}{s} \times \frac{W}{s}

localized pooling preserves finer-grained statistical features. It reflects global trends within channels while perceiving local dynamic feature distributions. The concatenated channel features undergo local context enhancement through depthwise separable convolutions

F_{c a} = D W (C o n c a t (F_{a v g}, F_{m a x}))

(10)

Then,

F_{c a}

is upsampled back to the original size and fused with the spatial branch features.

The PAFCA spatial branch employs depthwise separable convolutions directly on the spatial dimension to perceive and enhance local structures and spatial variations, yielding

F_{s a}

. Feature results from the spatial and channel branches undergo dynamic adjustment via learnable parameters

α

and

β

to produce dual-branch fusion features. The fusion output is activated by GELU and undergoes feature weighting with the original input. It is then processed through a set of convolutional transformations and activations to enhance the expressive power of the features,

F_{m o d} = α \cdot F_{c a} + β \cdot F_{s a}

(11)

F_{o u t} = F \cdot G E L U (F_{m o d}) + F

(12)

F_{a t t n} = C o n v_{1 \times 1} (G E L U (C o n v_{3 \times 3} (F_{o u t})))

(13)

2.2.3. Enhancement Module

To compress channel information and obtain high-quality HDR measurements, the Enhancement module conducts further feature compression and prediction, as shown in Figure 2c. To reduce subsequent convolutional computations, the fused output features are fed into a

3 \times 3

convolution to compress the multi-channel features back to the original channel number

(3 C \to C)

. To enhance deep feature representation, supplement local details, and improve network robustness, inspired by the classic ResNet [28] architecture, the compressed features are fed into a ResNet with five stacked residual blocks for successive feature reconstruction:

F_{s} = R e s N e t (C o n v_{3 \times 3} (F_{m e r g e}))

(14)

To prevent information drift or degradation during deep feature processing and to incorporate reliable information from the reference measurement, the ResNet output features undergo element-wise residual connection with the medium exposure reference features

{\hat{F}}_{m e r g e} = F_{s} + F^{m i d}

(15)

To reconstruct high-quality hyperspectral images, a physically consistent HDR measurement is required to avoid distortion. Different from existing HDR reconstruction networks for RGB images, we employ a

3 \times 3

convolution to generate HDR measurements

{\hat{Y}}_{H D R}

inside of the

S i g m o i d

activation function:

{\hat{Y}}_{H D R} = C o n v_{3 \times 3} ({\hat{F}}_{m e r g e})

(16)

2.2.4. Loss Function for HDR CASSI Measurement Estimation

Existing HDR networks [20,21,23,27,29,30,31,32] usually employ

μ

-law mapping for constructing loss functions to reduce the dominance of bright areas in the loss, enhancing the perceptual performance of the overall details. However, this strategy is unsuitable for CASSI systems as the

μ

-law mapping will introduce strong nonlinear compression in bright regions. Therefore, the

L_{1}

loss function is employed to provide high-fidelity HDR CASSI measurements for the subsequent hyperspectral reconstruction:

L_{H D R - Y} = \frac{1}{N} \sum_{i = 1}^{N} | {\hat{Y}}_{H D R}^{i} - Y_{H D R}^{i} |

(17)

where i denotes the ith pixel in the image, and N is the pixel number of the measurement,

H^{'} \times W^{'}

.

3. Simulation Experiments

3.1. Experimental Setup

Two datasets, CAVE [33] and KAIST [34], are utilized for training and evaluation. The CAVE dataset comprises 32 hyperspectral scenes with a spatial resolution of

512 \times 512

pixels and a spectral resolution of 10 nm. It includes 31 spectral channels spanning from 400 nm to 700 nm with an image bit-depth of 16-bit. The KAIST dataset comprises 30 hyperspectral scenes with a spatial resolution of

2704 \times 3376

pixels and a spectral resolution of 10 nm. It includes 31 spectral channels spanning from 400 nm to 700 nm with an image bit depth of 16-bit. Referring to previous CASSI reconstruction methodologies [9,11,12,35], training data is sourced from the CAVE dataset. Training samples are randomly cropped into blocks with

256 \times 256

pixels for data augmentation. Via spectral interpolation, 28 spectral channels spanning from 450 nm to 650 nm are generated. Test data comprises 10 scenes from the KAIST dataset, cropped and interpolated into

256 \times 256 \times 28

hyperspectral data cubes. The default exposure coefficient

G = 2

(

E V = 0

) is used for medium exposure to include moderate overexposure. For instance, multi-exposure combination

E V = [- 2, 0, + 2]

corresponds to exposure coefficients of

G = [0.5, 2, 8]

. Except for the noise robustness experiment, simulation experiments were conducted under noise-free conditions.

LDR CASSI measurements

Y_{L D R}^{E}, E = \{l o w, m i d, h i g h\}

are simulated as 8-bit images according to the mathematical model mentioned in Equation (4). The ground truth of the HDR measurement (GT-HDR) is generated with the forward imaging model (Equation (2),

G = 1

) without clipping or quantization, serving as the supervised target for training the proposed HCNet to estimate HDR measurements. Our optimization objective is to minimize the HDR measurement prediction loss

L_{H D R - Y}

to provide a predicted HDR measurement (Pred-HDR). The classical DAUHST-3stg [9] is adopted as the downstream pre-trained hyperspectral reconstruction network, which is trained using GT-HDR measurements with a

L_{1}

loss function. The model is constructed using PyTorch

1.12 . 0

and trained on a single RTX 3090 GPU. Training HCNet for 100 epochs took approximately

3.5

h and achieved convergence. Both HCNet and DAUHST are trained using the Adam optimizer [36] (

β_{1} = 0.9, β_{2} = 0.999

) for 300 epochs. The learning rate is initialized to

4 \times 10^{- 4}

and scheduled by a cosine annealing strategy.

The evaluation metrics include

P S N R

,

μ - P S N R

[37], and

S S I M

[38]. The results reported in the following simulations are the average values over 10 scenes from the KAIST test dataset.

The peak signal-to-noise ratio (

P S N R

) is defined as

P S N R = 10 {log}_{10} (\frac{M A X^{2}}{M S E})

(18)

where

M A X

is the maximum pixel value of the image, and

M S E

is the mean squared error

M S E = \frac{1}{N} \sum_{i = 1}^{N} {(I_{p r e d} (i) - I_{g t} (i))}^{2}

(19)

where N denotes the number of pixels in the image (width × height × number of channels), and

I_{p r e d} (i)

and

I_{g t} (i)

represent the reconstructed and ground-truth pixel values at pixel i, respectively.

To better preserve details in both dark and bright regions, we adopt the

μ

-law-modulated PSNR (

μ

-PSNR), inspired by HDR RGB image quality evaluation methods [18,19,21,32,37,39,40], which is defined as

μ - P S N R = 10 {log}_{10} (\frac{M A X^{2}}{M S E (T_{μ} (I_{p r e d}), T_{μ} (I_{g t}))})

(20)

where

T_{μ} (x) = \frac{log (1 + μ \cdot x)}{log (1 + μ)}

(21)

where

x \in [0, 1]

is the normalized pixel value and

μ > 0

is the compression parameter (

μ = 5000

in this work).

The structural similarity index (

S S I M

) is defined as

S S I M (I_{g t}, I_{p r e d}) = \frac{(2 μ_{I_{g t}} μ_{I_{p r e d}} + C_{1}) (2 σ_{I_{g t} I_{p r e d}} + C_{2})}{(μ_{I_{g t}}^{2} + μ_{I_{p r e d}}^{2} + C_{1}) (σ_{I_{g t}}^{2} + σ_{I_{p r e d}}^{2} + C_{2})}

(22)

where

μ_{I_{g t}}

and

μ_{I_{p r e d}}

are the mean intensities of

I_{g t}

and

I_{p r e d}

,

σ_{I_{g t}}^{2}

and

σ_{I_{p r e d}^{2}}

are the variances of

I_{g t}

and

I_{p r e d}

,

σ_{I_{g t} I_{p r e d}}

represents the covariance between

I_{g t}

and

I_{p r e d}

, and

C_{1}

and

C_{2}

are small constants to stabilize the division.

The spectral angle mapper (

S A M

) is defined as the average angle between the reconstructed spectral vector and the ground-truth spectral vector for each pixel, which evaluates the spectral fidelity of hyperspectral reconstruction:

S A M = \frac{1}{N} \sum_{i = 1}^{N} arccos (\frac{v_{p r e d} (i) \cdot v_{g t} (i)}{∥ v_{p r e d} (i) ∥ \cdot ∥ v_{g t} (i) ∥})

(23)

where

v_{p r e d} (i)

and

v_{g t} (i)

represent the spectral vectors (across all L spectral bands) at pixel i for the reconstructed and ground-truth hyperspectral images, respectively, and

∥ \cdot ∥

denotes the Euclidean norm.

S A M

is typically expressed in degrees or radians. The smaller the

S A M

values, the higher the spectral similarity.

3.2. Evaluation of Multi-Exposure Strategies

The performance of the proposed multi-exposure HDR measurement reconstruction framework is evaluated under different exposure combinations and interval settings.

3.2.1. Comparison of Single- and Multi-Exposure Strategies

To our knowledge, no HDR fusion algorithms currently exist for CASSI. Traditional methods and some existing deep learning-based HDR fusion approaches are designed for RGB images. To comprehensively demonstrate HCNet’s superiority and ensure fairness, we compared the hyperspectral reconstruction performances with three single-exposure LDR measurements (low-LDR

(E V = - 2)

, mid-LDR

(E V = 0)

, high-LDR

(E V = + 1)

), HDR measurements from five existing HDR fusion methods (PFM [41], Cen-HDR [21], DRHDR [22], HDRFlow [23], SAFNet [20]), the Pred-HDR measurement with HCNet

(E V = [- 2, 0, + 1])

and the GT-HDR measurement, using the same pretrained hyperspectral reconstruction network.

As shown in Table 1, Pred-HDR significantly outperforms the three single-exposure strategies and five HDR fusion methods in terms of the PSNR,

μ

-PSNR, SAM and SSIM metrics, approaching GT-HDR’s performance. This demonstrates that multi-exposure fusion effectively mitigates information loss caused by overexposure, underexposure, and LDR. Furthermore, compared to existing HDR fusion networks designed for RGB HDR, our HCNet approach significantly enhances the recovery of spatial and spectral information. The introduction of HCNet increases the computational load of the multi-exposure fusion process by 15.53 GFLOPs and adds 0.63 million(M) parameters.

To visualize the effectiveness of multi-exposure CASSI, reconstructed results for the fifth test scene in the KAIST dataset are shown in Figure 3. Figure 3a shows the CASSI-compressed measurements used for hyperspectral reconstruction and the RGB reference of the scene. Four spectral channels’ images are presented in Figure 3b, where three labeled regions’ spectral density curves are shown in Figure 3c with the spectral consistency metrics (corr). Figure 3d shows the magnified areas of the 636.5 nm channel.

For structures in the three areas (Regions 1–3 labeled with the white, green and yellow boxes, respectively), the results with low LDR exhibit severe detail loss and low contrast. Though more details are recovered with mid-LDR and high LDR, they suffer from reconstruction distortion and artifacts, which are obvious in the spectral density curves. This reveals that the reconstruction performances of the single-exposure measurements are sensitive to exposure levels and the quality will significantly deteriorate when overexposure occurs. Among all compared methods, the results with Pred-HDR are closest to the results with GT-HDR and the ground truth, maintaining high spatial and spectral fidelity. This demonstrates the best recovery of spatial details in Regions 1–3, indicating the proposed method’s superior adaptability and effectiveness in CASSI reconstruction tasks.

To further evaluate the recovery performance of local details, as shown in Table 2, PSNRs and SSIM indexes with different methods for Regions 1–3 (corresponding to Figure 3d) are compared. The results indicate that Pred-HDR with HCNet achieves the best performance across all three regions.

3.2.2. Hyperspectral Reconstruction Performances with Different Exposure Intervals

The CASSI system has a limited dynamic range. Lower exposures result in more severe underexposure in dark areas, while higher exposures cause overexposure in bright areas. We selected and expanded exposure interval combinations based on experimental settings from Cen-HDR [21] and HDRFlow [23] in RGB HDR research to evaluate our method’s reconstruction performance across different exposure intervals. Figure 4 shows PSNR,

μ

-PSNR, SAM and SSIM diagrams of hyperspectral reconstruction results for three single-exposure (

E V = - 2, 0, + 1

) and six multi-exposure (

E V = [- 1, 0, + 1], [- 1.3, 0, + 1.3],

[- 2, 0, + 2], [- 3, 0, + 3], [- 1, 0, + 2], [- 2, 0, + 1]

) measurements.

Multi-exposure strategies outperform single-exposure strategies and are insensitive to exposure interval configurations, demonstrating robust fusion and reconstruction performance for practical experiments with exposure fluctuation.

3.3. Loss Functions for HDR CASSI Measurement Prediction

For HDR CASSI measurement prediction, the reconstruction performances are compared with

L_{1}

,

L_{2}

, and

L_{1}

-

μ

-law loss functions for HCNet, as shown in Table 3. The DAUHST two-stage network [9] serves as the pre-trained HSI reconstruction network. The multi-exposure combination is

E V = [- 1.3, 0, + 1.3]

. The results demonstrate that measurements recovered using the

L_{1}

loss function yield superior hyperspectral reconstruction performance, achieving optimal values for PSNR,

μ

-PSNR, SAM and SSIM. The

L_{2}

loss function performs slightly worse, while the

μ

-law-related loss function exhibits further degradation due to dynamic range compression. Therefore, the

L_{1}

loss function is suitable for HDR CASSI measurement prediction to enable subsequent high-quality hyperspectral reconstruction.

3.4. Ablation Experiments

To validate the effectiveness of each key module in HCNet and its adaptation advantages for CASSI, ablation experiments are conducted with stepwise module removal. The pre-trained HSI reconstruction network is the two-stage DAUHST [9]. Here,

C A

and

S A

represent the channel branch and spatial branch in PAFCA (Figure 2b), respectively.

C A

w

G P

denotes the use of global pooling

(G P)

in

C A

,

C A

w

A P

denotes the adaptive pooling

(A P)

in

C A

,

E n h a n c e

represents the HCNet’s enhancement module,

S i g m o i d

indicates the Sigmoid activation at HCNet’s output layer, and

A c t

signifies the nonlinear activation after HCNet’s initial feature extraction.

This indicates that introducing either the channel or spatial branch alone (Entries 2, 3, and 4 in Table 4) improves performance, while combining both (5 in Table 4) yields superior results. Adding a residual enhancement module (Enhancement) (6 in Table 4) further improves PSNR and

μ

-PSNR metrics, demonstrating that residual enhancement effectively optimizes the structure and detail of predicted measurements. Using Sigmoid activation at the output layer (7 in Table 4) or introducing activation operations in the initial stages (8 in Table 4) imposes unnecessary nonlinear compression or disturbance on the feature distribution. This hinders the maintenance of physically consistent measurement predictions and leads to degraded reconstruction performance.

3.5. Evaluation of Noise Robustness

To assess the robustness of HCNet under noisy conditions, shot noise is added based on the multi-exposure CASSI imaging mathematical model (Equation (4)). Exposure intervals are set to

E V = [- 2, 0, + 1]

, corresponding to exposure coefficients G of

[0.5, 2, 4]

.

Single-exposure strategies’ measurements are injected with 8-bit shot noise. The noisy LDR measurements are denoted as low/mid/high(noise) LDR. The predicted HDR measurements with the proposed HCNet are denoted as Pred-HDR(noise). GT-HDR measurements include 11-bit shot noise, denoted as GT-HDR(noise).

The experimental results (Table 5) demonstrate that Pred-HDR(noise) exhibits significantly superior reconstruction performance compared to (low/mid/high)(noise) LDR, approaching that of GT-HDR(noise). This indicates that the complementary nature of multi-exposure information partially suppresses noise and enhances hyperspectral image reconstruction quality.

Figure 5 displays difference maps

| \hat{Y} - Y |

between noisy measurements (

\hat{Y}

) and true noise-free HDR measurements (Y) for Scenes 1, 3 and 6 in the KAIST test set under various exposure strategies. It is evident that Pred-HDR(noise) exhibits fewer and lower noise compared to the single-exposure LDR noisy measurements. Figure 6 shows the difference maps

| \hat{X} - X |

between the hyperspectral reconstruction results (

\hat{X}

) using the CASSI compressed measurements based on different exposure strategies and the ground truth hyperspectral images (X) for the 594.5 nm channel in Scene 1, the 567.5 nm channel in Scene 3, and the 529.5 nm channel in Scene 6. Results with Pred-HDR(noise) exhibit clearer texture details and more faithful glossiness compared to those from (low/mid/high)(noise) LDR. The reconstructions from low-exposure measurements show significant discrepancies from the ground truth with unrealistic brightness and severe loss of spatial information with medium and high exposure measurements. These results validate the proposed method’s noise robustness and generalization capability for practical CASSI systems with noise.

4. Real-World Experiments

A practical CASSI system was built, as shown in Figure 7, to validate the effectiveness of HCNet for real-world hyperspectral reconstruction. The object is projected on the spatial coding mask with a coupling lens

L 1

. After the spatial modulation, the coded object’s image passes through a 4f system with lenses

L 2

and

L 3

(focal lengths

f = 50

mm), between which a dispersion prism with a top angle of 30° and two filters are placed. The two filters limit the valid detected spectral range from 450 nm to 650 nm. After spectral dispersion and spatial integration, a grayscale camera captures 8-bit CASSI-compressed measurements with a valid area of 760 × 814 pixels. Two scenes are constructed and LDR measurements are captured under low-, medium- and high-exposure conditions with

E V = [- 2, 0, 1]

. To capture as much of the object’s information as possible, medium exposure serves as a prediction reference, low exposure is not overexposed to preserve the bright details, and high exposure is overexposed to enhance the dark details. Training for the real system uses the CAVE [33] dataset, where training samples are randomly cropped into 380 × 380 pixels blocks for data augmentation.

The reconstruction results for the two scenes are shown in Figure 8 and Figure 9, respectively. Figure 8a shows the three LDR measurements and Pred-HDR measurement with HCNet for Scene 1, whose ground truth is shown as the RGB reference. Figure 8b shows the recovered hyperspectral images at seven wavelengths for different exposure strategies. The reconstruction results with Pred-HDR are visually better than those with the single-exposure strategies, restoring superior details in both bright and dark areas. Although low LDR shows better detail preservation than mid-LDR and high LDR, the spatial contrast is lower than Pred-HDR, suffering from more noise and artifacts. This indicates that it is important to avoid overexposure for traditional single-exposure systems, and the multi-exposure fusion strategy can effectively enhance the reconstruction quality with HDR.

To validate the robustness for higher-dynamic-range scenarios, an HDR scene is constructed by illuminating Scene 1 on the green pattern with a high-power laser (516 nm), as shown in Figure 9. Figure 9a shows the three LDR measurements and Pred-HDR measurement with HCNet for Scene 2, whose ground truth without the laser is shown as the RGB reference. To cover the intensity range from the bight laser spot to the darker patterns, LDR measurements have to be adapted to ensure that the low-LDR measurement is not overexposed and the three LDR measurements follow the optimized exposure interval scheme. Since no similar HDR hyperspectral dataset exists, we simulate the constructed HDR scenes during training by adding a randomly positioned and sized high-intensity Gaussian spot at the 516 nm band in each training sample. Figure 9b shows the recovered hyperspectral images at six wavelengths for different exposure strategies. To clearly demonstrate the HDR imaging performance, the 516.0 nm channel containing the bright laser spot is presented with two gray-level ranges, 516.0 nm (1) and 516.0 nm (2), showing the bright and dark details, respectively. Figure 9c shows the spectral density curves for the selected regions (R1: blue, R2: red, R3: brown, R4: purple, R5: green) labeled in the RGB reference range in Figure 9a.

For laser-illuminated bright regions, mid-LDR and high-LDR loss spatial details exhibit overexposed flat tops, while Pred-HDR restores a Gaussian-shaped spot, which is consistent with the true spatial distribution of the laser (Figure 9b 516.0 nm (1)). For darker background patterns, Pred-HDR performs the best with a higher SNR, less distortion, and fewer artifacts, demonstrating the effectiveness of HDR reconstruction in the spatial domain.

In the spectral domain, for the regions without laser illumination (R1, R2, R3 and R4), spectral density curves with Pred-HDR (green curves in Figure 9c) demonstrate the highest spectral consistency with the ground truth (red) measured using a spectrometer. For the region with laser illumination, Figure 9c R5(1) and R5(2) show the spectral curves with and without the laser, receptively, revealing that Pred-HDR can extract reliable underneath spectral information even with strong interference.

5. Conclusions

To address the limitations of reconstructing hyperspectral images in real CASSI systems constrained by the detector’s finite dynamic range, this paper proposes a high-quality CASSI hyperspectral image reconstruction method based on multi-exposure fusion. This method acquires multiple compressed measurements under varying exposure conditions using the CASSI system to capture brighter and darker scene information with a higher dynamic range. In addition, a multi-exposure compressed measurement fusion network (HCNet) is introduced to effectively generate HDR-compressed measurements suitable for CASSI reconstruction tasks, enabling high-quality hyperspectral reconstruction. Unlike traditional HDR reconstruction algorithms focused on visual enhancement, HCNet prioritizes physical consistency as its core design objective, ensuring high fidelity in both spatial and spectral dimensions. Considering real-world system factors such as exposure levels, overexposure clipping, and grayscale quantization, a data acquisition model for the multi-exposure fusion CASSI system is constructed. To validate the proposed method, both simulated and real-world experiments were conducted to compare the hyperspectral image reconstruction quality between multi-exposure and traditional single-exposure strategies. In simulations, HCNet based on multi-exposure fusion effectively generates HDR-compressed measurements, yielding higher-quality hyperspectral reconstructions. Additionally, it demonstrates high robustness against small exposure interval shifts and shot noise, making it suitable for real experimental systems. In real-world experiments, the CASSI experimental system was built to shoot two scenes with relatively low and high dynamic ranges. The proposed method consistently demonstrates optimal reconstruction quality, including higher-contrast spatial details and more coherent spectral information, validating its effective measurement capability in real-world HDR scenarios.

Compared to single-exposure CASSI reconstruction pipelines, the proposed HCNet exhibits increased computational complexity due to the incorporation of multi-exposure fusion. The experimental results demonstrate that embedding HCNet or existing HDR algorithms exhibits increased GFLOPs and parameter counts. Although this entails additional computational costs during training, experiments reveal that the testing time with the pre-trained HDR fusion network increases by only approximately 0.1 s compared to single-exposure strategies. This enables direct deployment in real-time imaging applications, delivering superior HDR-compressed measurements and significantly enhancing hyperspectral reconstruction quality for HDR scenes. Future work will focus on exploring lightweight network architectures to reduce computational complexity during training while maintaining high reconstruction performance.

Furthermore, the current framework assumes that multi-exposure measurements are obtained from static or quasi-static scenes with neglectable spatial movement during the data acquisition process. For dynamic scenes with significant spatial position variations, this remains a challenge and can be explored in the future with regard to network improvements and system optimizations.

Author Contributions

Conceptualization, H.S., J.C. and Y.L.; methodology, H.S. and Y.L.; software, Y.L.; validation, H.S. and Y.L.; formal analysis, H.S. and Y.L.; writing—original draft preparation, H.S.; writing—review and editing, J.C., Y.L., P.Z. and J.T. All authors have read and agreed to the published version of the manuscript.

Funding

National Natural Science Foundation of China (12204529).

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Turner, K.J.; Tzortziou, M.; Grunert, B.K.; Goes, J.; Sherman, J. Optical classification of an urbanized estuary using hyperspectral remote sensing reflectance. Opt. Express 2022, 30, 41590–41612. [Google Scholar] [CrossRef] [PubMed]
Xie, C.; Yang, C. A review on plant high-throughput phenotyping traits using UAV-based sensors. Comput. Electron. Agric. 2020, 178, 105731. [Google Scholar] [CrossRef]
Ishida, T.; Kurihara, J.; Viray, F.A.; Namuco, S.B.; Paringit, E.C.; Perez, G.J.; Takahashi, Y.; Marciano, J.J., Jr. A novel approach for vegetation classification using UAV-based hyperspectral imaging. Comput. Electron. Agric. 2018, 144, 80–85. [Google Scholar] [CrossRef]
Fei, B. Hyperspectral imaging in medical applications. In Data Handling in Science and Technology; Elsevier: Amsterdam, The Netherlands, 2019; Volume 32, pp. 523–565. [Google Scholar]
Sandak, J.; Sandak, A.; Legan, L.; Retko, K.; Kavčič, M.; Kosel, J.; Poohphajai, F.; Diaz, R.H.; Ponnuchamy, V.; Sajinčič, N.; et al. Nondestructive evaluation of heritage object coatings with four hyperspectral imaging systems. Coatings 2021, 11, 244. [Google Scholar] [CrossRef]
Wagadarikar, A.; John, R.; Willett, R.; Brady, D. Single disperser design for coded aperture snapshot spectral imaging. Appl. Opt. 2008, 47, B44–B51. [Google Scholar] [CrossRef] [PubMed]
Bioucas-Dias, J.M.; Figueiredo, M.A. A new TwIST: Two-step iterative shrinkage/thresholding algorithms for image restoration. IEEE Trans. Image Process. 2007, 16, 2992–3004. [Google Scholar] [CrossRef] [PubMed]
Figueiredo, M.A.; Nowak, R.D.; Wright, S.J. Gradient projection for sparse reconstruction: Application to compressed sensing and other inverse problems. IEEE J. Sel. Top. Signal Process. 2008, 1, 586–597. [Google Scholar] [CrossRef]
Cai, Y.; Lin, J.; Wang, H.; Yuan, X.; Ding, H.; Zhang, Y.; Timofte, R.; Van Gool, L. Degradation-aware unfolding half-shuffle transformer for spectral compressive imaging. Adv. Neural Inf. Process. Syst. 2022, 35, 37749–37761. [Google Scholar]
Ying, Y.; Wang, J.; Shi, Y.; Ling, N. Hybrid sparse transformer and wavelet fusion-based deep unfolding network for hyperspectral snapshot compressive imaging. Sensors 2024, 24, 6184. [Google Scholar] [CrossRef] [PubMed]
Wang, J.; Li, K.; Zhang, Y.; Yuan, X.; Tao, Z. S²-transformer for mask-aware hyperspectral image reconstruction. IEEE Trans. Pattern Anal. Mach. Intell. 2025, 47, 4299–4316. [Google Scholar] [CrossRef] [PubMed]
Meng, Z.; Ma, J.; Yuan, X. End-to-end low cost compressive spectral imaging with spatial-spectral self-attention. In Proceedings of the 16th European Conference on Computer Vision (ECCV), Glasgow, UK, 23–28 August 2020; Springer: Cham, Switzerland, 2020; pp. 187–204. [Google Scholar]
Cai, Y.; Lin, J.; Hu, X.; Wang, H.; Yuan, X.; Zhang, Y.; Timofte, R.; Van Gool, L. Mask-guided spectral-wise transformer for efficient hyperspectral image reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 17502–17511. [Google Scholar]
Dong, Y.; Gao, D.; Li, Y.; Shi, G.; Liu, D. Degradation estimation recurrent neural network with local and non-local priors for compressive spectral imaging. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5520115. [Google Scholar] [CrossRef]
Hu, X.; Cai, Y.; Lin, J.; Wang, H.; Yuan, X.; Zhang, Y.; Timofte, R.; Van Gool, L. HDNet: High-resolution dual-domain learning for spectral compressive imaging. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 17542–17551. [Google Scholar]
Debevec, P.E.; Malik, J. Recovering high dynamic range radiance maps from photographs. In Seminal Graphics Papers: Pushing the Boundaries; Association for Computing Machinery: New York, NY, USA, 2023; Volume 2, pp. 643–652. [Google Scholar]
Granados, M.; Ajdin, B.; Wand, M.; Theobalt, C.; Seidel, H.P.; Lensch, H.P. Optimal HDR reconstruction with linear digital cameras. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), San Francisco, CA, USA, 13–18 June 2010; pp. 215–222. [Google Scholar]
Niu, Y.; Wu, J.; Liu, W.; Guo, W.; Lau, R.W. HDR-GAN: HDR image reconstruction from multi-exposed LDR images with large motions. IEEE Trans. Image Process. 2021, 30, 3885–3896. [Google Scholar] [CrossRef] [PubMed]
Zhang, Z.; Wang, H.; Liu, S.; Wang, X.; Lei, L.; Zuo, W. Self-supervised high dynamic range imaging with multi-exposure images in dynamic scenes. In Proceedings of the International Conference on Learning Representations (ICLR), Vienna, Austria, 7–11 May 2024. [Google Scholar]
Kong, L.; Li, B.; Xiong, Y.; Zhang, H.; Gu, H.; Chen, J. SAFNet: Selective alignment fusion network for efficient HDR imaging. In Proceedings of the European Conference on Computer Vision (ECCV), Milan, Italy, 29 September–4 October 2024; Springer: Cham, Switzerland, 2024; pp. 256–273. [Google Scholar]
Tel, S.; Heyrman, B.; Ginhac, D. CEN-HDR: Computationally efficient neural network for real-time high dynamic range imaging. In Proceedings of the European Conference on Computer Vision (ECCV), Tel Aviv, Israel, 23–27 October 2022; Springer: Cham, Switzerland, 2022; pp. 378–394. [Google Scholar]
Marín-Vega, J.; Sloth, M.; Schneider-Kamp, P.; Röttger, R. DRHDR: A dual branch residual network for multi-bracket high dynamic range imaging. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 844–852. [Google Scholar]
Xu, G.; Wang, Y.; Gu, J.; Xue, T.; Yang, X. HDRFlow: Real-time HDR video reconstruction with large motions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 16–22 June 2024; pp. 24851–24860. [Google Scholar]
Park, J.; Woo, S.; Lee, J.Y.; Kweon, I.S. BAM: Bottleneck attention module. In Proceedings of the British Machine Vision Conference (BMVC), Newcastle, UK, 3–6 September 2018; pp. 1–13. [Google Scholar]
Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. CBAM: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
Hu, T.; Yan, Q.; Qi, Y.; Zhang, Y. Generating content for HDR deghosting from frequency view. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 16–22 June 2024; pp. 25732–25741. [Google Scholar]
Chen, G.; Dai, K.; Yang, K.; Hu, T.; Chen, X.; Yang, Y.; Dong, W.; Wu, P.; Zhang, Y.; Yan, Q. Bracketing image restoration and enhancement with high-low frequency decomposition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 16–22 June 2024; pp. 6097–6107. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Shu, Y.; Shen, L.; Hu, X.; Li, M.; Zhou, Z. Towards real-world HDR video reconstruction: A large-scale benchmark dataset and a two-stage alignment network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 16–22 June 2024; pp. 2879–2888. [Google Scholar]
Vien, A.G.; Park, S.; Mai, T.T.N.; Kim, G.; Lee, C. Bidirectional motion estimation with cyclic cost volume for high dynamic range imaging. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 1183–1190. [Google Scholar]
Li, H.; Yang, Z.; Zhang, Y.; Tao, D.; Yu, Z. Single-image HDR reconstruction assisted ghost suppression and detail preservation network for multi-exposure HDR imaging. IEEE Trans. Comput. Imaging 2024, 10, 429–445. [Google Scholar] [CrossRef]
Yang, K.; Hu, T.; Dai, K.; Chen, G.; Cao, Y.; Dong, W.; Wu, P.; Zhang, Y.; Yan, Q. CRNet: A detail-preserving network for unified image restoration and enhancement task. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 16–22 June 2024; pp. 6086–6096. [Google Scholar]
Yasuma, F.; Mitsunaga, T.; Iso, D.; Nayar, S.K. Generalized assorted pixel camera: Postcapture control of resolution, dynamic range, and spectrum. IEEE Trans. Image Process. 2010, 19, 2241–2253. [Google Scholar] [CrossRef] [PubMed]
Choi, I.; Jeon, D.S.; Nam, G.; Gutierrez, D.; Kim, M.H. High-quality hyperspectral reconstruction using a spectral prior. ACM Trans. Graph. 2017, 36, 218. [Google Scholar] [CrossRef]
Yu, Z.; Liu, D.; Cheng, L.; Meng, Z.; Zhao, Z.; Yuan, X.; Xu, K. Deep learning enabled reflective coded aperture snapshot spectral imaging. Opt. Express 2022, 30, 46822–46837. [Google Scholar] [CrossRef] [PubMed]
Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
Pérez-Pellitero, E.; Catley-Chandar, S.; Leonardis, A.; Timofte, R. NTIRE 2021 challenge on high dynamic range imaging: Dataset, methods and results. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Virtual, 19–25 June 2021; pp. 691–700. [Google Scholar]
Wang, Z.; Bovik, A.C.; Sheikh, H.R.; Simoncelli, E.P. Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Process. 2004, 13, 600–612. [Google Scholar] [CrossRef] [PubMed]
Yan, Q.; Chen, W.; Zhang, S.; Zhu, Y.; Sun, J.; Zhang, Y. A unified HDR imaging method with pixel and patch level. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023; pp. 22211–22220. [Google Scholar]
Liu, Z.; Wang, Y.; Zeng, B.; Liu, S. Ghost-free high dynamic range imaging with context-aware transformer. In Proceedings of the European Conference on Computer Vision (ECCV), Tel Aviv, Israel, 23–27 October 2022; Springer: Cham, Switzerland, 2022; pp. 344–360. [Google Scholar]
Mertens, T.; Kautz, J.; Van Reeth, F. Exposure fusion: A simple and practical alternative to high dynamic range photography. Comput. Graph. Forum 2009, 28, 161–171. [Google Scholar] [CrossRef]

Figure 1. Schematic diagram of single- and multi-exposure CASSI data acquisition strategies.

Figure 2. Overview of HDR CASSI-compressed measurement estimation framework for high-quality hyperspectral reconstruction.

Figure 3. Visualization results of hyperspectral reconstruction using different exposure strategies for the fifth test scene in the KAIST dataset. (a) CASSI measurements for single-exposure strategies (low-LDR, mid-LDR, and high-LDR), multi-exposure strategy (PFM, Cen-HDR, DRHDR, HDRFlow, SAFNet, Pred-HDR), and the HDR ground truth (GT-HDR). RGB reference image of the scene. (b) Four spectral channel images from the reconstruction results. (c) Spectral density curves for the three labeled regions. (d) Magnified images of the three labeled regions at 636.5 nm. The white, green, and yellow boxes indicate the selected Regions 1, 2, and 3, respectively.

Figure 4. (a) PSNR,

μ

-PSNR, (b) SAM, (c) SSIM diagrams of hyperspectral reconstruction results for three single-exposure (

E V = - 2, 0, + 1

) and six multi-exposure (

E V = [- 1, 0, + 1]

,

[- 1.3, 0, + 1.3], [- 2, 0, + 2], [- 3, 0, + 3], [- 1, 0, + 2], [- 2, 0, + 1]

) CASSI measurements.

Figure 4. (a) PSNR,

μ

-PSNR, (b) SAM, (c) SSIM diagrams of hyperspectral reconstruction results for three single-exposure (

E V = - 2, 0, + 1

) and six multi-exposure (

E V = [- 1, 0, + 1]

,

[- 1.3, 0, + 1.3], [- 2, 0, + 2], [- 3, 0, + 3], [- 1, 0, + 2], [- 2, 0, + 1]

) CASSI measurements.

Figure 5. Difference maps (

| \hat{Y} - Y |

) between noisy measurements (

\hat{Y}

) under various exposure strategies and true noise-free HDR measurements (Y) for Scenes 1, 3 and 6 in the KAIST test set.

Figure 5. Difference maps (

| \hat{Y} - Y |

) between noisy measurements (

\hat{Y}

) under various exposure strategies and true noise-free HDR measurements (Y) for Scenes 1, 3 and 6 in the KAIST test set.

Figure 6. Difference maps

| \hat{X} - X |

between the hyperspectral reconstruction results (

\hat{X}

) using the CASSI-compressed measurements based on different exposure strategies and the ground truth hyperspectral images (X) for the 594.5 nm channel in Scene 1, the 567.5 nm channel in Scene 3, and the 529.5 nm channel in Scene 6.

Figure 6. Difference maps

| \hat{X} - X |

between the hyperspectral reconstruction results (

\hat{X}

) using the CASSI-compressed measurements based on different exposure strategies and the ground truth hyperspectral images (X) for the 594.5 nm channel in Scene 1, the 567.5 nm channel in Scene 3, and the 529.5 nm channel in Scene 6.

Figure 7. Coded aperture snapshot of spectral imaging system. The green arrow indicates the 516 nm laser beam.

Figure 8. Reconstruction results for Real-World Scene 1. (a) Single-exposure measurements, low LDR, mid-LDR, and high LDR. Multi-exposure measurement Pred-HDR with HCNet. RGB reference for Scene 1. (b) Recovered hyperspectral images at 7 wavelengths for different exposure strategies. The displayed colors correspond to the RGB visualization of the reconstructed hyperspectral data.

Figure 9. Reconstruction results for Real-World Scene 2. (a) Single-exposure measurements, low LDR, mid-LDR, and high LDR. Multi-exposure measurement Pred-HDR with HCNet. RGB reference for Scene 2. (b) Recovered hyperspectral images at 6 wavelengths for different exposure strategies. The displayed colors correspond to the RGB visualization of the reconstructed hyperspectral data. The 516.0 nm channel containing the bright laser spot presented with two gray-level ranges, 516.0 nm (1) and 516.0 nm (2). (c) Spectral density curves of selected regions R1–R5. R5(1) and R5(2) are the spectral curves with and without the laser.

Table 1. Comparison of the hyperspectral reconstruction performances (PSNR,

μ

-PSNR, SSIM, SAM, GFLOPs, Params) with three single-exposure LDR measurements (low-LDR

(E V = - 2)

, mid-LDR

(E V = 0)

, high-LDR

(E V = + 1)

), HDR measurements from five HDR fusion methods (PFM, Cen-HDR, DRHDR, HDRFlow, SAFNet), the Pred-HDR measurement with HCNet

(E V = [- 2, 0, + 1])

, and the GT-HDR measurement. Boldface indicates the best results in each column except for GT-HDR.

Table 1. Comparison of the hyperspectral reconstruction performances (PSNR,

μ

-PSNR, SSIM, SAM, GFLOPs, Params) with three single-exposure LDR measurements (low-LDR

(E V = - 2)

, mid-LDR

(E V = 0)

, high-LDR

(E V = + 1)

), HDR measurements from five HDR fusion methods (PFM, Cen-HDR, DRHDR, HDRFlow, SAFNet), the Pred-HDR measurement with HCNet

(E V = [- 2, 0, + 1])

, and the GT-HDR measurement. Boldface indicates the best results in each column except for GT-HDR.

Method	PSNR (dB)	$μ$ -PSNR (dB)	SSIM	SAM	GFLOPs	Params (M)
LDR-Low	24.583	19.124	0.789	19.145	20.105	1.379
LDR-Mid	15.291	19.393	0.635	14.113	20.105	1.379
LDR-High	8.466	12.902	0.181	18.352	20.105	1.379
PFM	33.901	27.741	0.930	9.893	21.917	1.425
Cen-HDR	36.753	29.021	0.953	8.754	24.958	1.582
DRHDR	37.472	33.397	0.964	6.915	86.711	2.567
HDRFlow	37.383	32.936	0.964	7.052	31.552	1.644
SAFNet	37.487	33.413	0.966	6.987	71.291	2.492
Pred-HDR	37.585	34.224	0.967	6.411	35.632	2.013
GT-HDR	37.734	34.513	0.969	6.185	20.105	1.379

Table 2. Comparison of local PSNRs and SSIM indexes (upper and lower values in cells) with different methods at 636.5 nm for Regions 1–3 (Figure 3d) in the fifth scene of the KAIST dataset. Boldface denotes the best results in each column except for GT-HDR.

Method	Region 1	Region 2	Region 3
LDR-Low	18.99 (0.652)	21.19 (0.668)	16.21 (0.781)
LDR-Mid	8.86 (0.552)	11.89 (0.588)	11.23 (0.361)
LDR-High	2.57 (0.173)	5.05 (0.153)	7.64 (0.006)
PFM	30.68 (0.858)	30.91 (0.940)	20.08 (0.718)
Cen-HDR	35.31 (0.936)	33.96 (0.969)	31.26 (0.949)
DRHDR	36.89 (0.955)	35.49 (0.977)	30.92 (0.950)
HDRFlow	36.64 (0.952)	35.01 (0.975)	30.96 (0.947)
SAFNet	37.27 (0.957)	35.56 (0.977)	31.01 (0.951)
Pred-HDR	37.28 (0.957)	35.57 (0.977)	32.11 (0.958)
GT-HDR	37.58 (0.961)	36.42 (0.981)	31.94 (0.958)

Table 3. Hyperspectral reconstruction performances with different loss functions for HCNet. Boldface denotes the best results in each column.

Loss Function	PSNR	$μ$ -PSNR	SAM	SSIM
$L_{1}$	36.68	33.19	8.85	0.961
$L_{2}$	36.37	31.55	8.98	0.958
$L_{1}$ - $μ$ -law	34.84	32.38	9.51	0.947

Table 4. Hyperspectral reconstruction performances for ablation experiments. Boldface denotes the best results in each column.

		PSNR	SSIM	$μ$ -PSNR	SAM	GFLOPs	Params
1	Baseline	36.15	0.954	31.57	9.21	15.15	0.942
2	1 + CA w GP	36.35	0.957	32.36	9.02	23.01	1.423
3	1 + CA w AP	36.51	0.959	32.89	8.99	23.09	1.425
4	1 + SA	36.61	0.960	32.99	8.91	26.38	1.228
5	3 + 4	36.63	0.961	33.07	8.86	28.10	1.527
6	5 + Enhance	36.68	0.961	33.19	8.81	30.68	1.573
7	6 + Sigmoid	36.22	0.956	32.17	9.33	30.97	1.585
8	6 + Act	36.18	0.955	31.85	9.55	31.08	1.601
9	7 + 8	36.27	0.955	32.35	9.54	31.38	1.616

Table 5. Comparison of hyperspectral reconstruction performances for noise robustness evaluation. Boldface denotes the best results in each column except for GT-HDR.

Method	PSNR	$μ$ -PSNR	SAM	SSIM
LDR-Low (noise)	21.51	16.08	16.03	0.578
LDR-Mid (noise)	12.14	14.58	18.01	0.483
LDR-High (noise)	8.75	13.51	18.35	0.241
Pred-HDR (noise)	33.44	30.98	9.36	0.926
GT-HDR (noise)	34.15	31.31	9.08	0.933

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Shi, H.; Chen, J.; Li, Y.; Zhang, P.; Tian, J. HCNet: Multi-Exposure High-Dynamic-Range Reconstruction Network for Coded Aperture Snapshot Spectral Imaging. Sensors 2026, 26, 337. https://doi.org/10.3390/s26010337

AMA Style

Shi H, Chen J, Li Y, Zhang P, Tian J. HCNet: Multi-Exposure High-Dynamic-Range Reconstruction Network for Coded Aperture Snapshot Spectral Imaging. Sensors. 2026; 26(1):337. https://doi.org/10.3390/s26010337

Chicago/Turabian Style

Shi, Hang, Jingxia Chen, Yahui Li, Pengwei Zhang, and Jinshou Tian. 2026. "HCNet: Multi-Exposure High-Dynamic-Range Reconstruction Network for Coded Aperture Snapshot Spectral Imaging" Sensors 26, no. 1: 337. https://doi.org/10.3390/s26010337

APA Style

Shi, H., Chen, J., Li, Y., Zhang, P., & Tian, J. (2026). HCNet: Multi-Exposure High-Dynamic-Range Reconstruction Network for Coded Aperture Snapshot Spectral Imaging. Sensors, 26(1), 337. https://doi.org/10.3390/s26010337

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

HCNet: Multi-Exposure High-Dynamic-Range Reconstruction Network for Coded Aperture Snapshot Spectral Imaging

Abstract

1. Introduction

2. Methods

2.1. Mathematical Model for Single- and Multi-Exposure CASSI

2.1.1. Single-Exposure CASSI

2.1.2. Multi-Exposure CASSI

2.1.3. CASSI Measurement Fusion and Hyperspectral Reconstruction Objectives

2.2. HDR CASSI Measurement Reconstruction

2.2.1. Fusion Module

2.2.2. Parallel Adaptive Channel-Spatial Fusion Attention (PAFCA)

2.2.3. Enhancement Module

2.2.4. Loss Function for HDR CASSI Measurement Estimation

3. Simulation Experiments

3.1. Experimental Setup

3.2. Evaluation of Multi-Exposure Strategies

3.2.1. Comparison of Single- and Multi-Exposure Strategies

3.2.2. Hyperspectral Reconstruction Performances with Different Exposure Intervals

3.3. Loss Functions for HDR CASSI Measurement Prediction

3.4. Ablation Experiments

3.5. Evaluation of Noise Robustness

4. Real-World Experiments

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI