DDSR: Degradation-Aware Diffusion Model for Spectral Reconstruction from RGB Images

Chen, Yunlai; Zhang, Xiaoyan

doi:10.3390/rs16152692

Open AccessArticle

DDSR: Degradation-Aware Diffusion Model for Spectral Reconstruction from RGB Images

by

Yunlai Chen

and

Xiaoyan Zhang

^*

College of Computer Science and Software Engineering, Shenzhen University, Shenzhen 518060, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2024, 16(15), 2692; https://doi.org/10.3390/rs16152692

Submission received: 11 June 2024 / Revised: 12 July 2024 / Accepted: 17 July 2024 / Published: 23 July 2024

(This article belongs to the Special Issue Deep Neural Networks for Hyperspectral Remote Sensing Image Processing)

Download

Browse Figures

Versions Notes

Abstract

The reconstruction of hyperspectral images (HSIs) from RGB images is an attractive low-cost approach to recover hyperspectral information. However, existing approaches focus on learning an end-to-end mapping of RGB images and their corresponding HSIs with neural networks, which makes it difficult to ensure generalization due to the fact that they are trained on data with a specific degradation process. As a new paradigm of generative models, the diffusion model has shown great potential in image restoration, especially in noisy contexts. To address the unstable generalization ability of end-to-end models while exploiting the powerful ability of the diffusion model, we propose a degradation-aware diffusion model. The degradation process from HSI to RGB is modeled as a combination of multiple degradation operators, which are used to guide the inverse process of the diffusion model by utilizing a degradation-aware correction. By integrating the degradation-aware correction to the diffusion model, we obtain an efficient solver for spectral reconstruction, which is robust to different degradation patterns. Experiment results on various public datasets demonstrate that our method achieves competitive performance and shows a promising generalization ability.

Keywords:

diffusion model; degradation-aware; spectral reconstruction; hyperspectral image

1. Introduction

A hyperspectral image (HSI) records the spectrum of a real-world scene in multiple bands, each reflecting information at a specific spectral wavelength, making it possible to detect unique spectral signatures of an individual object at different spatial locations, and thus to detect substances indistinguishable to the human eye. Compared with a traditional RGB image, a hyperspectral image has more spectral bands, which is able to store richer information and provide more details of the scene. Based on the above advantages, HSIs are very useful in many applications, such as medical image processing [1], remote sensing [2], anomaly detection [3], automatic driving [4], and other fields [5,6]. However, acquiring HSIs with rich spectral information is very costly and complicated, which poses significant limitations to the development of various applications, particularly in dynamic scenes or real-time scenarios. It is known to us that the acquisition of RGB images is easy and cheap [6], so extracting spectral information from RGB images has become a recent research hotspot, also known as spectral reconstruction (SR). The reconstructed HSI has shown potential in real-world tasks recently [7,8]; for example, it is used to field disease detection with 6.14% improvement compared to the baseline methods. However, collecting a large amount of well-calibrated pairwise data, like using dual cameras, is not trivial. Inverse mapping of existing HSIs onto RGB images is currently an effective training method, also known as the degradation of HSIs.

Early SR methods are mainly prior-based, which explore priors (such as sparsity or spectral correlation [9,10,11]) in HSIs. However, due to the poor representation ability of these priors, these methods only perform well on data in specific domains. With the development of deep learning [12,13,14], deep neural networks have become a powerful tool for solving the SR problem, which learn an end-to-end mapping from RGB images to HSIs, such as AWAN [15], HSCNN+ [16] and DRCRNet [17], also called data-driven methods. Transformer is also used to solve SR tasks [18,19], achieving impressive results with the multi-head self-attention (MSA) mechanism [20,21]. However, these end-to-end methods are obviously limited to specific degradation process, and data augmentation or ensemble learning could probably alleviate this problem, but not as a flexible approach. Moreover, these types of implicit mappings learned in an unconstrained space are often not optimal, especially when the data are insufficient, and the models usually face overfitting.

The diffusion model is a highly flexible and easy to train generative model, which consists of a forward and an inverse process. The basic idea is that the forward process sequentially perturbs the distribution of the data, and the inverse process is to restore the data distribution gradually. Previous works have demonstrated the powerful data modeling capabilities of the diffusion model to enable flexible mapping, i.e., from randomly sampled Gaussian noise to complex target distributions, such as text, image, and speech [22,23,24,25,26]. Therefore, we propose to use the diffusion model to perform efficient spectral reconstruction from an RGB image. However, for the SR task, the inverse process recovers hyperspectral information starting from random noise with the same size as the HSI; therefore, the noise space is large, which increases the prediction uncertainty in the diffusion inverse process. Fortunately, we find that the core part of the degradation process from HSI to RGB lies in the linear spectral downsampling, also known as the spectral response function, which is useful for spectral reconstruction [9,12,27,28]. The spectral response function provides important prior information for modeling the degradation process. Based on the degradation process, we could reduce the error caused by the prediction uncertainty in the inverse diffusion process. According to the above observation, we propose a degradation-aware diffusion model (namely, DDSR) for spectral reconstruction from an RGB image.

Specifically, the real-world degradation process from HSI to RGB is modeled as a combination of multiple degradation operators; they are linear spectral downsampling, random noise and nonlinear JPEG compression, respectively. These degradation operators are useful for SR tasks, which are used to guide the inverse process of diffusion, the key is to reduce the error caused by the prediction uncertainty in each inverse prediction step by utilizing our degradation-aware correction. Thus, we obtain a degradation-aware diffusion model, which is easily adapted to different degradation processes, while ensuring the generalization ability.

To the best of our current knowledge, this is the first degradation-based diffusion model for spectral reconstruction. By integrating the degradation-aware correction to the diffusion model, we obtain an efficient solver for spectral reconstruction. The main contributions of this work are summarized below:

We propose DDSR, a diffusion-based spectral reconstruction architecture that utilizes a degradation-aware correction to reduce the error caused by the prediction uncertainty in each inverse step.
Since realistic scenarios are usually noisy, a noise-related correction method of our work, motivated by adapting the correction process to the noise level of the current image, is proposed to reduce the effect of the noise.
JPEG compression is a common nonlinear degradation, and we propose to extend the correction further for JPEG-related scenarios.
Quantitative experiments on various public datasets demonstrate that our method can achieve competitive performance and shows promising generalization ability.

2. Related Work

Recovering HSIs from RGB images is a highly ill-posed problem and previous methods have achieved relatively impressive results, which can usually be categorized into two types: prior-based methods and data-driven methods.

2.1. Prior-Based Methods

Prior-based methods explore statistical information (such as sparsity, spatial structural similarity, and spectral correlation) as a prior in an HSI dataset to learn the mathematical model to explain the correlation between an RGB image and an HSI in a subspace. For example, Arad et al. [9] proposed a dictionary learning method based on sparse coding, but its computational complexity becomes higher as the dataset expands, which limits its application. Nguyen et al. [11] proposed a learning method based on a radial basis function network, which uses RGB white balance to normalize the scene illumination and restore the reflectance of the scenes. Jia et al. [10] analyzed a large set of datasets based on nonlinear dimensionality reduction techniques and proposed that the spectra of natural scenes lie in an intrinsically low-dimensional manifold, and proposed a manifold-based reconstruction pipeline for RGB images to be accurately mapped to their corresponding HSIs.

2.2. Data-Driven Methods

Data-driven methods take advantage of the ability of neural networks to extract features from RGB and HSI datasets to fit optimal solutions, and various neural network architectures have been proposed to improve reconstruction accuracy recently. HSCNN+ [16] is a distinct SR architecture that replaces the residual block by the dense block with a novel fusion scheme. the Adaptive Weighted Attention Network (AWAN [15]) was proposed to better capture channel dependencies through a trainable Adaptive Weighted Channel Attention (AWCA) module. Zhao et al. [29] propose a four-level Hierarchical Regression Network (HRNet) with a PixelShuffle layer as an inter-level interaction to recover HSIs. Zhang et al. [28] proposed an unsupervised framework that first unsupervisedly estimates the degradation process of the HSI to RGB image by progressively capturing the difference between the input RGB image and the reprojected RGB image from the recovered HSI. To enable the training process without using pairs of HSI and RGB images in the framework, an adversarial learning manner was employed. However, the training of an adversarial learning model may experience pattern collapse. MST++ [19] is a Transformer-based model that exploits the property of self-similarity between HSI spectra, using the channels as tokens to perform multi-head self-attention along the spectral dimensionality, and subsequently refining the reconstruction result step by step. The above data-driven methods all learn implicit mappings from RGBs to HSIs, which are often not optimal, especially when the data are insufficient, and the model usually faces overfitting.

2.3. Diffusion Model for Image Restoration

The diffusion model is a hot topic in computer vision, and many excellent diffusion-based image restoration methods have also appeared recently. Specifically, Wang et al. [30] proposed an effectively image restoration method based on denoising diffusion that removes noise and defect from an image and preserves the details and structure of the image. SR3 [22] performs image restoration through a stochastic iterative denoising process using a conditional diffusion model. EDiffSR [31] proposed a novel conditional Prior Enhancement Module (CPEM) that can efficiently utilize prior knowledge from a low-resolution image, which is subsequently provided to the diffusion model for efficient hyperspectral image restoration. HSR-Diff [32] was proposed as a conditional diffusion model and Dong et al. [33] proposed an interpretable scale-propelled diffusion (ISPDiff) model, which they used to generate a high-resolution HSI by fusing a high-resolution multispectral image (MSI) with a corresponding low-resolution HSI, namely HSI fusion. However, these methods recover HSI information from low-resolution HSIs, while HSI fusion methods additionally require high-resolution MSIs. They are not suitable for spectral reconstruction from RGB images, which provide very limited information compared to low-resolution HSIs, making model learning more difficult. In our work, we aimed to combine the diffusion model with degradation-aware correction to obtain an efficient SR task solver, shown in Figure 1, while the model is robust in different degradation patterns. Nevertheless, to the best of our knowledge, the diffusion model has been little explored in terms of spectral reconstruction, and our work aims to fill this research gap.

3. Method

3.1. Background

A diffusion model [30,34] is a type of probabilistic generative model that typically consists of two stages: forward process and inverse process. In short, the forward process perturbs the data distribution stochastically, usually by adding random noise, while the inverse process learns to recover the data distribution gradually. The forward process is composed of multiple perturbation steps in the form of a Markov chain, where the low-level noise is added to the input image

x_{0}

as the timestep increases. At timestep t, the perturbation process can be formulated as follows:

\begin{matrix} x_{t} = \sqrt{1 - β_{t}} x_{t - 1} + \sqrt{β_{t}} ϵ, ϵ \sim N (0, I) . \\ i . e ., q (x_{t} | x_{t - 1}) = N (x_{t}; \sqrt{1 - β_{t}} x_{t - 1}, β_{t} I) . \end{matrix}

(1)

where

β_{t}

is a hyperparameter that represents the noise level,

ϵ

is sampled from a standard Gaussian distribution and

x_{t}

is the noised image at timestep t, which can be represented as the closed-form

\begin{matrix} x_{t} = \sqrt[]{{\bar{α}}_{t}} x_{0} + \sqrt[]{1 - {\bar{α}}_{t}} ϵ, \\ i . e ., q (x_{t} | x_{0}) : = N (x_{t}; \sqrt{{\bar{α}}_{t}} x_{0}, (1 - {\bar{α}}_{t}) I) . \end{matrix}

(2)

where

α_{t} = 1 - β_{t}

and

{\bar{α}}_{t} = \prod_{s = 1}^{t} α_{s}

. Thus,

x_{t}

can be seen as a weighted sum of

x_{0}

and

ϵ

; as the timestep of noise injections increases, the input image gradually gets corrupted until it approaches standard Gaussian noise. The inverse process is to estimate

q (x_{t - 1} | x_{t})

from the noised image, which is difficult to estimate. However, the posterior distribution

p (x_{t - 1} | x_{t}, x_{0})

can be derived by the Bayes theorem (see Appendix A.1 for details), which can be expressed as

\begin{matrix} p (x_{t - 1} | x_{t}, x_{0}) = N (x_{t - 1}; μ_{t} (x_{t}, x_{0}), σ_{t}^{2} I) . \end{matrix}

(3)

where

σ_{t}^{2} = \frac{1 - {\bar{α}}_{t - 1}}{1 - {\bar{α}}_{t}} β_{t}

is the variance and

μ_{t} (x_{t}, x_{0}) = \frac{\sqrt{{\bar{α}}_{t - 1}} β_{t}}{1 - {\bar{α}}_{t}} x_{0} + \frac{\sqrt{α_{t}} (1 - {\bar{α}}_{t - 1})}{1 - {\bar{α}}_{t}} x_{t}

is the mean value [35]. Based on Equation (2),

x_{0}

can be reparameterized as

x_{0 | t} = \frac{1}{\sqrt{{\bar{α}}_{t}}} (x_{t} - ϵ_{θ} (x_{t}, t) \sqrt{1 - {\bar{α}}_{t}}) .

(4)

where

ϵ_{θ}

denotes a predicted noise by the neural network with parameter

θ

, described in Figure 2, and the

x_{0 | t}

is the predicted image of

x_{0}

in timestep t. Finally, the inverse process to obtain

x_{t - 1}

can be expressed as

x_{t - 1} = \frac{1}{\sqrt[]{α_{t}}} (x_{t} - \frac{1 - α_{t}}{\sqrt{1 - {\bar{α}}_{t}}} ϵ_{θ} (x_{t}, t)) + σ_{t} ϵ, ϵ \sim N (0, I) .

(5)

The specific forward and inverse processes can be seen in Figure 1; the forward process adds noise according to Equation (1), while the inverse process predicts

x_{0 | t}

using a neural network, then leverages Equation (3) to recover the HSI from random noise repeatedly.

3.2. Architecture of Neural Network

The neural network is an important part of the inverse process; specifically, the network takes the noisy image

x_{t}

and timestep t as inputs to predict the noise

ϵ_{θ}

, which is subsequently used to compute

x_{t - 1}

according to Equation (5). The overall structure of the neural network is depicted in Figure 2, which uses U-Net as a backbone and consists of a number of ResBlocks and attention blocks [20]. Apparently, the network extracts features at different scales of the image, thus providing more enriched contextual information. We also used skip connections to convey feature information at different levels, which helps to fully utilize the local and global information of the image. This ensures that the network can learn the noise distribution more efficiently.

Figure 3 depicts the structure of the ResBlock, which contains batch normalization, Silu activation, 2D convolution layers with a 3 × 3 kernel, and a residual connection to the output. Due to the insufficient training data for hyperspectral images, we additionally used dropout to avoid overfitting. Time embedding was added to the intermediate features to allow the network to learn different levels of noise.

3.3. Degradation-Aware Diffusion Model

For the SR task, the inverse process recovers an HSI starting from random noise with the same size at the HSI; therefore, the noise space is large for conventional diffusion-based methods, which increases the uncertainty in predicting

x_{t - 1}

in the inverse process. Thus, we leverage a degradation-aware correction to rectify the predicted

x_{t - 1}

, which is based on Range–Null Space (RNS) Decomposition [36]. By integrating the degradation-aware correction to the diffusion model, we obtain an efficient solver for spectral reconstruction, which is robust to different degradation patterns under the guidance of the degradation process. The overall architecture of our proposed DDSR is shown in Figure 1; a degradation-aware correction module is added in each inverse process. Unlike conditional diffusion models [22,31], it is important to note that the neural network in the DDSR does not require an RGB image as input, which is only provided as prior information for the correction, so the model can adapt to various degradation processes flexibly.

3.3.1. Degradation-Aware Correction

In general, the core part of degradation from HSI to RGB can be formulated as linear spectral downsampling:

y = H x,

(6)

where

x \in R^{D \times 1}

denotes the HSI,

y \in R^{d \times 1}

denotes the degraded RGB image, and

H \in R^{d \times D}

denotes the spectral downsampling operator, also known as the spectral response function; d and D are the number of channels of the RGB and HSI, respectively. The goal of HSI reconstruction, here, is to recover x from

y

given

H

. An ideal recovered

x_{r}

needs to satisfy the consistency constraint

y = H x_{r}

. However, the reconstructed HSI from end-to-end models optimized by pixel loss with the ground-truth x usually fails to satisfy the consistency constraint.

Therefore, we propose a degradation-aware correction to rectify the

x_{t - 1}

in each inverse step, while trying to ensure that

x_{0 | t}

satisfies consistency in each inverse process. Specifically,

H

and

y

are given; based on RNS decomposition [36], we can simply decompose x into two parts. Then, x can be written as

\begin{matrix} x & \equiv x + H^{†} H x - H^{†} H x \\ \equiv H^{†} H x + (I - H^{†} H) x \\ \equiv H^{†} y + (I - H^{†} H) x . \end{matrix}

(7)

where

H^{†}

is the pseudo-inverse matrix of the spectral downsampling operator

H

; they satisfy

H H^{†} H = H

, while

H^{†} y

is in range space and

(I - H^{†} H) x

is in null space. There are two noteworthy points: (i) the range part of x is known given

H

and

y

; and (ii) if we replace x with

x_{0 | t}

in the null part of Equation (7), the resulting

{\hat{x}}_{0 | t}

still satisfies consistency, which can be formulated as

\begin{matrix} {\hat{x}}_{0 | t} = H^{†} y + (I - H^{†} H) x_{0 | t} . \end{matrix}

(8)

where

{\hat{x}}_{0 | t}

is the result after degradation-aware correction, such that the error of the range part of the generative algorithm is removed while the

{\hat{x}}_{0 | t}

always satisfies the consistency constraint. The architecture of the degradation-aware correction is shown in Figure 1. We rectified

x_{t - 1}

in each inverse process to achieve efficient spectral reconstruction; the whole process of sampling is represented in Algorithm 1.

Algorithm 1 Simple sampling
1: Input: $x_{T} \sim N (0, I)$ , degraded image $y = H x$
2: Output: Reconstructed HSI $x_{r}$
3: for $t = T, . . ., 1$ do
4: $x_{0 \| t} = \frac{1}{{\sqrt{\bar{α}}}_{t}} (x_{t} - ϵ_{θ} (x_{t}, t) \sqrt{1 - {\bar{α}}_{t}})$
5: ${\hat{x}}_{0 \| t} = x_{0 \| t} - H^{†} (H x_{0 \| t} - y)$	▹ update ${\hat{x}}_{0 \| t}$ via Equation (8)
6: $x_{t - 1} \sim p (x_{t - 1} \| x_{t}, {\hat{x}}_{0 \| t})$	▹ Equation (3)
7: end for
8: return $x_{0}$

3.3.2. Noise-Related Correction

Degradation in realistic scenarios is usually noisy due to the effects of the sensor, lighting, and exposure time. Furthermore, special modifications are required for noisy scenarios. In the case of noise,

y = H x + n

, where

n \sim N (0, σ_{y}^{2} I)

is sampled from a Gaussian distribution; if we substitute

y

into in Equation (8), we obtain

{\hat{x}}_{0 | t} = x_{0 | t} - H^{†} (H x_{0 | t} - H x) + H^{†} n

(9)

where

H^{†} n

is the noise bias, which will be amplified in the Markov chain during the inverse process. Given that

n

is sampled from a Gaussian distribution, we find that

H^{†} n

still obeys a Gaussian distribution with variance

ω^{2} {σ_{y}}^{2}

, while

ω

can be estimated given

H

. To reduce the impact of noise bias, the sampling needs to be modified as

{\hat{x}}_{0 | t} = x_{0 | t} - λ_{t} H^{†} (H x_{0 | t} - y)

(10)

\hat{p} (x_{t - 1} | x_{t}, {\hat{x}}_{0 | t}) = N (x_{t - 1}; μ_{t} (x_{t}, {\hat{x}}_{0 | t}), (1 - κ_{t}) σ_{t}^{2} I)

(11)

where

λ_{t}

and

κ_{t}

are subject to two constraints: (i)

λ_{t}

needs to be as close to 1 as possible to ensure consistency; and (ii) the

x_{t - 1}

sample from Equation (11) should keep the same variance

σ_{t}^{2}

as in Equation (3). If we substitute the modified

{\hat{x}}_{0 | t}

in Equation (10) into

μ_{t}

in Equation (3), obviously, there is an additional variance

{(ω a_{t} λ_{t} σ_{y})}^{2}

due to the bias

λ_{t} H^{†} n

, where

a_{t} = \frac{\sqrt{{\bar{α}}_{t - 1}} β_{t}}{1 - {\bar{α}}_{t}}

, which is the coefficient of

x_{0 | t}

. So, the constraint (ii) can be formulated as

σ_{t}^{2} = (1 - κ_{t}) σ_{t}^{2} + {(ω a_{t} λ_{t} σ_{y})}^{2}

. Combining the above two constraints, we can value

λ_{t}

and

κ_{t}

as follows:

\{\begin{matrix} λ_{t} = 1, κ_{t} = {(ω a_{t} σ_{y} / σ_{t})}^{2} σ_{t} \geq ω a_{t} σ_{y} \\ κ_{t} = 1, λ_{t} = σ_{t} / ω a_{t} σ_{y} σ_{t} < ω a_{t} σ_{y} \end{matrix}

(12)

The whole process of noise-related sampling is represented in Algorithm 2. The motivation is to try to ensure consistency when the current noise level is higher than the bias, and we make the variance of Equation (11) the same as Equation (3) when the noise level is low.

Algorithm 2 Noise-related sampling
1: Input: $x_{T} \sim N (0, I)$ , degraded image $y = H x$
2: Output: Reconstructed HSI $x_{r}$
3: for $t = T, . . ., 1$ do
4: Update $κ_{t}, λ_{t}$ via Equation (12)
5: $x_{0 \| t} = \frac{1}{{\sqrt{\bar{α}}}_{t}} (x_{t} - ϵ_{θ} (x_{t}, t) \sqrt{1 - {\bar{α}}_{t}})$
6: ${\hat{x}}_{0 \| t} = x_{0 \| t} - λ_{t} H^{†} (H x_{0 \| t} - y)$	▹ update ${\hat{x}}_{0 \| t}$ via Equation (10)
7: $x_{t - 1} \sim \hat{p} (x_{t - 1} \| x_{t}, {\hat{x}}_{0 \| t})$	▹ Equation (11)
8: end for
9: return $x_{0}$

3.3.3. JPEG-Related Correction

JPEG compression [37] is a common nonlinear operator in realistic scenarios, as shown in Figure 4; the degradation process in this paper can be extended by the noise case as follows:

\begin{matrix} y = d e c o d e r (e n c o d e r (H x)) + n \end{matrix}

(13)

e n c o d e r (d e c o d e r (e n c o d e r (x))) \approx e n c o d e r (x)

(14)

where

e n c o d e r (\cdot)

and

d e c o d e r (\cdot)

denote the encoding and decoding process of JPEG compression; they have similar properties to the linear operator

H

shown in Equation (14). We propose to extend the correction further for JPEG-related scenarios by replacing the

{\hat{x}}_{0 | t}

in Equation (10) with

{\hat{x}}_{0 | t} = x_{0 | t} - λ_{t} H^{†} (d e c o d e r (e n c o d e r (H x_{0 | t}))) + H^{†} y

(15)

In the following sections, the JPEG compression quality factor is denoted as

q

, which is used to control the compression ratio of the image.

4. Experiments

4.1. Dataset

In this work, we trained our DDSR on the ARAD-1K dataset [27], which provides images measuring 482 × 512 and each HSI consists of 31 channels corresponding to wavelengths ranging from 400 nm to 700 nm. The ARAD-1K dataset contains 1000 HSI and RGB pairs, which are divided into sizes of 900, 50, and 50 for training, testing, and validating, respectively. We further validated the effectiveness of our method on five additional publicly available datasets, namely CAVE [38], Foster [39], KAUST [40], NUS [11], and ICVL [9]. For each dataset, we chose the channels corresponding to 400–700 nm. We used all the HSIs of the CAVE and Foster datasets, and selected 12, 25, and 100 HSIs from the NUS, ICVL, and KAUST datasets, respectively. For a fair comparison, the same degradation process was adopted to obtain training and testing RGB images. In order to further demonstrate the generalization capability of our method, we adopted two spectral response functions, one provided by the 2022 NTIRE Spectral Recovery Challenge and the other being the CIE 1931 color matching function.

4.2. Implementation Detail

The total timestep in the diffusion model was set as T = 1000.

β_{1}

and

β_{T}

in Equation (1) were set to

1 \times 10^{- 4}

and

1 \times 10^{- 2}

, respectively, and

β_{1 : T}

in Equation (1) was linearly increasing with the step in both training and testing phases. The number of ResBlocks in each layer of U-Net was set to two and the number of channels in U-Net was set to [64, 128, 256, 512]. We used Adam as the optimizer with

β

= (0.9,0.999), the initial learning rate was set to

1 \times 10^{- 4}

, while the learning rate was decayed by the CosineAnnealing schedule until it reached

1 \times 10^{- 6}

. During the training phase, we used four NVIDIA Tasle P100s as the hardware platform for our entire experiment, where the software platform was implemented using Pytorch 1.8.0; the model was firstly trained using images measuring 256 × 256 with a batch size of 16, and then was fine-tuned with images measuring 482 × 482 with a batch size of four. Additionally, each RGB–HSI pair was rescaled to the range of [−1, 1] for training.

During the testing phase, the ARAD, CAVE, and KAUST dataset, were tested using the entire size of the image and the rest of the datasets were cropped to a size of 512 × 512. In order to objectively evaluate the performance of our proposed method, we adopted three metrics as the quantitative evaluation results, as used in the 2022 spectral recovery challenge. The first metric is the mean relative absolute error (MRAE) that computes the pixel-wise distance between all channels of the reconstructed and ground-truth HSI, which can be formulated as

MRAE (x, \hat{x}) = \frac{1}{N} \sum_{i = 1}^{N} \frac{| x [i] - \hat{x} [i] |}{x [i]},

(16)

where

\hat{x} \in R^{H \times W \times C_{λ}}

denotes the reconstructed HSI cube and

N = H \times W \times C_{λ}

denotes the number of all pixels on the image. x is the ground truth and is the same size as

\hat{x}

. The second metric is the root mean square error (RMSE) that is defined as

RMSE (x, \hat{x}) = \sqrt{\frac{1}{N} \sum_{i = 1}^{N} {(x [i] - \hat{x} [i])}^{2}} .

(17)

The last metric is the Peak Signal-to-Noise Ratio (PSNR):

PSNR (x, \hat{x}) = 20 \cdot {log}_{10} (\frac{1}{RMSE (x, \hat{x})})

(18)

Obviously, if the value of x is too small (sparse or dark region), it may lead to a relatively large value of MRAE, thus causing a gradient problem. Therefore, we chose

L_{2}

loss to train the neural network (see Appendix A.2 for details).

4.3. Baseline

We compared our proposed DDSR method with five state-of-the-art methods, including four deep learning-based spectral reconstruction algorithms (AWAN [15], HDNet [41], HRNet [29], and MST++ [19]), and an efficient image restoration model, Restormer [18], where MST++ and AWAN were the winners of the NTIRE 2022 and 2020 Spectral Reconstruction Challenges. In order to more fully evaluate our method, we also compared our method with a conditional diffusion model [22] and a SOTA diffusion model for hyperspectral image restoration [31]. For a fair comparison, all the methods were trained using the same data and settings with our method.

4.4. Experiment Results

4.4.1. Quantitative Result

We compared our method with five state-of-the-art methods on six datasets. In the testing, the RGB images were generated using the spectral response function provided by the 2022 NTIRE Spectral Recovery Challenge, which was the same as the training setting. Overall, Table 1, Table 2 and Table 3 demonstrates the quantitative results of our DDSR and the compared models on all the datasets. All metrics are averaged and presented for each dataset; it can be found that our method maintains the best or second-best RMSE and PSNR results for most of the datasets, which demonstrates the effectiveness of our method.

The quantitative results for the KAUST and NUS datasets are shown in Table 1; it is obvious that our method outperforms the compared methods. Specifically, compared with the second-best method on the KAUST dataset, our DDSR achieves a 7.2% and 2.1% improvement in RMSE and PSNR, respectively. The quantitative results from Table 2 and Table 3 show that our method, in the RMSE and PSNR metrics, achieves the best results on the CAVE and ICVL datasets, while achieving second-best on the ARAD-1K and Foster datasets, but are competitive with the best results.

However, the MRAE performance is slightly worse than other methods. This is due to the fact that our method has a relatively larger error in sparse or dark regions, which leads to a larger value when calculating the MRAE when dividing by small pixel values. The specific reasons can be explained as follows: the pixel values of

y

in sparse and dark regions tend to 0, which means that the information of

H^{†} y

for the correction is limited, so the reconstruction results are relatively worse. This is further presented in Figure 5, for example, where the trees in the first and second columns, and the windows in the fifth column, which are labeled by red boxes, are regions with relatively large MRAE values; however, the

L_{1}

loss in these regions is relatively small.

We also compared the results with two diffusion-based methods, and the results are shown in Table 4. Obviously, these diffusion methods of image restoration do not perform well in hyperspectral reconstruction. This is due to the lack of training data leading to insufficient model learning, while the input only provides RGB image with limited information, which further increases the difficulty of the diffusion model in recovering hyperspectral information from random noise.

4.4.2. Qualitative Results

In this section, we compare the reconstruction quality of different methods visually. The highest correlation and overlap between the curves of the ground truth and our result in Figure 6 validates the effectiveness of our method. Figure 7 demonstrates the five slices of a reconstructed image in the KAUST dataset; it is obvious that the reconstructed results of our method are perceptually competitive and pleasing, and some of the compared methods have unpleasant artifacts in their results, and the contrast of their reconstructed image has a certain deviation compared to the ground truth. The curves in Figure 7 correspond to the average values of the RMSE for each channel at the green box position of the RGB image. Overall, it is clear that our method can achieve better results.

The error maps of the the reconstructed HSIs of all methods compared with the ground truth are presented in Figure 8, Figure 9 and Figure 10; it can be observed that the compared methods show limitations in fine-grained reconstruction; the error is mainly concentrated in the detailed parts of the scene, while the reconstruction error of our method in the detailed region is always kept at a low level. We infer that this is because the degradation-based correction helps the model to better capture the details of image generation, while ensuring the reconstruction quality.

4.5. Ablation Study

We conducted various ablation studies to verify the effect of different modules in the proposed method, including the effect of the noise-related sample, different amounts and ranges of correction, and the JPEG-related sample.

Table 5 shows the effectiveness of our noise-related method in different noise cases. It is obvious that, compared with simple sampling, the noise-related method significantly improves the performance at the same noise level, and maintains the same level of MRAE and PSNR in the face of larger noise. Our noise-related method, even at

σ_{y}

= 0.01, still achieves the same results as simple sampling at

σ_{y}

= 0.005. This is due to the fact that

{\hat{x}}_{0 | t}

is adjusted according to the noise level, which helps to reduce the impact of noise.

Table 6 presents the results of our method under different ranges of correction amounts, indicating that the more corrections performed, the better the results obtained. In addition, it can be found that, for the same amount of corrections, the earlier they are performed, the better the results obtained. We infer that our approach helps the diffusion model to learn the distribution of HSIs more efficiently, while early correction can provide more accurate

x_{t - 1}

for the subsequent samples, and, finally, obtain a more reasonable

x_{0}

.

Table 7 shows that our JPEG-related method significantly improves MRAE, with a slight improvement in RMSE and PSNR. We infer that the proposed JPEG-related method can more effectively handle sparse or dark areas, while not affecting the accuracy of other regions.

4.6. Generalization Ability

In order to demonstrate the generalization ability of our DDSR to different degradation patterns, in this section, we tested a different degradation pattern corresponding to a different response function based on the CIE 1931 color match function for generating test images. The quantitative results on various datasets are shown in Table 8 and Table 9; it can be seen that our method outperforms the compared data-driven based approaches and maintains the same level of performance when a different degradation pattern is employed, demonstrating a promising generalization ability. This is attributed to the fact that the neural network in our correction method does not require RGB images as inputs, which are only provided as prior information for the correction, so the model can adapt to various degradation processes flexibly. However, those end-to-end trained methods have difficulty in ensuring a stable performance.

5. Conclusions

In this paper, we proposed a degradation-aware diffusion model for spectral reconstruction from RGB images, addressing the unstable generalization ability of end-to-end models, while exploiting the powerful ability of diffusion models. The degradation process from HSI to RGB was modeled as a combination of multiple degradation operators, which were used to guide the inverse process of diffusion; specifically, the key was to reduce the error caused by the prediction uncertainty in each inverse prediction step by utilizing our proposed degradation-aware correction. Based on the complex degradation of realistic scenarios, we further proposed correction methods related to noise and JPEG compression. Finally, we obtained an efficient solver for spectral reconstruction, which is robust to different degradation patterns. We conducted quantitative and qualitative experiments to demonstrate the performance and generalization ability of our method, as well as ablation experiments to validate the effectiveness of the proposed method. In the future, we will focus on the following aspects: (1) how to deal with more complex degraded environments; and (2) how to improve the reconstruction accuracy for dark regions.

Author Contributions

Methodology, Y.C.; Resources, X.Z.; Writing—original draft, Y.C.; Writing—review & editing, X.Z.; Visualization, Y.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data presented in this study are available in [ARAD-1K, CAVE, Foster, KAUST, NUS, ICVL] at [https://www1.cs.columbia.edu/CAVE/databases/multispectral, https://doi.org/10.48420/14877285, https://repository.kaust.edu.sa/items/891485b4-11d2-4dfc-a4a6-69a4912c05f1, https://doi.org/10.1007/978-3-319-10584-0_13, https://doi.org/10.1007/978-3-319-46478-7_2, all accessed on 11 July 2024], reference number [8,26,36,37,38,39].

Acknowledgments

The authors acknowledge the supports in part by Ministry of Science and Technology China (MOST) Major Program on New Generation of Artificial Intelligence 2030 No. 2018AAA0102200, in part by the National Nature Science Foundation of China under Grant 91850202, and in part by State Key Laboratory of Radio Frequency Heterogeneous Integration (Independent Scientific Research Program No. 2024002).

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

Appendix A.1. Background

First of all, the probability density function of the Gaussian distribution can be expressed as

\begin{matrix} f (x) = \frac{1}{σ \sqrt{2 π}} e^{- \frac{{(x - μ)}^{2}}{2 σ^{2}}} \end{matrix}

(A1)

where

μ

and

σ^{2}

are denoted as the mean value and variance, respectively. Given Equation (A1), the posterior distribution, Equation (3), can be expanded as

\begin{matrix} q (x_{t - 1} ∣ x_{t}, x_{0}) & = \frac{q (x_{t} ∣ x_{t - 1}, x_{0}) q (x_{t - 1} ∣ x_{0})}{q (x_{t} ∣ x_{0})} \\ = \frac{N (x_{t}; \sqrt{α_{t}} x_{t - 1}, (1 - α_{t}) I) N (x_{t - 1}; \sqrt{{\bar{α}}_{t - 1}} x_{0}, (1 - {\bar{α}}_{t - 1}) I)}{N (x_{t}; \sqrt{{\bar{α}}_{t}} x_{0}, (1 - {\bar{α}}_{t}) I)} \\ \propto exp (- \frac{1}{2} (\frac{{(x_{t} - \sqrt{α_{t}} x_{t - 1})}^{2}}{β_{t}} + \frac{{(x_{t - 1} - \sqrt{{\bar{α}}_{t - 1}} x_{0})}^{2}}{1 - {\bar{a}}_{t - 1}} - \frac{{(x_{t} - \sqrt{{\bar{α}}_{t}} x_{0})}^{2}}{1 - {\bar{a}}_{t}})) \\ \propto exp (- \frac{1}{2} ((\frac{α_{t}}{β_{t}} + \frac{1}{1 - {\bar{α}}_{t - 1}}) x_{t - 1}^{2} - (\frac{2 \sqrt{α_{t}}}{β_{t}} x_{t} + \frac{2 \sqrt{{\bar{a}}_{t - 1}}}{1 - {\bar{α}}_{t - 1}} x_{0}) x_{t - 1})) \end{matrix}

(A2)

Obviously, the posterior distribution still satisfies the Gaussian distribution, which is denoted as

p (x_{t - 1} | x_{t}, x_{0}) = N (x_{t - 1}; μ_{t} (x_{t}, x_{0}), σ_{t}^{2} I)

. Thus,

μ_{t} (x_{t}, x_{0})

and

σ_{t}^{2}

can be obtained as follows:

\begin{matrix} \frac{1}{σ_{t}^{2}} & = (\frac{α_{t}}{β_{t}} + \frac{1}{1 - {\bar{α}}_{t - 1}}) . \\ σ_{t}^{2} = \frac{1 - {\bar{α}}_{t - 1}}{1 - {\bar{α}}_{t}} \cdot β_{t} \end{matrix}

(A3)

and

\begin{matrix} \frac{2 μ_{t} (x_{t}, x_{0})}{σ_{t}^{2}} & = (\frac{2 \sqrt{α_{t}}}{β_{t}} x_{t} + \frac{2 \sqrt{{\bar{a}}_{t - 1}}}{1 - {\bar{α}}_{t - 1}} x_{0}) \\ μ_{t} (x_{t}, x_{0}) & = \frac{\sqrt{a_{t}} (1 - {\bar{α}}_{t - 1})}{1 - {\bar{α}}_{t}} x_{t} + \frac{\sqrt{{\bar{α}}_{t - 1}} β_{t}}{1 - {\bar{α}}_{t}} x_{0} \end{matrix}

(A4)

Appendix A.2. Training the Neural Network

The training process uses only HSIs; specifically, we uniformly sampled timestep

t \sim (0, 1, . . ., T)

, and sampled

x_{t}

by Equation (2) and provided it to the network to predict the noise added to the image. The training objective can be expressed as

L_{2} = {∥ϵ_{θ} (x_{t}, t) - ϵ∥}_{2}

(A5)

where

| | \cdot {| |}_{2}

denotes the

L_{2}

loss. The training process of the neural network can be expressed as

Algorithm A1 Training process

1:: Input: $x_{0}$
2:: Repeat
3:: $t \sim {0, . . ., T}$
4:: $x_{t} = \sqrt[]{{\bar{α}}_{t}} x_{0} + \sqrt[]{1 - {\bar{α}}_{t}} ϵ, ϵ \sim N (0, I)$
5:: Take a gradient descent step:
6:: $\nabla_{θ} {∥ϵ_{θ} (x_{t}, t) - ϵ∥}_{2}$
7:: until converged

References

Khan, U.; Paheding, S.; Elkin, C.P.; Devabhaktuni, V.K. Trends in Deep Learning for Medical Hyperspectral Image Analysis. IEEE Access 2021, 9, 79534–79548. [Google Scholar] [CrossRef]
Zhao, H.; Wang, X.; Li, J.; Zhong, Y. Class Prior-Free Positive-Unlabeled Learning with Taylor Variational Loss for Hyperspectral Remote Sensing Imagery. In Proceedings of the International Conference on Computer Vision (ICCV), Paris, France, 2–6 October 2023. [Google Scholar]
Chen, S.; Li, X.; Yan, Y. Hyperspectral Anomaly Detection with Auto-Encoder and Independent Target. Remote. Sens. 2023, 15, 5266. [Google Scholar] [CrossRef]
Basterretxea, K.; Martinez, V.; Echanobe, J.; Gutierrez-Zaballa, J.; Del Campo, I. HSI-Drive: A Dataset for the Research of Hyperspectral Image Processing Applied to Autonomous Driving Systems. In Proceedings of the IEEE Intelligent Vehicles Symposium (IV), Nagoya, Japan, 11–17 July 2021. [Google Scholar]
Fu, Y.; Zhang, T.; Zheng, Y.; Zhang, D.; Huang, H. Joint Camera Spectral Sensitivity Selection and Hyperspectral Image Recovery. In Proceedings of the Computer Vision–ECCV 2018, Lecture Notes in Computer Science, Munich, Germany, 8–14 September 2018. [Google Scholar]
Zhang, J.; Su, R.; Fu, Q.; Ren, W.; Heide, F.; Nie, Y. A survey on computational spectral reconstruction methods from RGB to hyperspectral imaging. Sci. Rep. 2022, 12, 11905. [Google Scholar] [CrossRef] [PubMed]
Fu, J.; Liu, J.; Zhao, R.; Chen, Z.; Qiao, Y.; Li, D. Maize disease detection based on spectral recovery from RGB images. Front. Plant Sci. 2023, 13, 1056842. [Google Scholar] [CrossRef] [PubMed]
Kong, L.; Li, L.; Yuan, J.; Zhao, Y.; Dong, L.; Liu, M.; Zhao, Y.; Lu, T.; Chu, X. High-precision hemoglobin detection based on hyperspectral reconstruction of RGB images. Biomed. Signal Process. Control 2024, 91, 105904. [Google Scholar] [CrossRef]
Arad, B.; Ben-Shahar, O. Sparse Recovery of Hyperspectral Signal from Natural RGB Images. In Proceedings of the European Conference on Computer Vision (ECCV), Amsterdam, The Netherlands, 11–14 October 2016. [Google Scholar]
Jia, Y.; Zheng, Y.; Gu, L.; Subpa-Asa, A.; Lam, A.; Sato, Y.; Sato, I. From RGB to Spectrum for Natural Scenes via Manifold-Based Mapping. In Proceedings of the International Conference on Computer Vision (ICCV), Zurich, Switzerland, 6–12 September 2014. [Google Scholar]
Nguyen, R.M.H.; Prasad, D.K.; Brown, M.S. Training-Based Spectral Reconstruction from a Single RGB Image. In Proceedings of the International Conference on Computer Vision (ICCV), Zurich, Switzerland, 6–12 September 2014. [Google Scholar]
Liao, X.; He, L.; Mao, J.; Xu, M. Spectral Superresolution Using Transformer with Convolutional Spectral Self-Attention. Remote Sens. 2024, 16, 1688. [Google Scholar] [CrossRef]
Zhang, Y.; Zhang, L.; Song, R.; Tong, Q. A General Deep Learning Point–Surface Fusion Framework for RGB Image Super-Resolution. Remote Sens. 2024, 16, 139. [Google Scholar] [CrossRef]
Wang, L.; Sole, A.; Hardeberg, J.Y. Densely Residual Network with Dual Attention for Hyperspectral Reconstruction from RGB Images. Remote Sens. 2022, 14, 3128. [Google Scholar] [CrossRef]
Li, J.; Wu, C.; Song, R.; Li, Y.; Liu, F. Adaptive Weighted Attention Network with Camera Spectral Sensitivity Prior for Spectral Reconstruction from RGB Images. In Proceedings of the Computer Vision and Pattern Recognition Workshops (CVPRW), Seattle, WA, USA, 14–19 June 2020. [Google Scholar]
Shi, Z.; Chen, C.; Xiong, Z.; Liu, D.; Wu, F. HSCNN+: Advanced CNN-Based Hyperspectral Recovery from RGB Images. In Proceedings of the Computer Vision and Pattern Recognition Workshops (CVPRW), Seattle, WA, USA, 14–19 June 2020. [Google Scholar]
Li, J.; Du, S.; Wu, C.; Leng, Y.; Song, R.; Li, Y. DRCR Net: Dense Residual Channel Re-calibration Network with Non-local Purification for Spectral Super Resolution. In Proceedings of the Computer Vision and Pattern Recognition Workshops (CVPRW), New Orleans, LA, USA, 19–20 June 2022. [Google Scholar]
Syed Waqas, Z.; Arora, A.; Khan, S.; Hayat, M.; Khan, F.S.; Yang, M.H. Restormer: Efficient Transformer for High-Resolution Image Restoration. In Proceedings of the Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022. [Google Scholar]
Cai, Y.; Lin, J.; Lin, Z.; Wang, H.; Zhang, Y.; Pfister, H.; Timofte, R.; Gool, L. MST++: Multi-stage Spectral-wise Transformer for Efficient Spectral Reconstruction. In Proceedings of the Computer Vision and Pattern Recognition (CVPR) Workshops, New Orleans, LA, USA, 19–20 June 2022. [Google Scholar]
Niu, B.; Wen, W.; Ren, W.; Zhang, X.; Yang, L.; Wang, S.; Zhang, K.; Cao, X.; Shen, H. Single Image Super-Resolution via a Holistic Attention Network. In Proceedings of the European Conference on Computer Vision (ECCV), Glasgow, UK, 23–28 August 2020. [Google Scholar]
Yao, J.; Hong, D.; Chanussot, J.; Meng, D.; Zhu, X.; Xu, Z. Cross-Attention in Coupled Unmixing Nets for Unsupervised Hyperspectral Super-Resolution. In Proceedings of the European Conference on Computer Vision (ECCV), Glasgow, UK, 23–28 August 2020. [Google Scholar]
Saharia, C.; Ho, J.; Chan, W.; Salimans, T.; Fleet, D.J.; Norouzi, M. Image Super-Resolution via Iterative Refinement. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 2022, 45, 4713–4726. [Google Scholar] [CrossRef] [PubMed]
Han, L.; Zhao, Y.; Lv, H.; Zhang, Y.; Liu, H.; Bi, G.; Han, Q. Enhancing Remote Sensing Image Super-Resolution with Efficient Hybrid Conditional Diffusion Model. Remote Sens. 2023, 15, 3452. [Google Scholar] [CrossRef]
Zhang, H.; Chen, N.; Li, M.; Mao, S. The Crack Diffusion Model: An Innovative Diffusion-Based Method for Pavement Crack Detection. Remote Sens. 2024, 16, 986. [Google Scholar] [CrossRef]
Chen, J.; Luo, Y.; Wang, J.; Tang, H.; Tang, Y.; Li, J. Elimination of Irregular Boundaries and Seams for UAV Image Stitching with a Diffusion Model. Remote Sens. 2024, 16, 1483. [Google Scholar] [CrossRef]
Chen, J.; Jia, L.; Zhang, J.; Feng, Y.; Zhao, X.; Tao, R. Super-Resolution for Land Surface Temperature Retrieval Images via Cross-Scale Diffusion Model Using Reference Images. Remote Sens. 2024, 16, 1356. [Google Scholar] [CrossRef]
Arad, B.; Timofte, R.; Yahel, R.; Morag, N.; Bernat, A.; Cai, Y.; Lin, J.; Lin, Z.; Wang, H.; Zhang, Y.; et al. NTIRE 2022 Spectral Recovery Challenge and Data Set. In Proceedings of the Computer Vision and Pattern Recognition Workshops (CVPRW), New Orleans, LA, USA, 19–20 June 2022. [Google Scholar]
Zhu, Z.; Liu, H.; Hou, J.; Zeng, H.; Zhang, Q. Semantic-embedded Unsupervised Spectral Reconstruction from Single RGB Images in the Wild. In Proceedings of the International Conference on Computer Vision (ICCV), Virtual Conference, 11–17 October 2021. [Google Scholar]
Zhao, Y.; Po, L.M.; Yan, Q.; Liu, W.; Lin, T. Hierarchical Regression Network for Spectral Reconstruction from RGB Images. In Proceedings of the Computer Vision and Pattern Recognition Workshops (CVPRW), Seattle, WA, USA, 14–19 June 2020. [Google Scholar]
Wang, Y.; Yu, J.; Zhang, J. Zero-Shot Image Restoration Using Denoising Diffusion Null-Space Model. In Proceedings of the International Conference on Learning Representations (ICLR), Virtual Event, 25–29 April 2022. [Google Scholar]
Xiao, Y.; Yuan, Q.; Jiang, K.; He, J.; Jin, X.; Zhang, L. EDiffSR: An Efficient Diffusion Probabilistic Model for Remote Sensing Image Super-Resolution. IEEE Trans. Geosci. Remote. Sens. (TGRS) 2023, 62, 5601514. [Google Scholar] [CrossRef]
Wu, C.; Wang, D.; Mao, H.; Li, Y. HSR-Diff:Hyperspectral Image Super-Resolution via Conditional Diffusion Models. In Proceedings of the International Conference on Computer Vision (ICCV), Paris, France, 2–3 October 2023. [Google Scholar]
Dong, W.; Liu, S.; Xiao, S.; Qu, J.; Li, Y. ISPDiff: Interpretable Scale-Propelled Diffusion Model for Hyperspectral Image Super-Resolution. IEEE Trans. Geosci. Remote. Sens. 2024, 62, 5519614. [Google Scholar] [CrossRef]
Ho, J.; Jain, A.; Abbeel, P. Denoising Diffusion Probabilistic Models. In Proceedings of the Neural Information Processing Systems (NIPS), Online Conference, 6–12 December 2020. [Google Scholar]
Sohl-Dickstein, J.; Weiss, E.A.; Maheswaranathan, N.; Ganguli, S. Deep Unsupervised Learning using Nonequilibrium Thermodynamics. In Proceedings of the International Conference on Machine Learning (ICCV), Lille, France, 6–11 July 2015. [Google Scholar]
Schwab, J.; Antholzer, S.; Haltmeier, M. Deep Null Space Learning for Inverse Problems: Convergence Analysis and Rates. Inverse Probl. 2019, 35, 025008. [Google Scholar] [CrossRef]
Wallace, G. The JPEG still picture compression standard. IEEE Trans. Consum. Electron. 1992, 38, xviii–xxxiv. [Google Scholar] [CrossRef]
Yasuma, F.; Mitsunaga, T.; Iso, D.; Nayar, S. Generalized Assorted Pixel Camera: Postcapture Control of Resolution, Dynamic Range, and Spectrum. IEEE Trans. Image Process. 2010, 19, 2241–2253. [Google Scholar] [CrossRef] [PubMed]
Nascimento, S.M.; Amano, K.; Foster, D.H. Spatial distributions of local illumination color in natural scenes. Vis. Res. 2016, 120, 39–44. [Google Scholar] [CrossRef] [PubMed]
Li, Y.; Fu, Q.; Heidrich, W. Multispectral illumination estimation using deep unrolling network. In Proceedings of the International Conference on Computer Vision (ICCV), Virtual Conference, 11–17 October 2021. [Google Scholar]
Hu, X.; Cai, Y.; Lin, J.; Wang, H.; Yuan, X.; Zhang, Y.; Timofte, R.; Gool, L. HDNet: High-resolution Dual-domain Learning for Spectral Compressive Imaging. In Proceedings of the Computer Vision and Pattern Recognition Workshops (CVPRW), New Orleans, LA, USA, 19–20 June 2022. [Google Scholar]

Figure 1. Description of the overall architecture of our proposed DDSR and degradation-aware correction.

Figure 2. The architecture of the neural network with parameter

θ

, which uses U-Net as a backbone and consists of a number of ResBlocks and attention blocks, where we uniformly sample timestep

t \sim (0, 1, . . ., T)

and encode it sinusoidally, and then pass it through Full connection layers to get time embedding.

Figure 2. The architecture of the neural network with parameter

θ

, which uses U-Net as a backbone and consists of a number of ResBlocks and attention blocks, where we uniformly sample timestep

t \sim (0, 1, . . ., T)

and encode it sinusoidally, and then pass it through Full connection layers to get time embedding.

Figure 3. Structure of the ResBlock of the neural network; C, H, and W represent the number of channels, height, and width of the input, respectively.

Figure 4. Description of the degradation process in this paper, where

H x

is the linear spectral downsampling, and JPEG denotes the JPEG compression.

Figure 4. Description of the degradation process in this paper, where

H x

is the linear spectral downsampling, and JPEG denotes the JPEG compression.

Figure 5. Visualization of MRAE (second and third rows) and

L_{1}

loss (fourth and fifth rows) heatmap of the reconstruction results by our method corresponding to 460 and 500 nm, which are sampled from ARAD-1K dataset. Note that the red boxes label the regions where the MRAE loss is large, but the

L_{1}

loss in these regions is small.

Figure 5. Visualization of MRAE (second and third rows) and

L_{1}

loss (fourth and fifth rows) heatmap of the reconstruction results by our method corresponding to 460 and 500 nm, which are sampled from ARAD-1K dataset. Note that the red boxes label the regions where the MRAE loss is large, but the

L_{1}

loss in these regions is small.

Figure 6. Visualization of spectral density curve of selected samples corresponding to the green box from the Foster and CAVE datasets; corr represents the correlation with the ground-truth curve.

Figure 7. Visualization of slices of reconstructed image and RMSE curves for all methods, corresponding to the green box position in the RGB image on the KAUST dataset.

Figure 8. Visualization of the error heatmap of generated results on the NUS dataset. Please follow the color bar to find areas of large losses and zoom in for a better view.

Figure 9. Visualization of the error heatmap of generated results on the CAVE dataset. Please follow the color bar to find areas of large losses and zoom in for a better view.

Figure 10. Visualization of the error heatmap of the generated results on the ICVL dataset. hlPlease follow the color bar to find areas of large losses and zoom in for a better view.

Table 1. The average quantitative results on the KAUST and NUS datasets. The degradation is set to

σ_{y}

= 0.005, and

q

= 75. The best and second-best values are highlighted.

Table 1. The average quantitative results on the KAUST and NUS datasets. The degradation is set to

σ_{y}

= 0.005, and

q

= 75. The best and second-best values are highlighted.

Method	KAUST			NUS
Method	MRAE	RMSE	PSNR	MRAE	RMSE	PSNR
AWAN [15]	0.279	0.0894	22.435	0.192	0.0239	34.215
HDNet [29]	0.302	0.0878	22.639	0.201	0.0249	33.347
HRNet [41]	0.287	0.0867	22.752	0.204	0.0251	33.469
Restormer [18]	0.291	0.0856	22.936	0.212	0.0251	33.470
MST++ [19]	0.279	0.0855	23.592	0.205	0.0237	33.433
Ours	0.277	0.0797	24.102	0.191	0.0236	34.274

Table 2. The average quantitative results on the CAVE and Foster datasets. The degradation is set to

σ_{y}

= 0.005, and

q = 75

. The best and second-best values are highlighted.

Table 2. The average quantitative results on the CAVE and Foster datasets. The degradation is set to

σ_{y}

= 0.005, and

q = 75

. The best and second-best values are highlighted.

Method	CAVE			Foster
Method	MRAE	RMSE	PSNR	MRAE	RMSE	PSNR
AWAN [15]	0.409	0.0389	29.333	0.331	0.0226	34.620
HDNet [29]	0.425	0.0352	30.132	0.339	0.0217	35.347
HRNet [41]	0.398	0.0360	29.757	0.300	0.0223	35.128
Restormer [18]	0.447	0.0347	29.867	0.294	0.0205	35.541
MST++ [19]	0.390	0.0332	30.485	0.311	0.0195	36.163
Ours	0.446	0.0331	30.496	0.389	0.0199	35.644

Table 3. The average quantitative results on the ARAD-1K and ICVL datasets. The degradation is set to

σ_{y}

= 0.005, and

q = 75

. The best and second-best values are highlighted.

Table 3. The average quantitative results on the ARAD-1K and ICVL datasets. The degradation is set to

σ_{y}

= 0.005, and

q = 75

. The best and second-best values are highlighted.

Method	ARAD-1K			ICVL
Method	MRAE	RMSE	PSNR	MRAE	RMSE	PSNR
AWAN [15]	0.158	0.0222	34.950	0.201	0.0214	34.505
HDNet [29]	0.167	0.0224	35.542	0.196	0.0217	34.218
HRNet [41]	0.154	0.0224	34.918	0.188	0.0244	33.675
Restormer [18]	0.159	0.0207	36.258	0.215	0.0216	34.378
MST++ [19]	0.148	0.0219	35.305	0.198	0.0203	34.825
Ours	0.198	0.0215	35.586	0.213	0.0202	34.886

Table 4. The quantitative ablation results of our method and two diffusion-based methods on the ARAD-1K dataset. The degradation is set to

σ_{y} = 0.005

, and

q = 100

. The best and second-best values are highlighted.

Table 4. The quantitative ablation results of our method and two diffusion-based methods on the ARAD-1K dataset. The degradation is set to

σ_{y} = 0.005

, and

q = 100

. The best and second-best values are highlighted.

Method	MRAE	RMSE	PSNR
SR3 [22]	1.442	0.0698	23.456
EDiffSR [31]	0.840	0.0861	21.736
Ours	0.198	0.0220	35.586

Table 5. Ablation study of our noise-related method with different noise cases on the ARAD-1K dataset, where

q = 100

. The best and second-best values are highlighted.

Table 5. Ablation study of our noise-related method with different noise cases on the ARAD-1K dataset, where

q = 100

. The best and second-best values are highlighted.

Sample Method	MRAE	RMSE	PSNR
Simple ( $σ_{y}$ = 0.01)	0.300	0.0219	34.642
Simple ( $σ_{y}$ = 0.005)	0.211	0.0195	36.385
Noise-related ( $σ_{y}$ = 0.01)	0.215	0.0191	36.699
Noise-related ( $σ_{y}$ = 0.005)	0.178	0.0192	36.914

Table 6. Ablation study of our method in different amounts and at different ranges on the ARAD-1K dataset, where

σ_{y}

= 0.005 and

q = 100

. The best and second-best values are highlighted. The

(a, b)

of the Range column means that the correction is performed when

a \leq t < b

.

Table 6. Ablation study of our method in different amounts and at different ranges on the ARAD-1K dataset, where

σ_{y}

= 0.005 and

q = 100

. The best and second-best values are highlighted. The

(a, b)

of the Range column means that the correction is performed when

a \leq t < b

.

Amount	Range	MRAE	RMSE	PSNR	Range	MRAE	RMSE	PSNR
100	(400,500)	1.263	0.0792	22.367	(900,1000)	1.003	0.0820	22.965
250	(250,500)	0.918	0.0659	24.082	(750,1000)	0.770	0.0750	23.745
500	(0,500)	0.854	0.0597	25.802	(500,1000)	0.536	0.0529	26.403

Table 7. Ablation study of our JPEG-related method on the CAVE dataset, where

σ_{y}

= 0.005. The best and second-best values are highlighted.

Table 7. Ablation study of our JPEG-related method on the CAVE dataset, where

σ_{y}

= 0.005. The best and second-best values are highlighted.

Sample Method	MRAE	RMSE	PSNR
Simple ( $q = 40$ )	0.558	0.0460	27.549
JPEG-related ( $q = 40$ )	0.521	0.0452	27.702
Simple ( $q = 60$ )	0.477	0.0420	28.327
JPEG-related ( $q = 60$ )	0.456	0.0414	28.594

Table 8. The average quantitative results on the KAUST and NUS datasets. The degradation is set to

σ_{y}

= 0.005, and

q = 75

. The best and second-best values are highlighted.

Table 8. The average quantitative results on the KAUST and NUS datasets. The degradation is set to

σ_{y}

= 0.005, and

q = 75

. The best and second-best values are highlighted.

Method	KAUST			ICVL
Method	MRAE	RMSE	PSNR	MRAE	RMSE	PSNR
AWAN [15]	0.301	0.1351	18.371	0.328	0.0523	26.650
HDNet [29]	0.304	0.1304	18.631	0.361	0.0532	26.534
HRNet [41]	0.324	0.1378	18.139	0.354	0.0540	26.348
Restormer [18]	0.303	0.1309	18.285	0.352	0.0539	26.312
MST++ [19]	0.296	0.1313	18.573	0.342	0.0531	26.485
Ours	0.204	0.0689	24.979	0.236	0.0225	33.936

Table 9. The average quantitative results on the CAVE and Foster datasets. The degradation is set to

σ_{y}

= 0.005, and

q = 75

. The best and second-best values are highlighted.

Table 9. The average quantitative results on the CAVE and Foster datasets. The degradation is set to

σ_{y}

= 0.005, and

q = 75

. The best and second-best values are highlighted.

Method	CAVE			Foster
Method	MRAE	RMSE	PSNR	MRAE	RMSE	PSNR
AWAN [15]	0.554	0.0791	22.836	0.427	0.0393	30.738
HDNet [29]	0.551	0.0674	24.067	0.452	0.0388	30.924
HRNet [41]	0.548	0.0770	22.927	0.424	0.0406	30.536
Restormer [18]	0.523	0.0690	23.815	0.417	0.0404	30.288
MST++ [19]	0.509	0.0666	24.114	0.414	0.0391	31.001
Ours	0.443	0.0354	29.771	0.390	0.0217	35.454

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Chen, Y.; Zhang, X. DDSR: Degradation-Aware Diffusion Model for Spectral Reconstruction from RGB Images. Remote Sens. 2024, 16, 2692. https://doi.org/10.3390/rs16152692

AMA Style

Chen Y, Zhang X. DDSR: Degradation-Aware Diffusion Model for Spectral Reconstruction from RGB Images. Remote Sensing. 2024; 16(15):2692. https://doi.org/10.3390/rs16152692

Chicago/Turabian Style

Chen, Yunlai, and Xiaoyan Zhang. 2024. "DDSR: Degradation-Aware Diffusion Model for Spectral Reconstruction from RGB Images" Remote Sensing 16, no. 15: 2692. https://doi.org/10.3390/rs16152692

APA Style

Chen, Y., & Zhang, X. (2024). DDSR: Degradation-Aware Diffusion Model for Spectral Reconstruction from RGB Images. Remote Sensing, 16(15), 2692. https://doi.org/10.3390/rs16152692

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

DDSR: Degradation-Aware Diffusion Model for Spectral Reconstruction from RGB Images

Abstract

1. Introduction

2. Related Work

2.1. Prior-Based Methods

2.2. Data-Driven Methods

2.3. Diffusion Model for Image Restoration

3. Method

3.1. Background

3.2. Architecture of Neural Network

3.3. Degradation-Aware Diffusion Model

3.3.1. Degradation-Aware Correction

3.3.2. Noise-Related Correction

3.3.3. JPEG-Related Correction

4. Experiments

4.1. Dataset

4.2. Implementation Detail

4.3. Baseline

4.4. Experiment Results

4.4.1. Quantitative Result

4.4.2. Qualitative Results

4.5. Ablation Study

4.6. Generalization Ability

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A

Appendix A.1. Background

Appendix A.2. Training the Neural Network

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI