Unsupervised Hyperspectral Image Denoising via Spectral Learning Preference of Neural Networks

Zhang, Ruobing; Ng, Michael K.; Ljubenovic, Marina; Zhuang, Lina

doi:10.3390/rs18050742

Open AccessArticle

Unsupervised Hyperspectral Image Denoising via Spectral Learning Preference of Neural Networks

by

Ruobing Zhang

^1,2,

Michael K. Ng

³,

Marina Ljubenovic

⁴ and

Lina Zhuang

^1,*

¹

Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China

²

School of Electronic, Electrical and Communication Engineering, University of Chinese Academy of Sciences, Beijing 100049, China

³

Department of Mathematics, Hong Kong Baptist University, Hong Kong, China

⁴

BioSense Institute, University of Novi Sad, 21000 Novi Sad, Serbia

^*

Author to whom correspondence should be addressed.

Remote Sens. 2026, 18(5), 742; https://doi.org/10.3390/rs18050742

Submission received: 29 December 2025 / Revised: 22 January 2026 / Accepted: 26 January 2026 / Published: 28 February 2026

(This article belongs to the Special Issue Advances in Hyperspectral Remote Sensing Image Processing: 2nd Edition)

Download

Browse Figures

Versions Notes

Highlights

What are the main findings?

An unsupervised hyperspectral image denoising framework is proposed that exploits the inherent spectral learning preference of deep neural networks.
An adaptive early stopping strategy is designed to align with this learning behavior, enabling the model to fit only clean spectral signals, prevent overfitting to noise and significantly improving denoising performance.

What are the implications of the main findings?

Without requiring clean reference data, the proposed approach provides a practical solution for real-world scenarios where noise-free hyperspectral observations are unavailable.
By leveraging intrinsic network priors rather than explicit noise modeling, the method offers strong adaptability across different noise types. This suggests a promising direction for robust and label-free hyperspectral image restoration.

Abstract

Existing hyperspectral denoising networks typically rely on large amounts of high-quality paired noisy–clean images for training, which are often unavailable. Moreover, the noise distribution in real hyperspectral images (HSIs) is complex and variable, making it challenging for existing networks to handle noise distributions not present in the training dataset, resulting in poor generalization. To address these issues, this paper proposes an unsupervised Hyperspectral image Denoising approach exploiting the spectral learning preference of neural networks with an adaptive early stopping strategy (termed HyDePre). Inspired by the Deep Image Prior, which reveals that neural networks tend to capture natural image structures before fitting noise, we observe that deep neural networks exhibit a similar learning preference in the spectral domain. Specifically, as training progresses, the network first fits smooth spectral feature curves and only later adapts to Gaussian noise and complex impulse noise. This observation provides an opportunity to use an early stopping strategy, allowing the network to fit only the clean spectral signals and thus achieve denoising. Our method does not require clean images for training, but instead optimizes network parameters to automatically learn prior spectral information from a single noisy image, modeling the intrinsic structure of the input data to uncover its underlying patterns.However, finding the optimal stopping point is challenging without access to clean images as sources of prior information. To tackle this challenge, we introduce an adaptive early stopping strategy based on the average spectral maximum variation of the reconstructed image, effectively preventing overfitting. The experimental results demonstrate that HyDePre outperforms existing methods in terms of both visual quality and quantitative metrics.

Keywords:

hyperspectral image restoration; long short-term memory network; encoder–decoder; deep image prior

1. Introduction

Hyperspectral images (HSIs) have demonstrated enormous application potential in fields such as precision agriculture, environmental monitoring [1], and military reconnaissance, due to their unique advantage of integrating spectral and spatial information. However, limited by the complexity of imaging systems and external environments, raw HSI data inevitably suffers from various types of noise contamination, such as Gaussian noise, impulse noise, stripe noise, and deadlines. These noises not only seriously degrade the visual quality of the images but, more critically, significantly affect the accuracy of subsequent advanced tasks such as classification and unmixing. Therefore, researching efficient and robust algorithms for removing noise in hyperspectral images is of crucial theoretical and practical value.

Current hyperspectral image denoising methods can be roughly divided into two categories: model-based methods and deep learning-based methods. Model-based methods constrain the solution space by encoding the prior knowledge of HSIs in the form of regularization terms. Commonly used priors include local smoothness [2], non-local self-similarity [3], and the low-rankness in the spectral dimension [4]. Many works use total variation (TV) regularization to model local smoothness prior, such as the nonconvex correlated total variation method (NCTV [5]), enhanced 3DTV method [6], and many other methods that integrate TV regularization, and the low-rank model, such as total-variation-regularized low-rank matrix factorization (LRTV [7]), the decomposition of joint low rankness and local smoothness plus sparse matrices (DCTV-RPCA [8]), representative coefficient total variation (RCTV [9]), and low-rank tensor decomposition with total variation (LRTDTV [10]) or weighted group sparsity regularization (LRSTV [11]). To utilize non-local self-similarity prior, existing methods group these similar blocks and then perform collaborative filtering (BM3D [3], BM4D [12], VBM4D [13], and NSP [14]) or low-rank approximation (GLF [15]) on them. Due to the high correlation in the spectral dimension, HSI data cubes have strong low-rank characteristics. Exploiting this characteristic, a significant amount of Gaussian noise can be removed using techniques such as double-factor regularized low-rank tensor factorization (LRTF-DFR) [16], low-rank matrix recovery (LRMR) [4], low-rank and sparse representations [17], nonconvex regularized low-rank and sparse matrix decomposition (NonRLRS) [18], and sparse representation of the underlying HSI, e.g., Kronecker-basis-representation (KBR) [19]. As for sparse noise, LRTF-DFR [16] minimizes the

ℓ_{1}

norm to promote sparity, LRMR constrains the upper bound of the cardinality of sparse noise component, and NonRLRS [18] introduces the normalized

ε

—penalty as a nonconvex regularizer. While these methods are effective in certain scenarios, they often struggle with complex noise patterns and are highly sensitive to parameter settings. Additionally, many of these methods are time-consuming due to the large sizes of HSIs and the iterative estimation procedures.

Recently, deep learning-based methods have gained popularity due to their ability to learn complex representations of noise and signals [20,21,22,23,24,25,26,27,28,29,30,31,32,33]. Deep learning-based methods include supervised learning and self-supervised/unsupervised learning. Supervised learning methods train deep neural networks on large-scale datasets to learn an end-to-end mapping from noisy images to clean images. Notable examples include the spatial–spectral gradient network (SSGN) [21], the CNN-based HSI denoising method (HSI-DeNet [22], HSID-CNN [20], 3DADCNN [23], and QRNN3D [24]), and the transformer-based HSI denoising method (Hider [27], TRQ3DNet [28], SST [29], HSDT [30], and SERT [31]). The performance of supervised learning models heavily relies on large-scale, high-quality paired training data. However, acquiring noise-free hyperspectral images in the real world is extremely challenging. To overcome the reliance on paired data, self-supervised and unsupervised denoising approaches have been developed [34,35,36,37,38,39,40,41,42]. Representative methods include using blind spot networks [35,43], using deep image prior [34], using implicit neural representation [39], using diffusion models [38], and so on. Despite their effectiveness, most existing self-supervised/unsupervised hyperspectral denoising methods rely on specific noise statistics assumptions or handcrafted internal priors, making them prone to underfitting or overfitting when confronted with complex or non-independent noise, and often leading to unstable denoising performance and the loss of spectral or spatial details. Among them, a representative and influential line of research is the Deep Image Prior (DIP) [34,44], which reveals that neural networks themselves encode implicit image priors and tend to fit structured image content before noise, even without external training data. From this perspective, the DIP mechanism can be understood as the network implicitly imposing a form of prior during optimization. In this work, we analyze the interaction between the learning behavior of neural networks and the intrinsic spectral characteristics of hyperspectral images, and argue that extending the DIP mechanism to the spectral domain enables more effective exploitation of spectral information. Under this formulation, during network optimization, clean and smooth spectral signatures are preferentially reconstructed, while noise is fitted only at later stages of training. Such a learning behavior reflects an inherent spectral learning preference of neural networks, which can be conceptually interpreted as a deep spectral prior in the wavelength domain and which constitutes the foundation of the proposed method in this paper.

1.1. Related Work

While traditional model-based methods remain highly effective and interpretable, deep learning–based approaches have attracted increasing attention for hyperspectral image denoising due to their ability to capture complex data characteristics. However, their efficacy typically relies on extensive datasets of paired noisy–clean training data, which are often impractical to acquire in hyperspectral scenarios. To circumvent this limitation, self-supervised and unsupervised learning approaches have emerged as promising alternatives. The core idea of self-supervised learning methods is to generate supervisory signals directly from the noisy observations and then use these signals for training. The pioneering Noise2Noise framework [45] demonstrated that denoising networks can be trained using image pairs sharing the same underlying signal but corrupted by independent noise. Yet, acquiring such paired noisy data remains a significant challange in practical HSI applications. To address this, Zhu et al. [35] developed blind-spot networks utilizing per-band 2D convolutions (N2V-2D), 3D convolutions (N2V-3D), and spatial–spectral feature separation (N2V-Sep). These methods randomly mask a subset of pixels or spectral bands during training, compelling the network to reconstruct missing values via the surrounding spatial–spectral context [43,46]. Similarly, Zhuang et al. proposed the Eigenimage2Eigenimage (E2E) framework [36], which generates training pairs from a single noisy image via random neighbor subsampling for eigenimage denoising. Nevertheless, these self-supervised approaches are often predicated on certain noise assumptions. For instance, N2V-2D, N2V-3D, and N2V-Sep presuppose that the image signal is spatially correlated while the noise is independent, whereas E2E assumes zero mean noise. Such assumptions, however, frequently fail to hold under the complex noise conditions inherent to real-world HSIs.

Deep Image Prior (DIP) [44] introduces a novel paradigm for unsupervised image denoising. The fundamental premise of DIP is that the architecture of an untrained, randomly initialized CNN acts as an implicit image prior. During the optimization process for a single noisy image, the network naturally captures low-frequency structural information before modeling high-frequency noise. Consequently, applying early stopping yields a denoised result before the network overfits the noise. The DIP paradigm requires no external training data and relies solely on the single noisy image, making it particularly advantageous for HSI denoising where clean data are scarce. Sidorov and Hardeberg were the first to apply DIP to HSI inverse problems and proposed the DIP2D and DIP3D models [34]. Then, Miao et al. integrated DIP into linear mixed models, employing U-Net and fully connected networks to represent abundance maps and endmembers, respectively [47]. Saragadam et al. further extended this approach to address general low-rank tensor and matrix factorization problems (DeepTensor, [41]), thereby establishing a unified framework for various HSI inverse tasks. However, existing DIP-based methods depend heavily on CNNs; while adept at capturing spatial locality and translation invariance, they often fail to fully exploit the high correlation and smoothness inherent in the spectral dimension. Furthermore, Chakrabarty et al. pointed out some failure cases for the traditional DIP [48]. Specifically, when an image is corrupted by periodic stripe noise with regular intervals and widths, the DIP model often struggles to distinguish true signal from artifacts. This limitation stems from the inherent inductive bias of the network architecture, which prioritizes the generation of low-frequency structures. Since such structured stripe noise exhibits low-frequency characteristics that are similar to natural spatial features, the network inevitably assimilates these artifacts during the early optimization stages, rendering standard early stopping ineffective. This poses a fundamental challenge for DIP-based methods operating solely in the spatial domain. Conversely, in the spectral dimension, the disparity between the signal and noise is significantly more pronounced: while noise typically manifests as high-frequency fluctuations, clean spectral signatures are characterized by inherent smoothness. This distinction motivates us to explore spectral correlations as a more robust prior for effective denoising. Another challange is that their training process is very time-consuming [38,44] and it is difficult to accurately determine the optimal stopping point. To solve this problem, we introduce an adaptive early stopping strategy to prevent the network from overfitting to noise, significantly improving the denoising performance.

1.2. Contributions

This study investigates whether a similar learning preference exists in our one-dimensional natural signals, i.e., the spectral vectors in HSIs. Leveraging this learning preference, we propose a straightforward yet effective method for removing mixed noise in hyperspectral images. The main contributions of this paper can be summarized as follows:

Existing HSI denoising networks typically rely on large amounts of high-quality paired noisy–clean images, which are difficult to obtain in practice. To overcome this limitation, we propose an unsupervised denoising method that requires no clean images and optimizes network parameters to automatically learn spectral priors from a single noisy image, addressing the issue of insufficient training data.
We observe that deep networks tend to first fit smooth spectral features before addressing the noise. Based on this observation, we introduce an adaptive early stopping strategy, which leverages the network’s prior preference to fit only clean spectral signals, preventing overfitting to noise and significantly improving denoising performance.
Our work is addressing the limitations of traditional denoising networks in generalizing to complex noise types. By modeling the intrinsic structure of the data, our approach enhances the network’s adaptability to various noise types, such as Gaussian, stripe noise, impulse noise, and deadlines. This significantly improves the robustness and stability of the denoising process, making it more effective in real-world scenarios.

This paper is organized as follows: Section 2 provides a detailed description of our proposed denoising method HyDePre. Section 3 and Section 4 present the experimental results, including comparisons with state-of-the-art algorithms. Section 5 provides discussions of this paper. Section 6 makes a conclusion.

2. Methodology

2.1. Problem Formulation

Let

X \in R^{n_{b} \times m n}

denote a hyperspectral image matrix with

m \times n

spectral vectors (the columns of

X

, equal to number of spatial pixels) of size

n_{b}

(the number of bands). Under the assumption of additive noise, the observation model can be written as

Y = X + S + N,

(1)

where

Y, S, N \in R^{n_{b} \times m n}

represent the observed HSI data, sparse noise (e.g., impulse noise and stripes), and Gaussian noise, respectively. Traditionally, the denoising problem is formulated as

min_{X} L (Y, f (X)) + λ R (X),

(2)

where the loss function

L

represents the data fidelity and f models the forward physical process to obtain the observation

Y

. The regularizer

R

encodes priors on

X

, and

λ > 0

is the regularization parameter.

2.2. Network Learning Preference for HSI Denoising

Deep neural networks exhibit a well-documented learning preference toward low-frequency signal components during training, a phenomenon commonly referred to as “spectral bias” or the “frequency principle”. Numerous studies in machine learning theory have shown that neural networks tend to fit smooth and slowly varying structures at early training stages, while high-frequency components are learned progressively later [48,49,50,51,52]. This behavior has also been linked to implicit regularization effects arising from network architectures and gradient-based optimization.

In the context of hyperspectral imagery, this learning preference aligns naturally with the intrinsic properties of spectral signals. Clean hyperspectral spectra typically exhibit strong spectral correlation and smooth variation across adjacent bands, reflecting material-specific reflectance characteristics. In contrast, common noise sources, such as Gaussian noise, impulse noise, and stripe noise, are largely spectrally uncorrelated and manifest as high-frequency perturbations along the spectral dimension.

When a neural network is trained directly on spectral vectors, this interaction between the network’s frequency bias and the spectral characteristics of hyperspectral data causes the network to prioritize fitting smooth spectral curves before adapting to noisy fluctuations. As training continues, the network gradually begins to absorb high-frequency noise components, leading to overfitting. This observation motivates the adoption of an early stopping strategy in the spectral domain. By terminating training at an appropriate stage, the network is encouraged to capture the underlying clean spectral structure while avoiding excessive fitting to noise.

Let

G_{Θ}

denote a trainable deep neural network (DNN) parameterized by

Θ

. Introducing this reparametrization into (2), we obtain

min_{Θ} L (Y, f \circ G_{Θ} (Y)) + λ R \circ G_{Θ} (Y),

(3)

where the symbol ∘ represents the function composition operator.

The flowchart of our proposed HyDePre is shown in Figure 1, where each pixel is treated as a sequence of spectral values, enabling the network to model the inter-band dependencies effectively. Obviously, we focus on spectral features instead of the spatial features emphasized in traditional DIP applications [34]. This is because spectral correlations in hyperspectral data are much stronger than spatial correlations [53]. Prioritizing spectral correlations enables more effective denoising (see Section 2.3) while simplifying the network, reducing parameters, and easing training. Moreover, HyDePre adopts an encoder–decoder architecture, which consists of the following three key components:

(1) Encoder: The encoder compresses the input spectral sequence for each pixel

y_{i} \in R^{n_{b}}

into a low-dimensional representation

y_{i}^{hidden} \in R^{d}

using a stacked Long Short-Term Memory (LSTM) [54] network with three layers. This process effectively preserves the critical spectral features while mitigating redundancy and noise. The selection of an LSTM network as the backbone of the encoder framework is guided by the inherent characteristics of HSIs: (1) HSIs typically comprise hundreds of spectral bands, resulting in a long sequential spectral vector for each pixel; (2) the strong inter-band correlations within HSIs introduce complex dependencies across the spectral sequence, which the LSTM is well-suited to model and exploit. The LSTM operates on the input sequence in a layer-wise manner:

h_{t}^{(l)} = {LSTM}_{l} (h_{t}^{(l - 1)}, c_{t}^{(l - 1)}), l = 1, 2, 3, t = 1, \dots, T,

(4)

where

h_{t}^{(l)}

and

c_{t}^{(l)}

are the hidden state and the cell state at time step t for the l-th LSTM layer, respectively. The LSTM encoder outputs a sequence of hidden states for each layer, and the final hidden state from the last layer

h_{T}^{(3)}

is taken as the final hidden state, which serves as a comprehensive representation of the entire spectral sequence:

y_{i}^{hidden} = h_{T}^{(3)} = {LSTM}_{encoder} (y_{i}) .

(5)

(2) Latent representation: After processing the input spectral sequence

y_{i}

through the LSTM encoder, the final hidden state

y_{i}^{hidden}

is then passed through a linear transformation without a bias term to project it into a lower-dimensional latent space of dimension p, (where

p ≪ n_{b}

):

{\tilde{y}}_{i} = W_{L} y_{i}^{hidden},

(6)

where

W_{L} \in R^{p \times d}

is the weight matrix.

The resulting latent representation

{\tilde{y}}_{i} \in R^{p}

provides a more compact encoding of the spectral information and acts as a learned spectral prior. This latent representation encapsulates the essential spectral features of the input sequence, serving as a compressed form that retains the most critical spectral dependencies and correlations.

(3) Decoder: The decoder is responsible for reconstructing the input sequence from its latent representation. This process begins with a latent-to-hidden transformation, wherein the latent vector is projected back into the hidden space via a linear layer followed by a ReLU activation function. This transformation can be expressed as follows:

h_{0} = ReLU (W_{L}^{'} {\tilde{y}}_{i}),

(7)

where

W_{L}^{'} \in R^{d \times p}

is the weight matrix. Next, the transformed vector

h_{0}

is reshaped into a form compatible with the decoder’s LSTM layer, denoted as

h_{0}^{(3)}

. The decoder employs an LSTM structure identical to the encoder, with the same number of layers, to process the reshaped hidden vector and generate a sequence

y_{i}^{decoded} \in R^{d}

. This process can be expressed as follows:

y_{i}^{decoded} = {LSTM}_{decoder} (h_{0}^{(3)}),

(8)

Finally, the output sequence is passed through a linear transformation (output layer) to obtain the reconstructed sequence

{\hat{x}}_{i} = W_{O} y_{i}^{decoded},

(9)

where

W_{O} \in R^{n_{b} \times d}

is the weight matrix. The

ℓ_{1}

norm between the reconstructed sequence

{\hat{x}}_{i}

and the original input sequence

y_{i}

is employed as the reconstruction loss. In addition, an

ℓ_{1}

-norm regularization is incorporated to promote parameter sparsity and stabilize the optimization process. Thus, the final loss function consisting of the reconstruction loss and an

ℓ_{1}

regularization term is formulated as follows:

L (Θ) = \frac{1}{n} \sum_{i = 1}^{n} {∥{\hat{x}}_{i} - y_{i}∥}_{1} + {λ ∥ Θ ∥}_{1} = \frac{1}{n} \sum_{i = 1}^{n} {∥G_{Θ} (y_{i}) - y_{i}∥}_{1} + λ {∥ Θ ∥}_{1} .

(10)

An important attribute of the deep neural network is its inherent bias towards capturing meaningful visual content more rapidly than fitting to noise during training, which leads to the phenomenon known as “early-learning-then-overfitting” [55]. This characteristic enables the framework to achieve effective denoising by stopping the training process at an optimal point before the model begins to overfit to noise.

In this work, we further investigate how this learning preference manifests in hyperspectral images and argue that transferring the DIP paradigm from the spatial domain to the spectral (wavelength) dimension allows the network to more effectively exploit intrinsic spectral correlations. The rationale and empirical evidence supporting the effectiveness of spectral domain DIP are discussed in the following subsection (Section 2.3). Based on this spectral learning behavior, effective denoising can be achieved by stopping the optimization process before the network starts fitting noise. However, the critical challenge lies in identifying this optimal stopping point without access to the ground truth clean image. To address this challenge, we introduce an adaptive early stopping strategy, requiring no clean images, which balances the trade-off between noise suppression and overfitting. The details of this strategy are elaborated in the subsequent subsection (Section 2.4).

2.3. Why Deep Image Prior Works Better in the Spectral Domain

Structured noise, such as stripe noise and deadline noise, is a common degradation in hyperspectral imagery and poses a challenge for DIP-based denoising methods. When such noise is spatially coherent, spatial domain modeling may inadvertently treat it as valid image structure and thus fit it during optimization. In this subsection, we provide an illustrative analysis to explain why modeling the deep prior from the spectral viewpoint is more effective in handling structured noise.

We consider Case 1 on the Washington DC Mall dataset, which simulated Gaussian noise and stripe noise. Figure 2a shows the clean spectral curve, the noisy observation, and the spectral vector recovered by the proposed HyDePre at the pixel point (20, 199). The clean image exhibits smooth variation along the spectral dimension, whereas the structured noise introduces sharp, localized perturbations at specific bands. During optimization, the proposed method based on spectral learning preference prioritizes fitting the smooth spectral trend while suppressing these abrupt fluctuations, resulting in a reconstructed spectrum that closely follows the clean signal without reproducing noise-induced spikes.

To further analyze how structured noise is perceived under different modeling viewpoints, we select a representative band (Band 32) where the spectral fluctuation is most pronounced. Figure 2b,c show the noisy and clean images of this band, respectively. The noise appears as strong spatially structured artifacts, which are highly coherent in the spatial domain. When a conventional spatial DIP [44] is applied directly to this single band, as shown in Figure 2d, the structured noise is partially preserved, indicating that the spatial model tends to interpret such patterns as image structures. This phenomenon is aligned with the failure case reported by Chakrabarty et al. in [48], which pointed out that when an image is corrupted by periodic stripe noise with regular intervals and widths, the DIP model often struggles to distinguish true signal from artifacts.

In contrast, Figure 2e shows the Band 32 image extracted from the denoised hyperspectral cube produced by the proposed HyDePre. Since the optimization is performed on pixel-wise spectral vectors rather than spatial patches, the structured spatial noise is implicitly transformed into localized spectral irregularities. These irregularities correspond to high-frequency components along the spectral dimension and are therefore less favored during the early stages of neural network optimization. As a result, the structured noise is effectively suppressed without being explicitly modeled or detected.

This analysis demonstrates that the advantage of spectral domain modeling lies in its ability to reshape the representation of structured noise in a way that aligns with the learning preference of neural networks. By transferring the deep prior from the spatial domain to the spectral domain, structured spatial noise becomes less dominant during optimization, leading to more robust denoising behavior.

2.4. Adaptive Early Stopping Strategy

By exploiting the fact that the spectral curves of clean HSI pixels are smooth, while noisy spectral curves exhibit irregularities, we design an early stopping strategy based on the average spectral maximum variation (ES-ASMV). The approach involves stopping the training process when significant variability is detected in the reconstructed spectral curves, signaling that the network has started to learn noise rather than the desired visual content.

We use superscripts to denote the number of training iterations, where

Θ^{t}

represents the network parameters after the t-th training iteration, and

{\hat{X}}^{t} = [{\hat{x}}_{1}^{t}, \dots, {\hat{x}}_{n}^{t}] = G_{Θ^{t}} (Y)

denotes the reconstructed image produced by the network at iteration t. For the i-th pixel of the t-th reconstruction image, its spectral maximum variation (SMV) is defined as the maximum absolute value of the variations computed along the spectral dimension

{v^{t}}_{i} = {∥D {\hat{x}}_{i}^{t}∥}_{\infty} = {∥{[{\hat{x}}_{i}^{t}]}_{2 : n_{b}} - {[{\hat{x}}_{i}^{t}]}_{1 : n_{b} - 1}∥}_{\infty},

(11)

where

{∥x∥}_{\infty} = {max}_{j} |x_{j}|

is the infinity norm of vector

x

, and the subscript

{[x]}_{j_{1} : j_{2}}

represents the subvector of

x

consisting of elements from the

j_{1}

-th to the

j_{2}

-th element.

As shown in Figure 3, the SMV values of the clean image are much smaller than those of the noisy image. This indicates that the average value of SMV is an effective indicator for distinguishing between clean images and noisy images. Based on this, the average spectral maximum variation (ASMV) of the entire reconstructed HSI at the t-th iteration is defined as the mean SMV of all pixels.

{ASMV}^{t} = mean ({v^{t}}_{i}) = \frac{1}{n} \sum_{i = 1}^{n} v_{i} .

(12)

Given

{ASMV}^{t}

, the rate of change in ASMV at the t-th iteration is defined as follows:

Δ {ASMV}^{t} = ({ASMV}^{t} - {ASMV}^{t - 1}) / {ASMV}^{t - 1} .

(13)

When this rate (

Δ ASMV

) exceeds a predefined threshold

γ

, it indicates that the network begins to learn high-frequency noise components, and training should be terminated to prevent the network from overfitting. However, considering that the network parameters update drastically and the rate of change in ASMV is unstable in the early stages of training, the fact that

Δ ASMV

exceeds the threshold in the early stages cannot truly reflect the overfitting trend of the network. Therefore, we introduce a Warm-up Period. Specifically, a starting epoch

E_{S}

is defined, during which the early stopping mechanism remains inactive. Once the training epoch surpasses

E_{S}

, early stopping is triggered if the

Δ ASMV

exceeds the threshold

γ

. This strategy can not only effectively avoid misjudgments in the initial training stage but also ensures that training halts when significant spectral variability is observed, thereby improving computational efficiency while preserving the quality of the reconstruction results. The proposed HyDePre algorithm for HSI mixed denoising is summarized in Algorithm 1.

Algorithm 1 HyDePre: Hyperspectral Image Denoising via Spectral Learning Preference

Require: Observation

Y

; randomly initialized parameters

Θ^{0}

; iteration counter

t = 0

;

{ASMV}^{0} = 0

; threshold

γ

; and starting epoch

E_{S}

Ensure: Reconstruction

\hat{X}

1: while not stopped do

2: Update

Θ

via (3) to obtain

Θ^{t + 1}

.

3: Reconstruct

{\hat{X}}^{t + 1} = G_{Θ^{t + 1}} (Y)

.

4: Compute

{ASMV}^{t + 1}

for

{\hat{X}}^{t + 1}

via (12).

5: Compute

Δ {ASMV}^{t + 1}

for

{\hat{X}}^{t + 1}

via (13).

6: if

Δ {ASMV}^{t + 1} > γ

and

t > E_{S}

then

7: return

{\hat{X}}^{t}

.

8: end if

9:

t \leftarrow t + 1

10: end while

2.5. Comparative Analysis of Deep 1D Spectral Prior and DIP

The core finding of this study corroborates the hypothesis that the “Deep Image Prior” (DIP) effect, originally characterized in spatial domains by Ulyanov et al. [44], is also intrinsic to one-dimensional spectral vectors in hyperspectral images (HSIs). Our experiments demonstrate that neural networks exhibit a distinct high-frequency impedance in the spectral domain, fitting smooth spectral signatures significantly faster than noise components. This work is best contextualized against two related approaches: DHP [34] and DS2DP [47]. DHP represents a direct extension of DIP to the spatial domain of HSIs using CNNs; however, it incurs substantial computational burdens due to the requirement for large network capacity and excessive iteration overhead. To address efficiency, DS2DP employs a linear mixture model framework, utilizing a U-Net and fully connected networks to model abundance maps and endmembers, respectively. While DS2DP shares a conceptual similarity with our work in applying deep priors to 1D vectors (specifically endmembers), our approach diverges fundamentally by targeting the raw spectral vectors directly. The dual-network architecture of DS2DP, which jointly optimizes abundance and endmembers in an unsupervised manner, is susceptible to error propagation, potentially compromising global reconstruction quality. In contrast, our method directly exploits the network’s inherent learning preference for 1D spectral vectors. This paradigm avoids the complexities of unmixing-based decomposition, offering a straightforward yet highly effective solution for HSI denoising.

3. Experiments on Simulated Images

3.1. Noise Simulation

To assess the performance of the proposed method in comparison with state-of-the-art denoising algorithms, experiments were conducted on simulated data. A subimage from the Washington DC Mall dataset (https://engineering.purdue.edu/~biehl/MultiSpec/hyperspectral.html, accessed on 1 September 2024) (of size

150 \times 200 \times 100

) and a subimage from the Pavia University dataset (https://www.ehu.eus/ccwintco/index.php?title=Hyperspectral_Remote_Sensing_Scenes, accessed on 1 September 2024) (of size

310 \times 250 \times 87

) were selected for evaluation. Both subimages are of high quality and are commonly regarded as clean reference images in the literature [16,53]. To simulate noisy images, we introduced five types of additive noise into the data, as follows: Case 1∼Case 3 include a combination of Gaussian noise and stripe noise, impulse noise, and deadlines, respectively. Case 4 combines Gaussian noise, stripe noise, and impulse noise. Case 5 includes Gaussian noise, stripe noise, impulse noise, and deadlines simultaneously. These configurations allow for the assessment of the denoising performance under increasingly challenging and realistic mixed noise scenarios.

3.2. Network Architecture and Training Configuration

The proposed HyDePre framework adopts an encoder–latent–decoder architecture to model pixel-wise spectral vectors in an unsupervised manner. Both the encoder and decoder are implemented using stacked LSTM networks with three layers. The encoder maps the input spectral sequence into a compact latent representation, which serves as a bottleneck to suppress noise while preserving essential spectral structures. The decoder then reconstructs the spectral signal from this latent representation. All experiments share the same network architecture and training configuration. The detailed architectural and optimization parameters are summarized in Table 1. In addition, the Warm-up Period

E_{S}

and the threshold

γ

are set to {(6, 0.07), (10, 0.03), (6, 0.1), (6, 0.1), (6, 0.1)} for the Washington DC Mall dataset under five cases and {(25, 0.06), (25, 0.06), (45, 0.06), (10, 0.07), and (30, 0.06)} for the Pavia University dataset.

3.3. Comparisons and Evaluation Metrics

The proposed HyDePre is compared with eight state-of-the-art methods including FastHyMix [53] (https://github.com/LinaZhuang, accessed on 10 September 2024), KBR [19] (https://github.com/XieQi2015/KBR-TC-and-RPCA, accessed on 10 September 2024), FallHyDe [56] (https://github.com/ChenYong1993/FallHyDe, accessed on 15 December 2025), NonRLRS [18], DDS2M [38] (https://github.com/miaoyuchun/DDS2M, accessed on 1 December 2025), HLRTF [37] (https://github.com/YisiLuo/HLRTF, accessed on 1 December 2025), LRTFR [39] (https://github.com/YisiLuo/Continuous-Tensor-Toolbox, accessed on 1 December 2025), and EGD-Net. Among them, KBR, FallHyDe, and NonRLRS are model-based methods; DDS2M, FastHyMix, HLRTF, LRTFR, and EGD-Net are deep-learning-based or hybrid methods. NonRLRS utilizes the idea of low-rank matrix recovery by introducing a normalized

ε

-penalty term to constrain the low-rank potential clean image and sparse noise components. KBR utilizes the idea of sparse representation, measuring the sparsity of a tensor through Kronecker-basis-representation. DDS2M is based on DIP and linear mixed models, using U-Net and fully connected networks to model abundance maps and endmembers, respectively. HLRTF and LRTFR both use low-rank tensor fractorization. HLRTF embeds a deep neural network into tensor singular value decomposition, while LRTFR proposes a low-rank tensor function representation which can continuously represent data beyond meshgrid with infinite resolution. FastHyMix is a two-stage hybrid method, involving signal subspace estimation followed by subspace coefficient regularization. It uses a Gaussian mixture model and a plug-and-play technique. Similarly, FallHyDe is also based on low-rank subspace representation. Its primary innovation lies in a non-iterative estimation of spatial representation coefficients, which leverages the high signal-to-noise ratio bands inherent in real HSIs. This design renders FallHyDe exceptionally fast and makes it highly suitable for large-scale HSI denoising tasks. EGD-Net is an eigenimage guided diffusion network for hyperspectral mixed noise removal. In the experiments, the hyperparameters of all algorithms were adjusted to the optimal values according to the authors’ recommendations in their papers through manual tuning. The experiments were conducted on a laptop equipped with a 12th Gen Intel(R) Core(TM) i7-12700H processor (2.30 GHz) and 16 GB of RAM. All the model-based methods were implemented in MATLAB R2022b. All the deep learning-based methods were implemented using the PyTorch 2.4.1 framework, accelerated by an NVIDIA GeForce RTX 3060 GPU.

To evaluate the denoising performance of different algorithms, in addition to visual effects, we use the Mean Peak Signal-to-Noise Ratio (MPSNR), Mean Structural Similarity Index (MSSIM), and Mean Feature Similarity Index (MFSIM) as quantitative metrics. These metrics are obtained by calculating the corresponding PSNR, SSIM [57], and FSIM [58] band by band and then taking the average. A higher PSNR, SSIM, and FSIM value indicates better denoising performance. Let

X \in R^{m \times n \times n_{b}}

denote the clean hyperspectral image and

\hat{X} \in R^{m \times n \times n_{b}}

be the denoised result, where

n_{b}

represents the number of spectral bands. These metrics are calculated by averaging the values obtained from each band b as follows:

MPSNR = \frac{1}{n_{b}} \sum_{b = 1}^{n_{b}} 10 {log}_{10} (\frac{{MAX}_{b}^{2}}{{MSE}_{b}}),

(14)

where

{MAX}_{b}

is the maximum pixel intensity in the b-th band, and

{MSE}_{b}

represents the mean squared error between the reference and denoised bands.

MSSIM = \frac{1}{n_{b}} \sum_{b = 1}^{n_{b}} \frac{(2 μ_{X_{b}} μ_{{\hat{X}}_{b}} + C_{1}) (2 σ_{X_{b} {\hat{X}}_{b}} + C_{2})}{(μ_{X_{b}}^{2} + μ_{{\hat{X}}_{b}}^{2} + C_{1}) (σ_{X_{b}}^{2} + σ_{{\hat{X}}_{b}}^{2} + C_{2})},

(15)

where

μ

and

σ^{2}

denote the mean and variance of the image intensity,

σ_{X_{b} {\hat{X}}_{b}}

is the covariance, and

C_{1}

and

C_{2}

are constants to ensure stability.

X_{b}

and

\hat{X_{b}}

indicate the b-th band of the clean and reconstructed HSI.

MFSIM = \frac{1}{n_{b}} \sum_{b = 1}^{n_{b}} FSIM (X_{b}, {\hat{X}}_{b}) .

(16)

where the calculation method of FSIM is defined in [58], which utilizes phase congruency and gradient magnitude to capture the preservation of image features.

3.4. Experimental Results

We present the quantitative assessment of the simulated experimental results in Table 2 and Table 3. To rigorously assess the effectiveness of our proposed adaptive early stopping strategy, we also include a variant denoted as HyDePre*, where the stopping iteration was manually selected to maximize the MPSNR for each specific case (serving as the idea performance that our framework can achieve). As shown in Table 2, our proposed HyDePre and HyDePre* outperform the compared algorithms in nearly all cases on the Washington DC Mall Dataset. Although FastHyMix surpasses our HyDePre and HyDePre* in terms of MPSNR in Case 2, it shows a significant performance decline in Case 4 and Case 5, indicating that it cannot handle complex scenarios where images simultaneously contain multiple types of noise (including Gaussian noise, stripe noise, impulse noise, and deadlines).

As shown in Table 3, our HyDePre and HyDePre* rank first or second in nearly all individual cases on Pavia University dataset. While FastHyMix demonstrates exceptional performance in Case 3 (Gaussian noise and deadlines), its efficacy drops noticeably in complex mixed-noise scenarios (e.g., Case 5, which contains Gaussian noise, stripe noise, impulse noise, and deadlines), whereas HyDePre maintains robust performance across all degradation types. Compared to the DIP-based method DDS2M, our proposed approach achieves superior denoising performance with significantly less computational cost. This validates the efficacy of our design principle: prioritizing spectral correlation over spatial features enables a straightforward yet more powerful deep image prior for hyperspectral image denoising. Figure 4 and Figure 5 show the visual effects of different denoising algorithms. For the visual comparison of the Washington DC Mall dataset, we present the results from the five top-performing algorithms to ensure clarity within the paper’s space constraints. The selection was based on the highest mean MPSNR achieved across all cases. These algorithms are KBR, HLRTF, LRTFR, EGD-Net, and the proposed HyDePre with adaptive early stopping strategy. Additionally, for the Pavia University dataset, we present the results of all algorithms except FallHyDe. As we can see, with the exception of the proposed HyDePre, all comparing methods encountered significant challenges when handling the complex mixed noise scenarios in Case 4 and Case 5.

The comparison between HyDePre (with the proposed adaptive early stopping) and HyDePre* (with manual stopping) indicates that the performance of HyDePre is remarkably close to that of HyDePre*. For instance, in Cases 1∼4 on the Pavia University dataset, the difference in MPSNR is negligible (<0.2 dB), and in Case 5, the metrics are identical. This strongly validates that our proposed adaptive early stopping criterion can accurately identify the optimal trade-off point between noise removal and signal preservation without requiring ground truth references. In summary, HyDePre offers a superior balance of high-quality restoration, robust generalization across varying noise intensities, and practical automation via the adaptive stopping mechanism.

3.5. Effectiveness of the Proposed Early Stopping Strategy

We proposed an adaptive early stopping strategy which aims to stop training at an optimal point to avoid overfitting to noise. To validate the effectiveness of this strategy, experiments were conducted on two simulated datasets, the Washington DC Mall dataset and the Pavia University dataset. The effectiveness of the proposed early stopping strategy can be observed in Figure 6. The figure depicts the relationship between the MPSNR and

Δ

ASMV across training epochs for two cases: Case 1 (Washington DC Mall dataset) and Case 4 (Pavia University dataset). In these experiments, the Warm-up Period was configured to ten epochs with a threshold

γ = 0.07

. Notably, the

Δ

ASMV value surpasses the predefined threshold

γ

after the Warm-up Period, coinciding with the point where the MPSNR value peaks and subsequently begins to decline. This result validates that the proposed early stopping strategy can effectively identify the optimal stopping point for training.

3.6. Sensitivity Analysis of the Regularization Coefficient $λ$

The proposed loss function contains a regularization term weighted by a coefficient

λ

, which controls the strength of the prior imposed during optimization. In all experiments,

λ

is set to

10^{- 7}

, and this value is used consistently across different datasets and noise settings. To examine the sensitivity of the proposed method to this hyperparameter, we conduct a simple sensitivity analysis on simulated data, specifically Case 1 of the Washington DC Mall dataset. We evaluate three representative values,

λ \in {10^{- 6}, 10^{- 7}, 10^{- 8}}

. Figure 7 reports the corresponding training loss and MPSNR curves during training. As shown in Figure 7, when

λ = 10^{- 6}

, the training loss is high and the MPSNR is significantly low, indicating that excessive regularization leads to constant degradation; when

λ = 10^{- 8}

, the training loss is the lowest but the MPSNR decreases as training progresses, suggesting that the network overfits to noise. In contrast,

λ = 10^{- 7}

achieves a favorable trade-off between training stability and denoising performance, yielding the highest MPSNR and a more stable optimization trajectory. Based on this analysis,

λ = 10^{- 7}

is adopted in all experiments, as it effectively balances bias and variance under the current model architecture and data scale, avoids extreme degradation, and leads to more reliable denoising results.

3.7. Ablation Experiments

In addition to the reconstruction loss, our network incorporates an

ℓ_{1}

regularization term, which facilitates implicit feature selection by promoting parameter sparsity and guiding the network toward targeted learning. To evaluate the effectiveness of this regularization term, we conducted ablation experiments by excluding the

ℓ_{1}

regularization term while retaining all other model configurations. The experimental results are reported in Table 4. As shown, removing the

ℓ_{1}

regularization degrades performance across nearly all scenarios, indicating its positive contribution to the overall effectiveness of our method.

4. Experiments on Real Images

4.1. Mars Image

The first real image we used is the Mars image, which was captured over the McLaughlin Crater region on Mars by the CRISM sensor on 10 March 2008. We selected a sub-region of it with a spatial size of

120 \times 120

and 418 spectral bands for the experiment. Figure 8 presents the restoration results of four selected bands (bands 2, 262, 320, and 350). As observed in the comparison, obvious residual vertical stripes remain visible in the restored images of FastHyMix, KBR, FallHyDe, NonRLRS, and HLRTF (especially in bands 262 and 320). DDS2M effectively suppresses the stripe noise but suffers from severe over-smoothing, causing the significant loss of spatial details and blurring the sharp edges of the crater. LRTFR introduces noticeable blocking artifacts. In contrast, the proposed HyDePre method and EGD-Net method achieve the most favorable visual quality, striking a superior balance between noise removal and detail preservation.

4.2. Hyperion Cuprite Image

The second real image we used is the Hyperion Cuprite image, which was captured by the Hyperion sensor in the cuprite area of Nevada, USA. It contains 242 bands, with a 10 nm spectral resolution and a 30 m spatial resolution. We selected a sub-region of it with a spatial size of

200 \times 160

and 177 spectral bands for experiment. Figure 9 presents the restoration results of 4 selected bands (bands 52, 98, 166, and 177). As illustrated in Figure 9, the reconstruction results of DDS2M suffer from excessive smoothing and blurring, while LRTFR introduces distinct blocking artifacts. KBR, FallHyDe, NonRLRS, and EGD-Net fail to fully eliminate stripe noise, with noticeable stripes remaining in bands 52, 166, and 177. While FastHyMix and HLRTF effectively suppress most noise, they compromise fine structural details within the regions highlighted by red dashed ovals (bands 166 and 177). Collectively, this visual evidence validates the robustness of HyDePre in handling complex, real-world hyperspectral degradation.

5. Discussion

Analysis of the Adaptive Early Stopping Strategy

The success of the proposed unsupervised learning framework hinges on the precise termination of the optimization process. To achieve this without ground truth references, we introduce the rate of change in average spectral maximum variation (

Δ

ASMV) as a pseudo-metric to monitor the noise level of the generated HSI (Section 2.4). The ASMV is defined as the mean of the maximum absolute gradients along the spectral dimension for all pixels.

The rationale behind using

Δ

ASMV lies in the intrinsic difference between the spectral properties of natural signals and mixed noise. Natural spectral signatures generally exhibit high autocorrelation and continuity, meaning their first-order differences (gradients) are relatively small and bounded. Conversely, mixed noise—particularly impulse noise and stripes—manifests as high-frequency singularities in the spectral domain. By calculating the maximum variation rather than the average variation per pixel, the ASMV becomes highly sensitive to these anomalous spikes, effectively acting as a detector for high-frequency spectral artifacts. The effectiveness of using

Δ

ASMV can be confirmed by Figure 10.

6. Conclusions

In this paper, we propose a new denoising method for HSIs, HyDePre, which takes advantage of the intrinsic learning preference of neural networks for spectral vectors. At its core, HyDePre utilizes an encoder–decoder architecture to learn the deep 1D spectral prior directly from the noisy data. The elegance of this approach lies in its conceptual simplicity: it requires no pre-trained models or complex parameter tuning, yet demonstrates remarkable effectiveness in separating signal from noise. To validate the effectiveness of our method, experiments on simulated and real datasets were conducted. The experimental results demonstrate the effectiveness and robustness of our proposed method, achieving significant improvements in HSI denoising performance. Beyond denoising, the core principle HyDePre offers a promising paradigm for a broader range of HSI inverse problems such as inpainting, destriping, and super resolution.

By focusing on spectral vectors, HyDePre establishes a concise and fully unsupervised denoising framework that effectively exploits the abundance of spectral samples in hyperspectral data. This design choice leads to a transparent and reproducible methodology with stable optimization behavior and denoising performance that is competitive with most state-of-the-art approaches. As HyDePre is formulated as a single-image optimization problem, the inference stage involves iterative updates for each test image. From an engineering perspective, the resulting computational cost may be mitigated by incorporating meta-learning or transfer learning strategies to provide favorable initialization, while preserving the core architecture and underlying principle of the proposed method. Furthermore, although the present work deliberately emphasizes 1D spectral modeling to maintain conceptual simplicity and robustness, hyperspectral images inherently exhibit rich spatial–spectral structures. Extending HyDePre to integrate spatial context in a controlled and principled manner constitutes a natural direction for future research, which is expected to further enhance denoising performance at the expense of increased model complexity and computational overhead.

Author Contributions

Conceptualization, M.K.N. and L.Z.; methodology, R.Z., M.K.N., M.L. and L.Z.; software, R.Z. and L.Z.; validation, R.Z.; formal analysis, M.K.N.; investigation, R.Z.; data curation, R.Z.; writing—original draft preparation, R.Z. and L.Z.; writing—review and editing, M.K.N. and M.L.; visualization, R.Z. and L.Z.; supervision, M.K.N. and L.Z.; and funding acquisition, L.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China under Grant 42571420. M. Ng’s research was supported by the GDSTC: Guangdong and Hong Kong Universities “1+1+1” Joint Research Collaboration Scheme UICR0800008-24, National Key Research and Development Program of China under Grant 2024YFE0202900, RGC GRF 12300125 and Joint NSFC and RGC N-HKU769/21. M. Ljubenovic’s research was supported by the Ministry of Education, Science and Technological Development of the Republic of Serbia (Grant No. 451-03-136/2025-03/200358).

Data Availability Statement

The datasets generated during the study are available from the corresponding author on reasonable request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Li, J.; Pei, Y.; Zhao, S.; Xiao, R.; Sang, X.; Zhang, C. A review of remote sensing for environmental monitoring in China. Remote Sens. 2020, 12, 1130. [Google Scholar] [CrossRef]
Peng, J.; Wang, H.; Cao, X.; Zhao, Q.; Yao, J.; Zhang, H.; Meng, D. Learnable representative coefficient image denoiser for hyperspectral image. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5506516. [Google Scholar] [CrossRef]
Dabov, K.; Foi, A.; Katkovnik, V.; Egiazarian, K. Image denoising by sparse 3-D transform-domain collaborative filtering. IEEE Trans. Image Process. 2007, 16, 2080–2095. [Google Scholar] [CrossRef]
Zhang, H.; He, W.; Zhang, L.; Shen, H.; Yuan, Q. Hyperspectral image restoration using low-rank matrix recovery. IEEE Trans. Geosci. Remote Sens. 2013, 52, 4729–4743. [Google Scholar] [CrossRef]
Sun, J.; Mao, C.; Yang, Y.; Wang, S.; Xu, S. Hyperspectral Image Denoising Based on Non-Convex Correlated Total Variation. Remote Sens. 2025, 17, 2024. [Google Scholar] [CrossRef]
Peng, J.; Xie, Q.; Zhao, Q.; Wang, Y.; Yee, L.; Meng, D. Enhanced 3DTV regularization and its applications on HSI denoising and compressed sensing. IEEE Trans. Image Process. 2020, 29, 7889–7903. [Google Scholar] [CrossRef]
He, W.; Zhang, H.; Zhang, L.; Shen, H. Total-variation-regularized low-rank matrix factorization for hyperspectral image restoration. IEEE Trans. Geosci. Remote Sens. 2015, 54, 178–188. [Google Scholar] [CrossRef]
Peng, J.; Wang, Y.; Zhang, H.; Wang, J.; Meng, D. Exact decomposition of joint low rankness and local smoothness plus sparse matrices. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 45, 5766–5781. [Google Scholar] [CrossRef]
Peng, J.; Wang, H.; Cao, X.; Liu, X.; Rui, X.; Meng, D. Fast noise removal in hyperspectral images via representative coefficient total variation. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5546017. [Google Scholar] [CrossRef]
Wang, Y.; Peng, J.; Zhao, Q.; Leung, Y.; Zhao, X.L.; Meng, D. Hyperspectral image restoration via total variation regularized low-rank tensor decomposition. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2017, 11, 1227–1243. [Google Scholar] [CrossRef]
Zeng, H.; Huang, S.; Chen, Y.; Luong, H.; Philips, W. All of low-rank and sparse: A recast total variation approach to hyperspectral denoising. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2023, 16, 7357–7373. [Google Scholar] [CrossRef]
Maggioni, M.; Katkovnik, V.; Egiazarian, K.; Foi, A. Nonlocal transform-domain filter for volumetric data denoising and reconstruction. IEEE Trans. Image Process. 2012, 22, 119–133. [Google Scholar] [CrossRef]
Maggioni, M.; Boracchi, G.; Foi, A.; Egiazarian, K. Video denoising, deblocking, and enhancement through separable 4-D nonlocal spatiotemporal transforms. IEEE Trans. Image Process. 2012, 21, 3952–3966. [Google Scholar] [CrossRef]
Wang, S.; Zhang, L.; Liang, Y. Nonlocal spectral prior model for low-level vision. In Asian Conference on Computer Vision (ACCV 2012); Springer: Berlin/Heidelberg, Germany, 2012; pp. 231–244. [Google Scholar] [CrossRef]
Zhuang, L.; Fu, X.; Ng, M.K.; Bioucas-Dias, J.M. Hyperspectral image denoising based on global and nonlocal low-rank factorizations. IEEE Trans. Geosci. Remote Sens. 2021, 59, 10438–10454. [Google Scholar] [CrossRef]
Zheng, Y.B.; Huang, T.Z.; Zhao, X.L.; Chen, Y.; He, W. Double-factor-regularized low-rank tensor factorization for mixed noise removal in hyperspectral image. IEEE Trans. Geosci. Remote Sens. 2020, 58, 8450–8464. [Google Scholar] [CrossRef]
Zhuang, L.; Bioucas-Dias, J.M. Fast hyperspectral image denoising and inpainting based on low-rank and sparse representations. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2018, 11, 730–742. [Google Scholar] [CrossRef]
Xie, T.; Li, S.; Sun, B. Hyperspectral images denoising via nonconvex regularized low-rank and sparse matrix decomposition. IEEE Trans. Image Process. 2019, 29, 44–56. [Google Scholar] [CrossRef] [PubMed]
Xie, Q.; Zhao, Q.; Meng, D.; Xu, Z. Kronecker-basis-representation based tensor sparsity and its applications to tensor recovery. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 40, 1888–1902. [Google Scholar] [CrossRef]
Yuan, Q.; Zhang, Q.; Li, J.; Shen, H.; Zhang, L. Hyperspectral image denoising employing a spatial–spectral deep residual convolutional neural network. IEEE Trans. Geosci. Remote Sens. 2018, 57, 1205–1218. [Google Scholar] [CrossRef]
Zhang, Q.; Yuan, Q.; Li, J.; Liu, X.; Shen, H.; Zhang, L. Hybrid Noise Removal in Hyperspectral Imagery With a Spatial–Spectral Gradient Network. IEEE Trans. Geosci. Remote Sens. 2019, 57, 7317–7329. [Google Scholar] [CrossRef]
Chang, Y.; Yan, L.; Fang, H.; Zhong, S.; Liao, W. HSI-DeNet: Hyperspectral Image Restoration via Convolutional Neural Network. IEEE Trans. Geosci. Remote Sens. 2019, 57, 667–682. [Google Scholar] [CrossRef]
Liu, W.; Lee, J. A 3-D atrous convolution neural network for hyperspectral image denoising. IEEE Trans. Geosci. Remote Sens. 2019, 57, 5701–5715. [Google Scholar] [CrossRef]
Wei, K.; Fu, Y.; Huang, H. 3-D quasi-recurrent neural network for hyperspectral image denoising. IEEE Trans. Neural Netw. Learn. Syst. 2020, 32, 363–375. [Google Scholar] [CrossRef]
Xiong, F.; Zhou, J.; Zhao, Q.; Lu, J.; Qian, Y. MAC-Net: Model-aided nonlocal neural network for hyperspectral image denoising. IEEE Trans. Geosci. Remote Sens. 2021, 60, 5519414. [Google Scholar] [CrossRef]
Bodrito, T.; Zouaoui, A.; Chanussot, J.; Mairal, J. A trainable spectral-spatial sparse coding model for hyperspectral image restoration. Adv. Neural Inf. Process. Syst. 2021, 34, 5430–5442. [Google Scholar]
Chen, H.; Yang, G.; Zhang, H. Hider: A hyperspectral image denoising transformer with spatial–spectral constraints for hybrid noise removal. IEEE Trans. Neural Netw. Learn. Syst. 2022, 35, 8797–8811. [Google Scholar] [CrossRef]
Pang, L.; Gu, W.; Cao, X. TRQ3DNet: A 3D quasi-recurrent and transformer based network for hyperspectral image denoising. Remote Sens. 2022, 14, 4598. [Google Scholar] [CrossRef]
Li, M.; Fu, Y.; Zhang, Y. Spatial-spectral transformer for hyperspectral image denoising. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI 2023), Washington, DC, USA, 7–14 February 2023; Volume 37, pp. 1368–1376. [Google Scholar] [CrossRef]
Lai, Z.; Yan, C.; Fu, Y. Hybrid spectral denoising transformer with guided attention. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV 2023), Paris, France, 1–6 October 2023; pp. 13065–13075. [Google Scholar] [CrossRef]
Li, M.; Liu, J.; Fu, Y.; Zhang, Y.; Dou, D. Spectral enhanced rectangle transformer for hyperspectral image denoising. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2023), Vancouver, BC, Canada, 18–22 June 2023; pp. 5805–5814. [Google Scholar] [CrossRef]
Fu, G.; Xiong, F.; Lu, J.; Zhou, J. SSUMamba: Spatial-spectral selective state space model for hyperspectral image denoising. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5527714. [Google Scholar] [CrossRef]
Wu, Y.; Liu, D.; Zhang, J. SS3L: Self-Supervised Spectral–Spatial Subspace Learning for Hyperspectral Image Denoising. Remote Sens. 2025, 17, 3348. [Google Scholar] [CrossRef]
Sidorov, O.; Yngve Hardeberg, J. Deep hyperspectral prior: Single-image denoising, inpainting, super-resolution. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops (ICCVW 2019), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 3844–3851. [Google Scholar] [CrossRef]
Zhu, H.; Ye, M.; Qiu, Y.; Qian, Y. Self-supervised learning hyperspectral image denoiser with separated spectral-spatial feature extraction. In Proceedings of the IGARSS 2022-2022 IEEE International Geoscience and Remote Sensing Symposium, Kuala Lumpur, Malaysia, 17–22 July 2022; IEEE: New York, NY, USA, 2022; pp. 1748–1751. [Google Scholar] [CrossRef]
Zhuang, L.; Ng, M.K.; Gao, L.; Michalski, J.; Wang, Z. Eigenimage2Eigenimage (E2E): A self-supervised deep learning network for hyperspectral image denoising. IEEE Trans. Neural Netw. Learn. Syst. 2024, 35, 16262–16276. [Google Scholar] [CrossRef] [PubMed]
Luo, Y.; Zhao, X.; Meng, D.; Jiang, T. HLRTF: Hierarchical Low-Rank Tensor Factorization for Inverse Problems in Multi-Dimensional Imaging. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2022), New Orleans, LA, USA, 19–24 June 2022; pp. 19281–19290. [Google Scholar] [CrossRef]
Miao, Y.; Zhang, L.; Zhang, L.; Tao, D. Dds2m: Self-supervised denoising diffusion spatio-spectral model for hyperspectral image restoration. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV 2023), Paris, France, 1–6 October 2023; pp. 12086–12096. [Google Scholar] [CrossRef]
Luo, Y.; Zhao, X.; Li, Z.; Ng, M.K.; Meng, D. Low-Rank Tensor Function Representation for Multi-Dimensional Data Recovery. IEEE Trans. Pattern Anal. Mach. Intell. 2024, 46, 3351–3369. [Google Scholar] [CrossRef]
Pang, L.; Rui, X.; Cui, L.; Wang, H.; Meng, D.; Cao, X. HIR-Diff: Unsupervised hyperspectral image restoration via improved diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2024), Seattle, WA, USA, 17–21 June 2024; pp. 3005–3014. [Google Scholar] [CrossRef]
Saragadam, V.; Balestriero, R.; Veeraraghavan, A.; Baraniuk, R.G. DeepTensor: Low-rank tensor decomposition with deep network priors. IEEE Trans. Pattern Anal. Mach. Intell. 2024, 46, 10337–10348. [Google Scholar] [CrossRef]
Zhuang, L.; Ng, M.K.; Gao, L.; Wang, Z. Eigen-CNN: Eigenimages plus eigennoise level maps guided network for hyperspectral image denoising. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5512018. [Google Scholar] [CrossRef]
Gao, L.; Wang, D.; Zhuang, L.; Sun, X.; Huang, M.; Plaza, A. BS3LNet: A new blind-spot self-supervised learning network for hyperspectral anomaly detection. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5504218. [Google Scholar] [CrossRef]
Ulyanov, D.; Vedaldi, A.; Lempitsky, V. Deep image prior. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2018), Salt Lake City, UT, USA, 18–22 June 2018; pp. 9446–9454. [Google Scholar]
Lehtinen, J.; Munkberg, J.; Hasselgren, J.; Laine, S.; Karras, T.; Aittala, M.; Aila, T. Noise2Noise: Learning Image Restoration without Clean Data. In Proceedings of the 35th International Conference on Machine Learning (PMLR 2018), Stockholm, Sweden, 10–15 July 2018; Dy, J., Krause, A., Eds.; Stockholmsmässan: Stockholm, Sweden, 2018; Volume 80, pp. 2965–2974. [Google Scholar]
Wang, D.; Zhuang, L.; Gao, L.; Sun, X.; Huang, M.; Plaza, A. BockNet: Blind-Block Reconstruction Network With a Guard Window for Hyperspectral Anomaly Detection. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5531916. [Google Scholar] [CrossRef]
Miao, Y.C.; Zhao, X.L.; Fu, X.; Wang, J.L.; Zheng, Y.B. Hyperspectral Denoising Using Unsupervised Disentangled Spatiospectral Deep Priors. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5513916. [Google Scholar] [CrossRef]
Chakrabarty, P.; Maji, S. The spectral bias of the deep image prior. arXiv 2019, arXiv:1912.08905. [Google Scholar] [CrossRef]
Rahaman, N.; Baratin, A.; Arpit, D.; Draxler, F.; Lin, M.; Hamprecht, F.; Bengio, Y.; Courville, A. On the Spectral Bias of Neural Networks. In Proceedings of the 36th International Conference on Machine Learning (PMLR 2019), Long Beach, CA, USA, 9–15 June 2019; Chaudhuri, K., Salakhutdinov, R., Eds.; Voluem 97, pp. 5301–5310. [Google Scholar]
John Xu, Z.Q.; Zhang, Y.; Luo, T.; Xiao, Y.; Ma, Z. Frequency Principle: Fourier Analysis Sheds Light on Deep Neural Networks. Commun. Comput. Phys. 2020, 28, 1746–1767. [Google Scholar] [CrossRef]
Shi, Z.; Mettes, P.; Maji, S.; Snoek, C.G. On measuring and controlling the spectral bias of the deep image prior. Int. J. Comput. Vis. 2022, 130, 885–908. [Google Scholar] [CrossRef]
Fridovich-Keil, S.; Gontijo Lopes, R.; Roelofs, R. Spectral bias in practice: The role of function frequency in generalization. Adv. Neural Inform. Process. Syst. 2022, 35, 7368–7382. [Google Scholar]
Zhuang, L.; Ng, M.K. FastHyMix: Fast and parameter-free hyperspectral image mixed noise removal. IEEE Trans. Neural Netw. Learn. Syst. 2021, 34, 4702–4716. [Google Scholar] [CrossRef] [PubMed]
Hochreiter, S.; Schmidhuber, J. Long Short-term Memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
Heckel, R.; Soltanolkotabi, M. Denoising and regularization via exploiting the structural bias of convolutional generators. arXiv 2019, arXiv:1910.14634. [Google Scholar]
Chen, Y.; Zeng, J.; He, W.; Zhao, X.L.; Jiang, T.X.; Huang, Q. Fast large-scale hyperspectral image denoising via non-iterative low-rank subspace representation. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5530714. [Google Scholar] [CrossRef]
Wang, Z.; Bovik, A.C.; Sheikh, H.R.; Simoncelli, E.P. Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Process. 2004, 13, 600–612. [Google Scholar] [CrossRef]
Zhang, L.; Zhang, L.; Mou, X.; Zhang, D. FSIM: A feature similarity index for image quality assessment. IEEE Trans. Image Process. 2011, 20, 2378–2386. [Google Scholar] [CrossRef]
Wang, Z.; Zhuang, L.; Michalski, J.R.; Ng, M.K. EGD-Net: Eigenimage Guided Diffusion Network for Hyperspectral Mixed Noise Removal. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2025, 18, 17197–17213. [Google Scholar] [CrossRef]

Figure 1. The encoder–decoder architecture of the proposed noise removal method, HyDePre.

Figure 2. Illustration of why spectral domain modeling better handles structured noise. (a) Clean, noisy, and reconstructed spectral curves of a representative pixel, showing that the proposed method prioritizes smooth spectral trends while suppressing sharp noise-induced fluctuations. (b) Noisy image of Band 32 exhibiting strong spatially structured noise. (c) Corresponding clean image of Band 32. (d) Denoising result of Band 32 using a spatial Deep Image Prior, where structured noise is partially preserved. (e) Band 32 extracted from the hyperspectral image denoised by the proposed HyDePre method, showing effective suppression of structured noise.

Figure 3. Spectral maximum variation (defined in (11)) histograms corresponding to the clean image (a) and the noisy image (b), under Case 1 on the Washington DC Mall dataset.

Figure 4. Denoising results of Washington DC Mall dataset in Cases 1∼5. False-color image is synthesized from bands 92, 52, and 1 (R: 92, G: 52, and B: 1).

Figure 5. Denoising results of Pavia University dataset in Cases 1∼5. False-color image is synthesized from bands 87, 32, and 18 (R: 87, G: 32, and B: 18).

Figure 6. Illustration of the core principle behind our early stopping strategy. The figure depicts the relationship between the MPSNR and

Δ

ASMV across training epochs for two cases: (a) Case 1 on Washington DC Mall dataset and (b) Case 4 on Pavia University dataset. Crucially, the

Δ

ASMV value surpasses the predefined threshold

γ

after the Warm-up Period, coinciding with the point where the MPSNR value peaks and subsequently begins to decline. This temporal alignment validates that

Δ

ASMV effectively signals the onset of overfitting, functioning as a robust, reference-free proxy for determining the optimal training termination point.

Figure 6. Illustration of the core principle behind our early stopping strategy. The figure depicts the relationship between the MPSNR and

Δ

ASMV across training epochs for two cases: (a) Case 1 on Washington DC Mall dataset and (b) Case 4 on Pavia University dataset. Crucially, the

Δ

ASMV value surpasses the predefined threshold

γ

after the Warm-up Period, coinciding with the point where the MPSNR value peaks and subsequently begins to decline. This temporal alignment validates that

Δ

ASMV effectively signals the onset of overfitting, functioning as a robust, reference-free proxy for determining the optimal training termination point.

Figure 7. Sensitivity analysis of

λ

in terms of (a) training loss and (b) MPSNR.

Figure 7. Sensitivity analysis of

λ

in terms of (a) training loss and (b) MPSNR.

Figure 8. Reconstruction results of different methods on real image: Mars. The time cost of each is as follows: FastHyMix (23 s), KBR (158 s), FallHyDe (0.2 s), NonRLRS (43 s), DDS2M (1128 s), HLRTF (80 s), LRTFR (96 s), EGD-Net (10 s), and HyDePre (31 s).

Figure 9. Reconstruction results of different methods on real image: Hyperion Cuprite. The time cost of each is as follows: FastHyMix (5 s), KBR (152 s), FallHyDe (0.2 s), NonRLRS (92 s), DDS2M (1128 s), HLRTF (51 s), LRTFR (113 s), EGD-Net (8 s), and HyDePre (32 s).

Figure 10. Evolution of the average spectral maximum variation (ASMV) during unsupervised training (Case 1, Washington DC Mall Dataset). (a) The

Δ

ASMV trajectory across iterations. SMV heatmaps visualize two critical checkpoints: (c) the proposed adaptive stopping point and (d) the overfitting stage. The clean image’s SMV map (b) and the noisy input’s SMV map (e) are provided for reference. (f–i) Mean residual maps of the clean image, the reconstructions at epochs 15 and 60, and the noisy input. Residual maps are obtained by calculating

\frac{1}{n_{b}} \sum_{b = 1}^{n_{b}} | \hat{X} (:, :, b) - X (:, :, b) |

and

\frac{1}{n_{b}} \sum_{b = 1}^{n_{b}} | Y (:, :, b) - X (:, :, b) |

, in which the

\hat{X}

,

Y

and

X

indicate the reconstructed HSI image, noisy image, and clean image, respectively. And the residual is obtained by calculating the total sum of the entire residual map. It can be observed that the ASMV at the stopping point (see (c,g)) effectively captures the spectral edges while suppressing the high-frequency variations caused by noise, whereas prolonged training (see (d,h)) results in overfitting to noise patterns.

Figure 10. Evolution of the average spectral maximum variation (ASMV) during unsupervised training (Case 1, Washington DC Mall Dataset). (a) The

Δ

ASMV trajectory across iterations. SMV heatmaps visualize two critical checkpoints: (c) the proposed adaptive stopping point and (d) the overfitting stage. The clean image’s SMV map (b) and the noisy input’s SMV map (e) are provided for reference. (f–i) Mean residual maps of the clean image, the reconstructions at epochs 15 and 60, and the noisy input. Residual maps are obtained by calculating

\frac{1}{n_{b}} \sum_{b = 1}^{n_{b}} | \hat{X} (:, :, b) - X (:, :, b) |

and

\frac{1}{n_{b}} \sum_{b = 1}^{n_{b}} | Y (:, :, b) - X (:, :, b) |

, in which the

\hat{X}

,

Y

and

X

indicate the reconstructed HSI image, noisy image, and clean image, respectively. And the residual is obtained by calculating the total sum of the entire residual map. It can be observed that the ASMV at the stopping point (see (c,g)) effectively captures the spectral edges while suppressing the high-frequency variations caused by noise, whereas prolonged training (see (d,h)) results in overfitting to noise patterns.

Table 1. Network architecture and training configuration of HyDePre.

d	p	Optimizer	Learning Rate	Batch Size	$λ$
128	8	Adam	0.001	64	$10^{- 7}$

d: hidden state dimensionality of each LSTM layer in both encoder and decoder; p: dimensionality of the latent representation; and

λ

: regularization coefficient used in the loss function.

Table 2. Quantitative assessment of the proposed and comparison methods on Washington DC Mall Dataset in different cases.

Case	Metric	Noisy	FastHyMix (TNNLS, 2021) [53]	KBR (TPAMI, 2017) [19]	FallHyDe (TGRS, 2024) [56]	NonRLRS (TIP, 2019) [18]	DDS2M (ICCV, 2023) [38]	HLRTF (CVPR, 2022) [37]	LRTFR (TPAMI, 2024) [39]	EGD-Net (JSTARS, 2025) [59]	HyDePre* (Proposed)	HyDePre (Proposed)
Case 1	MPSNR	20.10	32.72	33.70	30.63	32.08	32.06	34.03	31.78	33.55	34.74	34.74
	MSSIM	0.5673	0.9377	0.9600	0.8915	0.9529	0.8670	0.9642	0.8493	0.9572	0.9737	0.9737
	MFSIM	0.7913	0.9645	0.9791	0.9535	0.9750	0.9277	0.9836	0.9212	0.9753	0.9865	0.9865
	Time	-	5	63	0.2	202	1194	21	64	7	89	89
Case 2	MPSNR	16.38	30.56	29.56	25.53	28.58	29.27	29.03	30.26	30.65	30.39	30.39
	MSSIM	0.3705	0.9280	0.9103	0.7448	0.8980	0.7687	0.9152	0.8064	0.9258	0.9343	0.9343
	MFSIM	0.6846	0.9667	0.9557	0.8893	0.9525	0.8730	0.9613	0.9006	0.9663	0.9702	0.9702
	Time	-	3	47	0.2	97	2507	22	60	7	183	183
Case 3	MPSNR	17.68	31.83	31.72	27.33	29.86	29.81	32.22	30.59	31.69	32.53	32.44
	MSSIM	0.4637	0.9182	0.9403	0.8237	0.9290	0.7842	0.9482	0.8026	0.9497	0.9583	0.9601
	MFSIM	0.7395	0.9585	0.9701	0.9255	0.9638	0.8823	0.9763	0.8960	0.9774	0.9803	0.9801
	Time	-	3	76	0.2	52	1202	20	61	7	68	51
Case 4	MPSNR	17.28	28.04	30.36	26.74	29.62	28.56	30.13	30.11	28.13	30.93	30.62
	MSSIM	0.4420	0.8795	0.9299	0.8075	0.9253	0.7992	0.9421	0.8185	0.8771	0.9509	0.9488
	MFSIM	0.7106	0.9380	0.9696	0.9196	0.9647	0.8961	0.9762	0.9058	0.9388	0.9782	0.9776
	Time	-	3	90	0.1	90	2389	20	61	7	198	214
Case 5	MPSNR	17.25	29.29	32.08	27.58	31.12	29.13	31.57	31.29	28.66	32.53	32.53
	MSSIM	0.4538	0.9164	0.9500	0.8625	0.9447	0.8074	0.9539	0.8380	0.9106	0.9615	0.9615
	MFSIM	0.7219	0.9577	0.9754	0.9438	0.9725	0.9000	0.9805	0.9157	0.9548	0.9808	0.9808
	Time	-	3	87	0.2	45	1203	37	60	7	59	59
Mean	MPSNR	17.74	30.49	31.48	27.56	30.25	29.77	31.40	30.81	30.54	32.22	32.14
	MSSIM	0.4595	0.9160	0.9381	0.8260	0.9300	0.8053	0.9447	0.8230	0.9241	0.9557	0.9557
	MFSIM	0.7296	0.9571	0.9700	0.9257	0.9657	0.8958	0.9756	0.9079	0.9625	0.9792	0.9790
	Time	-	3	73	0.2	97	1699	24	61	7	119	119

Note: The best and the second best results are highlighted in bold and underline, respectively.

Table 3. Quantitative assessment of the proposed and comparison methods on Pavia University dataset in different cases.

Case	Metric	Noisy	FastHyMix (TNNLS, 2021) [53]	KBR (TPAMI, 2017) [19]	FallHyDe (TGRS, 2024) [56]	NonRLRS (TIP, 2019) [18]	DDS2M (ICCV, 2023) [38]	HLRTF (CVPR, 2022) [37]	LRTFR (TPAMI, 2024) [39]	EGD-Net (JSTARS, 2025) [59]	HyDePre* (Proposed)	HyDePre (Proposed)
Case 1	MPSNR	24.30	34.93	35.70	27.14	34.09	30.85	33.64	30.38	35.37	35.56	35.44
	MSSIM	0.4448	0.9291	0.9327	0.6432	0.9302	0.8345	0.9158	0.8270	0.9411	0.9357	0.9358
	MFSIM	0.7357	0.9617	0.9716	0.8043	0.9680	0.8846	0.9617	0.8914	0.9686	0.9735	0.9736
	Time	-	7	161	0.3	56	2009	55	161	8	349	334
Case 2	MPSNR	22.95	34.16	33.86	24.41	33.32	28.23	32.15	29.97	33.96	33.98	33.95
	MSSIM	0.3992	0.9091	0.9166	0.5417	0.9005	0.7692	0.9048	0.8150	0.9018	0.9219	0.9220
	MFSIM	0.6963	0.9673	0.9672	0.7445	0.9580	0.8596	0.9592	0.8841	0.9655	0.9729	0.9730
	Time	-	7	162	0.4	74	2117	55	158	8	374	358
Case 3	MPSNR	26.44	40.81	35.78	28.88	36.37	29.56	37.28	31.55	37.19	39.02	38.99
	MSSIM	0.6031	0.9786	0.9330	0.6735	0.9520	0.7907	0.9564	0.8621	0.9504	0.9680	0.9678
	MFSIM	0.8281	0.9895	0.9673	0.8357	0.9756	0.8687	0.9799	0.9097	0.9788	0.9883	0.9881
	Time	-	10	163	0.3	76	2027	55	157	8	646	661
Case 4	MPSNR	21.43	29.40	30.54	23.33	30.13	26.74	29.00	27.62	29.69	30.94	30.78
	MSSIM	0.3752	0.8195	0.8154	0.4999	0.8294	0.7460	0.7687	0.7481	0.8185	0.8716	0.8753
	MFSIM	0.6549	0.9298	0.9169	0.7142	0.9260	0.8695	0.8827	0.8525	0.9295	0.9480	0.9560
	Time	-	7	158	0.5	71	2037	55	156	8	219	168
Case 5	MPSNR	22.14	32.53	36.50	29.22	35.85	28.20	35.27	31.02	31.32	36.76	36.76
	MSSIM	0.4428	0.8888	0.9581	0.7984	0.9535	0.8061	0.9501	0.8485	0.8910	0.9534	0.9534
	MFSIM	0.7037	0.9606	0.9833	0.9243	0.9790	0.9054	0.9802	0.9031	0.9652	0.9837	0.9837
	Time	-	21	245	0.4	63	2056	55	157	8	486	486
Mean	MPSNR	23.45	34.37	34.48	26.60	33.95	28.72	33.47	30.11	33.51	35.25	35.18
	MSSIM	0.4530	0.9050	0.9112	0.6313	0.9131	0.7893	0.8992	0.8201	0.9006	0.9301	0.9309
	MFSIM	0.7237	0.9618	0.9613	0.8046	0.9613	0.8776	0.9527	0.8882	0.9615	0.9733	0.9749
	Time	-	10	178	0.4	68	2049	55	158	8	415	401

Note: The best and the second best results are highlighted in bold and underline, respectively.

Table 4. Performance comparison of the proposed HyDePre* with and without

ℓ_{1}

regularization in different cases.

Table 4. Performance comparison of the proposed HyDePre* with and without

ℓ_{1}

regularization in different cases.

	HyDePre* w/o $ℓ_{1}$ Regularization			HyDePre*
Washington DC Mall Dataset
	MPSNR	MSSIM	MFSIM	MPSNR	MSSIM	MFSIM
Case 1	34.58	0.9742	0.9860	34.74	0.9737	0.9865
Case 2	30.07	0.9353	0.9708	30.39	0.9343	0.9702
Case 3	32.30	0.9582	0.9797	32.53	0.9583	0.9803
Case 4	30.78	0.9418	0.9728	30.93	0.9509	0.9782
Case 5	32.21	0.9602	0.9807	32.53	0.9615	0.9808
Mean	31.99	0.9539	0.9780	32.22	0.9557	0.9792
Pavia University dataset
Case 1	35.10	0.9182	0.9688	35.56	0.9357	0.9735
Case 2	33.56	0.9188	0.9723	33.98	0.9219	0.9729
Case 3	38.70	0.9672	0.9880	39.02	0.9680	0.9883
Case 4	30.69	0.8739	0.9537	30.94	0.8716	0.9480
Case 5	36.18	0.9501	0.9830	36.76	0.9534	0.9837
Mean	34.85	0.9256	0.9732	35.25	0.9301	0.9733

Note: The best results are highlighted in bold.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhang, R.; Ng, M.K.; Ljubenovic, M.; Zhuang, L. Unsupervised Hyperspectral Image Denoising via Spectral Learning Preference of Neural Networks. Remote Sens. 2026, 18, 742. https://doi.org/10.3390/rs18050742

AMA Style

Zhang R, Ng MK, Ljubenovic M, Zhuang L. Unsupervised Hyperspectral Image Denoising via Spectral Learning Preference of Neural Networks. Remote Sensing. 2026; 18(5):742. https://doi.org/10.3390/rs18050742

Chicago/Turabian Style

Zhang, Ruobing, Michael K. Ng, Marina Ljubenovic, and Lina Zhuang. 2026. "Unsupervised Hyperspectral Image Denoising via Spectral Learning Preference of Neural Networks" Remote Sensing 18, no. 5: 742. https://doi.org/10.3390/rs18050742

APA Style

Zhang, R., Ng, M. K., Ljubenovic, M., & Zhuang, L. (2026). Unsupervised Hyperspectral Image Denoising via Spectral Learning Preference of Neural Networks. Remote Sensing, 18(5), 742. https://doi.org/10.3390/rs18050742

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Unsupervised Hyperspectral Image Denoising via Spectral Learning Preference of Neural Networks

Highlights

Abstract

1. Introduction

1.1. Related Work

1.2. Contributions

2. Methodology

2.1. Problem Formulation

2.2. Network Learning Preference for HSI Denoising

2.3. Why Deep Image Prior Works Better in the Spectral Domain

2.4. Adaptive Early Stopping Strategy

2.5. Comparative Analysis of Deep 1D Spectral Prior and DIP

3. Experiments on Simulated Images

3.1. Noise Simulation

3.2. Network Architecture and Training Configuration

3.3. Comparisons and Evaluation Metrics

3.4. Experimental Results

3.5. Effectiveness of the Proposed Early Stopping Strategy

3.6. Sensitivity Analysis of the Regularization Coefficient $λ$

3.7. Ablation Experiments

4. Experiments on Real Images

4.1. Mars Image

4.2. Hyperion Cuprite Image

5. Discussion

Analysis of the Adaptive Early Stopping Strategy

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Unsupervised Hyperspectral Image Denoising via Spectral Learning Preference of Neural Networks

Highlights

Abstract

1. Introduction

1.1. Related Work

1.2. Contributions

2. Methodology

2.1. Problem Formulation

2.2. Network Learning Preference for HSI Denoising

2.3. Why Deep Image Prior Works Better in the Spectral Domain

2.4. Adaptive Early Stopping Strategy

2.5. Comparative Analysis of Deep 1D Spectral Prior and DIP

3. Experiments on Simulated Images

3.1. Noise Simulation

3.2. Network Architecture and Training Configuration

3.3. Comparisons and Evaluation Metrics

3.4. Experimental Results

3.5. Effectiveness of the Proposed Early Stopping Strategy

3.6. Sensitivity Analysis of the Regularization Coefficient λ

3.7. Ablation Experiments

4. Experiments on Real Images

4.1. Mars Image

4.2. Hyperion Cuprite Image

5. Discussion

Analysis of the Adaptive Early Stopping Strategy

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

3.6. Sensitivity Analysis of the Regularization Coefficient $λ$