DNMF-AG: A Sparse Deep NMF Model with Adversarial Graph Regularization for Hyperspectral Unmixing

Qu, Kewen; Luo, Xiaojuan; Bao, Wenxing

doi:10.3390/rs18010155

Open AccessArticle

DNMF-AG: A Sparse Deep NMF Model with Adversarial Graph Regularization for Hyperspectral Unmixing

by

Kewen Qu

^1,2,*

,

Xiaojuan Luo

^1,2 and

Wenxing Bao

^1,2

¹

The School of Computer Science and Engineering, North Minzu University, Yinchuan 750021, China

²

The Image and Intelligence Information Processing Innovation Team of the National Ethnic Affairs Commission of China, Yinchuan 750021, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2026, 18(1), 155; https://doi.org/10.3390/rs18010155

Submission received: 26 November 2025 / Revised: 29 December 2025 / Accepted: 31 December 2025 / Published: 3 January 2026

(This article belongs to the Special Issue Recent Advances in Hyperspectral Remote Sensing: Theories, Technologies and Applications)

Download

Browse Figures

Versions Notes

Highlights

What are the main findings?

DNMF-AG integrates sparse deep decomposition with adversarial graph regularization, achieving stronger spatial structural modeling and improved endmember–abundance separation.
The combination of adversarial graphs, sparsity constraints, and a truncated activation function enhances noise robustness and yields more accurate abundance estimates.

What are the implications of the main findings?

The proposed method addresses key limitations of deep NMF in noisy environments, including insufficient structural exploitation and weak sparsity, resulting in a more robust and interpretable unmixing framework.
The improved reliability of endmember and abundance estimation provides a more accurate foundation for subsequent hyperspectral tasks such as target detection, classification, and change detection.

Abstract

Hyperspectral unmixing (HU) aims to extract constituent information from mixed pixels and is a fundamental task in hyperspectral remote sensing. Deep non-negative matrix factorization (DNMF) has recently attracted attention for HU due to its hierarchical representation capability. However, existing DNMF-based methods are often sensitive to noise and outliers, and face limitations in incorporating prior knowledge, modeling feature structures, and enforcing sparsity constraints, which restrict their robustness, accuracy, and interpretability. To address these challenges, we propose a sparse deep NMF model with adversarial graph regularization for hyperspectral unmixing, termed DNMF-AG. Specifically, we design an adversarial graph regularizer that integrates local similarity and dissimilarity graphs to promote intraclass consistency and interclass separability in the spatial domain, thereby enhancing structural modeling and robustness. In addition, a Gram-based sparsity constraint is introduced to encourage sparse abundance representations by penalizing inner product correlations. To further improve robustness and computational efficiency, a truncated activation function is incorporated into the iterative update process, suppressing low-amplitude components and promoting zero entries in the abundance matrix. The overall model is optimized using the alternating direction method of multipliers (ADMM). Experimental results on multiple synthetic and real datasets demonstrate that the proposed method outperforms state-of-the-art approaches in terms of estimation accuracy and robustness.

Keywords:

spectral image; hyperspectral unmixing; deep non-negative matrix factorization; adversarial graph learning; gram sparsity

1. Introduction

With the rapid development of hyperspectral imaging technology, hyperspectral sensors can capture narrow and continuous spectral information from ground objects, which not only yields improved ground observation accuracy but also broadens the application potential of hyperspectral remote sensing images (HSI) in fields such as agriculture, medical analysis, and environmental monitoring [1]. However, owing to the limited spatial resolutions of sensors, the phenomenon of “mixed pixels” is common in HSI, where a single pixel often contains mixed spectral information derived from numerous distinct materials. This phenomenon severely limits the precise applicability of HSI in tasks such as clustering [2], fine classification [3], and target detection [4]. Therefore, HU, whose main purpose is to identify pure objects (i.e., endmembers) and their corresponding proportions (i.e., abundances) [5] from mixed pixels, has become an important step for processing HSI.

To date, researchers have proposed many methods for solving the HU problem. These methods can be divided into two main categories: linear mixture models (LMM) [6] and nonlinear mixture models (NLMM) [7]. An LMM assumes that the mixing of ground materials occurs at a macroscopic scale, where each incident solar photon interacts with only a single type of material. In contrast, an NLMM holds that after the incident solar light reaches the ground, it refracts with various ground objects and then reflects to the sensor. Owing to the advantages of LMMs, such as their simple expression forms, strong operability, and clear physical meaning, they are still some of the most widely used models in HU research.

In recent years, many methods based on LMMs have been proposed; these approaches can be classified into geometric, statistical, sparse regression, tensor, and neural network (NN) methods. Among them, the sparse regression (SR) method [8] is a semisupervised method based on the theory of compressed sensing. It assumes that the observed hyperspectral data can be represented by a linear combination of atoms in a preknown spectral library, thereby transforming the unmixing problem into the process of finding the optimal endmember subset in the spectral library for modeling the hyperspectral data. Since the SR method does not require endmember extraction or endmember quantity estimation and can efficiently utilize the rich atomic information contained in the spectral library, it has received extensive attention in practical applications [9]. Ref. [10] first integrated SR into HU (SUnSAL) and used

L_{1, 1}

regularization to constrain the abundance matrix, obtaining ideal abundance estimates. On this basis, a total variation (TV) regularization term was incorporated into SUnSAL, resulting in piecewise smooth abundance maps and significantly improving the robustness of the constructed model [11]. Inspired by this, many SR methods with manually set abundance regularization have emerged, and all of them have produced ideal results [12,13,14]. However, these methods have problems such as high spectral library coherence and high computational costs. Ref. [15] proposed a two-stage optimization algorithm, NeSU-LP, that combines adaptive spectral library pruning and the Nesterov acceleration strategy, providing significantly improved unmixing efficiency. Overall, although the SR method avoids the complexity of endmember extraction, its performance is limited by the large scale and high coherence of the employed spectral library.

Geometric methods are based on convex geometry theory, which holds that the vertices of the simplex formed by hyperspectral data in the given feature space are the endmembers. Therefore, the process of endmember extraction is transformed into the process of identifying the vertices of the simplex. This category mainly includes pure pixel methods and nonpure pixel methods. Among them, a pure pixel algorithm assumes that each endmember has at least one pure pixel in the associated hyperspectral image. Typical algorithms include pixel purity index (PPI) [16], N-finder algorithm (N-FINDR) [17], vertex component analysis (VCA) [18], etc. Nonpure pixel methods are based on the minimum simplex theory [19] and generate endmembers by estimating the vertices of the simplex; thus, they can handle highly mixed data.

Statistical methods are rooted in probability theory and mathematical statistics theory, and they have the advantage of not relying on pure pixel assumptions. Among them, non-negative matrix factorization (NMF) is a typical representative [20]. NMF decomposes a non-negative data matrix into the product of two non-negative low-rank submatrices that correspond to the basis matrix and the coefficient matrix [21,22,23]. However, owing to the nonconvexity of its objective function, the algorithm easily falls into local minima during the optimization process. To address this issue, researchers have integrated many prior constraints into NMF to enhance the physical interpretability and robustness of the model; these constraints include abundance sparsity constraints [24], piecewise smooth constraints [25], and manifold regularization [26]. On the basis of the sparse prior, ref. [27] proposed sparse-constraint NMF by enhancing the abundance sparsity of the model through

L_{1 / 2}

regularization and proved that

L_{1 / 2}

regularization is easier to solve than

L_{0}

regularization is and has stronger sparsity than

L_{1}

regularization does, making it a better alternative to

L_{p}

(0 < p < 1)

regularization. In addition, Yuan and Li et al. proposed a region-space adaptive total variation (RSATV) model [28] to address the problem that TV regularization is prone to introducing false edges in flat image regions. By combining regional k-means clustering and dual filtering mechanisms, the spatial consistency and robustness of the constructed abundance map can be improved while preserving edge information. Furthermore, Qu and Bao [29] proposed a multiprior set constraint-based NMF unmixing method, which integrates the minimum volume constraint (MVC), a reweighted

L_{1 / 2}

norm, and TV regularization into the NMF framework and imposes constraints on the endmember and abundance matrices in the NMF model, thereby overcoming the limitations of traditional single-matrix constraint methods.

Since NMF usually expands three-dimensional data into a two-dimensional matrix when processing HSIs, this expansion process inevitably leads to the loss of the original spatial structure information. To solve this problem, researchers have proposed non-negative tensor factorization (NTF). NTF methods are based mainly on tensor algebra theory and can be roughly classified into four categories according to their different decomposition strategies. First, canonical polyadic decomposition (CPD) [30], which represents a tensor as the sum of several rank-one tensors, is suitable for depicting simple structures. Second, Tucker decomposition [31], which uses a core tensor and multiple factor matrices to achieve low-rank approximation, has stronger representational capabilities. Third, block tensor decomposition (BTD) [32] retains the local structure of the target tensor while enhancing the interpretability of the model. Fourth, matrix-vector NTF (MV-NTF) [33] effectively improves the computational efficiency and adaptability of the model by combining tensor decomposition and matrix decomposition strategies. Compared with matrix decomposition, NTF can more effectively preserve the original intrinsic spatial structure of HSIs. However, how to reasonably determine the rank of the target tensor is still an important challenge for this method.

With the rapid development of deep learning, especially the outstanding achievements of convolutional NNs in tasks such as hyperspectral image classification and HU [34,35], researchers have begun to integrate multilayer deep structures into the NMF framework. They aim to construct NMF unmixing methods with multiple levels and high representational capabilities. Ref. [36] proposed multilayer NMF (MLNMF), which iteratively decomposes the target observation matrix into multiple layers to achieve more levels of feature extraction. However, MLNMF is essentially only a series of sequential single-layer decompositions; it lacks cross-layer information feedback and fusion and fails to give full play to the hierarchical modeling advantages of deep structures. To enhance the modeling capability of data structures in HU, ref. [37] proposed a DNMF method incorporating

L_{1 / 2}

sparsity constraint and TV regularization, referred to as SDNMF-TV. By leveraging a hierarchical pretraining and fine-tuning strategy, this method effectively captures the layered structural information of hyperspectral data, thereby improving the estimation accuracy of both endmembers and abundances. Building upon this, ref. [38] addressed the performance degradation of deep NMF methods under noise interference by introducing an

L_{1 / 2}

-norm constraint. The resulting robust DNMF model significantly enhances the model’s resilience to outlier noise, confirming its superior unmixing performance. Moreover, the self-supervised robust deep matrix decomposition model (SSRDMF) jointly optimizes the endmember and abundance estimation processes through an encoder–decoder architecture in combination with sparse noise modeling and self-supervised constraints, significantly enhancing the robustness of the unmixing model against Gaussian noise and sparse noise and further improving its unmixing performance [39]. In summary, the DNMF method has significant advantages in terms of achieving improved HU accuracy and antinoise performance because of its hierarchical modeling capabilities, adaptive optimization mechanism, and regularization strategy.

On the basis of the above analysis, considering the superior characteristics of DNMF in hierarchical feature learning tasks and drawing on the ideas of deep learning, an adversarial graph regularization and Gram sparsity-constrained DNMF method for spectral unmixing is proposed in this paper. Specifically, our method improves and enhances the existing model from three perspectives—spatial structure preservation, sparse representation, and robustness—by introducing graph structure priors and sparsity mechanisms (as shown in Figure 1). First, we utilize the manifold graph learning and adversarial learning concepts to design an adversarial graph regularization method composed of a local similarity graph and a heterogeneous repulsion graph, which effectively enhances the continuity within classes and the separability between classes during abundance estimation, thereby improving the ability of the model to identify complex spatial structures and class boundaries. Second, to effectively characterize the sparsity of the abundance components contained in multilayer structures, we introduce a novel sparsity constraint based on inner product penalties, which guides the task of generating unmixing results by penalizing the Gram matrix of abundances, enhancing the interpretability and discrimination of the model. Finally, the designed adversarial graph regularization scheme and Gram sparse constraint are uniformly integrated into the deep robust NMF model to further increase its robustness and accuracy. Additionally, during the iterative process of deep robust NMF, we innovatively introduce a truncated activation function, which truncates the low-amplitude terms in the hierarchical iterative matrix to further promote the generation of zero values in the abundance matrix, thereby enhancing the robustness of the model to noise and outliers while improving its computational efficiency and interpretability.

The main contributions of this study are summarized as follows.

(1): A robust DNMF framework is proposed, integrating adversarial graph regularization, a Gram sparsity constraint, and a reconstruction strategy that incorporates an improved ReLU function and the $L_{2, 1}$ -norm, thereby enhancing the expressive capacity, sparsity, nonlinearity modeling, and decomposition robustness of the multilayer unmixing model when dealing with complex hyperspectral data.
(2): An adversarial graph regularization method that considers a local similarity graph and a heterogeneous repulsion graph is designed; this approach dynamically balances the intraclass smoothness and interclass distinctiveness of the abundance matrix. Moreover, a sparse constraint based on the Gram inner product is introduced to guide the process of generating abundance components in each layer, further enhancing the sparse representation ability of the model and improving the interpretability of multilayer factor decomposition.
(3): The proposed model is solved using the alternating direction method of multipliers (ADMM), which is adopted as a standard numerical optimization technique to efficiently decompose and solve the resulting constrained optimization problem.

The remainder of this article is organized as follows. Section 2 introduces the relevant basic models. Section 3 describes the model of the proposed algorithm and its optimization algorithm. Section 4 describes and analyzes the results obtained on synthetic datasets and real datasets. Section 5 provides some conclusions and directions for future research.

2. Background

2.1. Linear Mixing Model (LMM)

In this work, all matrices are represented by capital bold letters, vectors are denoted by lowercase letters, and scalars are signified by capital letters.

The classic LMM assumes that the spectra of distinct materials do not interfere with each other. Therefore, a mixed pixel can be represented by a linear combination of the endmember spectra and the associated coefficient (representation abundance). The LMM is defined as follows:

X = A S + N

(1)

where

X \in R^{B \times N}

denotes an observation matrix with B bands and N pixels,

A \in R^{B \times P}

denotes the endmember matrix, P represents the number of endmembers,

S \in R^{P \times N}

denotes the abundance matrix, and

N \in R^{B \times N}

denotes the noise matrix. In the LMM, the abundance matrix must satisfy the non-negativity constraint (ANC) and the sum-to-one constraint (ASC):

\begin{matrix} A N C : S_{i j} \geq 0 (i = 1, 2, \dots P, j = 1, 2, \dots N) \end{matrix}

A S C : \sum_{i = 1}^{P} S_{i j} = 1 (j = 1, 2, \dots, N)

(2)

2.2. Non-Negative Matrix Factorization (NMF)

NMF is a widely used method in data analysis tasks, especially when the non-negative constraint is considered. The main purpose of NMF is to decompose a non-negative HSI matrix

X \in R^{B \times N}

into the product of two non-negative low-rank submatrices. NMF can be written as follows:

X \approx AS

(3)

To measure the quality of the approximation results, the objective function usually minimizes the Euclidean distance between

X

and

AS

; i.e.,

\min_{A \geq 0, S \geq 0} F_{NMF} (A, S) = \frac{1}{2} {∥ X - AS ∥}_{F}^{2}

(4)

where the operator

{∥\cdot∥}_{F}

denotes the Frobenius norm and ≥ indicates a non-negativity constraint imposed on the values.

When exploring various optimization algorithms, the multiplicative update rule (MUR) is widely used because of its computational simplicity. Minimizing the loss function in (4) via the MUR yields the following update rule:

A \leftarrow A \cdot * ({XS}^{T}) \cdot / ({ASS}^{T})

(5)

S \leftarrow S \cdot * (A^{T} X) \cdot / (A^{T} AS)

(6)

where the operators

\cdot *

and

\cdot /

represent elementwise multiplication and division, respectively.

{(\cdot)}^{T}

represents the transpose of a matrix.

2.3. Deep Non-Negative Matrix Factorization (DNMF)

The traditional NMF method decomposes mixed pixels into a linear combination of an endmember spectral matrix and an abundance matrix by constructing a low-dimensional representation. However, in hyperspectral unmixing, the complexity of mixed pixels also arises from nonlinear mixing, endmember spectral variability (e.g., due to illumination or terrain effects), and noise interference. Shallow NMF struggles to capture these complex relationships, limiting the accuracy of endmember identification and abundance estimation. To overcome this limitation, ref. [38] proposed the DNMF framework, which decomposes the given data matrix

X

into

l + 1

non-negative matrices through the following steps:

\begin{matrix} X & \approx A_{1} S_{1} \\ X & \approx A_{1} A_{2} S_{2} \\ ⋮ \\ X & \approx A_{1} A_{2} \dots A_{l} S_{l} \end{matrix}

(7)

where l denotes the total number of layers,

A_{i} \in R^{P_{i - 1} \times P_{i}}

represents the endmember matrix at the i-th layer, and

S_{l} \in R^{P_{i} \times N}

denotes the corresponding abundance matrix. Moreover,

P_{0}

is the number of spectral bands B, and N is the number of pixels. The size of each layer is given by

P_{i}

, where

i = 1, 2, \dots, l

. This hierarchical structure allows the dimensionality to gradually decrease across successive layers.

B = P_{0} \geq P_{1} \geq \dots \geq P_{l - 1} \geq P_{l}

(8)

where P denotes the number of endmembers. The optimization objective for the HU process based on DNMF is defined as follows:

F_{D eep} = min_{A_{i}, S_{l}} {∥X - A_{1} A_{2} \dots A_{l} S_{l}∥}_{F}^{2}

(9)

We perform NMF-based approximation for all layers.

A_{l} \leftarrow A_{l} \frac{X S_{l}^{T}}{A_{l} S_{l} S_{l}^{T}}

(10)

S_{l} \leftarrow S_{l} \frac{A_{l}^{T} X}{A_{l}^{T} A_{l} S_{l}}

(11)

Specifically, in each layer of the DNMF framework, the endmember matrix

A_{i}

and the corresponding abundance matrix

S_{l}

are explicitly considered, whereas the abundance matrix

S_{i}

is implicitly represented, where

i = 1, 2 \dots l - 1

. To facilitate a better understanding of the DNMF framework, Figure 2 illustrates a three-layer DNMF architecture.

3. Proposed Model

In this section, we first delve deeply into the mathematical modeling process of DNMF-AG. The optimization process and time complexity analysis of the algorithm are subsequently discussed in detail. Finally, the specific implementation details of the model are discussed.

3.1. Robust DNMF with Truncated Activation Functions

The traditional NMF method relies on linear decomposition and non-negativity constraints to perform matrix decomposition on data. However, in practical HU tasks, the abundance matrix is often disturbed by noise and outliers, causing some elements to deviate from their physically feasible domain, which in turn affects the unmixing accuracy and the stability of the constructed model. To address this issue, truncated activation functions (such as variants of the ReLU) are introduced in this paper; these functions map negative values or minimal values to zero through truncation, thereby achieving a nonlinear mapping that enhances the expressive power of the model while suppressing the interference of outliers during the optimization process. Additionally, this strategy can improve the stability and physical feasibility of abundance estimation while maintaining data sparsity. Its mathematical formalization is as follows:

F_{NMF} = {∥X - A σ (S)∥}_{F}^{2}

(12)

where

σ (\cdot)

denotes the element-wise truncated nonlinear activation function, which is employed to suppress unreasonable small values in the abundance matrix caused by noise or outliers. For each element

s_{i j}

in the abundance matrix

S

, let t be the truncation threshold, the truncated activation function is then defined as:

σ (s_{i j}) = \{\begin{matrix} 0, & s_{i j} \leq t \\ s_{i j}, & s_{i j} > t \end{matrix}

(13)

The traditional NMF method usually adopts a loss function that is based on the Frobenius norm. However, when addressing noise and outliers, the Frobenius norm exhibits strong instability. The fundamental reason for this finding is that the error term is usually expressed as the sum of the squared Euclidean norms of residuals (12). When extreme values exist, the squared terms significantly amplify the error, causing a few outliers to disproportionately impact the decomposition results, thereby reducing the stability and unmixing accuracy of the model. To overcome this problem, ref. [40] proposed a robust NMF method by introducing a

L_{2, 1}

norm to redefine the optimization objective, effectively weakening the interference caused by outliers.

F_{NMF} = {∥X - A σ (S)∥}_{2, 1}

(14)

In terms of norm selection, the

L_{2, 1}

norm and the Frobenius norm have significant differences. For a given matrix

X \in R^{B \times N}

, the

L_{2, 1}

norm is defined as

{∥X∥}_{2, 1} = \sum_{i = 1}^{B} {∥x_{i}∥}_{2}

, whereas the Frobenius norm is defined as

{∥X∥}_{F}^{2} = \sum_{i = 1}^{B} {∥x_{i}∥}_{2}^{2}

. Since the

L_{2, 1}

norm avoids the calculation of squared errors, it is more effective than the Frobenius norm when handling noisy data.

Inspired by the theoretical advantages of truncated activation functions and the

L_{2, 1}

norm in terms of improving the performance of NMF, we formulate the DNMF problem as follows:

F_{Deep} = min_{A_{i}, S_{l}} {∥X - A_{1} σ (A_{2} σ (\dots σ (A_{l} S_{l})))∥}_{2, 1}

(15)

3.2. Adversarial Graph Regularization

Graph theory [41] and manifold learning theory [42] indicate that constructing a nearest-neighbor graph can effectively preserve the local geometric structure of data. However, the existing graph regularization methods (such as the Laplacian regularization term) often overly rely on the local manifold assumption, focusing only on the similarity between samples and neglecting their differences. This limitation leads to overly strong continuity in the abundance matrix space, thereby damaging the detailed information contained within it. To address this issue, a dual-graph adversarial learning mechanism (Figure 1) is proposed in this paper; this mechanism effectively optimizes the intraclass compactness and interclass separability of the target abundance matrix by dynamically balancing the reward graph and the penalty graph, overcoming the shortcomings of the traditional graph regularization methods.

The reward graph is aimed at enhancing the consistency of the abundance distribution of the pixels within the local neighborhood. Specifically, given a data matrix

X \in R^{B \times N}

, if a pixel

x_{i}

is one of the K-nearest neighbors of

x_{j}

, similarity connections are connected among the samples of the same type through the K-nearest neighbors (KNN) criterion:

W_{R} (i, j) = \{\begin{matrix} exp (- \frac{{∥x_{i} - x_{j}∥}_{2}^{2}}{τ}), & if x_{i} \in N_{K} (x_{j}) \\ 0, & otherwise \end{matrix}

(16)

where

N_{K} (x_{j})

represents the KNN of pixel

x_{i}

and

τ

is a temperature parameter that controls the rate of similarity decay. According to the Laplacian matrix

L_{R} = D_{R} - W_{R}

,

D_{R}

is the degree matrix, and

D_{R} (i, i) = \sum_{j} W_{R} (i, j)

. The reward map restricts the abundance vectors of pixels belonging to the same class to be as close as possible in the manifold space, which suppresses the loose intraclass distribution caused by noise or gradual changes in the mixing ratio.

A penalty graph is used to enhance the differences between non-neighboring pixels, preventing the potential overlapping of heterogeneous samples in the abundance space. If pixel

x_{i}

is not among the KNNs of pixel

x_{j}

, a connecting edge is constructed and assigned a weight:

W_{P} (i, j) = \{\begin{matrix} exp (- \frac{{∥x_{i} - x_{j}∥}_{2}^{2}}{τ}), & if x_{i} \notin N_{K} (x_{j}) \\ 0, & otherwise \end{matrix}

(17)

Here, the corresponding Laplacian matrix

L_{P} = D_{P} - W_{P}

and its associated regularization term enhance interclass distinguishability by maximizing the abundance differences between non-neighboring samples.

It is worth noting that the parameter K is a crucial one in the construction of the reward graph and the penalty graph. It defines the local neighborhood range of each pixel, which directly affects the connectivity and discriminability of the graph structure. In the reward graph, K determines the connection strength between pixels of the same type. If it is too small, it will lead to insufficient intra-class information, affecting the consistency modeling of abundance distribution. In the penalty graph, the setting of K affects the degree of separation between different types of samples. If it is too large, it may introduce a large number of non-similar neighbors, causing the graph structure to be overly smoothed and weakening the discriminability between classes. Therefore, the reasonable setting of K requires a balance between intra-class compactness and inter-class separability.

By jointly optimizing the Laplacian constraints of the reward graph and the penalty graph, the objective function is defined as follows in each layer of the DNMF model:

G_{D e e p} = α t r (S_{i} L_{R} S_{i}^{T}) - β t r (S_{i} L_{P} S_{i}^{T})

(18)

where

i = 1, 2 \dots l

and

α

and

β

are weighting parameters that control the balance between intraclass compactness and interclass separability.

3.3. Gram Sparsity Constraint

From a statistical perspective, each pixel in a hyperspectral image is typically composed of only a few endmembers, so the column vectors of the abundance matrix naturally exhibit sparsity. Traditional element-wise sparsity constraints based on the

L_{0}

or

L_{1}

norm can directly control the sparsity of individual elements; however, solving the

L_{0}

norm is an NP-hard problem, and the

L_{1}

norm often provides limited sparsity in practical unmixing applications. Moreover, these methods generally overlook the redundancy and overlap between column vectors, making it difficult to effectively distinguish the endmember compositions of different pixels. To address this limitation, this study introduces an inner-product-based sparsity regularization method within the proposed DNMF framework, namely the Gram sparsity constraint. This constraint effectively suppresses high correlations between columns, thereby highlighting the primary endmember contributions of each pixel, indirectly enhancing sparsity while reducing the influence of noise on unmixing. Specifically, the constraint penalizes the off-diagonal elements of the Gram matrix, preventing the same endmember from exhibiting similar distributions across different pixel columns, thus reducing inter-column redundancy and avoiding potential mixing ambiguities during the unmixing process [43].

Specifically, considering the abundance matrix

S_{i} = [s_{i}^{1}, s_{i}^{2}, \dots, s_{i}^{N}]

at the i-th layer, where

i = 1, 2, \dots, l

, the corresponding Gram matrix

S_{i}^{T} S_{i}

can be expressed as follows:

S_{i}^{T} S_{i} = (\begin{matrix} 〈s_{i}^{1}, s_{i}^{1}〉 〈s_{i}^{1}, s_{i}^{2}〉 & \dots & 〈s_{i}^{1}, s_{i}^{N}〉 \\ ⋮ & ⋱ & ⋮ \\ 〈s_{i}^{N}, s_{i}^{1}〉 〈s_{i}^{N}, s_{i}^{2}〉 & \dots & 〈s_{i}^{N}, s_{i}^{N}〉 \end{matrix})

(19)

where

〈s_{i}^{p}, s_{i}^{q}〉

represents the inner product of the abundance vectors of columns p and q. The off-diagonal elements

〈s_{i}^{p}, s_{i}^{q}〉 (p \neq q)

of the Gram matrix reflect the cooperative or competitive relationships among different endmembers during the mixing process. The expression

t r (S_{i}^{T} S_{i} 1_{N}) - t r (S_{i}^{T} S_{i})

corresponds to the sum of the off-diagonal elements in the Gram matrix

S_{i}^{T} S_{i}

. In simple terms,

t r (S_{i}^{T} S_{i} 1_{N}) - t r (S_{i}^{T} S_{i}) = \sum_{p, q = 1}^{N} 〈s_{i}^{p}, s_{i}^{q}〉

(20)

where

1_{N}

represents a matrix of all ones. The inner product

〈s_{i}^{p}, s_{i}^{q}〉

approaches zero if and only if

s_{i}^{p}

or

s_{i}^{q}

approaches zero. Therefore, this optimization process can induce sparsity in the abundance matrix along the column dimension, suppressing the global participation of redundant endmembers.

3.4. Proposed DNMF-AG Model

Considering the previous discussion, the structure of DNMF-AG consists of three main parts: (1) robust DNMF with a truncated activation function; (2) an exploration of the problem of adversarial graphs; and (3) Gram sparsity regularization learning. On this basis, the optimization problem of the proposed model is formulated as follows:

\begin{matrix} min_{A_{i}, S_{i}} & {∥X - A_{1} σ (A_{2} σ (\dots σ (A_{l} S_{l})))∥}_{2, 1} + α tr (S_{i} L_{R} S_{i}^{T}) - β tr (S_{i} L_{P} S_{i}^{T}) \\ + γ (tr (S_{i}^{T} S_{i} 1_{N}) - tr (S_{i}^{T} S_{i})) \end{matrix}

(21)

where the first term is a robust DNMF reconstruction with a truncated activation; the second and third are adversarial graph and Gram sparsity regularizations. Here, l is the number of layers,

σ (\cdot)

the improved nonlinear activation, and

α

,

β

, and

γ

are the parameters for reward graph, penalty graph, and Gram sparsity, respectively.

3.5. Optimization

To efficiently solve the optimization problem (21), this paper adopts the Lagrange multiplier method as a standard numerical optimization tool. Using this method, the constrained optimization problem (21) is transformed into an equivalent unconstrained formulation, which facilitates subsequent numerical solution.

\begin{matrix} F ({A_{i}}_{i = 1}^{l}, {S_{i}}_{i = 1}^{l}) & = ∥ X - A_{1} σ (A_{2} σ (\dots σ (A_{l} S_{l}) \dots)) ∥_{2, 1} + α \sum_{i = 1}^{l} tr (S_{i} L_{R} S_{i}^{T}) \\ - β \sum_{i = 1}^{l} tr (S_{i} L_{P} S_{i}^{T}) + γ \sum_{i = 1}^{l} (tr (S_{i}^{T} S_{i} 1_{N}) - tr (S_{i}^{T} S_{i})) \\ + \sum_{i = 1}^{l} tr (Θ_{i} A_{i}^{T}) + \sum_{i = 1}^{l} tr (Δ_{i} S_{i}^{T}) . \end{matrix}

(22)

where

i = 1, 2, \dots, l

and

Θ_{i}

and

Δ_{i}

are the Lagrange multipliers for

A_{i} \in R^{P_{l - 1} \times P_{l}}

and

S_{i} \in R^{P_{l} \times N}

, respectively.

(1) Calculate

A_{i}

:

In subproblem

A_{i}

, the following two situations may occur.

Case 1: $i = 1$ . We assume that $ψ_{2} = σ (A_{2} σ (\dots σ (A_{l} S_{l})))$ .

\begin{matrix} F (A_{1}) & = {∥X - A_{1} ψ_{2}∥}_{2, 1} + t r (Θ_{1} A_{1}^{T}) \\ = - 2 t r (X^{T} A_{1} ψ_{2}) + t r (ψ_{2}^{T} A_{1}^{T} A_{1} ψ_{2}) + t r (Θ_{1} A_{1}^{T}) \end{matrix}

(23)

This means that

\frac{\partial F}{\partial A_{1}} = - 2 X ψ_{2}^{T} + 2 A_{1} ψ_{2} ψ_{2}^{T} + Θ_{1}

(24)

Based on the KKT condition

Θ_{1} A_{1} = 0

and assuming

\partial F / \partial A_{1} = 0

, the update rule for

A_{1}

is:

A_{1} \leftarrow A_{1} \sqrt{\frac{X ψ_{2}^{T}}{A_{1} ψ_{2} ψ_{2}^{T}}}

(25)

Case 2: $i = 2, 3, \dots, l$ . We assume that $φ_{i - 1} = A_{1} σ (A_{2} σ (\dots σ (A_{i - 1})))$ and $ψ_{i + 1} = σ (A_{i + 1} σ (\dots σ (A_{l} S_{l})))$ are ignored in the function $F$ , we obtain:

\begin{matrix} F (A_{i}) & = {∥X - φ_{i - 1} A_{i} ψ_{i + 1}∥}_{2, 1} + tr (Θ_{i} A_{i}^{T}) \\ = - 2 tr (X^{T} φ_{i - 1} A_{i} ψ_{i + 1}) + tr (ψ_{i + 1}^{T} A_{i}^{T} φ_{i - 1}^{T} φ_{i - 1} A_{i} ψ_{i + 1}) \end{matrix}

(26)

Hence,

\frac{\partial F}{\partial A_{i}} = - 2 φ_{i - 1}^{T} X ψ_{i + 1}^{T} + 2 φ_{i - 1}^{T} φ_{i - 1} A_{i} ψ_{i + 1} ψ_{i + 1}^{T}

(27)

According to the KKT condition, the update rule for

A_{i}

is as follows:

A_{i} \leftarrow A_{i} \sqrt{\frac{φ_{i - 1}^{T} X ψ_{i + 1}^{T}}{φ_{i - 1}^{T} φ_{i - 1} A_{i} ψ_{i + 1} ψ_{i + 1}^{T}}}

(28)

(2) Calculate

S_{i}

: In subproblem

S_{i}

, the following two situations may occur:

Case 1: For $i = l$ , we assume that $φ_{l} = A_{1} σ (A_{2} σ (\dots σ (A_{l})))$ , and ignoring irrelevant terms in $F$ .

\begin{matrix} F (S_{l}) = {∥X - φ_{l} S_{l}∥}_{2, 1} + α t r (S_{l} L_{R} S_{l}^{T}) - β t r (S_{l} L_{P} S_{l}^{T}) + γ (t r (S_{l}^{T} S_{l} 1_{N}) - t r (S_{l}^{T} S_{l})) + t r (Δ_{l} S_{l}^{T}) \\ = - 2 t r (X^{T} φ_{l} S_{l}) + t r (S_{l}^{T} φ_{l}^{T} φ_{l} S_{l}) + α t r (S_{l} L_{R} S_{l}^{T}) - β t r (S_{l} L_{P} S_{l}^{T}) + γ (t r (S_{l}^{T} S_{l} 1_{N}) - t r (S_{l}^{T} S_{l})) + t r (Δ_{l} S_{l}^{T}) \end{matrix}

(29)

Hence,

\begin{matrix} \frac{\partial F}{\partial S_{l}} = & - 2 φ_{l}^{T} X + 2 φ_{l}^{T} φ_{l} S_{l} + 2 α S_{l} L_{R} - 2 β S_{l} L_{P} + 2 γ (S_{l} 1_{N} - S_{l}) + Δ_{l} \end{matrix}

(30)

According to the KKT condition, the update rule for

S_{l}

is as follows:

S_{l} \leftarrow S_{l} \sqrt{\frac{φ_{l}^{T} X + β S_{l} L_{P} + γ S_{l}}{φ_{l}^{T} φ_{l} S_{l} + α S_{l} L_{R} + γ S_{l} 1_{N}}}

(31)

Case 2: For $i = 1, 2, \dots, l - 1$ , assuming $φ_{i} = A_{1} σ (A_{2} σ (\dots σ (A_{l})))$ and ignoring irrelevant terms in $F$ , we have:

\begin{matrix} F (S_{i}) = {∥X - φ_{i} S_{i}∥}_{2, 1} + α t r (S_{i} L_{R} S_{i}^{T}) - β t r (S_{i} L_{P} S_{i}^{T}) + γ (t r (S_{i}^{T} S_{i} 1_{N}) - t r (S_{i}^{T} S_{i})) + t r (Δ_{i} S_{i}^{T}) \\ = - 2 t r (X^{T} φ_{i} S_{i}) + t r (S_{i}^{T} {φ_{i}}^{T} φ_{i} S_{i}) + α t r (S_{i} L_{R} S_{i}^{T}) - β t r (S_{i} L_{P} S_{i}^{T}) + γ (t r (S_{i}^{T} S_{i} 1_{N}) - t r (S_{i}^{T} S_{i})) + t r (Δ_{i} S_{i}^{T}) \end{matrix}

(32)

The following can then be obtained:

\begin{matrix} \frac{\partial F}{\partial S_{i}} & = - 2 φ_{i}^{T} X + 2 φ_{i}^{T} φ_{i} S_{i} + 2 α S_{i} L_{R} - 2 β S_{i} L_{P} + 2 γ (S_{i} 1_{N} - S_{i}) + Δ_{i} \end{matrix}

(33)

According to the KKT condition, the update rule for

S_{i}

is as follows:

S_{i} \leftarrow S_{i} \sqrt{\frac{φ_{i}^{T} X + β S_{i} L_{P} + γ S_{i}}{φ_{i}^{T} φ_{i} S_{i} + α S_{i} L_{R} + γ S_{i} 1_{N}}}

(34)

The algorithm process can be seen in Algorithm 1.

3.6. Complexity Analysis

This section analyzes the computational complexity of the proposed method, which consists of hierarchical pretraining and global fine-tuning.

During the hierarchical pretraining stage, the classic NMF algorithm is adopted to initialize the decomposition layers at each level. Therefore, the computational complexity of this step can be expressed as

O (I t_{p r e} (B N P_{i} + B {P_{i}}^{2} + N {P_{i}}^{2}))

, where

P_{i}

represents the size of each layer, and

I t_{p r e}

represents the number of iterations required by the NMF process. In the fine-tuning stage, each layer involves three main tasks: (1) constructing auxiliary matrices

φ_{i - 1}

and

ψ_{i + 1}

, (2) updating the matrix

A_{i}

, and (3) updating the matrix

S_{i}

. Constructing

φ_{i - 1}

and

ψ_{i + 1}

requires at most

O (I t_{f i n e} ({P_{i}}^{3} + B {P_{i}}^{2}))

computational complexity. Moreover, the computational complexities for updating matrices

A_{i}

and

S_{i}

are

O (I t_{f i n e} (B^{2} P_{i - 1} + B P_{i - 1}^{2}))

and

O (I t_{f i n e} (N^{2} P_{i} + N P_{i}^{2}))

, respectively, where

I t_{f i n e}

is the maximum number of iterations involved in the fine-tuning stage of the proposed method. The comprehensive analysis indicates that when the size of each layer

P_{i}

is less than

\{B, N\}

and

P_{i} \leq P_{i - 1}

, the overall computational complexity of a single layer can be approximated as

O (I t_{f i n e} (N^{2} P_{i} + N P_{i}^{2} + B^{2} P_{i - 1} + B P_{i - 1}^{2}))

. Therefore, by summing the costs of both the hierarchical pretraining and global fine-tuning stages across all layers, the total computational complexity of the proposed method is approximately

O (l \cdot I t_{pre} (B N P + B P^{2} + N P^{2}) + l \cdot I t_{fine} (N^{2} P + N P^{2} + B^{2} P + B P^{2} + P^{3}))

, where l is the number of layers.

Algorithm 1: Algorithm of the proposed method.

1:: Input: A hyperspectral image $X \in R^{B \times N}$ ; the number of layers l; the size of each layer $P_{i} (i = 1, 2, \dots, l)$ ; the parameters $α, β, γ$ .
2:: Pretraining all layers:
3:: for $i = 1, 2, \dots, l$ do
4:: $(A_{i}, S_{i}) \leftarrow NMF (X, P_{i})$ .
5:: Set $X = S_{i}$ .
6:: end for
7:: Fine-tuning all layers:
8:: repeat
9:: for all layers do
10:: Define $φ_{i - 1} = A_{1} A_{2} \dots A_{i - 1}$ , and set $φ_{0} = I$ .
11:: Define $ψ_{i + 1} = A_{i + 1} \dots A_{l}$ .
12:: Update $A_{i}$ by using the rules in (25) and (28).
13:: Update $S_{i}$ by using the rules in (31) and (34).
14:: end for
15:: until convergence is achieved
16:: Output: Set $A = A_{1} A_{2} \dots A_{l}$ and $S = S_{l}$ .

3.7. Implementation Details

To effectively implement the proposed algorithm, several issues need to be considered.

The first issue concerns the initialization of the endmember and abundance matrices. In our implementation, vertex component analysis (VCA) is used only to initialize the matrix

A_{1}

, and the fully constrained least-squares (FCLS) algorithm is employed to initialize the matrix

S_{1}

; the subsequent decomposition process follows the standard layer-wise structure of deep non-negative matrix factorization, without introducing any additional initialization strategies.

The second issue is how to ensure that the two LMM constraints are satisfied during the unmixing process. According to the MUR, we conclude that as long as the initial

A_{i}

and

S_{i}

are non-negative, the non-negative constraint can be achieved. In each iteration, we replace

X

and

A

with their enhanced forms:

\bar{X} = (\begin{matrix} X \\ δ 1_{N}^{T} \end{matrix}), \bar{A} = (\begin{matrix} A \\ δ 1_{P}^{T} \end{matrix})

(35)

where

δ

controls the weight of the ASC constraint in the objective function. Larger values improve ASC accuracy but slow convergence; therefore, we set

δ = 25

to balance accuracy and convergence speed.

The third issue is choosing the number of layers in DNMF. Too few layers may fail to capture the complexity of the data, while too many can cause overfitting or higher computational cost. We conducted experiments on simulated datasets to evaluate different layer numbers and determined the optimal decomposition depth for subsequent experiments.

The fourth consideration is the use of the truncated activation function. This function enhances the sparsity of the abundance matrix and improves the interpretability of the unmixing results. However, different truncation thresholds can impact sparsity, accuracy, and computational efficiency. Based on ablation experiments on SC1, and taking SAD, RMSE, and runtime into account, a threshold of

1 \times 10^{- 5}

was selected as the final value.

Finally, two stopping criteria are defined. The first is an error threshold: if the reconstruction error does not exceed this threshold for ten consecutive iterations, the iteration process is terminated. We set this threshold as

1 \times 10^{- 4}

. The second criterion is the maximum number of iterations. In this model, we set the maximum number of iterations to 3000 to ensure full algorithmic convergence.

4. Experimental Results and Discussion

To comprehensively validate the effectiveness of the proposed algorithm, we conducted systematic experiments on two simulated datasets and four publicly available real hyperspectral datasets, comparing the results with several methods, including VCAFCLS (VCA [18] + FCLS [44]), L1/2NMF [27], GLNMF [45], MVNTFTV [33], MLNMF [36], SDNMFTV [37], HGNMFFS [46], BUDDIP [47], FaSUn [48], SGLRWRSU [49] and MSGACD [50]. All the experiments were conducted on a Windows^™ 11 system with a 3.80-GHz AMD^® Ryzen^™ 7 7840H CPU and 16 GB of RAM using MATLAB^® 2019a.

For quantitative evaluation purposes, the SAD and RMSE were used as metrics. They are defined as follows:

S A D = \sum_{i = 1}^{P} arccos (\frac{a_{i}^{T} \tilde{a_{i}}}{∥a_{i}∥ \cdot ∥\tilde{a_{i}}∥})

(36)

R M S E = {(\frac{1}{N} \sum_{i = 1}^{n} {∥s_{i} - \tilde{s_{i}}∥}_{2}^{2})}^{1 / 2}

(37)

where

a_{i}

and

\tilde{a_{i}}

denote the i-th estimated and true endmembers, and

s_{i}

and

\tilde{s_{i}}

denote the i-th estimated and true abundances.

4.1. Experiments Conducted on Synthetic Dataset

The experimental data generation involves two steps: endmember selection and abundance estimation. To evaluate the effectiveness of the proposed algorithm in HU, two simulated datasets, SC1 and SC2, are employed. Specifically, SC1 is constructed by randomly selecting six endmembers from the USGS spectral library Figure 3, generating abundances using the method in [51], and synthesizing hyperspectral data via the LMM, resulting in an image with a spatial resolution of 57 × 57 pixels and 188 spectral bands. Similarly, SC2 is generated by randomly selecting five endmembers from the USGS spectral library (see Figure 4), producing abundances with the method in [11], and applying the LMM to obtain a simulated dataset with 75 × 75 pixels and 188 bands.

To ensure a reliable and fair comparison, all competing NMF-based methods were uniformly initialized using the VCA and FCLS algorithms, while other baseline methods adopted their default initialization strategies as recommended in their original publications. To mitigate the influence of randomness, all experiments were randomly repeated twenty times, and the mean and standard deviation of each evaluation metric were reported.

Furthermore, all comparison algorithms were configured with their optimal parameter settings, see Table 1. VCAFCLS is used as a baseline method and does not involve any parameters. In L1/2NMF,

λ

controls the weight of the sparsity term. GLNMF introduces graph into sparse NMF, where

λ

,

μ

, and

δ

correspond to the weights of the sparsity, graph and the ASC constraint, respectively. MVNTFTV incorporates TV regularization, with

λ

regulating the strength of the TV term. MLNMF adopts a multilayer decomposition structure, where L denotes the number of decomposition layers and

α_{0}

is the sparsity parameter. SDNMFTV combines deep NMF with TV(

α

) and sparsity regularization(

λ

), in which L represents the number of layers. HGNMFFS is based on high-order graph regularization, where

λ

and

μ

denote regularization weights, and

β_{1}

and

β_{1}

balance the first and second-order neighborhood constraints. BUDDIP employs a unified framework to address both linear and nonlinear blind unmixing problems, where

α_{1} - α_{4}

regulate the consistency between the network output and the initialization guidance,

α_{5}

and

α_{6}

control the data fitting term. In FaSUn,

T_{A}

and

T_{B}

denote the numbers of inner-loop iterations in the ADMM optimization for abundance and endmember estimation, respectively, while

μ_{1} - μ_{3}

are penalty parameters associated with the corresponding constraints. In SGLRWRSU,

λ_{s}

and

λ_{g}

control the sparsity and graph Laplacian regularization, respectively. In MSGACD,

λ_{1}

and

λ_{2}

are used to balance the reconstruction error and the abundance regularization term.

4.1.1. Experiments on SC1

Experiment 1(Parameter Selection):In this study, the parameters

α

,

β

, and

γ

respectively control the contribution weights of the reward graph, the penalty graph, and the Gram sparsity regularization term. To determine the optimal parameter combination, we conducted a systematic grid search on simulated data with an SNR of 30 dB under the DNMF framework. The candidate sets were

α

was

{0

,

5 \times 10^{- 3}

,

1 \times 10^{- 2}

,

5 \times 10^{- 2}

,

1 \times 10^{- 1}

,

2 \times 10^{- 1}

,

3 \times 10^{- 1}}

, that for

β

was

{0

,

5 \times 10^{- 3}

,

1 \times 10^{- 2}

,

2 \times 10^{- 2}

,

3 \times 10^{- 2}

,

5 \times 10^{- 2}

,

1 \times 10^{- 1}}

, and that for

γ

was

{0

,

1 \times 10^{- 4}

,

3 \times 10^{- 4}

,

1 \times 10^{- 3}

,

3 \times 10^{- 3}

,

1 \times 10^{- 2}

,

1 \times 10^{- 1}}

. As shown in Figure 5, the grid search results indicate that the combination

α = 5 \times 10^{- 2}

,

β = 2 \times 10^{- 2}

, and

γ = 3 \times 10^{- 3}

achieves the best performance, and was thus selected as the global optimal parameter set.

Experiment 2(Sensitivity analysis of the parameter K): To systematically evaluate the impact of the parameter K on the effectiveness of adversarial graph regularization (as shown in Figure 6), all other parameters were fixed, and K was set to 3, 5, 10, 15, 20. The value of K controls the size of the local neighborhood in the graph structure, directly affecting its connectivity and discriminative capability. A smaller K results in an overly sparse graph that is sensitive to local noise and lacks sufficient intra-class information, while a larger K introduces too many inter-class neighbors, blurring class boundaries and weakening the discriminative power. In contrast,

K = 5

strikes a good balance between intra-class consistency and inter-class separability, more accurately reflecting the actual distribution of mixed pixels and thus improving the accuracy and robustness of the unmixing model.

Experiment 3(Layer Analysis): The proposed model hierarchically represents hyperspectral data through a cascaded structure of multiple factor matrices, with the number of layers l controlling its complexity and representational capacity. Experiments on the SC1 dataset with an SNR of 30 dB (Figure 7) show that when

l = 1

(NMF), SAD and RMSE reach their maximum, while deep models (

l \geq 2

) clearly outperform the shallow structure. As l increases to 3, the model achieves the minimum SAD and RMSE; therefore, in this study, the training depth is set to

l = 3

to balance performance and computational complexity.

Experiment 4(Robustness Analysis): This experiment evaluated different noise scenarios by setting the SNR to 20, 30, and 40 dB. Table 2 shows the SAD and RMSE values of the proposed and comparison methods at these noise levels. The results indicate that the proposed method consistently achieves the lowest SAD and RMSE, significantly outperforming existing methods. Figure 8 presents the abundance maps for six endmembers at SNR = 30 dB, showing that our method accurately estimates ground object distributions. The last row of Table 2 reports the average running time on the SC1 dataset, indicating that VCAFCLS, being non-iterative, is the fastest, while the DNMF-based algorithm remains less time-efficient, with potential for future improvement.

Experiment 5(Different Mixing Levels): This experiment systematically evaluated the impact of different mixing levels on HU algorithms. In SC1 with SNR = 30 dB, hyperspectral data with varying purities were generated by adjusting the mixing parameter

θ

(0.5–0.9), where higher

θ

indicates a lower level of data mixing. As shown in Figure 9, the proposed deep unmixing model consistently outperformed others across all mixing levels, maintaining the lowest errors even at

θ = 0.9

. This demonstrates the algorithm’s robustness to mixing-level variations and the multilevel analysis advantage of its deep architecture.

Experiment 6(Convergence Analysis): This experiment was conducted to analyze the convergence of the proposed algorithm. We set the number of endmembers to 6 and the SNR to 30 dB. Figure 10a,b shows the curves of the target function value and the main residual term changes, respectively. The target function value decreased rapidly after several iterations and finally converged to a stable value; the main residual term also tended to decrease gradually, which was consistent with the expected result. Furthermore, Figure 10c presents the iterative residual curves of the endmembers and abundance matrices. As the number of iterations increased, the residuals of these two matrices gradually approached zero, further validating the good convergence ability of the proposed algorithm.

η = \frac{1}{\sqrt{B}} \sum_{b} \frac{\sqrt{N} - {∥x_{b}∥}_{1} / {∥x_{b}∥}_{2}}{\sqrt{N - 1}}

(38)

where

x_{b}

represents the given hyperspectral data at band b.

Experiment 7 (Truncated Activation Function Analysis): This experiment investigated the effect of the truncation threshold of the activation function (

10^{- 3}

to

10^{- 9}

) on HU, analyzing the abundance matrix sparsity

η

, SAD, RMSE, and running time. The results (Table 3) show that decreasing the threshold reduces sparsity, especially between

10^{- 3}

and

10^{- 5}

, where the number of zero elements significantly decreases; below

10^{- 6}

, sparsity stabilizes. SAD and RMSE are optimal at

10^{- 5}

, indicating the highest unmixing accuracy; a too-large threshold truncates too many abundance values, increasing errors, while a smaller threshold increases running time. These results demonstrate that the truncation threshold significantly affects model performance and requires a tradeoff between accuracy and efficiency, with

10^{- 5}

being the optimal choice. According to [30],

η

is defined as follows:

Experiment 8 (Ablation Analysis): To evaluate the effectiveness of truncated robust DNMF, adversarial graph regularization, and Gram sparsity regularization, we designed eight ablation experiments. First, within the DNMF framework, the traditional Frobenius norm reconstruction term was replaced with the

L_{2, 1}

norm to enhance robustness against noise and outliers (39) and (40). On this basis, a truncated activation function was introduced to form the truncated robust DNMF model (41). The loss functions of the three models are as follows:

\begin{matrix} min_{A_{i}, S_{l}} {∥X - A_{1} A_{2} \dots A_{l} S_{l}∥}_{F}^{2} \end{matrix}

(39)

\begin{matrix} min_{A_{i}, S_{l}} {∥X - A_{1} A_{2} \dots A_{l} S_{l}∥}_{2, 1} \end{matrix}

(40)

\begin{matrix} min_{A_{i}, S_{l}} {∥X - A_{1} σ (A_{2} σ (\dots σ (A_{l} S_{l})))∥}_{2, 1} \end{matrix}

(41)

By establishing a dynamic mechanism for balancing the reward graph and the penalty graph, the proposed dual-graph adversarial regularization module (42) and (43) effectively suppressed the difficulty of distinguishing dissimilar abundances while maintaining the continuity of the space of similar samples. Furthermore, we conducted a systematic comparison between the Gram sparsity regularization proposed in this paper and the traditional

L_{1 / 2}

regularization technique (44) and (45) and ultimately formed the model presented in this paper (46).

\begin{matrix} min_{A_{i}, S_{i}} {∥X - A_{1} σ (A_{2} σ (\dots σ (A_{l} S_{l})))∥}_{2, 1} + α tr (S_{i} L_{R} {S_{i}}^{T}) \end{matrix}

(42)

\begin{matrix} min_{A_{i}, S_{i}} {∥X - A_{1} σ (A_{2} σ (\dots σ (A_{l} S_{l})))∥}_{2, 1} + α tr (S_{i} L_{R} {S_{i}}^{T}) - β tr (S_{i} L_{P} {S_{i}}^{T}) \end{matrix}

(43)

\begin{matrix} min_{A_{i}, S_{i}} {∥X - A_{1} σ (A_{2} σ (\dots σ (A_{l} S_{l})))∥}_{2, 1} + λ {∥S_{i}∥}_{\frac{1}{2}} \end{matrix}

(44)

\begin{matrix} min_{A_{i}, S_{i}} {∥X - A_{1} σ (A_{2} σ (\dots σ (A_{l} S_{l})))∥}_{2, 1} + γ (tr ({S_{i}}^{T} S_{i} 1_{N}) - tr ({S_{i}}^{T} S_{i})) \end{matrix}

(45)

\begin{matrix} min_{A_{i}, S_{i}} {∥X - A_{1} σ (A_{2} σ (\dots σ (A_{l} S_{l})))∥}_{2, 1} + α tr (S_{i} L_{R} {S_{i}}^{T}) - β tr (S_{i} L_{P} {S_{i}}^{T}) + γ (tr ({S_{i}}^{T} S_{i} 1_{N}) - tr ({S_{i}}^{T} S_{i})) \end{matrix}

(46)

To evaluate the performance of each model, experiments were conducted on synthetic data with six endmembers and an SNR of 30 dB (Figure 11). The results show that replacing the traditional Frobenius norm with the

L_{2, 1}

norm effectively suppresses interference from noise and outliers, significantly reducing SAD and RMSE. The robust DNMF model with the truncated activation function not only preserves the non-negativity of the abundance matrix but also truncates elements smaller than

10^{- 5}

to 0, enhancing sparsity and improving computational efficiency. On this basis, the dual-graph adversarial regularization module (43) outperforms the single-graph model (42) by simultaneously constructing a local similarity graph and a non-neighborhood repulsion graph, better capturing detailed features. Finally, the proposed Gram sparsity constraint achieves the best results compared to the traditional

L_{1 / 2}

regularization, further improving unmixing performance.

Experiment 9(Outlier Robustness Analysis): To verify the algorithm’s robustness against outliers, a two-dimensional experiment was designed. In the spectral dimension, abnormal pixels with full bands, 1/2 bands, and 1/3 bands (random negative values) were constructed (Figure 12a,b) to test adaptability to varying spectral anomalies. In the spatial dimension, with 1/3 abnormal bands fixed, the number of abnormal pixels (3, 5, and 10) was increased to evaluate spatial sensitivity (Figure 12c,d). Results show that in single-pixel tests, the algorithm maintains optimal endmember extraction (SAD) and abundance inversion (RMSE); in multi-pixel tests, thanks to the synergy of the activation function,

L_{2, 1}

norm constraint, and regularization term, the algorithm significantly outperforms comparison methods, demonstrating excellent robustness.

4.1.2. Experiments on SC2

This set of experiments evaluated the unmixing performance of the proposed algorithm on SC2 under SNRs of 20, 30, and 40 dB. As shown in Table 4, it achieves the best performance under most noise conditions. The abundance maps at 30 dB (Figure 13) show that the algorithm preserves fine spatial structures in transition zones. These results indicate improved spectral separation and spatial resolution under varying noise levels.

4.2. Experiments Conducted on Real Datasets

4.2.1. Samson Dataset

This experiment validated the proposed algorithm on the Samson dataset (95 × 95 pixels, 156 bands, containing trees, soil, and water). As shown in Figure 14, the extracted endmember spectra closely match the reference spectra. Figure 15 and Table 5 present the abundance maps and SAD values, indicating that the algorithm performs well on most materials and achieves the best overall performance.

4.2.2. Jasper Ridge Dataset

This experiment was conducted on a 100 × 100-pixel subregion of the Jasper dataset (198 bands), containing four main land-cover types: tree, water, soil, and road. The results show that the endmember spectra estimated by the proposed algorithm closely match the USGS reference spectra (Figure 16), and the abundance maps (Figure 17) demonstrate strong spatial resolution. Table 6 indicates that the proposed method outperforms other algorithms in most land-cover categories and in average SAD, confirming its advantage in hyperspectral unmixing tasks.

4.2.3. AVIRIS Cuprite Dataset

On the AVIRIS Cuprite dataset, a 250 × 191 × 188 subregion was selected for analysis, with the number of endmembers set to 12. (Figure 18) shows the reference endmembers and the endmember spectra estimated by each algorithm, indicating that the endmembers extracted by the proposed method are highly consistent with the references. (Figure 19) presents the abundance maps, where brighter pixels represent higher abundances. Table 7 lists the average SAD values for each algorithm, showing that the proposed method outperforms the comparison methods for most mineral categories and overall.

4.2.4. Urban Dataset

The Urban dataset contains 307 × 307 pixels and 162 effective bands, covering six types of ground features: asphalt road, grass, roof1, tree, roof2, and concrete road. As shown in Figure 20, the endmember spectra estimated by the proposed algorithm closely match the reference endmembers. Figure 21 shows that the abundance maps are consistent with the ground-truth distribution, and Table 8 indicates that it achieves the lowest SAD values.

5. Conclusions

This study proposed a deep non-negative matrix factorization method (DNMF-AG) for hyperspectral unmixing, incorporating adversarial graph regularization and Gram sparsity constraints. The adversarial graph integrates similarity and dissimilarity graphs to enhance intraclass continuity and interclass separability, while the Gram inner-product penalty constrains the abundance matrix to encourage structured sparsity and improve endmember distinguishability. These two regularizations are jointly embedded into the DNMF framework to enhance robustness and unmixing accuracy. A truncated activation function is introduced during iterative optimization to suppress low-amplitude noise, promote sparsity, and improve computational efficiency.

In future work, we will further explore the use of more spatial–spectral prior information to enhance the generalization ability of the proposed model in complex scenarios.

Author Contributions

Methodology, K.Q., X.L., and W.B.; Writing—original draft, X.L.; Writing—review and editing, X.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China (Grant No. 62461001), Natural Science Project Fund for High-level Talents of North Minzu University (Grant No.: 2025BG239), and in part by the North Minzu University Postgraduate Innovation Project (Grant No. YCX24365).

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Bhargava, A.; Sachdeva, A.; Sharma, K.; Alsharif, M.H.; Uthansakul, P.; Uthansakul, M. Hyperspectral imaging and its applications: A review. Heliyon 2024, 10, e33208. [Google Scholar] [CrossRef]
Sun, L.; Wang, X.; Zheng, Y.; Wu, Z.; Fu, L. Multiscale 3-D–2-D Mixed CNN and Lightweight Attention-Free Transformer for Hyperspectral and LiDAR Classification. IEEE Trans. Geosci. Remote Sens. 2024, 62, 2100116. [Google Scholar] [CrossRef]
Cheng, S.; Chan, R.; Du, A. CACFTNet: A Hybrid Cov-Attention and Cross-Layer Fusion Transformer Network for Hyperspectral Image Classification. IEEE Trans. Geosci. Remote Sens. 2024, 62, 1–17. [Google Scholar] [CrossRef]
He, X.; Tang, C.; Liu, X.; Zhang, W.; Sun, K.; Xu, J. Object Detection in Hyperspectral Image via Unified Spectral–Spatial Feature Aggregation. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5521213. [Google Scholar] [CrossRef]
Rasti, B.; Zouaoui, A.; Mairal, J.; Chanussot, J. Image Processing and Machine Learning for Hyperspectral Unmixing: An Overview and the HySUPP Python Package. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5517631. [Google Scholar] [CrossRef]
Wei, J.; Wang, X. An Overview on Linear Unmixing of Hyperspectral Data. Math. Probl. Eng. 2020, 2020, 3735403. [Google Scholar] [CrossRef]
Li, H.C.; Feng, X.R.; Wang, R.; Gao, L.; Du, Q. Superpixel-Based Low-Rank Tensor Factorization for Blind Nonlinear Hyperspectral Unmixing. IEEE Sens. J. 2024, 24, 13055–13072. [Google Scholar] [CrossRef]
Su, H.; Jia, C.; Zheng, P.; Du, Q. Superpixel-Based Weighted Collaborative Sparse Regression and Reweighted Low-Rank Representation for Hyperspectral Image Unmixing. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2022, 15, 393–408. [Google Scholar] [CrossRef]
Shen, X.; Chen, L.; Liu, H.; Su, X.; Wei, W.; Zhu, X.; Zhou, X. Efficient Hyperspectral Sparse Regression Unmixing With Multilayers. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5522614. [Google Scholar] [CrossRef]
Iordache, M.D.; Bioucas-Dias, J.M.; Plaza, A. Sparse Unmixing of Hyperspectral Data. IEEE Trans. Geosci. Remote Sens. 2011, 49, 2014–2039. [Google Scholar] [CrossRef]
Iordache, M.D.; Bioucas-Dias, J.M.; Plaza, A. Total Variation Spatial Regularization for Sparse Hyperspectral Unmixing. IEEE Trans. Geosci. Remote Sens. 2012, 50, 4484–4502. [Google Scholar] [CrossRef]
Liang, Y.; Zheng, H.; Yang, G.; Du, Q.; Su, H. Superpixel-Based Weighted Sparse Regression and Spectral Similarity Constrained for Hyperspectral Unmixing. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2023, 16, 6825–6842. [Google Scholar] [CrossRef]
Ayres, L.C.; Borsoi, R.A.; Bermudez, J.C.M.; de Almeida, S.J.M. A Generalized Multiscale Bundle-Based Hyperspectral Sparse Unmixing Algorithm. IEEE Geosci. Remote Sens. Lett. 2024, 21, 5502505. [Google Scholar] [CrossRef]
Xu, C. Spectral Weighted Sparse Unmixing Based on Adaptive Total Variation and Low-Rank Constraints. Sci. Rep. 2024, 14, 23705. [Google Scholar] [CrossRef]
Qu, K.; Luo, F.; Wang, H.; Bao, W. A New Fast Sparse Unmixing Algorithm Based on Adaptive Spectral Library Pruning and Nesterov Optimization. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2025, 18, 6134–6151. [Google Scholar] [CrossRef]
Boardman, J.W.; Kruscl, F.A.; Grccn, R.O. Mapping target signatures via partial unmixing of AVIRIS data. In Proceedings of the Fifth JPL Airborne Earth Science Workshop, Pasadena, CA, USA, 23–26 January 1995. [Google Scholar]
Li, H.C. An Algorithm for Fast Spectral Endmember Determination in Hyperspectral Data. In Proceedings of the IGARSS 2018—2018 IEEE International Geoscience and Remote Sensing Symposium, Valencia, Spain, 22–27 July 2018; pp. 2689–2692. [Google Scholar]
Nascimento, J.; Dias, J. Vertex component analysis: A fast algorithm to unmix hyperspectral data. IEEE Trans. Geosci. Remote Sens. 2005, 43, 898–910. [Google Scholar] [CrossRef]
Zhuang, L.; Lin, C.H.; Figueiredo, M.A.T.; Bioucas-Dias, J.M. Regularization Parameter Selection in Minimum Volume Hyperspectral Unmixing. IEEE Trans. Geosci. Remote Sens. 2019, 57, 9858–9877. [Google Scholar] [CrossRef]
Feng, X.R.; Li, H.C.; Wang, R.; Du, Q.; Jia, X.; Plaza, A. Hyperspectral Unmixing Based on Nonnegative Matrix Factorization: A Comprehensive Review. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2022, 15, 4414–4436. [Google Scholar] [CrossRef]
Li, X.; Zhang, X.; Yuan, Y.; Dong, Y. Adaptive Relationship Preserving Sparse NMF for Hyperspectral Unmixing. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5504516. [Google Scholar] [CrossRef]
Dong, L.; Lu, X.; Liu, G.; Yuan, Y. A Novel NMF Guided for Hyperspectral Unmixing From Incomplete and Noisy Data. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5513515. [Google Scholar] [CrossRef]
Qu, K.; Li, Z. A Fast Sparse NMF Optimization Algorithm for Hyperspectral Unmixing. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 1885–1902. [Google Scholar] [CrossRef]
Chetia, G.S. Hyperspectral Unmixing for Highly Correlated Endmembers using Scaled Endmembers and Abundance Sparsity Constraint NMF. In Proceedings of the 2024 6th International Conference on Energy, Power and Environment (ICEPE), Shillong, India, 20–22 June 2024; pp. 1–6. [Google Scholar]
Jia, S.; Qian, Y.-T.; Ji, Z. Nonnegative matrix factorization with piecewise smoothness constraint for hyperspectral unmixing. In Proceedings of the 2008 International Conference on Wavelet Analysis and Pattern Recognition, Hong Kong, China, 30–31 August 2008; Volume 2, pp. 815–820. [Google Scholar]
Li, D.; Zhang, X.; Zhang, Y.; Liang, L.; Chen, X.; Jia, L. Superpixel-guided manifold sparse nonnegative matrix factorization for hyperspectral unmixing. J. Appl. Remote Sens. 2025, 19, 20. [Google Scholar] [CrossRef]
Qian, Y.; Jia, S.; Zhou, J.; Robles-Kelly, A. Hyperspectral Unmixing via L_1/2 Sparsity-Constrained Nonnegative Matrix Factorization. IEEE Trans. Geosci. Remote Sens. 2011, 49, 4282–4297. [Google Scholar] [CrossRef]
Yuan, Q.; Yan, L.; Li, J.; Zhang, L. Remote sensing image super-resolution via regional spatially adaptive total variation model. In Proceedings of the 2014 IEEE Geoscience and Remote Sensing Symposium, Quebec City, QC, Canada, 13–18 July 2014; pp. 3073–3076. [Google Scholar]
Qu, K.; Bao, W. Multiple-Priors Ensemble Constrained Nonnegative Matrix Factorization for Spectral Unmixing. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2020, 13, 963–975. [Google Scholar] [CrossRef]
Liu, X.Y.; Gong, X.F.; Wang, L.; Feng, W.; Lin, Q.H. A Parametric Non-Negative Coupled Canonical Polyadic Decomposition Algorithm for Hyperspectral Super-Resolution. In Proceedings of the ICASSP 2025—2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Hyderabad, India, 6–11 April 2025; pp. 1–5. [Google Scholar]
Jimena, G.M.; De Ketelaere, B.; Saeys, W. Shared subspace learning via partial Tucker decomposition for hyperspectral image classification. Spectrochim. Acta Part A Mol. Biomol. Spectrosc. 2025, 343, 126584. [Google Scholar] [CrossRef] [PubMed]
Jiang, X.; Sun, L.; Lin, P. Local Sparsity Blocks and Tensor Low Rank Regularized Sparse Unmixing. In Proceedings of the 2023 13th Workshop on Hyperspectral Imaging and Signal Processing: Evolution in Remote Sensing (WHISPERS), Athens, Greece, 31 October–2 November 2023; pp. 1–5. [Google Scholar]
Xiong, F.; Qian, Y.; Zhou, J.; Tang, Y.Y. Hyperspectral Unmixing via Total Variation Regularized Nonnegative Tensor Factorization. IEEE Trans. Geosci. Remote Sens. 2019, 57, 2341–2357. [Google Scholar] [CrossRef]
Li, H.; Xiong, X.; Liu, C.; Ma, Y.; Zeng, S.; Li, Y. SFFNet: Staged Feature Fusion Network of Connecting Convolutional Neural Networks and Graph Convolutional Neural Networks for Hyperspectral Image Classification. Appl. Sci. 2024, 14, 2327. [Google Scholar] [CrossRef]
Dong, H.; Zhang, X.; Meng, H.; Jiao, L. A multistage graph-based autoencoder network with global-local features for hyperspectral unmixing. Int. J. Remote Sens. 2025, 46, 3709–3735. [Google Scholar] [CrossRef]
Rajabi, R.; Ghassemian, H. Spectral Unmixing of Hyperspectral Imagery Using Multilayer NMF. IEEE Geosci. Remote Sens. Lett. 2015, 12, 38–42. [Google Scholar] [CrossRef]
Feng, X.R.; Li, H.C.; Li, J.; Du, Q.; Plaza, A.; Emery, W.J. Hyperspectral Unmixing Using Sparsity-Constrained Deep Nonnegative Matrix Factorization With Total Variation. IEEE Trans. Geosci. Remote Sens. 2018, 56, 6245–6257. [Google Scholar] [CrossRef]
Huang, R.; Jiao, H.; Li, X.; Chen, S.; Xia, C. Hyperspectral Unmixing Using Robust Deep Nonnegative Matrix Factorization. Remote Sens. 2023, 15, 2900. [Google Scholar] [CrossRef]
Li, H.C.; Feng, X.R.; Zhai, D.H.; Du, Q.; Plaza, A. Self-Supervised Robust Deep Matrix Factorization for Hyperspectral Unmixing. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5513214. [Google Scholar] [CrossRef]
Kong, D.; Ding, C.; Huang, H. Robust nonnegative matrix factorization using L21-norm. In Proceedings of the CIKM ’11: 20th ACM International Conference on Information and Knowledge Management, Glasgow, UK, 24–28 October 2011; pp. 673–682. [Google Scholar]
Prokhorenkova, L.; Shekhovtsov, A. Graph-based nearest neighbor search: From practice to theory. In Proceedings of the ICML’20: 37th International Conference on Machine Learning, Virtual, 13–18 July 2020. [Google Scholar]
Xu, X.; Huang, Z.; Zuo, L.; He, H. Manifold-Based Reinforcement Learning via Locally Linear Reconstruction. IEEE Trans. Neural Netw. Learn. Syst. 2017, 28, 934–947. [Google Scholar] [CrossRef]
Han, J.; Sun, Z.; Hao, H. Selecting feature subset with sparsity and low redundancy for unsupervised learning. Knowl.-Based Syst. 2015, 86, 210–223. [Google Scholar] [CrossRef]
Heinz, D.; Chein-I-Chang. Fully constrained least squares linear spectral mixture analysis method for material quantification in hyperspectral imagery. IEEE Trans. Geosci. Remote Sens. 2001, 39, 529–545. [Google Scholar] [CrossRef]
Lu, X.; Wu, H.; Yuan, Y.; Yan, P.; Li, X. Manifold Regularized Sparse NMF for Hyperspectral Unmixing. IEEE Trans. Geosci. Remote Sens. 2013, 51, 2815–2826. [Google Scholar] [CrossRef]
Qu, K.; Li, Z.; Wang, C.; Luo, F.; Bao, W. Hyperspectral Unmixing Using Higher-Order Graph Regularized NMF With Adaptive Feature Selection. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5511815. [Google Scholar] [CrossRef]
Zhou, C.; Rodrigues, M.R.D. Hyperspectral Blind Unmixing Using a Double Deep Image Prior. IEEE Trans. Neural Netw. Learn. Syst. 2024, 35, 16478–16492. [Google Scholar] [CrossRef]
Rasti, B.; Zouaoui, A.; Mairal, J.; Chanussot, J. Fast Semisupervised Unmixing Using Nonconvex Optimization. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5526713. [Google Scholar] [CrossRef]
Zou, X.; Xu, M.; Liu, S.; Sheng, H. Superpixel-Based Graph Laplacian Regularized and Weighted Robust Sparse Unmixing. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5526415. [Google Scholar] [CrossRef]
Cao, F.; Situ, Y.; Ye, H. A Joint Multiscale Graph Attention and Classify-Driven Autoencoder Framework for Hyperspectral Unmixing. IEEE Trans. Geosci. Remote Sens. 2025, 63, 5504514. [Google Scholar] [CrossRef]
Miao, L.; Qi, H. Endmember Extraction From Highly Mixed Data Using Minimum Volume Constrained Nonnegative Matrix Factorization. IEEE Trans. Geosci. Remote Sens. 2007, 45, 765–777. [Google Scholar] [CrossRef]

Figure 1. Schematic diagram of the proposed DNMF-AG model.

Figure 2. Demonstration of a three-layer DNMF framework. A hyperspectral data matrix

X \in R^{B \times N}

is given, where B is the number of spectral bands and N is the number of pixels. In the first layer of DNMF,

X

is decomposed into

A_{1} \in R^{B \times P_{1}}

and

S_{1} \in R^{P_{1} \times N}

, where

P_{1}

denotes the dimensionality of the first layer and

P_{1} \leq B

. Next, the abundance matrix

S_{1}

is further decomposed into

A_{2} \in R^{P_{1} \times P_{2}}

and

S_{2} \in R^{P_{2} \times N}

, where

P_{2}

is the dimensionality of the second layer and

P_{2} \leq P_{1}

. This process continues to the third layer, where

S_{2}

is decomposed into

A_{3} \in R^{P_{2} \times P_{3}}

and

S_{3} \in R^{P_{3} \times N}

, with

P_{3}

being the dimensionality of the third layer and

P_{3} \leq P_{2}

. In summary, the three-layer DNMF process can be written as follows:

X \approx A_{1} S_{1}

,

S_{1} \approx A_{2} S_{2}

, and

S_{2} \approx A_{3} S_{3}

. Finally, the reconstructed matrix is given by the following:

\tilde{X} = A_{1} A_{2} A_{3} S_{3}

.

Figure 2. Demonstration of a three-layer DNMF framework. A hyperspectral data matrix

X \in R^{B \times N}

is given, where B is the number of spectral bands and N is the number of pixels. In the first layer of DNMF,

X

is decomposed into

A_{1} \in R^{B \times P_{1}}

and

S_{1} \in R^{P_{1} \times N}

, where

P_{1}

denotes the dimensionality of the first layer and

P_{1} \leq B

. Next, the abundance matrix

S_{1}

is further decomposed into

A_{2} \in R^{P_{1} \times P_{2}}

and

S_{2} \in R^{P_{2} \times N}

, where

P_{2}

is the dimensionality of the second layer and

P_{2} \leq P_{1}

. This process continues to the third layer, where

S_{2}

is decomposed into

A_{3} \in R^{P_{2} \times P_{3}}

and

S_{3} \in R^{P_{3} \times N}

, with

P_{3}

being the dimensionality of the third layer and

P_{3} \leq P_{2}

. In summary, the three-layer DNMF process can be written as follows:

X \approx A_{1} S_{1}

,

S_{1} \approx A_{2} S_{2}

, and

S_{2} \approx A_{3} S_{3}

. Finally, the reconstructed matrix is given by the following:

\tilde{X} = A_{1} A_{2} A_{3} S_{3}

.

Figure 3. Synthetic data (SC1): (a) endmember spectra and (b) the simulated image.

Figure 4. Synthetic data (SC2): (a) endmember spectra and (b) the simulated image.

Figure 5. Parameter analysis: (a) SAD values corresponding to parameters

α

and

β

; (b) RMSE values corresponding to parameters

α

and

β

; (c) SAD values corresponding to parameter

γ

; (d) RMSE values corresponding to parameter

γ

.

Figure 5. Parameter analysis: (a) SAD values corresponding to parameters

α

and

β

; (b) RMSE values corresponding to parameters

α

and

β

; (c) SAD values corresponding to parameter

γ

; (d) RMSE values corresponding to parameter

γ

.

Figure 6. The impact of different K values on unmixing performance.

Figure 7. Results of a performance analysis conducted with different numbers of layers: (a) SAD and (b) RMSE.

Figure 8. Abundance maps produced by all algorithms for SC1.

Figure 9. Unmixing results produced by different methods under different mixing levels: (a) SAD and (b) RMSE.

Figure 10. Convergence curves: (a) objective function value, (b) residual value of reconstruction and (c) endmember and abundance residuals.

Figure 11. Results of ablation experiments conducted under different SNRs: (a) SAD and (b) RMSE.

Figure 12. Outlier robustness analysis. (a) SAD under different spectral anomaly scenarios; (b) RMSE under different spectral anomaly scenarios; (c) SAD with varying numbers of anomalous pixels under the fixed 1/3 anomalous-band condition; (d) RMSE with varying numbers of anomalous pixels under the fixed 1/3 anomalous-band condition.

Figure 13. Abundance maps produced by all algorithms for SC2.

Figure 14. Endmember spectra from the Samson dataset (from left to right: soil, trees, and water).

Figure 15. Abundance maps estimated by different algorithms for the Samson dataset.

Figure 16. Endmember spectra from Jasper dataset (from left to right: soil, tree, water and road).

Figure 17. Abundance maps estimated by different algorithms for the Jasper dataset.

Figure 18. Endmember spectra from the AVIRIS Cuprite dataset.

Figure 19. Abundancemaps estimated by different algorithms for the AVIRIS Cuprite dataset.

Figure 20. Comparison among the endmember spectra extracted by different algorithms from the Urban dataset.

Figure 21. Abundance maps estimated by different algorithms for the Urban dataset.

Table 1. Parameter settings of the comparison algorithms.

Algorithm	VCAFCLS	L1/2NMF	GLNMF	MVNTFTV	MLNMF	SDNMFTV	HGNMFFS	BUDDIP	FaSUn	SGLRWRSU	MSGACD
Parameter	None	$λ = 0.1$	$λ = 0.1$ $μ = 0.1$ $δ = 15$	$λ = 0.1$ $δ = 15$	$L = 10$ $α_{0} = 0.5$	$L = 3$ $λ = 0.02$ $α = 0.005$ $μ = 1000$	$λ = 0.01$ $μ = 0.05$ $β_{1} = 0.3$ $β_{2} = 0.7$	$α_{1, 3, 5} = 1.0$ $α_{2} = 0.001$ $α_{4} = 0.01$ $α_{6} = 0.1$	$T_{A} = T_{B} = 5$ $μ_{1} = 50$ $μ_{2} = 2$ $μ_{3} = 1$	$λ_{s} = 1$ $λ_{g} = 1 \times 10^{- 3}$	$λ_{1} = 0.1$ $λ_{2} = 0.002$

Table 2. Results of SAD, RMSE, and Time under different noise levels on SC1 (best values in bold).

SNR	Metric		VCAFCLS [2005]	L1/2NMF [2011]	GLNMF [2013]	MVNTFTV [2019]	MLNMF [2015]	SDNMFTV [2018]	HGNMFFS [2023]	BUDDIP [2024]	FaSUn [2024]	SGLRWRSU [2024]	MSGACD [2025]	Proposed
SNR = 20	SAD	Mean	$0.1349$	$0.0851$	$0.0714$	$0.1135$	$0.0725$	$0.1079$	$0.0927$	$0.0539$	/	/	$0.0805$	$0.0397$
	SAD	Std	$\pm 2.04 %$	$\pm 2.78 %$	$\pm 2.61 %$	$\pm 2.42 %$	$\pm 1.47 %$	$\pm 2.81 %$	$\pm 3.02 %$	$\pm 1.73 %$	/	/	$\pm 3.05 %$	$\pm 2.83 %$
	RMSE	Mean	$0.1187$	$0.0984$	$0.0899$	$0.1022$	$0.0691$	$0.0966$	$0.0976$	$0.0590$	$0.0996$	$0.0661$	$0.0752$	$0.0446$
	RMSE	Std	$\pm 2.64 %$	$\pm 2.22 %$	$\pm 2.93 %$	$\pm 2.52 %$	$\pm 1.09 %$	$\pm 2.34 %$	$\pm 3.77 %$	$\pm 1.78 %$	$\pm 1.03 %$	$\pm 1.72 %$	$\pm 2.48 %$	$\pm 2.52 %$
SNR = 30	SAD	Mean	$0.1273$	$0.0737$	$0.0694$	$0.0801$	$0.0518$	$0.0766$	$0.0625$	$0.0358$	/	/	$0.0328$	$0.0267$
	SAD	Std	$\pm 1.81 %$	$\pm 1.11 %$	$\pm 1.53 %$	$\pm 3.01 %$	$\pm 1.10 %$	$\pm 0.98 %$	$\pm 2.85 %$	$\pm 1.18 %$	/	/	$\pm 0.82 %$	$\pm 0.41 %$
	RMSE	Mean	$0.1088$	$0.0910$	$0.0884$	$0.0962$	$0.0427$	$0.0946$	$0.0808$	$0.0557$	$0.0428$	$0.0495$	$0.0394$	$0.0344$
	RMSE	Std	$\pm 2.00 %$	$\pm 0.95 %$	$\pm 1.04 %$	$\pm 2.68 %$	$\pm 1.57 %$	$\pm 0.64 %$	$\pm 3.40 %$	$\pm 1.73 %$	$\pm 0.90 %$	$\pm 0.56 %$	$\pm 1.28 %$	$\pm 0.37 %$
SNR = 40	SAD	Mean	$0.1058$	$0.0702$	$0.0404$	$0.0614$	$0.0581$	$0.0737$	$0.0248$	$0.0212$	/	/	$0.0332$	$0.0166$
	SAD	Std	$\pm 1.08 %$	$\pm 0.91 %$	$\pm 1.81 %$	$\pm 3.30 %$	$\pm 0.83 %$	$\pm 1.20 %$	$\pm 1.13 %$	$\pm 0.96 %$	/	/	$\pm 0.31 %$	$\pm 0.47 %$
	RMSE	Mean	$0.0915$	$0.0713$	$0.0487$	$0.0639$	$0.0410$	$0.0849$	$0.0493$	$0.0298$	$0.0275$	$0.0258$	$0.0377$	$0.0204$
	RMSE	Std	$\pm 1.04 %$	$\pm 0.94 %$	$\pm 2.25 %$	$\pm 1.42 %$	$\pm 0.63 %$	$\pm 0.61 %$	$\pm 0.90 %$	$\pm 1.04 %$	$\pm 0.76 %$	$\pm 1.06 %$	$\pm 0.32 %$	$\pm 0.46 %$
Times [s]			$1.924$	$15.48$	$39.87$	$133.89$	$17.51$	$67.17$	$30.51$	$600.23$	$90.60$	$28.37$	$869.35$	$30.22$