Unmixing approaches can be broadly classified based on their primary objectives. First, we examine hyperspectral unmixing methods that assume the number of endmembers in the scene is known a priori. Next, we discuss techniques that are specifically designed to estimate the number of endmembers from the data. Finally, we address methods that simultaneously estimate the number of endmembers and perform unmixing. The latter category presents a set of challenges, as it requires solving two complex tasks—endmember counting and hyperspectral unmixing.
2.1. Hyperspectral Unmixing
Hyperspectral unmixing methods can typically be grouped into two main categories: conventional linear approaches and deep-learning-based approaches. The traditional linear unmixing approaches ignore the complex multiple scattering of the electromagnetic radiation and assume that each photon reflected from the scene belongs to a single material/endmember. Hence, each pixel is a linear combination of all endmembers, weighted by its contribution to the pixel, i.e., its fractional abundance [
7]. These methods include geometrical, statistical, and sparse regression methods [
10]. The geometrical methods such as the N-Findr algorithm [
11], the Pixel Purity Index [
12], the Vertex Component Analysis (VCA) [
13], and the Minimum Volume Simplex Analysis (MVSA) [
14] rely on the assumption that the mixed endmember spectra are enclosed in a simplex set, i.e., a positive cone, assuming that every material/endmember in the hyperspectral image has a pure pixel.
Although simple, the geometrical methods fail when the hyperspectral image is intimately mixed, and an insufficient collection of vectors is present in the simplex facets [
15]. In such cases, the statistical methods provide an alternative solution that formulates the unmixing task as a statistical inference problem. In particular, variants or extensions of the non-negative matrix factorization (NMF) [
10] are adopted for blind-separation-based unmixing approaches. Alternatively, approaches that employ the Bayesian framework, such as [
16,
17], provide the ability to impose priors that constrain and regularize the solution space. Then, the posterior distribution of the abundances and the endmembers is computed. Other nonparametric statistical methods, as those in [
18,
19], exploit the Independent Component Analysis (ICA) method [
20] to address the unmixing challenge. Although statistical methods are capable of unmixing intimately mixed hyperspectral images well, they suffer from a high computational cost [
21].
The sparse regression methods are semi-supervised approaches where the hyperspectral data are expressed as a linear combination of known spectral signatures predefined in spectral libraries. Then, a process of finding the optimal combination of materials that best models the mixed pixel is carried out [
10,
15].
In these traditional linear unmixing methods, each pixel is modeled as a linear combination of fixed endmember signatures and corresponding abundance fractions. While this assumption simplifies the problem, it fails to capture the nonlinear interactions of light with complex surfaces and materials. Moreover, it assumes that each endmember is associated with a single, invariant spectral signature. This assumption rarely holds in real-world scenarios due to spectral variability [
22] caused by illumination changes, atmospheric conditions, or material heterogeneity.
A variety of methods have been proposed to address these limitations. Nonlinear unmixing techniques include kernel-based approaches [
23], polynomial post-nonlinear models [
24], and recent deep-learning-based models [
25,
26] that aim to capture complex interactions in hyperspectral data. In parallel, methods such as the Endmember Bundle approach [
27], Multiple Endmember Spectral Mixture Analysis (MESMA) [
28], and deep generative models [
29] have been developed to tackle spectral variability by allowing each endmember to be represented by a set of spectral signatures.
While these approaches have shown promise, many still rely on prior knowledge of the number of endmembers or operate under supervised settings.
The first attempt to employ deep learning to address unmixing challenges dates back to 2015 [
30] when autoencoders were considered for this task. A cascade of decision models [
31] consisting of a marginalized denoising autoencoder (mDA) [
32] and a non-negative sparse autoencoder (NNSAE) [
33] marked a successful start for using autoencoders in hyperspectral unmixing. Likewise, the work in [
34] proposed a stacked set of NNSAEs to detect outliers prior to the unmixing step. Similarly, DAEN [
35] is composed of two parts: stacked autoencoders (SAEs) for learning the endmembers and a variational autoencoder (VAE) for unmixing the hyperspectral image while penalizing the abundance matrix in order to fulfill the two main constraints: (i) the non-negativity constraint (ANC), and (ii) the abundances-sum-to-one constraint (ASC). Alternatively, the authors in [
36] used the Spectral Information Divergence (SID) [
37] as a loss function for the optimization process instead of using the Mean Squared Error (MSE) due to MSE’s sensitivity to the magnitude of the observed spectra. The latter discriminates the same endmember in a scene based on its absolute magnitude, whereas SID is not sensitive to the absolute magnitude of the endmember. Alternatively, EndNet [
38], an end-to-end deep learning model that used the SAD as the loss function, showed promising results, whereas the Maximum Mean Discrepancy (MMD) was used as a regularization term in the probability metric-based autoencoder (PME) [
39]. Recent works have explored advanced architectural solutions to enhance hyperspectral image unmixing. For instance, CMSDAE [
40] introduced a Channel Multi-scale Processing Block (CMPB) that performs feature extraction across multiple scales at the channel level, avoiding spatial redundancy and enriching feature depth. To fuse multi-level features effectively, the authors designed the Hybrid Attention Fusion Block (HAFB), combining channel and spatial attention (via Criss-Cross attention) for adaptive, focused feature representation. Moreover, the Spectral Information Guidance (SIG) module, employing a Simplified Channel Self-Attention Mechanism (SCAM), enables long-range spectral dependency modeling.
On the other hand, CNNs were considered in the Convolutional Autoencoder for Spatial–Spectral Hyperspectral Unmixing (CNNAEU) [
25] to preserve the spatial structure of the data. Similarly, the work in [
41] relied on a CNN; however, it used a three-dimensional CNN instead of a two-dimensional CNN to capture spectral and spatial information simultaneously. EGU-Net [
42] performs self-supervised two-stream end-member-guided unmixing. It utilizes HySime [
43] to estimate the number of endmembers and uses VCA [
14] as an endmember extraction method. Then, a clustering algorithm, such as K-means [
44], is used to reduce the redundancy in the extracted endmembers. A similar two-stream autoencoder network was proposed in [
45], employing logarithmic SAD as its loss function. This provides anomaly-based guidance that enhances the robustness of the unmixing process.
More recent work such as [
46] attempted to utilize the powerful abilities of transformers to capture the global feature dependencies and preserve the spatial and spectral information in the hyperspectral image. The work in [
47] combines CNNs with transformers by using the CNNs for encoding the hyperspectral image, and then a transformer with multi-head self-patch attention modules captures the contextual information in the feature dependencies of the image patches. Finally, a single convolutional layer is used to reconstruct the image. Similarly, ref. [
48] combines CNNs with transformers in a comparable way by employing CNNs to extract high-level, discriminative representations, which are then converted into semantic tokens via a tokenization process. The transformer module subsequently learns the dependencies between these tokens to improve the unmixing performance. PICT [
49] is a dual-stream network that utilizes transformers with the addition of prior spectral knowledge to guide the network and enhance the unmixing results. Similarly, UnDAT [
50] introduces a double-ware transformer model with two modules, the score-based homogeneous-aware (SHA) module and the spectral group-aware (SGA) module. SHA creates a homogeneous map by splitting the linear feature map, while the SGA module divides the hyperspectral image into multiple spectral groups based on their spectral similarity.
2.2. Estimating the Number of Endmembers
Several algorithms were designed to estimate the number of endmembers in a hyperspectral image. Virtual Dimension (VD) [
51], which is a class of supervised algorithms, along with the Harsanyi–Farrand–Chang (HFC) [
52] algorithm, has frequently been used to find the minimum number of distinct signals in a given spectral matrix. The geometric in-degree distribution (IDD) algorithm [
53] is a supervised algorithm that estimates the Intrinsic Dimension (ID) of the spectral data space in order to determine the number of endmembers. A related approach was introduced in [
49], utilizing the hubness phenomenon, where the IDD is shown to be strongly influenced by the intrinsic dimensionality of the data and becomes increasingly skewed as the dimensionality rises. The HySime algorithm [
15] is an unsupervised method based on the minimum error, aiming at selecting the subset of eigenvectors with the minimum root-mean-square error that best represents the signal subspace. Then, the dimensions of the obtained subspace are considered as the number of endmembers [
53]. Another method based on the eigen-gap approach was proposed in [
54] to estimate the intrinsic dimensionality of hyperspectral images, which corresponds to the number of endmembers. Alternatively, WGSDM [
55] employs the weight-sequence geometry separation detection method, whereas [
56] deploys clustering approaches to estimate the number of endmembers. On the other hand, instead of using clustering algorithms, the authors of [
57] used collaborative sparsity as a promising alternative to address the overestimation of the number of endmembers in hyperspectral data typically caused by spectral variability. The work in [
58] uses Iterative Error Analysis (IEA) and spectral discrimination measurements to determine the number of endmembers and remove redundancy. Recently, ref. [
59] attempted to estimate the number of endmembers by detecting the intrinsic endmember spectra in the image while also eliminating the effects of the illumination-based spectral variability.
2.3. Simultaneous Unmixing and Number-of-Endmembers Estimation
A few attempts have been made to self-estimate the number of endmembers while performing the unmixing. These attempts can also be divided into two categories: conventional approaches and deep-learning-based approaches.
The Sparsity Promoting Iterated Constrained Endmembers (SPICE) [
60] approach was introduced as an extension of the Iterated Constrained Endmembers (ICE) [
61] detection algorithm that adds a sparsity-promoting term to the ICE objective function. This allows SPICE to estimate the number of endmembers and perform unmixing simultaneously. Alternatively, the Sampling Piece-wise Convex Unmixing and Endmember Extraction (S-PCUE) approach was introduced in [
62]. This fully stochastic method determines the number of endmembers by estimating the number of convex regions in the hyperspectral data. On the other hand, the Mixture Analysis with Self-Estimation of the Number of Endmembers (MASENE) [
63] clusters the spectral data using the Competitive Agglomeration (CA) [
64] clustering algorithm. The clustering outcomes are used to derive the convex geometry unmixing model and estimate the number of endmembers. The Maximum Distance Analysis (MDA) [
65] attempts to estimate the number of endmembers based on the assumption that the points and lines that are the farthest from any other point, line, or affine hull in the simplex that the hyperspectral image pixels form are endmembers. MDA also extends MVSA [
14] to create MDA-MVSA, which addresses the limitations of MVSA [
14], as MVSA only extracts the endmembers and requires determining the number of endmembers beforehand. However, when applied in practice, it required knowledge of the actual number of endmembers set in the parameters.
On the other hand, self-estimating the number of endmembers in a hyperspectral image and performing the unmixing task using a deep learning approach has not been sufficiently investigated in the literature. At the time of writing this research, only one deep-learning-based hyperspectral unmixing approach that self-estimates the number of endmembers was proposed in the literature. The Untied Denoising Autoencoder with Sparsity (uDAS) [
21], which is a three-layer part-based denoising autoencoder proposed as an improvement to [
30], addressed the issue of adopting the same regularization terms for both the encoder and decoder, arguing that the tied-weight structure hinders the unmixing process. This allows it to handle noisy hyperspectral images and reduce the redundant network weights that correspond to the extracted endmembers automatically. This yields the determination of the optimal number of endmembers in the image.