Hierarchical Sparse Nonnegative Matrix Factorization for Hyperspectral Unmixing with Spectral Variability

: Accounting for endmember variability is a challenging issue when unmixing hyperspectral data. This paper models the variability that is associated with each endmember as a conical hull deﬁned by extremal pixels from the data set. These extremal pixels are considered as so-called prototypal endmember spectra that have meaningful physical interpretation. Capitalizing on this data-driven modeling, the pixels of the hyperspectral image are then described as combinations of these prototypal endmember spectra weighted by bundling coefﬁcients and spatial abundances. The proposed unmixing model not only extracts and clusters the prototypal endmember spectra, but also estimates the abundances of each endmember. The performance of the approach is illustrated thanks to experiments conducted on simulated and real hyperspectral data and it outperforms state-of-the-art methods.


Introduction
Hyperspectral imaging records a collection of two-dimensional (2D) spatial images acquired at several hundred wavelengths. Each pixel is associated with a reflectance or radiance spectrum that enables a variety of materials to be uniquely identified and quantified [1]. Yet, this high spectral resolution usually comes with a reduced spatial resolution and, as a consequence, the presence in the image of mixed pixels (mixels), e.g., pixels at the boundary of two materials. Hence, quantifying materials that are present in mixels is of primary importance for a full analysis of an hyperspectral image.
Spectral unmixing (SU) aims at decomposing a mixed spectrum into a collection of pure spectra (endmembers) and estimating their proportions (abundances) in each pixel of the image [2,3]. The linear mixing model (LMM) is the most widely used model for SU because of its simplicity [4]. LMM assumes that a single spectrum fully characterizes a given class of material. However, the underlying assumption is unrealistic, because the spectral signature of a material is usually affected by illumination variations and it suffers from intrinsic variability of the class [5,6]. The spectral variability affecting a class of material is usually referred to as endmember variability [7,8].
To tackle this issue, a popular approach consists in assuming that a spectral library describing this endmember variability is a priori available [9]. An optimal combination of endmember spectra from a spectral library is selected before estimating their respective abundances. Among this family of approaches, multiple endmember spectral mixture analysis (MESMA) is probably the most popular method and it has been successfully applied to many applications [10]. More recently, several models have been developed in order to achieve better accuracy while reducing the computational complexity of the combinatorial selection procedure. These advanced models depart from MESMA, since they simultaneously handle all of the spectra composing the spectral library and estimates their corresponding abundances, resorting to additional constraints, e.g., sparsity [11][12][13].
All of these methods provide a simple and flexible representation of endmember variability provided that a spectral library is validated by an expert. However, such prior knowledge and the quality of the spectral library may be difficult to assess [6]. For instance, multiple experts provide generally different and often inconsistent description of endmember variability. To overcome this human-based expertise, automated data-driven endmember bundle extraction methods build the spectral library from the measured spectra directly [14][15][16]. Yet they require a careful tuning of parameters that do not have any physical meaning while significantly affecting the unmixing results.
One alternative consists in designing statistical or parametric methods that do not rely on any spectral library. Statistical methods describes the endmember variability through a probability distribution [7,17,18]. The parametric methods explicitly the variability as a deviation of the spectral signature from a nominal behavior [19][20][21][22]. These models have been resorted to handle spatially or temporally varying endmember signature within a given image [20,23] or across multiple images acquired at different time instants [24][25][26][27]. However, inference from these models is greatly impaired by the endmember variability and it becomes too challenging to accurately account for complex physical processes that underlie most observed variabilities.
In our previous work [28], the endmember variability was modeled as a conical hull, including all spectra of a given endmember class. However, these spectra still need to be selected beforehand to build a spectral library. This paper builds on and extends this promising conical hull-based model of endmember variability to design an unmixing method that does not require the prior knowledge of a spectral library. Instead, the proposed framework relies on a mixing model that is similar to archetypal analysis (AA), which allows for the spectral signatures defining the endmember classes to be automatically learned from the data, yielding a fully unsupervised unmixing method. More precisely, the conventional formulation of AA assumes that observations are convex combinations of so-called archetypes, themselves defined as convex combinations of the observations [29]. AA and some variants have been already instanced in the specific context of hyperspectral image analysis [30,31]. In particular, when conducting unsupervised unmixing, the archetypes have shown to provide physically meaningful estimates of the endmember spectral signatures [32][33][34][35][36]. However, very few attempts have capitalized on the AA concept to explicitly tackle the problem of spectral variability [37,38]. In this work, to handle spectral variability, we assume that the measured pixel spectra are convex combinations of pixel-wise endmember spectral signatures. Capitalizing on our past results [28] and exploiting an AA-like formulation, we define this set of endmembers signatures as conical combinations of so-called prototypal endmembers corresponding to extremal pixels that belong to the data set. In other words, each intra-class endmember variability is described by the conical hull spanned by prototypal endmembers, which have the great advantage of having a clear physical interpretation. By constraining the spectral variability to be modeled through these prototypal endmember spectra, it combines the advantages of unsupervised methods that do not require expert knowledge and those of spectral library-based SU methods that offer a physically sound description of the variability. Thus, the main contributions of this paper are threefold:

1.
It introduces the new assumption that endmember variability can be represented by the conical hulls of prototypal endmember spectra.

2.
It derives the resulting mixing model, which, coupled with a hierarchically structured group-sparsity constraint, allow for the prototypal endmember spectra to be automatically extracted and clustered from the data.

3.
It develops the associated unmixing algorithm that jointly estimates the prototypal endmembers and the abundances within an unsupervised framework.
The structure of this paper is organized, as follows. Section 2 describes related works and their inherent drawbacks and the newly proposed assumption for modeling endmember variability is introduced. In Section 3, the unmixing model that incorporates the new assumption is proposed and the estimation of its parameters is discussed in Section 4. Sections 5 and 6 show results obtained from the experiments. Finally, Section 7 concludes the paper.

LMM with Fixed Endmember Spectra
Let y p ∈ R L×1 denote the L-spectrum measured at the pth pixel of an hyperspectral image, M = [m 1 , . . . , m K ] ∈ R L×K is the matrix of the endmember spectra associated with the K endmember classes, a p = a 1p , . . . , a Kp T ∈ R K×1 is the abundance vector of the pth pixel, where a kp is the abundance fraction of the kth endmember class. The linear mixing model (LMM) assumes that the observed spectrum of the pth pixel y p is approximated by a weighted linear combination of endmember spectra and abundance fractions where n p ∈ R L×1 represents the noise and modeling error. Abundance non-negativity constraint (ANC) and the abundance sum-to-one constraint (ASC) are usually imposed, as follows ∀k, ∀p, a kp ≥ 0, and ∀p, As discussed in the introduction, this model is known to be not sufficiently flexible to describe endmember variability.

LMM with Pixel-Wise Endmember Spectra
Many methods have been developed to incorporate endmember variability in LMM [7,8]. State-of-the-art approaches consider endmember spectra that are spatially variable for each pixel, i.e., where M p = m 1p , . . . , m Kp can be interpreted as pixel-wise endmember spectra [39]. It is worth noting that the model in (3) can be straightforwardly rewritten as a conventional LMM (1), i.e., whereñ p n p + (M p − M)a p . This ambiguity shows that distinguishing measurement noise from spectral variability or, more generally any other second-order effects, such as nonlinearity [40] or adjacency effects [41], is a challenging issue. However, one may expect to correctly describe a particular deviation from the conventional LMM if this deviation has a magnitude significantly greater than the measurement noise level, i.e., (M p − M)a p 2 2 n p 2 2 [42]. Conversely, once the level of this noise is larger than the magnitude of spectral variability, correctly estimating the variability becomes much more challenging. However, hyperspectral imaging for Earth observation offers a favorable applicative context where advanced models can be considered, since the typical signal-to-noise ratios (SNRs) are sufficiently high. Moreover, this ambiguity can also be alleviated by imposing additional assumptions on the components of the mixing model. In the literature, these additional assumptions cast the resulting unmixing methods into several families discussed in what follows.

Spectral Library-Free Mixing Models
A wide class of unmixing methods are unsupervised in the sense that they are not driven by any additional prior knowledge. Among these methods, some assume that the spectral signature m kp associated with the kth endmember class in the pth pixel can be described thanks to a statistical model. The most popular, referred to as normal compositional model, considers the Gaussian [17,[43][44][45][46] m kp ∼ N (m k , Σ) (5) wherem k is the nominal signature associated with the kth endmember class and Σ is a covariance matrix encoding endmember variability. In [47], the authors follow the same approach while advocating the use of the beta distribution to enforce the endmember signatures to belong to the set (0,1). Other unsupervised methods elaborates on explicit parametric (i.e., deterministic) modeling of this variability. When this latter results from illumination variation, a natural approach consists in introducing a scaling factor, as in [19]. The pixel-wise endmember matrix is, thus, defined as where S p = diag[s 1p , . . . , s Kp ] gather spatially varying scaling factors. In [20], the authors model the variability as an additive perturbation dM p around the nominal signatures The pixel-wise endmember spectra can also be modeled as a combination of scaling and additive factors [21]. The aforementioned pixel-wise endmember spectra can be very flexible. Another route followed in [48] consists in performing a linear unmixing in a particular lower-dimensional subspace accounting for spectral variability. However, the optimization of the endmember variability terms is challenging without any additional prior knowledge.

Bundle-Based Mixing Models
A wide class of methods capitalizes on the availability of a spectral library as prior knowledge. This library includes multiple spectra, known as endmember bundles, representative of each endmember class. This library is defined as the L × N structured matrix where the submatrix E k = e 1k , . . . , e N k k ∈ R L×N k defines the bundle associated with the kth endmember class, N k is the number of spectra representative of the kth endmember class with N = ∑ K k=1 N k . In particular, the well-known MESMA approach builds on this formalism to form the pixel-wise endmember matrix M p = m 1p , . . . , m Kp by selecting a single spectrum from each endmember bundle E k [10], i.e., ∀k ∈ {1, . . . , K} , m kp ∈ e 1k , . . . , e N k k .
This model imposes that the endmember variability can only be described by the spectra belonging to the endmember bundles. When endmember variability is not fully captured by the design of these endmember bundles, the performance of MESMA is thus expected to severely degrade. Some unmixing models have been developed in order to overcome the problem of MESMA [12,13,49]. Instead of considering the LMM (1), these methods rather build on the formulation under ASC and ASC where the vector r p ∈ R N×1 are multiple abundances allowing several spectra in each bundles to be used to describe the mixture. By considering all spectra, endmember variability can be described by the convex combination of each spectral signatures defining the endmember bundles. However, these models do not impose any structure to independently describe pixel-wise endmember spectra nor abundances of each class. In addition, due to the ASC, this representation prevents any proper modeling of illumination variation by a scaling factor. To cope with these limitations, the multiple endmember mixing model (MEMM) proposes decoupling the pixel-wise abundance a p from the spectral variability modeling by explicitly introducing additional bundling coefficients B p , i.e., by imposing r p = B p a p [28]. Equivalently, the pixel-wise endmember matrix is then defined as M p = EB p and the mixing model writes where B p ∈ R N×K represents bundling coefficients that are associated with the pth pixel. Under the particular structure (8) for the building E, the bundling matrix B p has the following companion structure where each vector b kp ∈ R N k ×1 is the bundling coefficients associated with the kth endmember class describing the pth pixel. To provide a compact and interpretable representation, this model is granted with the following constraints ∀p, B p 0 and where · 0 is the 0 -pseudo norm s is the maximum number of spectra in B p used to explain the pth pixel. Thus, MEMM describes the endmember variability by a conical hull of each endmember bundle. The conical hull enables a larger range of endmember variability to be modeled, even if the bundles are defined by a few spectra.

Towards Library-Free Bundle-Based Models
Interestingly, the taxonomy of the methods conducted above shows that the unsupervised library-free (statistical or parametric) methods do not exploit the appealing formalism of bundles to characterize the spectral variability. Conversely, all of the aforementioned bundle-based methods require these endmember bundles to be fully characterized before estimating the abundances. When no prior knowledge regarding the scene of interest is available, these bundles may be manually extracted by an expert or automatically extracted from the data [9,14,15]. However, unmixing results are dramatically affected by the quality of this two-step process, since they are largely driven by this preliminary extraction. To the authors' knowledge, no existing methods have been proposed to fill the gap between these two families of methods. As a consequence, this work attempts to reconcile both approaches while leveraging their respective advantages. It aims at defining a new mixing model: which is able to propose a bundle-driven description of the spectral variability to ensure physically sound interpretation of this variability; and, 2.
which does not require the prior knowledge of a spectral library, akin to the unsupervised methods.
The proposed model is described in the next section.

A New Paradigm of Endmember Variability
Recent work has shown that conical hulls generated by endmember bundles have great potential to model endmember variability [28]. However, this method does not fully exploit the geometrical structure of these hulls, since, under the MEMM model (11), they are allowed to be formed by any spectra that belong to the bundles. However, the conical hull can be spanned by a set of specific spectra in the bundle, herein referred to as prototypal endmember spectra. We propose to build on this finding to geometrically describe the endmember variability by the conical hull defined by these prototypal endmember spectra, as illustrated in Figure 1. From a physical perspective, this assumption means that an endmember class has a hierarchical structure and the endmember variability can be described by a mixture of prototypal endmember spectra belonging to a lower hierarchy. For example, an endmember class referring to vegetation can be defined as a mixture of several vegetated instances (e.g., dry or green vegetation). A similar concept has been explored in [50]. Figure 2 illustrates the relevance of such modeling. It considers an initial set of tree spectral signatures, highlights 10 prototypal spectra, and depicts generated spectra under this conical formalism. Despite the complexity of the variability affecting the initial set of signatures, it shows that it can be well captured and accurately described by a relevant subset defined by conical combination of these prototypal endmembers.

A New Spectral Library-Free Bundle-Based Model
The above paradigm can be embedded into a generative mixing model to describe the whole set of pixel spectral signatures Y = [y 1 , . . . , y P ] T ∈ R L×P to be unmixed. In our objective of an unsupervised unmixing method similar to the spectral library-free methods that are discussed in Section 2.2.1, we further assume that the prototypal endmembers that are required to describe the variability belong to the observed spectra. This property looks like the pure-pixel or near-separable assumption underlying a wide class of endmember extraction algorithms such as N-FINDR [51], VCA [52] and SPA [53]. However, it offers more flexibility, since the pixel-wise endmember signatures that are actually involved in the mixtures are defined as conical combinations of these prototypal endmembers. It follows that the proposed model generalizes (10) and (11) by replacing the a priori chosen bundle matrix E by the whole set of pixel signatures Y, leading to where Kp ∈ R P×K gathers bundling coefficients (Note that the number of rows of the bundling coefficient matrix B p associated with the pth pixel is the number P of pixels, contrary to the definition of the matrix in (11)). of the pth pixel. Such a formulation has been, for instance, advocated in the unsupervised unmixing method in [54]. It is also intimately related to AA [29], already advocated as a convenient way to implicitly define the endmember signatures as combinations of the measured pixel spectra [31][32][33][34][35][36]. Satisfying the requirements stated in Section 2.2.3, it has the great advantage of jointly performing bundle extraction, bundle clustering, and unmixing within a unifying framework. Obviously, this newly proposed model can be still cast as a LMM with pixel-wise endmember spectra, since it can be rewritten as When considering all the pixels, the model (14) becomes where the P × PK-matrix B = B 1 B 2 . . . B P is the whole set of bundling coefficients (b pj ) of the P pixels. Moreover, the abundance matrix is structured, as follows The key step of the proposed modeling lies in the expected particular structure of the bundling matrices B p . Indeed, departing from a conventional AA, the specific hierarchy that is stated in Section 3.1 can be encoded by group sparsity-enforcing constraints imposed on the entries of these matrices, as shown in the next paragraph.

Hierarchical Group-Sparsity
As discussed above, the pixel-wise signatures that are associated with each endmember class are defined as the conical combinations of so-called prototypal endmembers extracted from the data. This assumption can be formulated by imposing a two-or three-level group sparsity imposed to the bundling matrix B, as illustrated in Figure 3. These constraints aim at selecting a few prototypal endmember spectra and assigning a single endmember class to each selected prototypal endmember spectrum. They are introduced in what follows.  Figure 3. Hierarchical sparsity at different levels. Red squares represent active coefficients while gray ones represents inactive coefficients.
1st level of sparsity-The hyperspectral image is assumed to only contain a few prototypal endmember spectra. This means that only a few nonzero rows are active in the bundling matrix B. This can be encoded by resorting to mixed 0,2 -norm [55] or to a collaborative l 0 -pseudo norm [56,57] imposed to B. By introducing the vector of row-wise energies where S 1 defines the sparsity level. The set of nonnegative bundling matrices satisfying (19) 2nd level of sparsity-at a higher level of the hierarchy, each prototypal endmember spectrum should be associated to a single endmember class. This means that, at most, one group of coefficients belonging to the kth endmember class should be active in the pth row denoted b p of the bundling matrix B (p = 1, . . . , P). This can be expressed as a set of P group-sparsity constraints, each associated with each row of B. More formally, let partition {1, . . . , PK} into K non-overlapping groups G (2,1) , . . . , G (2,K) where G (2,k) gathers the indexes (note that here G (2,k) = {k + p|p = 0, . . . , P − 1}) of the P columns of B associated with the kth endmember class. For each row b p of B, we define the vector of endmember-wise energies Imposing that a given prototypal endmember characterizes a single endmember class is formulated as with S 2 = 1. The subset of B 1 of bundling matrices whose the P rows jointly satisfy the constraints (21) is denoted B 2 , i.e., 3rd level of sparsity (optional)-at this stage of the hierarchy, for a given endmember class k ∈ {1, . . . , K}, the pixel-wise endmember spectra m kp for all pixels p = 1, . . . , P are defined by combinations of the same set of prototypal signatures. At a 3rd level of the hierarchy, one may want to impose further sparsity. In the most general case, this can be achieved by considering the constraint where S 3 is the number of non-zeros values in the whole matrix B. Note that a more structured sparsity could be also imposed, for instance, joint sparsity over a group of pixels [58]. The subset of B 2 of bundling matrices satisfying the constraints (23) is denoted B 3 , i.e., This three-level hierarchical sparsity structure shows that groups of coefficients in each hierarchy are non-overlapping and can be interpreted as a tree-structured sparsity [59]. In addition, to reach interpretable and physically meaningful abundance fractions, the conventional ASC and ANC will be complemented by an 0 -pseudo-norm sparsity constraint, i.e., In the sequel, the set of abundance vectors satisfying the ASC and ANC (2) is denoted A.

Further Relationships with Existing Models
The proposed model (16) can be related to several existing models recently proposed in the literature. First, by denoting R = BA, the model in (16) can be rewritten as where R directly stands for the abundances of prototypal endmember spectra. When this matrix is granted with an 2,1 -norm group sparse regularization, this leads to the group lasso with unit sum and positivity (GLUP) model proposed in [54]. More generally, promoting sparsity over the rows of R is generally referred to as nonnegative sparse regression with self-dictionary and it has been extensively studied in the context of hyperspectral unmixing [57,60,61]. Without imposing further structure on R, akin the 2nd level of sparsity introduced in the previous section, these methods merely extract redundant prototypal endmember spectra when endmember variability is present. Moreover, the proposed model generalizes these approaches by describing endmember variability thanks to the conical hull of prototypal endmember spectra rather than their convex hull. Besides, when the proposed model considers a common P × K bundling matrix for all pixels, i.e., B 1 = . . . = B P B , it can be rewritten as where YB = M can be interpreted as fixed endmember spectra. If only ANC and ASC are considered, this model is equivalent to the robust constrained matrix factorization [36]. When the sum-to-one constraint is additionally imposed for each column ofB, it yields the conventional AA [31][32][33][34][35][36]. This shows that the proposed model generalizes these two families of method by allowing for pixel-wise bundling coefficients that lead to a pixel-wise description of endmember variability.

HSNMF Unmixing Algorithm
We now propose to build on the model introduced in Section 3.2 that is associated with the hierarchical sparsity constraints detailed in Section 3.3 to derive an unmixing algorithm, referred to hierarchical sparse nonnegative matrix factorization (HSNMF). The associated optimization problem can be formulated as the following non-convex minimization problem where ı · (·) denotes the indicator function on the corresponding set and λ a adjusts the level of abundance sparsity. Solving this optimization problem is challenging, since the associated constraints and penalizations involve nonconvex non-smooth functions. Section 4.1 introduces an iterative algorithm converging to a critical point of the objective function (27). Because the problem is non-convex, the resulting estimates highly depends on the initialization of the parameters of interest A and B. Thus, Section 4.2 proposes a dedicated initialization step. The whole process of the unmixing procedure is represented as a flowchart in Figure 4.

PALM Algorithm
We propose resorting to the proximal alternating linearized minimization (PALM), which provides a sequence of iterates that converge to a critical point of the objective function [62]. PALM iteratively updates the parameters A and B by alternatively conducting a gradient descent step and then conducting a proximal step associated with the corresponding constraints. The HSNMF algorithm is summarized in Algorithm 1, whose main steps are detailed in what follows.

Updating B
The updating rule for B is where f (·, ·) is the unconstrained objective function in (27), and its gradient is given by The proximal step consists in a projection onto the set of constraints B 3 . The hierarchical structure of these constraints allow for this proximal mapping to be decomposed (see [59,63] for more details).

Updating A
As in [28], the P columns a p (p = 1, . . . , P) of the abundance matrix A can be updated independently according to the following rule The proximal step can be conducted while using the approach presented in [64].

Initialization and Computational Cost Reduction
The aforementioned optimization approach may raise two crucial issue. First, the optimization can be computationally expensive due to the size of Y. Indeed, without any pre-processing, the self-dictionary approach imposes that the dimension of B corresponds to the number of pixels P. Instead of solving (27), one strategy that is advocated in [54] is to crudely prune the observation matrix to solve where Y ω denotes a column subset of Y indexed by ω ⊂ {1, . . . , P}. Moreover, because the minimization problem is highly non-convex, the reached solution depends on the initialization. In what follows, we provide recipes to conduct relevant initialization of the matrices B and A and to reduce the overall computational complexity. First, capitalizing on the relation between the proposed model and the one recalled in (25), we propose to solve by resorting to well-established methods already proposed in the literature problem [54,57,61,65]. This pre-processing extracts candidate prototypal endmember spectra gathered in the submatrix Y ∈ R L×S 1 and estimates the corresponding abundancesR ∈ R S 1 ×P . Finally, the whole initialization process, as summarized in Algorithm 2, follows the four steps detailed below.
(i) Extraction ofỸ-candidate prototypal endmember spectraỸ are extracted by solving (30). In this work, we resort to the XRAY algorithm [65] since it works on the conical hull and is able to manage rather large dimensional settings. Again, if the number of pixels defining the observation matrix Y is still too large, pruning strategies should be employed, as in [54,61].
(ii) Definition of the bundles-to build consistent bundles, the S 1 extracted prototypal endmembers undergo a clustering step, i.e., they should be assigned a class label c s ∈ {1, . . . , K} (s = 1, . . . , S 1 ) such that c s = k if the prototypal endmemberỹ s is associated with the k endmember class. In this study, the hierarchical clustering based on rank-two NMF (H2NMF) [66] has been used since it has shown to outperform conventional clustering methods (e.g., spherical k-means) with a reasonable computational cost.
(iii) Initialization of A-the abundance matrix is initialized as where G ∈ R K×S 1 is a grouping matrix that computes the sum of multiple abundances inR with respect to the endmember bundles identified in the previous step. More precisely, the elements g ks of G (k = 1, . . . , K, s = 1, . . . , S 1 ) are defined thanks to the class vector c = [c 1 , . . . , c S 1 ] as (iv) Initialization of B-finally, the bundling coefficient matrices B 1 , . . . , B P are initialized through a normalization procedure, i.e., (33) where and are element-wise multiplication and division and g k ∈ R S 1 is the kth row of G. This initialization procedure has the advantage to provide prototypal endmember spectraỸ, whose number is much smaller than the whole set of observed spectra Y. Moreover, by exploiting the estimated clustering defined by the vector c of endmember class, only bundling coefficients that are related toỸ can be used for unmixing. The other coefficients associated with spectra in Y not identified as prototypal endmembers by XRAY can be fixed to 0 before solving the problem (27), explicitly ensuring the two first levels of sparsity. As a consequence, this optimization problem can be split into P optimization problems that can be solved independently for each pixel, i.e., By interpretingỸ as endmember bundles, this problem can be easily related to the MEMM model (11) for which an efficient unmixing algorithm has been proposed in [28]. This finding leads to the following important remarks: (i) the proposed method generalizes MEMM by incorporating automatic endmember bundle extraction and (ii) the MEMM algorithm can be resorted to solve the problem that is based on prototypal endmember spectra extracted beforehand. The performance of this strategy is assessed in the next sections thanks to the experiments conducted on synthetic and real data.

Experiments Using Simulated Data
First, the performance of proposed HSNMF method has been evaluated using two simulated data sets and compared to state-of-the-art methods. These two synthetic hyperspectral images are available online at http://dobigeon.perso.enseeiht.fr/data/HS/2020_HSNMF/datasets.zip.

Simulated Data Sets
SIM1-the first data set, referred to as SIM1, has been generated from K = 10 spectra extracted by XRAY from the USGS spectral library. For each endmember class, N = 3 prototypal endmember spectra have been generated using the approach proposed in [25] such that endmember variability of each class was moderate and does not overlap each other. A set of P = 1000 mixed spectra has been generated using the following five steps. First, the number K p of endmember classes actually involved in the pth pixel has been fixed as K p = 1 + u p where u p is randomly drawn according to the Poisson distribubtion P (1). Second, for each pixel, a random combination of K p endmember classes has been selected. Third, pixel-wise endmember spectra has been generated from the conical hull formed by the prototypal endmember spectra for each class. Forth, the abundances of the selected endmember classes have been randomly generated while using a uniform distribution to jointly ensure ANC and ASC. Finally, the observed spectra have been generated following the LMM as the weighted combination of the generated pixel-wise endmember spectra, corrupted by an additive white Gaussian noise with SNR = 30 dB. An experimental scenario with a lower SNR (i.e., SNR = 20 dB) is considered in the online Supplementary Materials. SIM2-the second data set, referred to as SIM2, considered spatial information has been generated from K = 4 spectra. Compared to SIM1, the number of endmember class has been reduced to allow for larger amounts of endmember variability within each class. In each class, N = 30 prototypal endmember spectra and pixel-wise endmember spectra have been generated similarly as SIM1. The corresponding K = 4 abundance maps of P = 1000 pixels have been randomly generated as in [14] to provide spatially consistent areas, as illustrated in Figure 5. The linearly mixed spectra have been corrupted by an additive white Gaussian noise with SNR = 30 dB. An experimental scenario with a lower SNR (i.e., SNR = 20 dB) is considered in the online Supplementary Materials.

Compared Methods
The following unmixing strategies have been compared with the proposed method.
(i) C-SUnSAL [67]-the constrained sparse unmixing by variable splitting and augmented Lagrangian (C-SUnSAL) can be considered as a state-of-the-art fully constrained least squares algorithm to conduct unmixing under ASC and ANC. The method has been included for comparison to assess the impact of neglecting spatial variability.
(ii) ELMM [19]-the extended linear mixing model (ELMM) models endmember variability as a scaling factor with a unique signature in each endmember class. It has been included in this comparison to evaluate the advantage of the conical hull to model pixel-wise variability.
(iii) MEMMs [28]-as for ELMM, the multiple endmember mixing model uses a single spectrum within each class. It can be interpreted as a less flexible counterpart of the proposed method since it also imposes sparsity for the selection of endmember classes.
(iv) EBE [15]-the automated endmember bundle extraction method (EBE) iteratively extracts endmember spectra from different subsets of randomly selected pixels and clusters them as endmember bundles. In this study, EBE has been complemented with fractional lasso (FL)-based unmixing [12] and MEMM, denoted as EBE-FL and EBE-MEMM, respectively, since they have shown to outperform other existing methods [12,28]. Note that the original implementation of EBE uses a k-means clustering. To provide fair comparison, the clustering step has been conducted while using H2NMF, as for the proposed method. EBE has been included for comparison to show the interest of considering conical hull to model pixel-wise variability.
The endmember spectra required by C-SUnSAL, ELMM, and MEMMs have been extracted by SVMAX [68]. The algorithmic parameters of ELMM, MEMMs, MEMM, FL, and HSNMF have been empirically adjusted for each simulation to reach the best result. Note that, in this study, only a single parameter controls the sparsity of endmember classes for MEMMs, MEMM, and HSNMF. For ELMM, the TV-based spatial regularization has only been considered for SIM2. For FL, an 1,q -norm with q = 0.1 has been considered to promote a higher level of sparsity when selecting the endmember classes. For EBE, the number of subsets has been chosen in {10, 20, . . . , 50} and the percentage of randomly selected pixels in each subset has been chosen in {20, 40, . . . , 100}. For HSNMF, the number S 1 of candidate prototypal endmembers has been determined in the set {50, 100, . . . , 200}, the optional 3rd level of sparsity has been omitted, avoiding to adjust S 3 and the parameter adjusting the abundance sparsity has been set to λ a = 0.1. A detailed analysis of the impact of these parameters is conducted in the online Supplementary Materials.

Performance Criterion
Four criteria are considered for quantitative validation of the methods. The performance in term of abundance estimation has been evaluated thanks to the root mean square error (RMSE) used. RMSE has been defined as where a kp andâ kp are the true and estimated abundance that are associated with the kth endmember class at the pth pixel. The reconstruction error (RE) has been used to quantify the model fitting where y p andŷ p are the actual and reconstructed spectrum of the pth pixel. The spectral angle mapper (SAM) averaged over the P pixels has been computed to evaluate the estimation of the pixel-wise endmember spectra where m kp andm kp are the reference and estimated pixel-wise endmember spectra of the kth class at the pth pixel. Note that C-SUnSAL uses fixed endmember spectra for all pixels. The Jaccard distance (JD) has been used in order to validate whether the compared methods can select a correct combination of endmember classes. It is defined as where S andŜ are the true and estimated support sets (i.e., indexes of nonzero values) of the pth pixel, |X | represents the total number of elements in the set X . Finally, the methods have been compared with respect to their computational times. Table 1 shows the performance with respect to the different criteria for data set SIM1. The performance of C-SUnSAL is worse than other methods, as expected because the method fails to incorporate endmember variability. ELMM performs poorly in terms of RMSE and JD, as compared with other methods that incorporate endmember variability. This can be explained by the fact that ELMM does not take advantage of the sparsity characterizing endmember classes. On the other hand, ELMM produces the lowest RE, since it exploits the higher degree of freedom by allowing for an overestimated number of endmember classes for unmixing each pixel. MEMMs performed better than ELMM and as well as the EBE-based methods in terms of RMSE and JD. The sparsity-promoting regularization used in MEMMs leads to more accurate estimates of abundances than ELMM. However, the estimated pixel-wise endmember spectra are much worse than the EBE-based methods and HSNMF. This shows that a simple scaling factor is not sufficient for accurately describing endmember variability. HSNMF performs better than the EBE-based methods in terms of RMSE, SAM, and JD, and produced comparable results in term of reconstruction. The results obtained on SIM1 show that the performance of HSNMF is comparable with the EBE-based methods when the number of endmember classes is large and endmember variability is relatively limited. The SIM2 data set is more challenging than SIM1, because a larger amount of endmember variability is present in each endmember class. In such a case, abundances estimated by simple nonnegativity and sum-to-one constraints (C-SUnSAL) show large errors, as shown in Table 2. ELMM and MEMMs perform better than C-SUnSAL and show that scaling endmember spectra is able to describe a large part of endmember variability. Unlike SIM1, EBE does not produce accurate endmember bundles. This leads to poor estimates of abundances and pixel-wise endmember spectra (large values of RMSE, RE, and SAM). HSNMF outperforms other compared methods. This shows that the assumption that endmember variability is well described by the conical hull when a large amount of endmember variability affects the observations. Finally, the computational times of all methods are discussed. C-SUnSAL is the fastest, at the price of a poor unmixing performance. HSNMF has an overall computationally complexity that was comparable to ELMM, but it is shown to be more expensive than other compared methods.

Experiments Using Real Data
In this experiments, HSNMF has been evaluated using two real hyperspectral images and compared with the methods considered in Section 5.2. These two real hyperspectral images are available online at http://dobigeon.perso.enseeiht.fr/data/HS/2020_HSNMF/datasets.zip.

MUESLI Image
The first hyperspectral image, referred to as MUESLI image, was acquired in June 2016 over the city of Saint-André, France. In the experiments that were reported in this paper, the image acquired by the VNIR sensor has been used, providing images with a spatial resolution of 1m and composed of L = 140 spectral bands after removing bands affected by noise. In the imaged scene, K = 4 endmember classes (i.e., tree, crop soil, bare soil and grass) are present. In the red-green-blue (RGB) composition that is depicted in Figure 6(left), the white and dark green homogeneous regions are identified as crop soil and tree, respectively. In these regions, only a single endmember class is more likely to be present. Light green and brown regions show grass and bare soil, respectively, with pure and mixed pixels. Abundance estimation can be qualitatively evaluated thanks to these simple homogeneous regions. For each endmember class, pixel-wise endmember variability is expected to be present because of illumination variations. This image is used to compare different strategies of endmember variability modeling.

AVIRIS Image
The second image, referred to as AVIRIS image, was acquired over Moffett Field, CA, USA, by the Airborne Visible/Infrared Imaging Spectrometer (AVIRIS). It is composed of L = 178 spectral bands after noisy bands were removed. In the scene depicted in Figure 6(right) as a RGB image, K = 3 endmember classes (i.e., vegetation, soil, and water) can be identified. Each region shows a homogeneous pattern and the boundaries of the regions can be considered as mixed pixels. In addition, previous studies exhibit nonlinearly mixed pixels [69,70]. In such case, nonlinearly mixed pixels may be wrongly identified as prototypal endmembers and they are discarded in a preprocessing step (see Section 6.2). This image was used to test whether the proposed method can perform as well as the others even when facing to nonlinearly mixtures.

Initialization
Endmember spectra required by C-SUnSAL, ELMM, and MEMMs have been extracted by SVMAX and their parameters were empirically determined. Specifically, the maximum number of prototypal endmember spectra and the parameters required for EBE have been adjusted, so that the clustering step produce relevant bundles. The parameters controlling the abundance sparsity have been adjusted to produce abundance maps that are consistent with the prior knowledge obtained from the RGB images. Extracting and clustering endmember bundles or prototypal endmember spectra are challenging procedures when analyzing real images. In the MUESLI image, the spectra with low magnitudes have been removed, since they are likely identified as shadow. Similarly, in the AVIRIS image, the spectra with low magnitudes have been classified as "water" beforehand. With such pre-processing, correct clusters are obtained for each image. In addition, the robust principal component analysis (RPCA) [71] has been conducted in order to identify and remove nonlinearly mixed pixels from the AVIRIS image.

Results
For both MUESLI and AVIRIS images, the extracted endmember spectra, endmember bundles, prototypal endmember spectra, and pixel-wise endmember spectra exhibit similar patterns, as shown in Figures 7 and 8.The single set of endmember spectra corresponds to one of the instances of endmember bundles or prototypal endmembers. The endmember bundles extracted by EBE includes a larger number of endmember spectra for each class and a denser distribution than the prototypal endmember spectra. The overall endmember variabilities that are captured by EBE and the prototypal endmember spectra are similar, despite some noticeable differences. The pixel-wise endmember spectra estimated by HSNMF show much larger amounts of endmember variability than those captured by EBE and the prototypal endmember spectra. This showed that HSNMF is able to capture pixel-wise illumination changes and to generate a scaled endmember spectrum for each pixel from the prototypal endmember spectra. The abundance maps show clear differences between the compared methods for the MUESLI, as shown in Figure 9. C-SUnSAL is very sensitive to the illumination variations, especially in crop soil area, because it is unable to account for endmember variability. ELMM seems to be less sensitive to the illumination variations than C-SUnSAL. However, it tends to overestimate the number of endmember classes for each pixel. MEMMs produce sparser abundance maps than C-SUnSAL and ELMM, but it fails to estimate accurate abundances of tree. EBE-FL shows spatially homogeneous abundance maps and does not capture the mixed pixels composed of grass and bare soil. Although EBE-MEMM shows similar abundance maps than HSNMF, abundance maps of crop soil are "noisy". HSNMF estimates abundance maps that are consistent with the spatial distributions of the endmember classes in Figure 6 (left). This shows that the conical hull derived by the prototypal endmember spectra is able to provide accurate estimates of endmember variability. For the AVIRIS image, the overall spatial distributions in the abundance maps of all methods are similar, as depicted in Figure 10. However, the abundances estimated by C-SUnSAL, ELMM, and MEMMs underestimate the presence of vegetation and soil. This shows that a single endmember spectrum within each class could lead to underestimated abundances, especially for the pure pixels. On the other hand, the abundances produced by EBE-FL, EBE-MEMM and HSNMF are visually more consistent with the regions of vegetation and soil in the RGB image. Modeling pixel-wise endmember spectra by bundles or prototypal endmember spectra leads to more accurate abundances when compared to methods that are based on fixed endmember spectra.

Conclusions
This paper proposed a new paradigm for describing endmember variability when conducting hyperspectral unmixing. It assumed that the data simplex is described by the conical hulls of prototypal endmember spectra. Under this assumption, HSNMF introduced hierarchical sparsity and jointly conducted the extraction and clustering of prototypal endmember spectra within a single framework. As a result, HSNMF produced a reliable estimation of pixel-wise endmember spectra. The proposed strategy was tested thanks to experiments conducted on simulated and real data. HSNMF outperformed other methods for simulated data, especially when the amount of endmember variability was large. The results on real data also showed that HSNMF lead to results that are in agreement with those of state-of-the-art methods. This tended to demonstrate that the conical hull of prototypal endmember spectra was a useful concept to describe endmember variability. Future work includes the improvement of the initialization step (extraction and clustering of prototypal endmember spectra) to further reduce the computational burden of the method.