Hyperspectral Unmixing Based on Constrained Bilinear or Linear-Quadratic Matrix Factorization

Unsupervised hyperspectral unmixing methods aim to extract endmember spectra and infer the proportion of each of these spectra in each observed pixel when considering linear mixtures. However, the interaction between sunlight and the Earth’s surface is often very complex, so that observed spectra are then composed of nonlinear mixing terms. This nonlinearity is generally bilinear or linear quadratic. In this work, unsupervised hyperspectral unmixing methods, designed for the bilinear and linear-quadratic mixing models, are proposed. These methods are based on bilinear or linear-quadratic matrix factorization with non-negativity constraints. Two types of algorithms are considered. The first ones only use the projection of the gradient, and are therefore linked to an optimal manual choice of their learning rates, which remains the limitation of these algorithms. The second developed algorithms, which overcome the above drawback, employ multiplicative projective update rules with automatically chosen learning rates. In addition, the endmember proportions estimation, with three alternative approaches, constitutes another contribution of this work. Besides, the reduction of the number of manipulated variables in the optimization processes is also an originality of the proposed methods. Experiments based on realistic synthetic hyperspectral data, generated according to the two considered nonlinear mixing models, and also on two real hyperspectral images, are carried out to evaluate the performance of the proposed approaches. The obtained results show that the best proposed approaches yield a much better performance than various tested literature methods.


Introduction
Following advances in the signal and image processing fields, remote sensing hyperspectral imaging systems are now widely adopted for many Earth observation applications [1][2][3]. Hyperspectral imaging sensors, which have a high spectral resolution, collect narrow and contiguous spectral bands, at once, with wavelengths ranging from the visible spectrum to the infrared region [4]. However, due to the relatively low spatial resolution of these sensors, mixed pixels, characterized by mixed spectra of more than one pure material (also called endmember), may occur in collected data [5][6][7]. Such a case can prevent direct identification of endmembers and lead to inaccuracies in the quantification of the observed areas, and therefore, requires further processing to unmix these mixed spectra.
Unsupervised spectral unmixing (SU) is one of the most used techniques for processing hyperspectral data by achieving the separation of the observed mixed spectra [5][6][7]. These approaches, also known as blind source separation (BSS) techniques in the field of signal processing, typically aim at decomposing observed mixed spectra into a collection of

Motivations and Contributions
As mentioned above, all introduced methods in this work extend the BMF method proposed in [62]. Indeed, in that previous work, the focus was on the development of a new bilinear unmixing principle and an associated cost function that was optimized by using the standard Nelder-Mead algorithm. Here, the introduced approaches, which are developed for the considered mixing models, are based on original optimization algorithms using bilinear or linear-quadratic matrix factorization with non-negativity constraints. This already constitutes a first novelty of this work. Moreover, the work reported here substantially extends the approaches described in [60,61]. Indeed, the latter approaches only use the projection of the gradient and require a "manual" selection of their learning rates. Setting these rates to (nearly) optimal values remains the weak point of this type of algorithm. Here, a second type of algorithm is developed. These new algorithms, which overcome the above drawback, employ multiplicative projective update rules with "automatically" chosen learning rates. This also constitutes one of the novel aspects of this work. Furthermore, and as another novel aspect of this paper, the endmember abundance fractions estimation step, with three alternative approaches, is also considered in this work, whereas the methods described in [60,61] concern only the endmember spectra extraction step. Moreover, it should be noted that the reduction of the number of manipulated variables in the optimization processes constitutes another originality of all proposed methods, as compared to the one reported in [47].
Finally, experiments are carried out to evaluate the performance of the proposed approaches. These experiments are based on realistic synthetic hyperspectral data, generated according to the two considered bilinear and linear-quadratic mixing models, and also on two real hyperspectral images (which also represent an extension from the work reported in [60] and [61]). The obtained results are analyzed in a much more detailed way than in [60,61], and are compared to those obtained using linear [29,30,64] and bilinear or linear-quadratic [47,65] methods from the literature.

Data Models
As introduced and mentioned in [39,47,60,61], every observed spectral vector associated with a pixel in a hyperspectral image is here considered as a bilinear or linear-quadratic mixture of different endmember spectra contained in the considered image. Thus, mathe-Remote Sens. 2021, 13, 2132 4 of 29 matically, the non-negative reflectance spectrum x i (column vector of size L), from pixel i of the considered hyperspectral image, can be written in the following form for LQ mixtures: where s j (column vector of size L) is the non-negative reflectance spectrum of the endmember j ( corresponds to an element-wise multiplication, s j s l is here considered as a "pseudo-endmember" spectrum). The mentioned variables a j (i) and a j,l (i) respectively correspond to the linear and second-order abundance fractions. L and M respectively correspond to the number of spectral bands in the considered hyperspectral image and the number of endmembers contained in the imaged area. In order to better highlight the linear and second-order auto-and cross-terms of the mixing models considered in these investigations, the model presented in Equation (1) is rewritten as follows [61]: a j,l (i)s j s l + M ∑ j = 1 a j,j (i)s j s j (2) where ∑ M j = 1 a j (i)s j represents the linear terms, ∑ M−1 j = 1 ∑ M l = j+1 a j,l (i)s j s l represents the second-order cross-term (s j s l with j = l corresponds to a "cross-pseudo-endmember" spectrum), and ∑ M j = 1 a j,j (i)s j s j represents the second-order auto-terms (s j s j here corresponds to an "auto-pseudo-endmember" spectrum).
As mentioned above, and since the bilinear model is considered as a subset of the LQ one, (2) describes the general expression of the LQ model for a given observed non-negative reflectance spectrum x i of a pixel i. In the case when the second-order auto-terms are not considered (the last term in Equation (2)), this equation describes the expression of the bilinear model. For P pixels, with P ≥ 2 , the model Equation (2) can be written in matrix form as follows, by adapting [47]: with X = [x 1 . . . x P ] T (matrix of the observed pixel spectra, with dimension P × L), where [.] T corresponds to the matrix transpose, A is the linear and second-order (quadratic) abundance fraction matrix, and S is the matrix containing the endmember spectra, crosspseudo-endmember spectra, and auto-pseudo-endmember spectra, with: It is clear here that, in the proposed approaches, the location of the quadratic terms in the considered mixing models is imposed: the cross-pseudo-endmember spectra are contained in the matrix S b and the auto-pseudo-endmember spectra are contained in the matrix S c . These two types of quadratic spectra are calculated by using the M spectra from the linear part of S by means of an element-by-element multiplication operation. In the bilinear mixing model (i.e., when the second-order auto-terms are not taken into account), the matrices A c and S c do not appear in Equation (3).

Proposed Methods
In these investigations, the designed hyperspectral unmixing methods aim towards modeling the mixing function defined by Equation (3). The variables involved in the considered unmixing methods consist of two matrices A and S, which respectively aim at estimating A and S. The rows of the matrix S are used to decompose the row vectors of the matrix X, while the matrix A contains the linear and second-order abundance fraction coefficients of this decomposition. Moreover, it should be noted here that the matrix S is constrained as described in Equations (8)- (11), whereas only the top M rows of S, which contain the estimated endmember spectra, are free; all the rows that follow are elementwise products of the above master M rows [62], making them slave rows. The proposed approaches minimize the following cost function: where . F denotes the Frobenius norm. As explained above, since only the elements of the first M row vectors of the matrix S are considered as master variables, they are freely tuned, while all slave subsequent row vectors of this matrix are updated by using element-wise products together with the above top M row vectors. Furthermore, the matrix A is considered as a slave variable and it is defined by its optimum least square solution, which minimizes the cost function J 1 for a considered value of S (assumed to have full row rank). Thus, the matrix A is predetermined as follows: where (.) −1 denotes the matrix inverse. Replacing the matrix A by its optimal value A opt in Equation (12), the cost function to be optimized becomes: In the latter equation, A and the considered cost function J 2 are defined by a closedform expression, which permits the calculation of the gradient expression of J 2 with respect to the master part of matrix S. After deriving this expression for each mixing model, this gradient is used in the endmember spectra extraction step of the proposed methods. To simplify the calculation of the gradient expression of the considered cost function, this function is rewritten by using standard matrix and Frobenius norm properties as: where Tr(.) denotes the matrix trace. By considering the case when the matrix S has more columns than rows, S T S S T −1 is replaced by S + (the Moore-Penrose pseudo-inverse matrix of S) in (15). This yields the following new expression for J 2 that is used, hereafter, in the endmember spectra extraction step: As detailed in [60,61], using Equation (16), the gradient expression of J 2 with respect to an element s ml of row m among the M master rows of the matrix S can be expressed as: The above expression contains two derivative terms. The first one is: where I denotes the identity matrix with the appropriate dimension. The second derivative term is: Thus, the gradient expression of J 2 with respect to s ml is given as follows: Consequently, the final form of Equation (20) can be calculated by deriving the expression of ∂ S ∂ s ml , which depends on the considered mixing model, among the bilinear [60] and LQ [61] ones. For both models, the same M × M symmetric matrix B, whose main diagonal is unused and whose upper part is organized as follows, is introduced. Each element [B] rt , corresponding to position (r, t) with t > r in B, concerns the cross-pseudo-endmember s r s t and is equal to the index of the row, within S b , which contains that cross-pseudoendmember. Due to the structure Equation (9) of that matrix, these values [B] rt , stored from left to right and top to bottom in the upper part of B, are integers in increasing order, that is 1 to (M − 1) on the first row, M to (2M − 3) on the second row, and so on. These values can be expressed as follows (for t > r):  in that matrix is equal to: The first designed approach, in the endmember spectra extraction step, uses the projected-gradient descent algorithm, with a fixed positive scalar learning rate α. Thus, for this first approach, two algorithms are proposed for the considered mixing models. The first algorithm, called Grd-NS-LS-BMF for "gradient-based non-negative spectra least squares bilinear matrix factorization" [60], is designed for the bilinear mixing model. The second algorithm, which is designed for the LQ mixing model, is called Grd-NS-LS-LQMF for "gradient-based non-negative spectra least squares linear-quadratic matrix factorization" [61]. Therefore, for both gradient-based algorithms, the final form Equation (20) yields, for the master elements s ml of the top M rows of S, the following preliminary iterative update rule: This update rule does not ensure non-negativity and, therefore, it is not sufficient. To guarantee this constraint, an iterative projected-gradient update rule, derived from the one above, is considered. This rule consists of projecting the value obtained from Equation (24) onto the non-negative real number subspace R + . This projection, denoted [ξ] + , can be achieved by replacing ξ, if it is negative, by zero, or in practice, by a small positive number ε, in order to avoid numerical instabilities. Thus, the projection becomes [ξ] + = max[ε, ξ], and the final iterative projected-gradient update rule reads: It is important to mention here that, unlike with the update rule (24), there is no theoretical convergence guarantee with the above final iterative projected-gradient update rule Equation (25). However, this rule Equation (25), like those used in standard NMF methods, minimizes, practically and globally, the considered cost function J 2 throughout iterations. For the second designed approach, still in the endmember spectra extraction step, and unlike the work done in [60,61], an iterative, multiplicative and projective update rule derived from the gradient-based update rule Equation (24) is proposed in the present paper for the two considered mixing models. To this end, this approach first uses the procedure which has been proposed in the literature for developing "standard" (i.e., nonprojective) multiplicative versions of various algorithms. This procedure is, e.g., detailed in [3] for another type of algorithm, and may be transposed as follows to the present context. First, the above fixed scalar learning rate α is replaced by a matrix, whose terms α ml are used as learning rates, respectively, for each of the considered adaptive scalar variables s ml . Then, the gradient expression of J 2 with respect to s ml is rewritten as the difference of two functions such that: where the function ∂J 2 ∂ s ml + is obtained by keeping the terms of (20) preceded by a plus sign, is obtained by keeping the terms preceded by a minus sign. Each learning rate α ml is then set to: Thus, the update rule Equation (24) becomes: where ε is a very small and positive value that is added to the denominator of the above multiplicative update rule to prevent possible division by zero. For the methods reported in the literature, this procedure is relevant because the counterparts of ∂J 2 ∂ s ml + and ∂J 2 ∂ s ml − in those methods are non-negative (since they are elements of products and sums of non-negative matrices), so that the counterparts of the learning rates α ml and the new value assigned to the counterparts of s ml , as in the right-hand term of Equation (28), are also non-negative, provided all these counterparts of s ml are initialized to non-negative values. In contrast, when applying this general procedure here, the expressions of ∂J 2 ∂ s ml + and ∂J 2 ∂ s ml − contain one matrix which is not necessarily non-negative, namely the Moore-Penrose pseudo-inverse matrix S + , which appears in Equation (20). Therefore, to take advantage of the preliminary, purely multiplicative rule Equation (28) (25), and then using these completely projected quantities in Equation (28), instead of the original ones.
Other versions are derived by analyzing the structure of the expressions ∂J 2 ∂ s ml + and ∂J 2 ∂ s ml − so as to separately project some of their terms that may be negative, thus using "partly projected functions". The first approach based on such partial projections (denoted [.] pp1 hereafter) operates as follows. Equation (20) shows that ∂J 2 ∂ s ml + and ∂J 2 ∂ s ml − are matrix traces. One, therefore, modifies these derivatives by first projecting each of the considered matrix elements onto R + . This yields (taking into account that X S + S T = S + S X T ): The rule Equation (28) is then replaced by: The corresponding multiplicative projective adaptation rule reads: Here also, and as mentioned above, there is no theoretical convergence guarantee with the above-defined completely/partly projected multiplicative adaptation rules. Nevertheless, these rules, like those used in standard multiplicative NMF algorithms, also practically and globally optimize the used cost function J 2 throughout iterations.
In the conducted tests (described hereafter), the rule Equation (31) resulted in better performance than the rule Equation (34), as well as the first version, based on completely projected functions. Therefore, hereafter, only the rule Equation (31) is considered. Thus, for this multiplicative approach (which is also projective; this is implicit in the names of the methods below), and for the considered bilinear and LQ mixing models, two algorithms are also proposed. The first one, called Multi-NS-LS-BMF for "multiplicative non-negative spectra least squares bilinear matrix factorization", is designed for the bilinear mixing model. The second algorithm, proposed for the LQ mixing model, is called Multi-NS-LS-LQMF for "multiplicative non-negative spectra least squares linear-quadratic matrix factorization".
Moreover, in all of the above four proposed algorithms, the slave variables of S are updated together with the master ones. When considering the bilinear mixing model, cross-pseudo-endmember slave elements s kl are updated as follows: When considering the LQ mixing model, and in addition to updating cross-pseudoendmember slave elements s kl by using the update rule (35), auto-pseudo-endmember slave elements s nl are also updated as follows: In this endmember spectra extraction step, the master variables (i.e., the hyperspectral endmember spectra) may be initialized by one of the standard linear methods. Indeed, although these methods are designed for linear mixtures, their estimation results may be considered as first approximations to be injected as an initialization of nonlinear methods. After a number of tests (described below in the experimental results section) by using three techniques in this step, the linear VCA method [29] is chosen for initializing the master variables. Moreover, still in this endmember spectra extraction step, the slave variables (i.e., the hyperspectral cross/auto-pseudo-endmember spectra) are initialized from the initial master variables by using only Equation (35) when considering the bilinear mixing model, or Equations (35) and (36) when considering the linear-quadratic one.
The adaptation of all master and slave variables is stopped when the number of iterations reaches a predefined maximum value, or when the relative modification of the criterion J 2 takes a value below a predefined threshold as follows: where t corresponds to an iteration. Furthermore, and as mentioned in Section 2, in the present paper and contrary to the works reported in [60,61], the estimation of endmember abundance fractions is also considered. Indeed, in this work, three alternative approaches are proposed for estimating the endmember abundance fractions. In the first approach, the matrix A opt , defined by Equation (13) and containing linear and second-order abundance fractions, is constrained and used to obtain the considered coefficients (the corresponding four complete unmixing methods are, therefore, called "Grd-NS-LS-BMF + constrained A opt ", and so on). The constraints, defined hereafter, imposed on this matrix are those related to the non-negativity of variables obtained by using the projection approach, the sum-to-one property of linear coefficients, and the upper bounding of second-order coefficients, as defined in Equation (1). These constraints respectively read: The second approach for estimating endmember linear and second-order abundance fractions consists of running a modified version of the iterative multiplicative LQNMF (Multi-LQNMF) method of [47], restricted to the bilinear mixing model when this one is considered, or fully used when the LQ mixing model is considered. This technique, which jointly estimates spectra and abundance fractions, is initialized by (i) the endmember and pseudo-endmember spectra contained in the matrix S, obtained by using one of the above four proposed algorithms, and (ii) the constrained A opt matrix obtained by using Equations (13), (38) and (39). Unlike in [47], only linear and second-order abundance fractions are here updated with this Multi-LQNMF algorithm that also contains the constraints defined by (39) and (40), whereas endmember and pseudo-endmember spectra are not updated here. The corresponding four complete unmixing methods, hereafter called "Grd-NS-LS-BMF + post-Multi-LQNMF1" and so on, therefore yield exactly the same estimated spectra as the associated above-defined methods "Grd-NS-LS-BMF + constrained A opt " and so on, but they result in different estimated abundance fractions.
The third and last approach, which leads to the four methods called "Grd-NS-LS-BMF + post-Multi-LQNMF2" and so on, is similar to the second one, but it jointly updates spectra and abundance fractions using the iterative Multi-LQNMF algorithm, again restricted to the bilinear mixing model when this model is considered, or fully used when the LQ mixing model is considered.
In the above second and third approaches, the adaptation of the considered variables is stopped when the number of iterations reaches a predefined maximum value.
The complete pseudo-code of the proposed algorithms is provided below.
Pseudo-code: hyperspectral unmixing methods based on constrained bilinear or linear-quadratic matrix factorization.
Input: hyperspectral image X.

2.
Abundance fractions estimation step 2.1. Initialization stage: initialize linear and second-order abundance fractions by using Equation (13), and considering the above extracted endmember spectra.

2.2.
Optimization stage (until convergence): update only abundance fractions by using Equations (38)-(40) for the first approach, or update only abundance fractions by using the Multi-LQNMF algorithm for the second approach, or jointly update endmember spectra and abundance fractions by using the Multi-LQNMF algorithm.
Output: endmember and cross/auto-pseudo endmember spectra, and their associated linear and second-order abundance fractions.

Tested Data
In this section, used realistic synthetic and real hyperspectral data are described. These data are used, hereafter, to evaluate the performance of the proposed approaches, and obtained results are then compared to those obtained by other techniques from the literature.

Synthetic Data
For realistic synthetic hyperspectral data, two sets of eight hyperspectral endmember spectra are selected from spectral libraries, with 184 spectral bands in the 0.4 to 2.5 µm region. The first set contains eight randomly selected spectra (Figure 1) from the spectral library compiled by the United States Geological Survey (USGS) [66]. The second set contains eight spectra (Figure 2) of materials used in urban areas, selected from the spectral library compiled by the Johns Hopkins University (JHU) [67].  The above spectra are used to create two realistic synthetic hyperspectral images. These two 100 × 100 pixel hyperspectral images are generated according to the considered mixing models; the first one is created by considering the bilinear mixing model, whereas the second image is generated by using the LQ model. The considered linear abundance fractions are created from a real classification of land cover, with eight classes, by averaging pixel classification values on a nonoverlapping sliding 4 × 4 pixel window. The second-order abundance fractions are generated from the linear ones by using the Fan model [41]. Besides, it is useful to mention here that the maximum purity of the considered linear abundance fractions does not exceed 0.75 (i.e., without the presence of pure pixels for each endmember), which makes, from the outset, the used synthetic data realistic, with more complex configurations than those with the presence of pure pixels for each endmember.

Real Data
In these investigations, two real hyperspectral images are also considered. The first one consists of the "Samson" data of [68]. This 95 × 95 pixel hyperspectral image ( Figure  3a) contains 156 spectral bands ranging from 0.4 to 0.9 μm. Moreover, this first image is provided with the corresponding linear abundance fraction ground-truth, which is used to manually extract, from supposedly pure pixels, three reference endmember spectra (soil, tree, and water: Figure 3b). The second real hyperspectral image consists of the "urban" data of [68]. This image, which is one of the most widely used hyperspectral data sets, was acquired by the HYperspectral Digital Imagery Collection Experiment (HYDICE) sensor over the Copperas Cove, near Fort Hood, Texas, USA. This 307 × 307 pixel hyperspectral image, with 2 m spatial resolution, contains 162 spectral bands (after removing the water absorption bands from the 210 spectral bands of the original data) ranging from 0.4 to 2.5 μm. This image (Figure 4a), which is provided with the corresponding linear abundance fraction ground-truth, mainly contains four pure materials: asphalt, grass, tree, and roof. By considering the provided ground-truth, the reference endmember spectra (Figure 4b) are manually extracted.  The above spectra are used to create two realistic synthetic hyperspectral images. These two 100 × 100 pixel hyperspectral images are generated according to the considered mixing models; the first one is created by considering the bilinear mixing model, whereas the second image is generated by using the LQ model. The considered linear abundance fractions are created from a real classification of land cover, with eight classes, by averaging pixel classification values on a nonoverlapping sliding 4 × 4 pixel window. The second-order abundance fractions are generated from the linear ones by using the Fan model [41]. Besides, it is useful to mention here that the maximum purity of the considered linear abundance fractions does not exceed 0.75 (i.e., without the presence of pure pixels for each endmember), which makes, from the outset, the used synthetic data realistic, with more complex configurations than those with the presence of pure pixels for each endmember.

Real Data
In these investigations, two real hyperspectral images are also considered. The first one consists of the "Samson" data of [68]. This 95 × 95 pixel hyperspectral image ( Figure  3a) contains 156 spectral bands ranging from 0.4 to 0.9 μm. Moreover, this first image is provided with the corresponding linear abundance fraction ground-truth, which is used to manually extract, from supposedly pure pixels, three reference endmember spectra (soil, tree, and water: Figure 3b). The second real hyperspectral image consists of the "urban" data of [68]. This image, which is one of the most widely used hyperspectral data sets, was acquired by the HYperspectral Digital Imagery Collection Experiment (HYDICE) sensor over the Copperas Cove, near Fort Hood, Texas, USA. This 307 × 307 pixel hyperspectral image, with 2 m spatial resolution, contains 162 spectral bands (after removing the water absorption bands from the 210 spectral bands of the original data) ranging from 0.4 to 2.5 μm. This image (Figure 4a), which is provided with the corresponding linear abundance fraction ground-truth, mainly contains four pure materials: asphalt, grass, tree, and roof. By considering the provided ground-truth, the reference endmember spectra (Figure 4b) are manually extracted. The above spectra are used to create two realistic synthetic hyperspectral images. These two 100 × 100 pixel hyperspectral images are generated according to the considered mixing models; the first one is created by considering the bilinear mixing model, whereas the second image is generated by using the LQ model. The considered linear abundance fractions are created from a real classification of land cover, with eight classes, by averaging pixel classification values on a nonoverlapping sliding 4 × 4 pixel window. The secondorder abundance fractions are generated from the linear ones by using the Fan model [41]. Besides, it is useful to mention here that the maximum purity of the considered linear abundance fractions does not exceed 0.75 (i.e., without the presence of pure pixels for each endmember), which makes, from the outset, the used synthetic data realistic, with more complex configurations than those with the presence of pure pixels for each endmember.

Real Data
In these investigations, two real hyperspectral images are also considered. The first one consists of the "Samson" data of [68]. This 95 × 95 pixel hyperspectral image (Figure 3a) contains 156 spectral bands ranging from 0.4 to 0.9 µm. Moreover, this first image is provided with the corresponding linear abundance fraction ground-truth, which is used to manually extract, from supposedly pure pixels, three reference endmember spectra (soil, tree, and water: Figure 3b). The second real hyperspectral image consists of the "urban" data of [68]. This image, which is one of the most widely used hyperspectral data sets, was acquired by the HYperspectral Digital Imagery Collection Experiment (HYDICE) sensor over the Copperas Cove, near Fort Hood, Texas, USA. This 307 × 307 pixel hyperspectral image, with 2 m spatial resolution, contains 162 spectral bands (after removing the water absorption bands from the 210 spectral bands of the original data) ranging from 0.4 to 2.5 µm. This image (Figure 4a), which is provided with the corresponding linear abundance fraction ground-truth, mainly contains four pure materials: asphalt, grass, tree, and roof. By considering the provided ground-truth, the reference endmember spectra (Figure 4b  The four reference endmember spectra of the "urban" data.

Performance Evaluation Criteria
In order to evaluate the performance of the tested methods, the following criteria are considered. The spectral angle mapper (SAM), the spectral normalized mean square error (NMSE ), and the spectral information divergence (SID) [69] are used as spectral performance criteria. The spatial normalized mean square error (NMSE ) is used as a spatial performance criterion only by considering the linear abundance fractions. A smaller value of these criteria indicates a better endmember spectra or abundance fraction estimation. These criteria read:  The four reference endmember spectra of the "urban" data.

Performance Evaluation Criteria
In order to evaluate the performance of the tested methods, the following criteria are considered. The spectral angle mapper (SAM), the spectral normalized mean square error (NMSE ), and the spectral information divergence (SID) [69] are used as spectral performance criteria. The spatial normalized mean square error (NMSE ) is used as a spatial performance criterion only by considering the linear abundance fractions. A smaller value of these criteria indicates a better endmember spectra or abundance fraction estimation. These criteria read: The four reference endmember spectra of the "urban" data.

Performance Evaluation Criteria
In order to evaluate the performance of the tested methods, the following criteria are considered. The spectral angle mapper (SAM), the spectral normalized mean square error (NMSE λ ), and the spectral information divergence (SID) [69] are used as spectral performance criteria. The spatial normalized mean square error (NMSE s ) is used as a spatial performance criterion only by considering the linear abundance fractions. A smaller value of these criteria indicates a better endmember spectra or abundance fraction estimation. These criteria read: where s j is the original spectrum of the endmember j and s j is its estimate, . denotes the vector norm, log (.) corresponds to the natural logarithm, stands for element-wise division, and a j is the vector composed of the original linear abundance fractions of the endmember j in all image pixels and a j is its estimate.

Results
The proposed methods, and methods from the literature are applied to the considered realistic synthetic and real hyperspectral data. The considered literature methods belong to two groups. The first one contains linear spectral unmixing methods: the NMF and the Lin-Ext-NMF (linear-extended non-negative matrix factorization) [47] techniques. It should be noted here that, for the NMF method, only the endmember spectra and linear abundance fractions are considered, while, for the Lin-Ext-NMF technique, in addition to the endmember spectra and linear abundance fractions, the pseudo-endmember spectra (resp. second-order abundance fractions) are considered as additional endmember spectra (resp. additional linear abundance fractions). The VCA [29] and the SGA [4,[30][31][32] combined with the fully constrained least squares (FCLS) [64] methods, which belong to this first group, are also considered in the conducted tests. The second group contains spectral unmixing methods based on bilinear or linear-quadratic mixing models. These methods are: the BiPSO (bilinear spectral unmixing using particle swarm optimization) technique [65], the Grad-LQNMF (gradient-based linear-quadratic non-negative matrix factorization), and the Multi-LQNMF (multiplicative linear-quadratic non-negative matrix factorization) algorithms [47] (restricted to the bilinear case when the considered mixing model is bilinear).
In all tested iterative algorithms, a maximum of 1000 iterations are considered in the endmember spectra extraction step, as well as in the abundance fraction estimation step. The convergence tolerance threshold, defined in Equation (37), is set to 10 −6 . In addition, in order to determine the optimal value of the learning rate α for all tested gradient-based algorithms, many tests are conducted and α is empirically fixed to 10 −3 for all of these algorithms in the results reported below. It should be remembered here that this empirically optimal choice of the learning rate value constitutes, as mentioned above, the main limitation of these gradient-based methods.

Initialization and Partial Projection Choices
The proposed methods have limitations, as well as all tested NMF-based and the BiPSO literature methods; they are not guaranteed to provide a unique solution, and their convergence points depend on their initialization. Therefore, in order to avoid the random initialization of the considered variables, and as mentioned above, three standard linear techniques are tested, only on the considered realistic synthetic datasets, to select the most suitable one to derive the initial estimated hyperspectral endmember spectra. These standard linear techniques are: the VCA [29] and the SGA [4,[30][31][32] techniques, combined with the FCLS [64] algorithm, and the NICA [16] technique. Furthermore, in order to minimize the random effect of these initialization techniques, 10 runs are performed for each considered dataset and for each considered standard linear initialization technique, and the mean values of the used performance evaluation criteria are reported in the following results tables.
Based on the results of Tables 1 and 2, it clearly appears that VCA is the most suitable standard linear initialization technique that can be used in order to derive initial estimated hyperspectral endmember spectra. Thus, this technique is considered to initialize all proposed methods; all tested NMF-based and the BiPSO literature techniques in the tests reported below. Moreover, for the proposed multiplicative projective methods, as explained above, two partly projected functions ([.] pp1 and [.] pp2 ) are introduced to take advantage of the multiplicative rule Equation (28) and to ensure the non-negativity constraint. In order to choose the most adequate partial projected function to be considered in these proposed methods, experiments are conducted, again only on the described realistic synthetic datasets, by using these multiplicative projective methods, and by only considering the first approach for estimating abundance fractions (i.e., methods called "Multi-NS-LS-BMF/LQMF + constrained A opt "). From the obtained results (Tables 3 and 4), it clearly appears that the first partly projected function (noted [.] pp1 ) is the most adequate one that can be considered in the proposed multiplicative projective methods. Consequently, this function is the one considered hereafter in these proposed methods. Table 3. Mean values of the considered performance criteria for the synthetic data generated according to the bilinear mixing model depending on the partly projected function (bold values correspond to best performances).

SAM ( • ) NMSE λ (%) SID NMSE s (%)
With the first set of spectra  Table 4. Mean values of the considered performance criteria for the synthetic data generated according to the LQ mixing model depending on the partly projected function (bold values correspond to best performances).

Results on Synthetic Data
Tables 5-8 show the mean values of the considered performance criteria obtained for the realistic synthetic data created according to the bilinear and linear-quadratic mixing models (after performing 10 runs for each considered dataset). Table 5. Mean values of the considered performance criteria for the synthetic data generated according to the bilinear mixing model and with the first set of spectra (bold values correspond to best performances).  Table 6. Mean values of the considered performance criteria for the synthetic data generated according to the bilinear mixing model and with the second set of spectra (bold values correspond to best performances).  Table 7. Mean values of the considered performance criteria for the synthetic data generated according to the LQ mixing model and with the first set of spectra (bold values correspond to best performances).  Table 8. Mean values of the considered performance criteria for the synthetic data generated according to the LQ mixing model and with the second set of spectra (bold values correspond to best performances). Furthermore, in order to study the effect of noise on the performance of the proposed methods, in particular on their endmember spectra extraction step (by only considering the proposed methods that use "constrained A opt "), and as an illustration, white Gaussian noise with different signal-to-noise ratio (SNR) values, ranging from 15 to 45 decibels (dB) with a step of 5 dB, is added to the created synthetic data generated according to the LQ mixing model and with the urban material spectra. This synthetic dataset is chosen because it is the one which represents the most complete configuration, from the point of view of the mixing model, and it is the most realistic one for LQ mixtures since it considers urban material spectra. The next figure (Figure 5) shows the obtained mean values of the considered performance criteria on the tested synthetic data.

SAM ( • ) NMSE λ (%) SID NMSE s (%)
From these results, it clearly appears that the considered proposed methods remain, overall, robust to the considered noise (with different SNR values), and also offer, globally, better performances than those provided by the tested approaches from the literature.  Figure 5. Mean values of the considered performance criteria, for the proposed methods that use "constrained ", and for the tested synthetic data, including white Gaussian noise with different SNR values.

Results on Real Data
The test results obtained for the two considered real hyperspectral images are provided in Tables 9 and 10 (after performing 10 runs for each considered dataset). Table 9. Mean values of the considered performance criteria for the real Samson data (bold values correspond to best performances).   . Mean values of the considered performance criteria, for the proposed methods that use "constrained A opt ", and for the tested synthetic data, including white Gaussian noise with different SNR values.

Results on Real Data
The test results obtained for the two considered real hyperspectral images are provided in Tables 9 and 10 (after performing 10 runs for each considered dataset). Table 9. Mean values of the considered performance criteria for the real Samson data (bold values correspond to best performances).

SAM ( • ) NMSE λ (%) SID NMSE s (%)
Grd-NS-LS-BMF + constrained A opt 4.65 16 Table 10. Mean values of the considered performance criteria for the real urban data (bold values correspond to best performances). As an illustration, Figures 6 and 7 show, for real data, the reference endmember spectra (ground truth) and their estimates derived by the proposed four methods that use "constrained A opt " and by the considered methods from the literature. Figures 8 and 9 then show, also as an illustration, the ground-truth linear abundance fraction maps and their estimates provided by the proposed Grd-NS-LS-BMF + constrained A opt method, and also those provided by the tested Multi-LQNMF literature one.

Grd-NS-LS-BMF + constrained
As an illustration, Figures 6 and 7 show, for real data, the reference endmember spectra (ground truth) and their estimates derived by the proposed four methods that use "constrained " and by the considered methods from the literature. Figures 8 and 9 then show, also as an illustration, the ground-truth linear abundance fraction maps and their estimates provided by the proposed Grd-NS-LS-BMF + constrained method, and also those provided by the tested Multi-LQNMF literature one.

Discussion
For the synthetic data generated according to the bilinear mixing model, Tables 5 and  6 first yield the following conclusions with respect to the estimated endmember spectra. Among the proposed methods, the four methods that use "constrained " (i.e., "Grd-NS-LS-BMF + constrained " and so on), and hence the four methods that use "post-Multi-LQNMF1" (i.e., "Grd-NS-LS-BMF + post-Multi-LQNMF1" and so on), yield a relatively similar performance. Indeed, for the spectra used in Table 5, the SAM values of these methods are between 7.63 and 7.87°, their NMSE values are between 17.37 and 17.69%, and their SID values are between 3.36 and 3.44. For the spectra used in Table 6, their SAM values are between 3.55 and 3.68°, their NMSE values are between 14.95 and 15.06%, and their SID values are between 0.86 and 0.87. These methods are, therefore, more attractive than the four methods that use "post-Multi-LQNMF2" (i.e., "Grd-NS-LS-BMF + post-Multi-LQNMF2" and so on), since the latter methods yield significantly wider ranges for the considered performance criteria values and are thus "less predictable". Indeed, for the spectra used in Table 5, their SAM values are between 4.11 and 15.40°, their NMSE values are between 8.92 and 33.35%, and their SID values are between 1. 18 and 21.19. For the spectra used in Table 6, their SAM values are between 2.51 and 11.05°, their NMSE values are between 9.72 and 46.80%, and their SID values are between 0.31 and 9.78. Similarly, the eight proposed methods that use "constrained " or "post-Multi-

Discussion
For the synthetic data generated according to the bilinear mixing model, Tables 5 and 6 first yield the following conclusions with respect to the estimated endmember spectra. Among the proposed methods, the four methods that use "constrained A opt " (i.e., "Grd-NS-LS-BMF + constrained A opt " and so on), and hence the four methods that use "post-Multi-LQNMF1" (i.e., "Grd-NS-LS-BMF + post-Multi-LQNMF1" and so on), yield a relatively similar performance. Indeed, for the spectra used in Table 5, the SAM values of these methods are between 7.63 and 7.87 • , their NMSE λ values are between 17.37 and 17.69%, and their SID values are between 3.36 and 3.44. For the spectra used in Table 6, their SAM values are between 3.55 and 3.68 • , their NMSE λ values are between 14.95 and 15.06%, and their SID values are between 0.86 and 0.87. These methods are, therefore, more attractive than the four methods that use "post-Multi-LQNMF2" (i.e., "Grd-NS-LS-BMF + post-Multi-LQNMF2" and so on), since the latter methods yield significantly wider ranges for the considered performance criteria values and are thus "less predictable". Indeed, for the spectra used in Table 5 Table 6, their SAM values are between 2.51 and 11.05 • , their NMSE λ values are between 9.72 and 46.80%, and their SID values are between 0.31 and 9.78. Similarly, the eight proposed methods that use "constrained A opt " or "post-Multi-LQNMF1" are more attractive than the methods from the literature, since the considered performance criteria values achieved by the latter methods are higher and cover a wide range. Indeed, for these literature methods, in Table 5 Table 6, the SAM values are between 6.09 and 13.48 • , the NMSE λ values are between 17.14 and 48.41%, and the SID values are between 1.07 and 16.19. Considering, in addition, the accuracy of abundance estimation and focusing on the above-defined eight preferred proposed methods, the subset of four gradient-based methods yields much lower NMSE s values (from 30.89 to 35.98% in Table 5, and from 38.37 to 39.60% in Table 6) than the four multiplicative methods (NMSE s values from 75.75 to 90.43% in Table 5, and from 68.54 to 90.69% in Table 6). Although the errors of these gradient-based methods are non-negligible, they are much better (i.e., lower, and with lower spread) than those of the methods from the literature, which yield NMSES values ranging from 39.76 to 107.55% in Table 5, and from 42.99 to 108.10% in Table 6.
For the synthetic data generated according to the LQ mixing model, Tables 7 and 8 yield the same general conclusions as above, based on the following values. The eight proposed methods that use "constrained A opt " or "post-Multi-LQNMF1" result in SAM values ranging from 5.37 to 5.83 • in Table 7, and from 4.72 to 5.10 • in Table 8. Moreover, these methods result in NMSE λ values ranging from 33.70 to 34.44% in Table 7, and from 13.21 to 13.79% in Table 8. Furthermore, these methods result in SID values ranging from 9.93 to 10.15 in Table 7, and from 2.52 to 2.62 in Table 8. For the four methods that use "post-Multi-LQNMF2", the SAM ranges from 3.63 to 12.01 • in Table 7, and from 2.99 to 9.18 • in Table 8. The NMSE λ ranges from 17.74 to 30.94% in Table 7, and from 8.42 to 25.00% in Table 8. Moreover, the SID ranges from 4.48 to 20.66 in Table 7, and from 1.08 to 9.33 in Table 8. For the methods from the literature, the SAM ranges from 5.30 to 15.80 • in Table 7, and from 7.01 to 12.12 • in Table 8. Their NMSE λ ranges from 25.24 to 60.41% in Table 7, and from 18.69 to 59.72% in Table 8. Finally, their SID ranges from 9.03 to 65.01 in Table 7, and from 4.62 to 46.62 in Table 8. Similarly, the ranges of NMSE s values respectively for Tables 7 and 8 Tables 7 and 8), but it is eventually not of interest as compared with the best methods proposed in this paper, because it yields very poor performance for real data, as shown further in this paper.
For the first tested real hyperspectral image, Table 9 yields the following results. The eight proposed methods that use "constrained A opt " or "post-  23.74 and 28.41%), but they are globally not of interest as compared with the best methods proposed in this work, because they yield very poor performance for synthetic data (see Tables 5-8) and they are, therefore, not predictable enough. This first real image consequently yields the same global remarks as the considered synthetic data, even though some of these trends are less pronounced here. Especially, the considered performance criteria values obtained with the eight preferred methods, among those proposed in this work, have larger ranges here than for the considered synthetic data.
The second tested real hyperspectral image yields the following values (see Table 10). For the eight proposed methods that use "constrained A opt " or "post-Multi-LQNMF1", the SAM values are between 6.34 and 18.06 • , the NMSE λ values are between 24.43 and 67.00%, and the SID ranges from 2.00 and 14.08. For the four methods that use "post-Multi-LQNMF2", the SAM criterion ranges from 14.90 to 19.83 • , the NMSE λ ranges from 45.43 to 64.12%, and SID values are between 11.87 and 24.12. For the methods from the literature, the SAM ranges from 10.96 to 49.34 • , the NMSE λ ranges from 36.63 to 171.24%, and the SID ranges from 4.56 to 91.61, the worst performances being, globally, obtained with the BiPSO method. Similarly, the ranges of NMSE s values are (40.21%, 58.84%) for the four selected gradient-based methods, (31.42%, 76.05%) for the four selected multiplicative methods, and (43.03%, 121.97%) for the methods from the literature, except Multi-LQNMF. For this second tested real image, the Multi-LQNMF method yields a rather good NMSE s value (30.40%), but it is globally not of interest when compared with the best methods proposed in this paper, because it yields very poor performance for synthetic data, as shown above (Tables 5-8). This second tested real image, therefore, also tends to yield the same global conclusions as with synthetic data, although some of these trends here are not very significant. In particular, the used performance criteria values obtained with the eight preferred methods, among those proposed in this paper, exhibit larger spreads than for synthetic data. Figures 6 and 7 show that the mentioned proposed methods, unlike the tested literature approaches, correctly extract most of endmember spectra, with estimates fairly close to the reference ones. From Figures 8 and 9, it also clearly appears that the mentioned proposed method correctly estimates the considered abundance fraction maps.
All of the above results clearly show that the proposed gradient-based methods globally provide better performances than those resulting from the proposed multiplicative projective ones, and also from those obtained by the tested literature approaches. However, once again, it should be remembered that these gradient-based methods only provide such results after an optimal choice of their learning rates, which are tedious to obtain, since it is necessary to perform several tests to find these optimal parameter values. In contrast, the proposed multiplicative projective methods are free from this constraint, and they provide satisfactory results that are, moreover, globally better than those of the tested literature methods. They are, therefore, an attractive alternative to gradient-based methods when automated operation is an important feature.
The proposed methods are very easy to implement and their update rules contain only direct and simple scalar/matrix operations. These operations make the algorithmic complexity of the proposed methods very limited. Besides, the run time of the core of the proposed methods depends on the input image size, the number of endmembers in the observed scene, the maximum number of iterations, and the stopping criterion Equation (37); based on the above description of tests by considering, as an illustration, the used synthetic data, the run time of the proposed methods (by using an Intel(R) Core(TM) i7 processor running at 2.40 GHz and a memory capacity of 8 GB) is about, for each run, 10 (respectively 40) seconds for the gradient-based (respectively multiplicative) methods that use "constrained A opt ", and 20 (respectively 50) seconds for the gradient-based (respectively multiplicative) methods that use "post-Multi-LQNMF".

Conclusions
In this paper, various unsupervised hyperspectral unmixing approaches, based on extensions of matrix factorization with non-negativity constraints and targeted at bilinear or linear-quadratic mixing models, are introduced. For each of these models, firstly, two approaches respectively based on projected gradient descent and multiplicative algorithms are proposed for extracting hyperspectral endmember spectra. The reduction of the number of variables manipulated in these algorithms constitutes the main originality of the proposed approaches. Moreover, three methods are then proposed to extend the above algorithms so as to estimate the proportions, i.e., abundance fractions, of the endmem-bers and to possibly refine the endmember spectra estimates. This eventually results in 12 versions of the proposed overall unmixing methods.
Experiments, based on realistic synthetic hyperspectral data, generated according to the two considered bilinear and linear-quadratic mixing models, and also on two real hyperspectral images, were conducted with the above 12 proposed methods, and with seven methods from the literature. The efficiency of all these methods was evaluated with established performance criteria. The obtained results show that the preferred proposed methods yield very satisfactory results, especially for hyperspectral endmember spectra extraction. Moreover, the preferred proposed algorithms significantly outperform the tested methods from the literature.
An interesting extension of this work may consist of using the proposed approaches in the unmixing-based hypersharpening process. Indeed, the proposed approaches may be used to extract spectral information from a high spectral resolution observable remote sensing image and spatial information from a high spatial resolution one that covers the same considered area. Then, the extracted high-resolution information will be merged, according to the considered bilinear or linear-quadratic mixing model, to form an unobservable remote sensing image with high spectral and spatial resolutions.