A Graph Regularized Multilinear Mixing Model for Nonlinear Hyperspectral Unmixing

: Spectral unmixing of hyperspectral images is an important issue in the ﬁelds of remote sensing. Jointly exploring the spectral and spatial information embedded in the data is helpful to enhance the consistency between mixing/unmixing models and real scenarios. This paper proposes a graph regularized nonlinear unmixing method based on the recent multilinear mixing model (MLM). The MLM takes account of all orders of interactions between endmembers, and indicates the pixel-wise nonlinearity with a single probability parameter. By incorporating the Laplacian graph regularizers, the proposed method exploits the underlying manifold structure of the pixels’ spectra, in order to augment the estimations of both abundances and nonlinear probability parameters. Besides the spectrum-based regularizations, the sparsity of abundances is also incorporated for the proposed model. The resulting optimization problem is addressed by using the alternating direction method of multipliers (ADMM), yielding the so-called graph regularized MLM (G-MLM) algorithm. To implement the proposed method on large hypersepectral images in real world, we propose to utilize a superpixel construction approach before unmixing, and then apply G-MLM on each superpixel. The proposed methods achieve superior unmixing performances to state-of-the-art strategies in terms of both abundances and probability parameters, on both synthetic and real datasets.


Introduction
Spectral unmixing (SU) has become one of the essential topics in the context of hyperspectral imagery processing. Hyperspectral images are captured by remote sensors over hundreds of channels within a certain wavelength range. Each such image can be viewed as a data cube, with two of its dimensions being the spatial and the third one recording the spectral information, i.e., each pixel corresponds to a spectrum vector [1]. Compared to abundant spectral information, the spatial information contained in the image is quite limited due to the low spatial resolution of sensors. Therefore, it is usually assumed that each observed spectrum is mixed by several pure material signatures, termed endmembers. The aim of SU is to extract the endmembers and to determine their respective proportions, referred to as abundances [2]. The linear mixing model (LMM) is the most widely-used one, which assumes that the spectra mixture occurs on a macroscopic scale. Namely, each photon interacts with only one endmember before being received by the sensor, and the light reflected from different endmembers is then mixed [3]. Therefore, each observed pixel can be represented as a linear combination of the endmembers [4]. To be physically interpretable, the abundance nonnegativity constraint (ANC) and the abundance sum This is because a large graph is built across the entire image, each node stands for a pixel and each edge connects two similar pixels. To alleviate this issue, the authors of [24] firstly applied the spectral clustering algorithm to divide the nodes in the graph into several subgraphs. In [28], the original image is preprocessed using the SLIC-based (simple linear iterative clustering) superpixel construction, by which the adjacent pixels with similar spectral features are divided into small blocks, called superpixels. After these procedures, linear unmixing with graph-based regularization is performed on each subgraph or superpixel with smaller size.
The high-dimensional observed spectra can be taken as being sampled from some sub-manifold [23]. Therefore, it is important to take account of the manifold structure of data in the mixing/unmixing models, in order to augment the parameters estimation. To the best of our knowledge, manifold regularization has not yet been approached for simultaneously regularizing the abundance matrix and the probability parameters for MLM-based methods. In this paper, we propose a novel graph regularized nonlinear SU method based on recent MLM model, and derive the associated algorithm using ADMM. The resulting unmixing algorithm is referred as the graph regularized MLM (G-MLM). The main contributions are as follows.
• By taking advantage of graph Laplacian and sparse unmixing, both the abundance matrix and the nonlinearity parameters are augmented. In particular, the internal manifold structure of data is well-exploited. The intuition is that, if two pixels are spectrally similar with connection established in the graph, their low-rank representations by abundance vectors, as well as their nonlinearity parameters should preserve this consistency. • We apply the alternating direction method of multipliers (ADMM) [29] to solve the resulting optimization problem. Moreover, to reduce algorithm complexity and improve efficiency, efforts have been made to avoid large graph computation associated to the whole image. To this end, we exploit the superpixel construction method proposed in [28]. By dividing the original image into smaller adjacent superpixels and then performing the unmixing step on each of them, the computational efficiency of the proposed G-MLM algorithm is greatly improved without deteriorating the unmixing performance too much. For the purpose of differentiation, we refer to the graph regularized MLM with superpixel construction by G-MLM super .
The rest of this paper is organized as follows. Section 2 briefly reviews the MLM model and its current variations. In Section 3, the proposed unmixing model is presented, and the optimization problem is formulated. Experimental results on a series of synthetic data and a real image are reported in Section 4. Section 5 concludes the paper with remarks on future works.

Related Work
This section succinctly reviews the existing works related to the MLM unmixing model. Before proceeding, main symbols used in this work are firstly introduced. Given a hyperspectral image, let X = [x 1 , x 2 , ..., x T ] ∈ R L×T represent the observed data matrix consisting of T pixels over L spectral bands, where x j ∈ R L records the spectrum vector of the jth pixel, for j = 1, 2, ..., T. Suppose that the hyperspectral data is composed by R endmembers. Let A = [a 1 , a 2 , ..., a R ] ∈ R L×R be the endmember matrix, with a i ∈ R L being the spectrum vector of the ith endmember. Let S = [s 1 , s 2 , ..., s T ] ∈ R R×T denote the abundance matrix, where s j ∈ R R is the abundance vector of the jth pixel. The entry of abundance matrix, namely s ij , denotes the abundance with respect to the ith endmember of the jth pixel. Particularly, we use y j = ∑ R i=1 s ij a i = As j to denote the linear part by LMM of the jth observed pixel x j , and Y = [y 1 , y 2 , ..., y T ] ∈ R L×T . Moreover, we use P = [P 1 , P 2 , ..., P T ] ∈ R 1×T to denote a row recording the probability parameters for every pixel in the image, where P j is a scalar indicating the probability of further interactions for the jth pixel.

MLM
The MLM in [17], a newly-proposed nonlinear mixing model, successfully generalizes the bilinear model to include all degrees of multiple scattering among endmembers. This model is built with a meaningful physical reasoning, upon following basic assumptions: • The incoming light will interact with at least one material.
• For pixel x j , after each interaction with a material, the probability to undergo further interactions is P j , and, accordingly, the probability of escaping the scene and arriving at the sensor becomes (1 − P j ). • The probability of interacting with material i is proportional to its abundance s ij .
• The intensity of the light scattered by material i depends on the corresponding material's albedo As only small differences exist between albedo and reflectance in most practical situations [17], the albedo is substituted by reflectance with ∀j, w j ≈ a j . Based on these assumptions, the MLM model is formulated as follows: s ij a i = As j denotes the linear part by LMM model, ε j ∈ R L is the additive Gaussian noise, and represents element-wise product. The resulting optimization problem is given by arg min where 1 R ∈ R R represents the vector of ones. The sequential quadratic programming is applied to solve above optimization problem. Note that the MLM is a supervised unmixing model in the sense that the endmember matrix A is extracted by VCA [19] in prior. The pixel-wise parameter P j indicates the probability for further interactions, where a particular case with P j = 0 leads to the LMM model. Of particular note is that the model in Equation (1) is also well defined for P j < 0.

Unsupervised MLM (MLMp)
In [18], an unsupervised unmixing method is developed based on MLM. The optimization problem in the so-called MLMp is formulated by arg min s.t. s j ≥ 0 and 1 R s j = 1 P j ≤ 1 Note that 0 ≤ A ≤ 1 is element-wise. Compared to MLM, MLMp has a simplified objective function without the denominator. This helps to alleviate the under-estimation issue of probability parameters in MLM to some extent, and also to decrease the difficulty in solving the problem in Equation (3) by using a block coordinate descent (BCD) strategy. Another merit is that MLMp is an unsupervised method, which estimates the endmember matrix jointly with the abundances and the probability parameters.

BNLSU
More recently, the authors of [20] extended MLM for the supervised band-wise nonlinear unmixing (BNLSU). Taking account of the wavelength dependent multiple scatterings, every pixel is supposed to have a specific probability vector recording the band-wise nonlinearities. Considering all the pixels of an image, a probability matrix P = [p 1 , p 2 , ..., p T ] ∈ R L×T is introduced, where each vector p j ∈ R L , for j = 1, 2, ..., T describes the probability vector for the jth pixel. In addition, both the sparsity constraint on abundances and the smoothness constraint on probability parameters are incorporated to the BNLSU model, yielding arg min where λ 1 and λ 2 are regularization parameters balancing the influences of constraints, Tr(·) is the trace of matrix, the inequality p j ≤ 1 is element-wise, L = D − W is the graph Laplacian matrix and D is a diagonal matrix with the entries D ii = L ∑ j=1 W ij . Here, the weight matrix W ∈ R L×L is constructed by measuring the similarity between different pairs of bands across the image. In this work, the Laplacian graph regularizer is adopted to promote smoothness on the estimated probability matrix P. The optimization problem in Equation (4) is addressed by ADMM. Owing to an increasing number of probability parameters, the BNLSU model shows great effectiveness in reducing the reconstruction error at each band. However, BNLSU is very sensitive to noise due to the calculation of a large number of probability parameters.

Proposed Graph Regularized Multilinear Mixing Model
In this section, we propose a novel graph regularized multilinear mixing model for nonlinear unmixing problem. We first present the constraints utilized in the proposed model, and formulate the optimization problem. Next, we derive the algorithm for solving the resulting optimization problem by applying the ADMM.

Constraints and Problem Formulation
The abundance sparsity is widely considered in the context of hyperspectral unmixing problem [30,31]. For these cases, the number of endmembers participating in the mixing process is small, leading to the sparse abundance matrix S containing many values of zero. Traditionally, l 0 norm can be used to represent the number of nonzero entries in a matrix. Since the resulting optimizations are hard, l 1 norm is usually considered instead of l 0 for sparsity regularization, with Besides the commonly-used sparsity constraint on abundance, the proposed model mainly exploits the underlying manifold structure of the observed spectra, by jointly including the manifold-based regularizations not only on abundance vectors but also on nonlinear probability parameters. Inspired by the work in [23,24], we adopt the Laplacian graph regularizer to promote similarities between similar pixels. To this end, the input image X ∈ R L×T is firstly mapped to a graph G, where each node of the graph represents a L-dimensional pixel spectrum x j , for j = 1, 2, ..., T. The affinity matrix W corresponding to the graph should satisfy following requirement: Its element W ij reflects the similarity level between pixels x i and x j . Specifically, the value of W ij is positively correlated to the similarity level. The value decreases to zero for dissimilar pixel pairs. There are different manners to define W, e.g., by using heat kernel as in [23,24]. In this paper, we define the affinity matrix W of the graph simply by where d 2 min denotes a pre-defined threshold for the squared distance between pixels [24]. It is natural to suppose that, if two pixels x i and x j are spectrally similar, their low-rank representations with respect to abundance vectors s i and s j , and with respect to nonlinear probability parameters P i and P j , should both retain this consistency in respective new spaces. Similar to in [23], by using the affinity matrix W, the above assumption on abundances consistency between similar pixels can be modeled by the following regularization where D is a diagonal matrix with the entries D ii = T ∑ j=1 W ij , and L is the graph Laplacian matrix Analogously, based on the assumption that spectrally similar pixels tend to have nonlinear probability parameters close to each other, the regularization for nonlinear probability parameters is expressed by 1 2 We propose a supervised unmixing method based on the MLM, where the endmembers are supposed to be extracted in prior using some endmember extraction strategy, e.g., VCA [19] and N-Findr [32]. The objective function in Equation (3) in MLMp [18] is taken into consideration in this paper. After incorporating the constraints in Equations (5)-(8) discussed previously, the proposed unmixing model is defined by considering the following optimization problem arg min where λ 1 , λ 2 and λ 3 are hyper-parameters which balance the importances of different regularization terms.

Nonlinear Unmixing Using ADMM
We utilize the well-known ADMM method [29] to address the optimization problem in Equation (9). The ADMM aims to decompose a hard optimization problem into a sequence of simpler subproblems, and conquers them in an alternating manner. To this end, we introduce new variables G and H for S and P, respectively, and reformulate the optimization problem in Equation (9) with arg min S,P,G,H where ρ is the penalty parameter which is usually set to be a small positive value, and M 1 and M 2 are the scaled dual variables, which are of the same size as S and P, respectively. At each iteration, the ADMM algorithm minimizes the augmented Lagrangian in Equation (11) iteratively, by alternating the minimization over each of the variables. Namely, every variable is alternately updated while keeping the other variables fixed to their latest values. For each iteration, we start with the minimization with respect to S and G, and then with respect to P and H. Finally, we update the scaled dual variables M 1 and M 2 . During the optimization, the primal residual and the dual residual are examined over iterations, in order to check whether the stopping-condition is attained, where G 0 and H 0 represent the values in previous iteration.

Update S and G
At the (t + 1)th iteration, we update S to S (t+1) . By discarding the terms irrelevant with S in Equation (11), the reduced optimization subproblem becomes j , for j = 1, 2, ...T. Then, the problem in Equation (14) becomes (15) or equivalently in the following vector-wise form for j = 1, 2, ..., T. Similar to in [6], the solution of Equation (16) is derived by solving a quadratic problem with equality constraints, given by with j denote the columns of G and M 1 , respectively, and I R×R is the identity matrix. Regarding the optimization in terms of G, the corresponding reduced subproblem is If we ignore the ANC constraint enforced by ι R R×T + (G), the solution of Equation (21) would be where S λ 1 /ρ is the soft thresholding operator [29] defined by To further impose the ANC constraint, it is straightforward to project the result in Equation (21) onto the first quadrant according to [6], yielding where the maximum function is applied in an element-wise manner.

Update P and H
By discarding the terms independent of P, the reduced optimization problem of P is expressed by The update rule computes where Y (t+1) = AS (t+1) is a matrix recording LMM terms as previously defined, and both . and (·) 2 indicate that the operations are element-wise.
In a similar way, the subproblem with respect to H is given by Without the boundary constraint ι {H|H≤1 T } (H) on nonlinearity parameters, the solution of Equation (27) would be By adding the boundary constraint, the update rule of H (t+1) becomes where the minimum function is element-wise.

Update M 1 and M 2
The last step of ADMM consists of updating the scaled dual variables M 1 and M 2 in the following manner where M 1 and M 2 measure the sum of the primal residuals, reflecting the compliance levels of S and P with their constraints, respectively. According to Boyd et al. [29], a good initialization is often beneficial to convergence of ADMM. In this paper, we use the fully constrained least squares (FCLS) [5] to initialize the abundance matrix S (0) , such that the result satisfies both ANC and ASC. On this basis, the nonlinearity parameters vector P (0) is initialized according to Equation (26), where the boundary constraint P (0) ≤ 1 is satisfied. Let G (0) and H (0) be equal to S (0) and P (0) , and M 2 be initialized by zero matrix and zero vector, respectively. The stopping condition of Algorithm 1 for G-MLM is two-fold: (1) The primal residual in Equation (12) and dual residual in Equation (13) are smaller than the pre-defined tolerance values, namely res p ≤ tol1 and res d ≤ tol2. In the experiments, we set tol1 = tol2 = √ TR × 10 −4 , following [6]. (2) The maximum number of iterations does not exceed the preset value Iter max .
We analyze the computational complexity of the proposed G-MLM in Algorithm 1, by specifying the updating complexity for each variable, as shown in Table 1. For each iteration, it is the S-update and H-update that are the most costly ones, as their complexities are O(T 2 LR) and O(T 2 ), with T standing for the total number of pixels. To alleviate the computational burden when tackling large image, we exploit a preprocessing approach before applying G-MLM.
We refer the G-MLM algorithm combined with spectral clustering method [24] by G-MLM sub .
To balance the computational complexity and the global knowledge represented by the graph, the number of subgraphs k should be set conservatively, as indicated in [24]. As shown in the Experimental Section 4, although G-MLM sub can reduce the computational cost to some extent, it still requires large memory to establish the connections between all pixel pairs across the image. • The second scheme is to apply a SLIC-based (simple linear iterative clustering) superpixel construction method [28], and to perform G-MLM on each of the superpixels. The superpixel construction in [28] consists of dividing the original image into m small and adjacent superpixels of irregular shapes, according to the spectral similarity of pixels. After that, the G-MLM unmixing method is directly performed on each superpixel that are of much smaller size compared to the original image. In this paper, the resulting method is termed as G-MLM super . As demonstrated next, G-MLM super is effective in reducing the computational time when addressing large image, without deteriorating the unmixing performance of G-MLM too much.

Experimental Results
In this section, the proposed G-MLM method, as well as G-MLM sub and G-MLM super , is compared with six state-of-the-art unmixing methods, on both synthetic and real hyperspectral datasets. The compared unmixing methods include both the linear and nonlinear ones, namely the C-SUnSAL [6] dedicated to the linear mixing model, the so-called GBM [14], PPNMM [16] elaborated for the bilinear mixing model, and the aforementioned MLM [17], MLMp [18], and BNLSU [20] based on the multilinear mixing model. All algorithms were executed using MATLAB R2016a on a computer with Intel Core i5-8250 CPU at 1.60 GHz and 8GB RAM.
We adopedt following metrics to evaluate the unmixing performance of all the comparing methods. For synthetic datasets, as the ground truth abundance matrix is available, the unmixing result was evaluated by the the root-mean-square error (RMSE) of abundance, defined by where s j and s j stand for the actual and estimated abundance vector for the jth pixel, respectively. Concerning the real hyperspecral data, as the actual abundance matrix is unknown, the reconstruction error (RE) of pixels was considered for assessing unmixing performance, which is given by where x j and x j represent the observed spectrum and reconstructed one for the jth pixel, respectively.

Experiments on Synthetic Data Cubes
Experiments were firstly conducted on two sets of synthetic data cubes, namely DC1 and DC2 [21,22,34], that were generated by MLM [17] and PPNMM [16], respectively. For either set, there are three images, and each has a size of 75 × 75 pixels. There are five endmembers for data generation, i.e., [a 1 , a 2 , a 3 , a 4 , a 5 ], that were randomly selected from the USGS library [35], each one being measured over 224 bands, as shown in Figure 1. The generated abundance maps corresponding to the five endmembers are illustrated in the first line of Figure 5.
There are 25 regions of square in the image, each of the size 5 × 5 pixels. Each square is homogeneous in terms of fractional abundances, namely the pixels in a same square have the same abundance vectors. The five squares of the first row are pure regions, each composed by only one endmember. For example, the abundance vectors for the square located at the first row and first column are s = [1 0 0 0 0] . The five squares of the second row are mixed by two of the endmembers with the same proportions. Taking the square located at the second row and first column as an example, the corresponding abundance vectors are s = [0.5 0 0 0 0.5] . Furthermore, the five squares in the last row of the image is a mixture of five endmembers with equal fractions, therefore they are identical. Apart from the 25 squares, the rest pixels in the image are all considered as background, which were generated using the following abundance vector [0.1149 0.0741 0.2003 0.2055 0.4051] . Generally, every synthetic image was generated using the aforementioned endmembers and abundance maps, according to different mixture models. The mixing models, as well as the associated parameters for generating DC1 and DC2, are detailed as follows:

DC1:
The pixels in DC1 images are generated by MLM in Equation (1). Specifically, the nonlinear probability parameter P is distributed as follows: the p values for the background pixels are set to zero by assuming that the backgrounds are linearly mixed. Concerning the remaining 25 regions in the square, each one is assigned by an identical p value, which is randomly chosen from a half-normal distribution with σ = 0.3. The values of p larger than one are set to zero as in [17]. It is also noteworthy that, since the five squares in the last row are identical, the same p value is assigned to all of them. DC2: The synthetic pixels in DC2 images are generated based on the PPNMM [16], which is given by where the nonlinear parameter b is generated similarly as in DC1. For the assemblage of background pixels and the remaining 25 squares, each is assigned by the same b randomly drawn from a uniform distribution within the range [−0.3, 0.3], according to Altmann et al. [16].
Finally, to test the robustness of the proposed G-MLM against noise, the Gaussian noise of varying signal-to-noise ratio (SNR) is added, with SNR = 25, 30 and 35 dB, thus yielding three images for each set.

Experimental Setting
For all the comparing methods, the unmixing procedure was performed in a supervised manner, where the ground truth endmembers were utilized. Concerning the initialization of abundance matrix, the result of FCLS was applied in all methods. The parameters in state-of-the-art methods C-SUnSAL, GBM, PPNMM and MLM were basically set according to their original papers. Specifically, a supervised version of MLMp method was considered by simply fixing the endmembers in the algorithm, and the tolerance value was changed to 10 −5 from 10 −4 as in [18], for the sake of fair comparison. Regarding BNLSU, the maximum iteration number was set to 500, the tolerance value was set to 10 −5 , and the parameters were set as follows: the penalty parameter in ADMM was set as ρ = 0.05, and the regularization parameters in Equation (4) were set as λ 1 = 0.001 for the sparsity constraint and λ 2 = 0.3 for the smoothness of nonlinearity matrix, following the analysis in [20].
As for the proposed G-MLM algorithm, the maximum iteration number was set to 500, the tolerance value was set to 10 −5 , and the penalty parameter in ADMM was set as ρ = 0.05. Four regularization parameters needed to be tuned: d 2 min , λ 1 , λ 2 and λ 3 . To keep the comparison fair and also to simplify the process of parameters tuning, we fixed the parameter for sparsity regularization as λ 1 = 0.001, the same value as in BNLSU.
The threshold parameter d 2 min in Equation (6) determines the affinity matrix W computed from the input image, and should be chosen case by case. In essence, this parameter determines the underlying graph structure of the Laplacian regularizers on both abundance matrix and nonlinearity parameter vector, that is, only the pixels connected by nonzero elements in W will mutually influence the parameter estimation. Following the spirit of the work in [36,37], we propose to estimate the parameter d 2 min as a function of reconstruction error, by using an empirical formula where the parameter θ was chosen as θ = 400 throughout the experiments, and S FCLS is the abundance matrix estimated by FCLS [5]. In this way, the threshold parameter d 2 min was chosen as 0.28, 0.17 and 0.13 for DC1 data with SNR = 25, 30, and 35 dB, respectively, and d 2 min was chosen as 0.19, 0.08 and 0.05 for DC2 data with SNR = 25, 30, and 35 dB, respectively. Of particular note is that the dataset with lower SNR value generally prefers more important threshold value. It is reasonable, since severer noise affects the measure of the real similarity between spectra pairs more.
We verified the appropriateness of the parameter d 2 min given by Equation (35), by taking DC1 data at SNR = 30 dB as an example. For these data, the affinity matrix W obtained by d 2 min = 0.17 is shown in Figure 2, where a re-arranged matrix is shown for illustration purpose, according to Ammanouil et al. [24]. That is, the pixels of the five squares in the first row are placed first, then the pixels of the five squares in the second row appear, and so on, and the background pixels are placed last. We observe that there are 20 small, 1 medium and 1 big white blocks in W. Each white block is a sub-matrix of W whose elements are all close to 1, signifying the associated pixels are mutually similar. This is just consistent with the actual case of data generation. To be precise, the 20 small blocks correspond to the 20 different squares of pixels in the first four rows, each containing 5 × 5 = 25 identical pixels; the one medium white block corresponds to an assemblage 25 × 5 = 125 identical pixels from five squares in the last row; and the one big white block represents the sub-affinity-matrix for all the remaining identical background pixels. As a result, the graph structure of these data is well-revealed by W obtained by d 2 min = 0.17. In Figure 3, we also compare the affinity matrices W produced by d 2 min = 0.20 and d 2 min = 0.28 given by Equation (35), respectively. It is obvious that, with d 2 min = 0.28, the graph structure is better represented, with the resulting matrix W being more consistent to the actual one. Based on above discussions, the effectiveness of Equation (35) in estimating d 2 min is verified to some extent. The remaining two regularization parameters to be tuned are λ 2 and λ 3 , which determine the importances of the graph-based abundance and nonlinearity parameter regularization terms, respectively. To further simplify the parameter tuning process, we applied λ 3 = 1 2 λ 2 in this experiment, considering that the abundance matrix S has more dimensions than the nonlinear parameter P. The parameter λ 2 was then selected by applying the candidate value set {0.0001, 0.0005, 0.001, 0.005, 0.01, 0.05, 0.1, 0.3, 0.5, 0.7, 0.9, 1, 1.5, 2, 2.5, 3, 3.5, 4, 4.5, 5}, on the image with SNR = 30 dB for each set. The influence of λ 2 on unmixing performance in terms of RMSE is plotted in Figure 4a

Results Analysis
On images in DC1 and DC2, the unmixing performance was evaluated by the RMSE of abundance, as defined in Equation (32). The results obtained by applying all the comparing algorithms on DC1 and DC2 images at different noise levels are compared in Tables 2 and 3, respectively. As observed in Table 2 for DC1, for all the three noise levels, the proposed G-MLM always outperforms its counterparts, by yielding the smallest RMSE value. This result is not surprising, as the proposed G-MLM takes advantage of the graph information hidden in the dataset, which is more consistent with the real situation. Moreover, as the value of SNR increases, the value of RMSE decreases prominently for every method. Namely, high noise level with low SNR deteriorates the performances of all the unmixing strategies. Generally, the MLM-based methods provide better unmixing results than the methods based on linear or bilinear models, on DC1.
In Table 3 for DC2, as the pixels were generated by PPNMM, it is not surprising to observe that the PPNMM unmixing method outperforms all the comparing methods. The proposed G-MLM is able to provide good unmixing results that are only second to those of PPNMM, on DC2 images with SNR = 25 dB and SNR = 30 dB. For the image with SNR = 35 dB, it is the BNLSU method that yields the second best RMSE. The estimated abundance maps by using different methods are shown in Figure 5 for DC1 image with SNR = 30, and in Figure 6 for DC2 image with SNR = 30. It is not difficult to find that the abundance maps obtained by the proposed G-MLM are the closest to the ground truth, when compared to all the other methods. For DC1 image with SNR = 30, the corresponding maps of P obtained by the MLM-based unmixing methods are shown in Figure 7, where the proposed G-MLM leads to the P map most consistent to the ground turth.
To study the convergence of the proposed ADMM algorithm for G-MLM, we traced the value of the objective function in Equation (9), the reconstruction error, the primal residual and the dual residual along with the number of iterations, as given in Figure 8. It is observed that all these values drop rapidly for the first 50 iterations, and then nearly decline to zeros at a slower rate. It demonstrates the good convergence of the proposed ADMM algorithm, according to [6,29].  Lastly, the executing time by different methods are compared in Table 4. The original G-MLM is shown to be time-consuming. To alleviate this issue, we examine G-MLM sub and G-MLM super algorithms. The comparison of these three methods are reported in Table 5, where the number of clusters in G-MLM sub is set as k = 10 and the number of superpixels in G-MLM super is set as m = 120. As observed, the use of G-MLM sub algorithm slightly reduces the computational time of G-MLM, almost without deteriorating the unmixing performance. The G-MLM super algorithm can greatly save the computational time, while providing acceptable RMSE that is still smaller than that of the comparing methods in Table 2. It is noteworthy that the overriding merit of G-MLM super is the practicality for addressing large-scale images, especially in the cases where both G-MLM and G-MLM sub may not be applicable.

Experiments on Urban Dataset
The Urban dataset is widely-investigated in hyperspectral unmixing research [9,38]. The original image has 307 × 307 pixels with each pixel corresponding to an area of 2 × 2 m 2 area. It is measured by 210 wavelengths ranging from 400 nm to 2500 nm. After removing the noisy bands (1-4, 76, 87, 101-111, 136-153, and 198-210) that are contaminated by water vapor density and atmosphere, the remaining 162 bands of high SNR were considered. This field is known to be mainly composed by four materials: roof, tree, asphalt and grass. In our experiment, the four endmembers were first extracted by the VCA algorithm, as shown in Figure 9. Because the Urban image is of large scale, neither G-MLM nor G-MLM sub can be directly applied on our computer. Therefore, the G-MLM super scheme was adopted. We speciied the selection of parameters in G-MLM super for Urban image. The original image was firstly divided into m = 120 superpixels, as shown in Figure 10. The value of d 2 min in Equation (6) was estimated to be 0.02, according to Equation (35). As done for synthetic images, the sparsity regularization parameter was set as λ 1 = 0.001, and the relationship with λ 3 = 1 2 λ 2 was maintained. For Urban image, the value of λ 2 was selected from the set {0.0001, 0.0005, 0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1, 2, 3, 4, 5}. For illustration purposes, we plot the changes of RE along with λ 2 in Figure 11. It is observed that the RE value is relatively small and stable for small values of λ 2 , and starts to increase when λ 2 exceeds 0.001. Therefore, we chose (λ 2 , λ 3 ) = (0.001, 0.0005) for Urban image.  The executing time using all the comparing methods is given in Table 6. A comparison of RE is also reported in the same table, where BNLSU provides a value much smaller than other unmixing approaches. It is reasonable, as this method defines the nonlinearity parameter in each band for a given pixel, thus yielding a model with many more parameters to fit the reconstruction error, compared to other methods. However, it should be noticed that RE is only adopted as a reference of the unmixing performance. It measures the averaged fitting level of a mixture model, e.g., LMM, GBM or MLM, to the observed spectra, but cannot truly reflect the unmixing quality in terms of abundance, especially for real hyperspectral images where the mechanism of mixture is complicated and unclear [10]. The abundance maps obtained by all comparing algorithms are shown in Figure 12. The abundance maps by different algorithms visually show little differences. The estimated maps of P obtained with G-MLM super , MLM and MLMp are shown in Figure 13. All three maps correspond well with the areas in the image where nonlinear effects are expected to be significant, e.g., the intersection area between roof and tree, and between grass and asphalt. As the graph structure of data is considered in the proposed G-MLM super , the corresponding abundance maps and P map exhibit more spatial consistency and and smoothness compared with other algorithms.

Conclusions
This paper proposes a graph regularized hyperspectral unmixing method based on the multilinear mixing model. By taking advantage of the graph structure embedded in the data, Laplacian graph ragularizers are introduced to regularize both the abundance matrix and the nonlinear probability parameters. The sparse constraint is also enforced on the abundance matrix. The resulting optimization problem is solved by using ADMM algorithm, and the superpixel construction scheme is applied to reduce the complexity. Comparative studies on two series of synthetic data and a real hyperspectral image verified the advantage of the proposed algorithm in terms of unmixing results. Future works include jointly exploiting the graph and spatial structures to further augment the G-MLM model, in order to accurately characterize abundances and nonlinear probability parameters. Extending the nonlinear parameters in G-MLM model to a band-wise form is also worth our investigation.
Author Contributions: F.Z. and M.L. conceived the idea and developed the methodology; M.L. prepared the first draft; and A.J.X.G. and J.C. contributed to the synthetic data preparation and experimental analysis.