Maximum Likelihood Estimation Based Nonnegative Matrix Factorization for Hyperspectral Unmixing

Hyperspectral unmixing (HU) is a research hotspot of hyperspectral remote sensing technology. As a classical HU method, the nonnegative matrix factorization (NMF) unmixing method can decompose an observed hyperspectral data matrix into the product of two nonnegative matrices, i.e., endmember and abundance matrices. Because the objective function of NMF is the traditional least-squares function, NMF is sensitive to noise. In order to improve the robustness of NMF, this paper proposes a maximum likelihood estimation (MLE) based NMF model (MLENMF) for unmixing of hyperspectral images (HSIs), which substitutes the least-squares objective function in traditional NMF by a robust MLE-based loss function. Experimental results on a simulated and two widely used real hyperspectral data sets demonstrate the superiority of our MLENMF over existing NMF methods.


Introduction
A hyperspectral image (HSI) can be represented as a three-dimensional data cube, containing both spectral and spatial information to characterize radiation properties, spatial distribution and geometric characteristics of ground objects [1,2].Compared with panchromatic, RGB and multispectral pictures that have only several broad bands, HSI usually has hundreds of spectral bands.The rich spectral information of HSI can be used to discriminate subtle differences between similar ground objects, which makes HSI suitable for different applications, such as target recognition, mineral detection, precision agriculture [1][2][3].Due to the scattering of ground surface and low spatial resolution of the hyperspectral sensor, an observed HSI pixel is often a mixture of multiple ground materials [4][5][6].This is the so called "mixed pixel".The presence of "mixed pixels" seriously affects the application of HSIs.To address the problem of mixed pixels, hyperspectral unmixing (HU) techniques have been developed [4][5][6][7][8].HU aims to decompose a mixed spectral into a collection of pure spectra (endmembers) while also providing the corresponding fractions (abundances).In terms of the spectral mixture mechanism, HU algorithms can be roughly categorized into linear and non-linear ones [4,5].Although, in general, the nonlinear mixing assumption represents the most-real cases better, the linear mixing assumption (although more simplified) has been proved to work very satisfactory in many cases in practice.Taking into account its mathematical tractability, it has attracted significant attention from the scientific community.For these reasons, the linear mixture model is adopted in the present paper, in which a measured spectral can be represented as a linear combination of several endmembers.
Nonnegative matrix factorization (NMF) is a widely used linear HU method [9][10][11][12][13][14][15][16][17][18][19][20].In this framework, HU is regarded as a blind source separable problem, and decomposes an observed HSI matrix into the product of the pure pixel matrix (endmember matrix) and corresponding proportion matrix (abundance matrix).Respecting the physical constraints, nonnegative constraints on the endmembers and abundances, and abundance sum-to-one constraint (ASC) are imposed.The NMF algorithm has the characteristics of intuition and interpretability.However, due to the existence of large number of unknown dependent variables, the solution space of NMF model is too large.To restrict its solution space, many NMF variants are proposed by adding constraints on the abundance or endmember [10][11][12][13][14][15][16].Miao et al. incorporated a volume constraint of endmember into the NMF formulation and proposed a minimum volume constrained NMF (MVC-NMF) model [10], which can perform unsupervised endmember extraction from highly mixed image data without the pure-pixel assumption.Jia et al. introduced two constraints to the NMF [11], i.e., piecewise smoothness of spectral data and sparseness of abundance fraction.Similarly, two constraints on abundance (i.e., abundance separation constraint and abundance smoothness constraint) were added into the NMF [12].Qian et al. imposed an l 1/2 -norm-based sparse constraints on the abundance and proposed an l 1/2 -NMF unmixing model [13].Lu et al. considered the manifold structure of HSI and incorporated manifold regularization into the l 1/2 -NMF [14].Wang et al. added endmember dissimilarity constraint into the NMF [15].
Although the aforementioned NMF methods improved the classical NMF unmixing model at a certain extent, they ignored the effect of noise.As the objective function of NMF is the least squares loss, NMF is sensitive to noise and corresponding unmixing results are usually inaccurate and unstable.To suppress the effect of noise and improve the robustness of the model, many robust NMF methods were proposed [17][18][19][20].He et al. proposed a sparsity-regularized robust NMF by adding a sparse matrix into the linear mixture model to model the sparse noise [17].Du et al. introduced a robust entropy-induced metric (CIM) and proposed a CIM-based NMF (CIM-NMF) model, which can effectively deal with non-Gaussian noise [18].Wang et al. proposed a robust correntropy-based NMF model (CENMF) [19], which contained a correntropy-based loss function and an l 1 -norm sparse constraint on the abundance.Based on the Huber's M-estimator, Huang et al. constructed l 2,1 -norm and l 1,2 -norm based loss functions to obtain a new robust NMF model [20,21].Defining the l 2,1 -norm (l 1,2 -norm) based loss function actually assumes that the columnwise (row-wise) approximation residual follows Laplacian (Gaussian) distribution from the viewpoint of maximum likelihood estimation (MLE).However, in practice this assumption may not hold well, especially when HSI contains complex mixture noise, such as impulse noise, stripes, deadlines, and other types of noise [22,23].
Inspired by the robust regression theory [23,24], we design the approximation residual as an MLE-like estimator and propose a robust MLE-based l 1/2 -NMF model (MLENMF) for HU.It replaces the least-squares loss in the original NMF by a robust MLE-based loss, which is a function (associated with the distribution of the approximation residuals) of the approximation residuals [24].The proposed MLENMF can be converted to a weighted l 1/2 -NMF model and can be solved by a re-weighted multiplication update iteration algorithm [9,13].By choosing an appropriate weight function, MLENMF can automatically assign small weights to bands with large residuals, which can effectively reduce the effect of noisy bands and improve the unmixing accuracy.Experimental results on simulated and real hyperspectral data sets show the superiority of MLENMF over existing NMF methods.
The rest of the paper is organized as follows.Section 2 introduces the NMF and l 1/2 -NMF.Section 3 describes our proposed MLENMF method.The experimental results and analysis are provided in Section 4. Section 5 discusses the effect of parameters in the algorithm.Finally, Section 6 concludes the paper.

NMF Unmixing Model
Under the linear spectral mixing mechanism, an observed spectral h ∈ R M×1 can be represented linearly by the endmember z 1 , • • • , z P [4,[10][11][12][13]: where Z = [z 1 , • • • , z P ] ∈ R M×P represents the endmember matrix, s ∈ R P×1 is the coefficient (abundance) vector, and is the residual.Applying the above linear mixing model ( 1) for all hyperspectral pixels h 1 , • • • , h N , the following matrix representation can be obtained: where are nonnegative hyperspectral data matrix and abundance matrix, respectively.E ∈ R M×N is the residual matrix.
In Equation ( 2), to make the decomposition result as accurate as possible, the residual should be minimized.Then, an NMF unmixing model can be obtained by considering the nonnegative property of endmember and abundance matrices: min where • F denotes the Frobenius norm, and Z ≥ 0 means that each element of Z is nonnegative.As each column of abundance matrix S records the proportion of endmembers in representing a pixel, the columns of S (each one corresponding to a pixel) should satisfy the sum-to-one constraint, i.e., ∑ P p=1 S pn = 1, n = 1, • • • , N. The above NMF Model (3) can be easily solved by the multiplication update algorithm [9,13].However, its solution space is very large [13].To restrict the solution space, an l 1/2 -constraint can be added to the abundance matrix S, and an l 1/2 -NMF model can be obtained as [13]: min where λ is a regularization parameter and S 1/2 is the l 1/2 -regularizer [13].As proved in Refs.[13,25], l 1/2 -regularizer is a good choice in enforcing the sparsity of hyperspectral unmixing because the sparsity of the l q (1/2 ≤ q < 1) solution increases as q decreases, whereas the sparsity of the solution for l q (0 < q ≤ 1/2) shows little change with respect to q.Meanwhile, the sparsity represented by l 1/2 also enforces the volume of the simplex to be minimized [13].

MLENMF Unmixing Model
In the NMF model (3) or (4), the objective function H − ZS 2 F is the least-squares (LS) function which is sensitive to noise.Here, we employ a new robust MLE-based loss to replace the LS objective function and propose an MLE-based NMF (MLENMF) model for HU.
Firstly, the matrix norm form is transformed into vector norm form: where H i is the i-th row of matrix H.We can regard the least squares objective function as the sum of approximation residuals, and then construct an MLE-like robust estimator to approximate the minimum of objective function.Denote the approximation residual of the i-th band as e i = H i − (ZS) i 2 and define residual vector e = [e 1 , . . . ,e M ] T , the above Formula (5) can be rewritten as: Assume that e 1 , . . ., e M are independent and identically distributed (i.i.d) random variables, which follow the same probability distribution function g θ (e i ), where θ is the distribution parameter.The likelihood function can be expressed as: According to the principle of MLE, the following objective function should be minimized: where ϕ θ (e i ) = − ln g θ (e i ).If we replace the objective function H − ZS 2 F in Equation ( 4) by the loss in Equation ( 8), we can get the following optimization problem: In fact, the aim is to construct a loss function to replace the least squares function to reduce the impact of noise.To construct the loss function, we analyze its Taylor expansion.Assume that g θ is symmetric, and g θ (e i ) < g θ e j if e i > e j .We can infer that: (1) g θ (0) is global maximum of g θ and ϕ θ (0) is the global minimum of ϕ θ ; (2) ϕ θ (e i ) = ϕ θ (−e i ); (3) ϕ θ (e i ) > ϕ θ e j if e i > e j .For simplicity, we assume ϕ θ (0 According to the first-order Taylor expansion around e 0 , D θ (e) can be approximated as [24]: (e − e 0 ) T W(e − e 0 ), (10) where D θ (e 0 ) is the first order derivative of D θ (e) at e 0 , and W is the Hessian matrix.We can get the mixed partial derivatives ∂ 2 D θ ∂e i ∂e j = 0 (e i = e j ) as the error residuals e i and e j are assumed i.i.d., and hence W is a diagonal matrix.Taking the derivative of D θ (e) with respect to e, it gets D θ (e) = D θ (e 0 ) + W(e − e 0 ).
As ϕ θ (0) = 0 is the global minimum of ϕ θ , the minimum of D θ (e) is D θ (0).D θ (e) should also reach its minimum at e = 0 for it is an approximation of D θ (e), so D θ (0) = 0 and then we can derive the following formulas from Equation (11): where W i,i is the i-th diagonal element of W. Denote w i = W i,i , Equation (13) can be written as As ϕ θ (x) is a nonlinear and nonconvex function, it is difficult to solve the model (9) directly.Inspired by the above Formula (14), we can get: and then the Model (9) can be expressed as a weighted NMF model: min The objective function of Model ( 16) can be rewritten as: where Then, the Model ( 16) can be expressed as: It is easy to see that model ( 18) is also an l 1/2 -NMF algorithm, and can be solved by the multiplication update iteration rule as follows [9,13]: The final endmember matrix is Z = W − 1 2 Z.In the model (18), a key factor is the weight.In this paper, the weight function is set as the logistic function [23,24,26]: where γ, τ are positive scalars.Parameter γ controls the decreasing rate from 1 to 0, and τ controls the location of demarcation point [24].It is clear that the value of weight function decreases rapidly with the increase of residual e i .MLE weight function in Equation ( 21) can approximate the weight of commonly used loss functions, such as l 2,1 , maximum correntropy and Huber weights.
When γ = 2 and τ → 0 , the MLE weight function is: which is close to l 2,1 weight: 1 1+e 2 i .The corresponding weights are shown as red and blue lines in Figure 1a.
When γ = 1 σ 2 and τ → 0 , the MLE weight function is: which is close to the weight of maximum correntropy criterion: exp − (σ is a parameter).The corresponding weights are shown in Figure 1b.Based on Equations ( 14) and ( 21), the objective function of MLE can be obtained From Equations ( 25) and ( 8), we can see that the probability distribution fun  ( ) has the form: If  = 0,  → 0, the probability distribution function  ( ) is actually a Gau distribution: In this case, the weight defined in Equation ( 21) is: ω = 1/2, which is the LS ca In Figure 2a, we compare the MLE objective function with the LS loss function.objective function is controlled by the parameters , , and is truncated to a consta large residuals (e.g., | | > 2).As the constant has no effect on the optimization m the negative effect of noise (points with large residuals) can be automatically dimini Compared with the MLE function, LS loss function is global and increases quadrat as the increase of residual.When there has heavy noise, the objective function of LS m will be dominated by the points with heavy noise.Figure 2b shows the influence fun [22,27] of MLE and LS.The influence function of a loss () is defined as:  ()/, which measures the robustness of loss function as the increase of error resi For residual  > 0, the influence function of MLE increases first, then decrease By choosing appropriate parameters, the MLE weight can also approximate the Huber weight: as shown in Figure 1c.
Based on Equations ( 14) and ( 21), the objective function of MLE can be obtained as: From Equations ( 8) and ( 25), we can see that the probability distribution function g θ (e i ) has the form: If τ = 0, γ → 0 , the probability distribution function g θ (e i ) is actually a Gaus- sian distribution: In this case, the weight defined in Equation ( 21) is: ω i = 1/2, which is the LS case.
In Figure 2a, we compare the MLE objective function with the LS loss function.MLE objective function is controlled by the parameters γ, τ, and is truncated to a constant for large residuals (e.g., |e i | > 2).As the constant has no effect on the optimization model, the negative effect of noise (points with large residuals) can be automatically diminished.Compared with the MLE function, LS loss function is global and increases quadratically as the increase of residual.When there has heavy noise, the objective function of LS model will be dominated by the points with heavy noise.Figure 2b shows the influence function [22,27] of MLE and LS.The influence function of a loss ϕ(e) is defined as: ψ(e) = ∂ϕ(e)/∂e, which measures the robustness of loss function as the increase of error residual.For residual e i > 0, the influence function of MLE increases first, then decreases and finally reaches the zero value.It means that larger errors finally have no effect on the MLE-based model.However, the influence function of LS continues to grow linearly.So, the LS loss function is seriously affected by noise.In the presence of noise, MLE is obviously more robust than LS.
finally reaches the zero value.It means that larger errors finally have no effect on t based model.However, the influence function of LS continues to grow linearly.S loss function is seriously affected by noise.In the presence of noise, MLE is ob more robust than LS.The procedure of the proposed MLENMF is shown in Algorithm 1.
1. Initialize Z (0) = Z 0 , S (0) = S 0 , v = 1, W = I 2. Run the following steps until convergence: (a) Compute the errors: (b) Calculate the weight of each entry: (c) Compute the weighted matrices: (d) Updating endmember matrix and weighted abundance matrix: Remark.In the current method, it assumes that different bands are independent and then an MLE solution can be deduced.The band independence assumption is only used in the derivation of MLE estimator.By means of this assumption, it can finally generate a weighted NMF model where the weight function can be used to reduce the effect of noisy bands.Although hyperspectral bands are not independent from each other in practice, the final weighted NMF model (i.e., MLENMF) can still alleviate negative effects of noise.

Evaluation Metrics
Spectral angular distance (SAD) and root mean square error (RMSE) are used to quantitatively evaluate the accuracy of estimated endmembers and abundances.
The formula of SAD is: where SAD k represents the similarity between the k-th real endmember z k and estimated endmember ẑk .The RMSE is: where s k and ŝk are the k-th real and estimated abundance maps (i.e., the k-th row vector in S and Ŝ), respectively.N is the number of pixels in HSI.

Implementation Details
The vertex component analysis (VCA) and fully constrained least squares (FCLS) methods are used to generate the initial endmember Z 0 and abundance S 0 for different unmixing methods [11][12][13][14][15][16][17][18][19].The regularization parameter λ in l 1/2 -NMF and CENMF is dependent on the sparsity of the material abundances and is estimated based on the sparseness criterion in Ref. [13].The parameter of CIMNMF and Huber-NMF are set to be the recommended values in Ref. [18].The proposed MLENMF contains two parameters γ and τ as shown in Equation (21).It is clear that τ is related to the amplitude of residual e 2 i .For different data sets, the amplitude of residuals may be different.So, it is difficult to determine a specific value of τ.Here, we set τ in a data-dependent way: τ is the (100ξ)-th percentile of residual vector e = e 2 1 , . . ., e 2 M T , where ξ ∈ (0, 1] controls the ratio of inliers.
Tables 1 and 2 show the average results of 20 random experiments under different degrees of noise.Each SAD (RMSE) value is the mean of SAD (RMSE) over seven endmembers.It is clear that the performance of different methods are improved as the increase of SNR or µ, and MLENMF shows better results in different degrees of noise.To visualize the results of different methods, the real and estimated spectra for the endmember 1 (i.e., "Carnallite NMNH98011") at µ = 20 are shown in Figure 3. Here, we only show the results for the endmember 1 due to space limitations.Similar good results are obtained for the other endmembers.It can be seen that the spectral curve estimated by the MLENMF can well approximate the reference one while the curve of other methods exhibit deviations in amplitude from the reference spectral.As the reference spectral and estimated spectral by different methods have similar shape, the SAD of different methods show small differences as shown in Table 1.Notwithstanding, the estimated abundance map of different methods show large differences as shown in Figure 4. Taking into account both SAD and RMSE results, we can see that our MLENMF method is more robust than other NMF methods when the data contains noise.To visualize the results of different methods, the real and estimated spectra for the endmember 1 (i.e., "Carnallite NMNH98011") at  = 20 are shown in Figure 3. Here, we only show the results for the endmember 1 due to space limitations.Similar good results are obtained for the other endmembers.It can be seen that the spectral curve estimated by the MLENMF can well approximate the reference one while the curve of other methods exhibit deviations in amplitude from the reference spectral.As the reference spectral and estimated spectral by different methods have similar shape, the SAD of different methods show small differences as shown in Table 1.Notwithstanding, the estimated abundance map of different methods show large differences as shown in Figure 4. Taking into account both SAD and RMSE results, we can see that our MLENMF method is more robust than other NMF methods when the data contains noise.

Experiments on Real Data
Two real hyperspectral unmixing data sets, i.e., Urban and Japser are used to evaluate the performance of different NMF unmixing methods (Available at https://rslab.ut.ac.ir/data,https://sites.google.com/site/feiyunzhuhomepage/datasetsground-truths,accessed on 2 July 2019).The Urban data was obtained by the HYDICE sensor.This scene has the size of 307 × 307 pixels and each pixel corresponds to an 2 × 2 m area.The original data has 210 bands, where band 1-4, 76, 87, 101-111, 136-153, 198-210 are severely affected by dense water vapor and atmosphere.After removing these noisy bands, 162 bands are kept.This scene contains four reference materials: Asphalt road, Grass, Tree and Roof, which are also available at the https://rslab.ut.ac.ir/data, accessed on 2 July 2019 We first perform experiments on the Urban data with 162 bands.The parameters of MLENMF are set as:  = 0.8 and  = 1.The estimated endmembers and abundances by different unmixing methods are compared with the groundtruth references and then the SAD and RMSE results are computed, as shown in Tables 3 and 4, respectively.Compared with other NMF methods, the proposed MLENMF shows better overall results.Figure 5 shows the estimated endmembers by different methods.It can be seen that the other methods cannot well estimate the endmember 'Roof', while our MLENMF generates spectral curve that is similar to the reference signature.From the abundance maps in Figure 6, we can see that the maps of MLENMF are more consistent with the reference maps than comparison methods.

Experiments on Real Data
Two real hyperspectral unmixing data sets, i.e., Urban and Japser are used to evaluate the performance of different NMF unmixing methods (Available at https://rslab.ut.ac.ir/data, https://sites.google.com/site/feiyunzhuhomepage/datasets-ground-truths,accessed on 2 July 2019).The Urban data was obtained by the HYDICE sensor.This scene has the size of 307 × 307 pixels and each pixel corresponds to an 2 × 2 m 2 area.The original data has 210 bands, where band 1-4, 76, 87, 101-111, 136-153, 198-210 are severely affected by dense water vapor and atmosphere.After removing these noisy bands, 162 bands are kept.This scene contains four reference materials: Asphalt road, Grass, Tree and Roof, which are also available at the https://rslab.ut.ac.ir/data, accessed on 2 July 2019.
We first perform experiments on the Urban data with 162 bands.The parameters of MLENMF are set as: ξ = 0.8 and c = 1.The estimated endmembers and abundances by different unmixing methods are compared with the groundtruth references and then the SAD and RMSE results are computed, as shown in Tables 3 and 4, respectively.Compared with other NMF methods, the proposed MLENMF shows better overall results.Figure 5 shows the estimated endmembers by different methods.It can be seen that the other methods cannot well estimate the endmember 'Roof', while our MLENMF generates spectral curve that is similar to the reference signature.From the abundance maps in Figure 6, we can see that the maps of MLENMF are more consistent with the reference maps than comparison methods.To test the unmixing performance of different methods in the case of noisy bands, we also calculate the SAD and RMSE for the Urban data with the whole 210 bands and show the results in Tables 5 and 6, respectively.The parameters of MLENMF in this case are set as: ξ = 0.4 and c = 10.Even with some known bad bands, our MLENMF also provides the best results.The Japser data is collected by the AVIRIS sensor, covering a spectral range of 380 to 2500 nm, with a total of 224 spectral bands, including 26 noisy bands.The spectral resolution is 9.46 nm, and the image size is 100 × 100.The image mainly contains four materials: Tree, Water, Soil, and Road.The parameters of MLENMF are set as: ξ = 0.4 and c = 1.The SAD and RMSE results of different unmixing methods on this data set are shown in Tables 7 and 8, respectively.Figures 7 and 8 show the estimated endmembers and abundance maps of different methods.It can be seen from these results that the proposed MLENMF can provide more accurate estimation on the endmembers and abundances.
Parameter c and ξ control the decreasing rate and the location of truncation point, respectively.The larger the value of c, the greater the degree of truncation.The smaller the value of ξ, the more forward the position of the truncation point.As shown in Figure 9, when the noise or residual is large, it is better to choose a larger c and a smaller ξ that truncates the weight of larger residuals to a constant (seeing the red dotted line).

Discussion
As described in Section 4.2,  is the (100)-th percentile of resid [ , . . .,  ] , and  = /,  ∈ (0,10],  ∈ (0,1].By tuning the paramete MLE objective function in Equation ( 26) can be truncated, as shown in Fig  and  control the decreasing rate and the location of truncation point, r larger the value of , the greater the degree of truncation.The smaller th more forward the position of the truncation point.As shown in Figure 9, or residual is large, it is better to choose a larger  and a smaller  th weight of larger residuals to a constant (seeing the red dotted line).We take the Urban data set as an example to show the effect of parameters c and ξ. Figure 10 shows the SAD results of MLENMF on Urban data with 210 bands.The results in Figure 10a are obtained by fixing ξ = 0.4 and changing c in the set {0.1, 0, 5, 1, 2, 5, 10}.When ξ is fixed, larger c values correspond to better unmixing results.As shown in Figure 9, c affects the degree of truncation.If choosing a large c, the weight of large errors can be truncated to a constant (e.g., the objective function values are constant for errors larger than 1.5, showing as the red solid line in Figure 9).As their objective function values are constant, they have no influence on the model.For Urban data with all 210 bands, MLENMF with a larger c can effectively alleviate the effect of noisy bands.By fixing c = 10 and changing ξ in the set {0.1, 0.2, 0, 4, 0.6, 0.8, 1}, Figure 10b shows the SAD of MLENMF versus parameter ξ.It is better to set the parameter ξ in the interval [0.4 0.8] when c is fixed.Parameter ξ determines the ratio of inliers.As the data contains noisy bands, the value of ξ should be less than 1.
When the known noisy bands on the Urban data are removed, the experimental results on Urban data with 162 bands are obtained at fixing ξ = 0.8 and c = 1, respectively.The results are shown in Figure 11.From Figure 11a, we can see that the proposed MLENMF is not sensitive to parameter c because different c values generate similar results for small errors in the case of low noise or no noise data as shown in Figure 9. From Figure 11b, the best result is achieved at ξ = 1, which means that the data points are almost inliers.When the known noisy bands on the Urban data are removed, the experim sults on Urban data with 162 bands are obtained at fixing  = 0.8 and  = 1 tively.The results are shown in Figure 11.From Figure 11a, we can see that the p MLENMF is not sensitive to parameter  because different  values generate si sults for small errors in the case of low noise or no noise data as shown in Figure Figure 11b, the best result is achieved at  = 1, which means that the data poin most inliers.The above analysis recommends setting the parameter  in the interval [0.4 data with heavy noise,  can be set to be a small value, such as  = 0.4.Param chosen in the interval [1,10].For data with heavy noise, it can set  = 10.Othe moderate value  = 1 is recommended.

Conclusions
This paper proposes a maximum likelihood estimation-based nonnegativ factorization (MLENMF) model for hyperspectral unmixing.The proposed MLEN ploys an MLE-like loss function that replaces the least-squares loss function in t model.The MLE-like loss is a robust loss, which can truncate the objective functi of noise and can reduce their negative effects on the unmixing model.Experim sults on a simulated data and two real data sets (Urban and Jasper) show that posed MLENMF model has obvious noise suppression effect and can obtain mo rate unmixing results.In the current model, it assumes that different bands are in ent and then an MLE solution can be deduced.Notwithstanding, in practice the The above analysis recommends setting the parameter ξ in the interval [0.4 0.8].For data with heavy noise, ξ can be set to be a small value, such as ξ = 0.4.Parameter c is chosen in the interval [1,10].For data with heavy noise, it can set c = 10.Otherwise, a moderate value c = 1 is recommended.

Conclusions
This paper proposes a maximum likelihood estimation-based nonnegative matrix factorization (MLENMF) model for hyperspectral unmixing.The proposed MLENMF employs an MLE-like loss function that replaces the least-squares loss function in the NMF model.The MLE-like loss is a robust loss, which can truncate the objective function value of noise and can reduce their negative effects on the unmixing model.Experimental results on a simulated data and two real data sets (Urban and Jasper) show that the proposed MLENMF model has obvious noise suppression effect and can obtain more accurate unmixing results.In the current model, it assumes that different bands are independent and then an MLE solution can be deduced.Notwithstanding, in practice the assumption of band independence is not generally valid.Taking also into account the dependence

Figure 2 .Figure 2 .
Figure 2. Comparison of objective function (a) and influence function (b) between MLE and LS.

Figure 3 .
Figure 3.The reference and estimated spectra for the endmember 1 of simulated data.

Figure 3 .
Figure 3.The reference and estimated spectra for the endmember 1 of simulated data.

Figure 4 .
Figure 4.The reference and estimated abundance maps for the endmember 1 of simulated data.

Figure 4 .
Figure 4.The reference and estimated abundance maps for the endmember 1 of simulated data.

Figure 6 .
Figure 6.Comparison of abundance maps estimated by different algorithms for Urban data with 162 bands for different materials: (a) Asphalt Road, (b) Grass, (c) Tree, (d) Roof.

Figure 6 .
Figure 6.Comparison of abundance maps estimated by different algorithms for Urban data with 162 bands for different materials: (a) Asphalt Road, (b) Grass, (c) Tree, (d) Roof.

Figure 8 .
Figure 8.Comparison of abundance estimated by different methods on Japser data for different materials: (a) Tree, (b) Water, (c) Soil, (d) Road.Figure 8. Comparison of abundance estimated by different methods on Japser data for different materials: (a) Tree, (b) Water, (c) Soil, (d) Road.

Figure 9 .
Figure 9.Comparison of MLE objective function and LS under different paramete

4 DFigure 9 .
Figure 9.Comparison of MLE objective function and LS under different parameters.

Figure 10 .
Figure 10.The SAD results under different parameter settings on Urban data with 210 bands.(a) SAD versus c at ξ = 0.4.(b) SAD versus ξ at c = 10.

gure 11 .
The SAD results under different parameter settings on Urban data with 162 bands.(a) SAD versus  at 8. (b) SAD versus  at  = 1.

Figure 11 .
Figure 11.The SAD results under different parameter settings on Urban data with 162 bands.(a) SAD versus c at ξ = 0.8.(b) SAD versus ξ at c = 1.

Table 1 .
The SAD results of different unmixing methods for simulated data.

Table 2 .
The RMSE results of different unmixing methods for simulated data.

Table 1 .
The SAD results of different unmixing methods for simulated data.

Table 2 .
The RMSE results of different unmixing methods for simulated data.

Table 3 .
The SAD results of different methods for Urban data with 162 bands.

Table 4 .
The RMSE results of different methods for Urban data with 162 bands.

Table 3 .
The SAD results of different methods for Urban data with 162 bands.

Table 4 .
The RMSE results of different methods for Urban data with 162 bands.

Table 5 .
The SAD results of different methods for Urban data with 210 bands.

Table 6 .
The RMSE results of different methods for Urban data with 210 bands.

Table 7 .
The SAD results of different methods for Jasper data.

Table 8 .
The RMSE results of different methods for Jasper data.