Next Article in Journal
Optimization in VHTS Satellite System Design with Irregular Beam Coverage for Non-Uniform Traffic Distribution
Previous Article in Journal
Deep Neural Network Utilizing Remote Sensing Datasets for Flood Hazard Susceptibility Mapping in Brisbane, Australia
Previous Article in Special Issue
Hyperspectral Image Classification with Localized Graph Convolutional Filtering
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Maximum Likelihood Estimation Based Nonnegative Matrix Factorization for Hyperspectral Unmixing

1
Hubei Key Laboratory of Applied Mathematics, Faculty of Mathematics and Statistics, Hubei University, Wuhan 430062, China
2
School of Finance, Anhui University of Finance & Economics, Bengbu 233030, China
*
Author to whom correspondence should be addressed.
Remote Sens. 2021, 13(13), 2637; https://doi.org/10.3390/rs13132637
Submission received: 21 May 2021 / Revised: 29 June 2021 / Accepted: 3 July 2021 / Published: 5 July 2021

Abstract

:
Hyperspectral unmixing (HU) is a research hotspot of hyperspectral remote sensing technology. As a classical HU method, the nonnegative matrix factorization (NMF) unmixing method can decompose an observed hyperspectral data matrix into the product of two nonnegative matrices, i.e., endmember and abundance matrices. Because the objective function of NMF is the traditional least-squares function, NMF is sensitive to noise. In order to improve the robustness of NMF, this paper proposes a maximum likelihood estimation (MLE) based NMF model (MLENMF) for unmixing of hyperspectral images (HSIs), which substitutes the least-squares objective function in traditional NMF by a robust MLE-based loss function. Experimental results on a simulated and two widely used real hyperspectral data sets demonstrate the superiority of our MLENMF over existing NMF methods.

Graphical Abstract

1. Introduction

A hyperspectral image (HSI) can be represented as a three-dimensional data cube, containing both spectral and spatial information to characterize radiation properties, spatial distribution and geometric characteristics of ground objects [1,2]. Compared with panchromatic, RGB and multispectral pictures that have only several broad bands, HSI usually has hundreds of spectral bands. The rich spectral information of HSI can be used to discriminate subtle differences between similar ground objects, which makes HSI suitable for different applications, such as target recognition, mineral detection, precision agriculture [1,2,3]. Due to the scattering of ground surface and low spatial resolution of the hyperspectral sensor, an observed HSI pixel is often a mixture of multiple ground materials [4,5,6]. This is the so called “mixed pixel”. The presence of “mixed pixels” seriously affects the application of HSIs. To address the problem of mixed pixels, hyperspectral unmixing (HU) techniques have been developed [4,5,6,7,8]. HU aims to decompose a mixed spectral into a collection of pure spectra (endmembers) while also providing the corresponding fractions (abundances). In terms of the spectral mixture mechanism, HU algorithms can be roughly categorized into linear and non-linear ones [4,5]. Although, in general, the nonlinear mixing assumption represents the most-real cases better, the linear mixing assumption (although more simplified) has been proved to work very satisfactory in many cases in practice. Taking into account its mathematical tractability, it has attracted significant attention from the scientific community. For these reasons, the linear mixture model is adopted in the present paper, in which a measured spectral can be represented as a linear combination of several endmembers.
Nonnegative matrix factorization (NMF) is a widely used linear HU method [9,10,11,12,13,14,15,16,17,18,19,20]. In this framework, HU is regarded as a blind source separable problem, and decomposes an observed HSI matrix into the product of the pure pixel matrix (endmember matrix) and corresponding proportion matrix (abundance matrix). Respecting the physical constraints, nonnegative constraints on the endmembers and abundances, and abundance sum-to-one constraint (ASC) are imposed. The NMF algorithm has the characteristics of intuition and interpretability. However, due to the existence of large number of unknown dependent variables, the solution space of NMF model is too large. To restrict its solution space, many NMF variants are proposed by adding constraints on the abundance or endmember [10,11,12,13,14,15,16]. Miao et al. incorporated a volume constraint of endmember into the NMF formulation and proposed a minimum volume constrained NMF (MVC-NMF) model [10], which can perform unsupervised endmember extraction from highly mixed image data without the pure-pixel assumption. Jia et al. introduced two constraints to the NMF [11], i.e., piecewise smoothness of spectral data and sparseness of abundance fraction. Similarly, two constraints on abundance (i.e., abundance separation constraint and abundance smoothness constraint) were added into the NMF [12]. Qian et al. imposed an l 1 / 2 -norm-based sparse constraints on the abundance and proposed an l 1 / 2 -NMF unmixing model [13]. Lu et al. considered the manifold structure of HSI and incorporated manifold regularization into the l 1 / 2 -NMF [14]. Wang et al. added endmember dissimilarity constraint into the NMF [15].
Although the aforementioned NMF methods improved the classical NMF unmixing model at a certain extent, they ignored the effect of noise. As the objective function of NMF is the least squares loss, NMF is sensitive to noise and corresponding unmixing results are usually inaccurate and unstable. To suppress the effect of noise and improve the robustness of the model, many robust NMF methods were proposed [17,18,19,20]. He et al. proposed a sparsity-regularized robust NMF by adding a sparse matrix into the linear mixture model to model the sparse noise [17]. Du et al. introduced a robust entropy-induced metric (CIM) and proposed a CIM-based NMF (CIM-NMF) model, which can effectively deal with non-Gaussian noise [18]. Wang et al. proposed a robust correntropy-based NMF model (CENMF) [19], which contained a correntropy-based loss function and an l 1 -norm sparse constraint on the abundance. Based on the Huber’s M-estimator, Huang et al. constructed l 2 , 1 -norm and l 1 , 2 -norm based loss functions to obtain a new robust NMF model [20,21]. Defining the l 2 , 1 -norm ( l 1 , 2 -norm) based loss function actually assumes that the column-wise (row-wise) approximation residual follows Laplacian (Gaussian) distribution from the viewpoint of maximum likelihood estimation (MLE). However, in practice this assumption may not hold well, especially when HSI contains complex mixture noise, such as impulse noise, stripes, deadlines, and other types of noise [22,23].
Inspired by the robust regression theory [23,24], we design the approximation residual as an MLE-like estimator and propose a robust MLE-based l 1 / 2 -NMF model (MLENMF) for HU. It replaces the least-squares loss in the original NMF by a robust MLE-based loss, which is a function (associated with the distribution of the approximation residuals) of the approximation residuals [24]. The proposed MLENMF can be converted to a weighted l 1 / 2 -NMF model and can be solved by a re-weighted multiplication update iteration algorithm [9,13]. By choosing an appropriate weight function, MLENMF can automatically assign small weights to bands with large residuals, which can effectively reduce the effect of noisy bands and improve the unmixing accuracy. Experimental results on simulated and real hyperspectral data sets show the superiority of MLENMF over existing NMF methods.
The rest of the paper is organized as follows. Section 2 introduces the NMF and l 1 / 2 -NMF. Section 3 describes our proposed MLENMF method. The experimental results and analysis are provided in Section 4. Section 5 discusses the effect of parameters in the algorithm. Finally, Section 6 concludes the paper.

2. NMF Unmixing Model

Under the linear spectral mixing mechanism, an observed spectral h R M × 1 can be represented linearly by the endmember z 1 , , z P [4,10,11,12,13]:
h = Z s + ϵ ,
where Z = [ z 1 , , z P ] R M × P represents the endmember matrix, s R P × 1 is the coefficient (abundance) vector, and ϵ is the residual. Applying the above linear mixing model (1) for all hyperspectral pixels h 1 , , h N , the following matrix representation can be obtained:
H = Z S + E ,
where H = [ h 1 , , h N ] R M × N , S = [ s 1 , , s N ] R P × N are nonnegative hyperspectral data matrix and abundance matrix, respectively. E R M × N is the residual matrix.
In Equation (2), to make the decomposition result as accurate as possible, the residual should be minimized. Then, an NMF unmixing model can be obtained by considering the nonnegative property of endmember and abundance matrices:
min Z , S   H Z S F 2 ,     s . t . ,   Z 0 , S 0 ,
where · F denotes the Frobenius norm, and Z 0 means that each element of Z is nonnegative. As each column of abundance matrix S records the proportion of endmembers in representing a pixel, the columns of S (each one corresponding to a pixel) should satisfy the sum-to-one constraint, i.e., p = 1 P S p n = 1 ,   n = 1 , , N .
The above NMF Model (3) can be easily solved by the multiplication update algorithm [9,13]. However, its solution space is very large [13]. To restrict the solution space, an l 1 / 2 -constraint can be added to the abundance matrix S , and an l 1 / 2 -NMF model can be obtained as [13]:
min Z , S   H Z S F 2 + λ S 1 / 2 ,     s . t . ,   Z 0 , S 0 ,
where λ is a regularization parameter and S 1 / 2 is the l 1 / 2 -regularizer [13]. As proved in Refs. [13,25], l 1 / 2 -regularizer is a good choice in enforcing the sparsity of hyperspectral unmixing because the sparsity of the l q ( 1 / 2 q < 1 ) solution increases as q decreases, whereas the sparsity of the solution for l q ( 0 < q 1 / 2 ) shows little change with respect to q . Meanwhile, the sparsity represented by l 1 / 2 also enforces the volume of the simplex to be minimized [13].

3. MLENMF Unmixing Model

In the NMF model (3) or (4), the objective function H Z S F 2 is the least-squares (LS) function which is sensitive to noise. Here, we employ a new robust MLE-based loss to replace the LS objective function and propose an MLE-based NMF (MLENMF) model for HU.
Firstly, the matrix norm form is transformed into vector norm form:
H Z S F 2 = i = 1 M H i ( Z S ) i 2 2 ,
where H i is the i -th row of matrix H .
We can regard the least squares objective function as the sum of approximation residuals, and then construct an MLE-like robust estimator to approximate the minimum of objective function. Denote the approximation residual of the i -th band as e i = H i ( Z S ) i 2 and define residual vector e = [ e 1 ,   ,   e M   ] T , the above Formula (5) can be rewritten as:
J ( e ) = e 2 2 = i = 1 M e i 2 .
Assume that e 1 ,   ,   e M are independent and identically distributed (i.i.d) random variables, which follow the same probability distribution function g θ ( e i ) , where θ is the distribution parameter. The likelihood function can be expressed as:
J θ ( e 1 ,   ,   e n ) = i = 1 M g θ ( e i ) .
According to the principle of MLE, the following objective function should be minimized:
ln J θ = i = 1 M   φ θ ( e i ) ,
where φ θ ( e i ) = ln g θ ( e i ) . If we replace the objective function H Z S F 2 in Equation (4) by the loss in Equation (8), we can get the following optimization problem:
min Z , S i = 1 M φ θ ( e i ) + λ S 1 / 2 ,     s . t . ,   Z 0 , S 0 ,
In fact, the aim is to construct a loss function to replace the least squares function to reduce the impact of noise. To construct the loss function, we analyze its Taylor expansion. Assume that g θ is symmetric, and g θ ( e i ) < g θ ( e j ) if | e i | > | e j | . We can infer that: (1) g θ ( 0 ) is global maximum of g θ and φ θ ( 0 ) is the global minimum of φ θ ; (2) φ θ ( e i ) = φ θ ( e i ) ; (3) φ θ ( e i ) > φ θ ( e j ) if | e i | > | e j | . For simplicity, we assume φ θ ( 0 ) = 0 . Define D θ ( e ) = i = 1 M φ θ ( e i ) . According to the first-order Taylor expansion around e 0 , D θ ( e ) can be approximated as [24]:
D ˜ θ ( e ) = D θ ( e 0 ) + ( e e 0 ) T D θ ( e 0 ) + 1 2 ( e e 0 ) T W ( e e 0 ) ,
where D θ ( e 0 ) is the first order derivative of D θ ( e ) at e 0 , and W is the Hessian matrix. We can get the mixed partial derivatives 2 D θ e i e j = 0 ( e i e j ) as the error residuals e i and e j are assumed i.i.d., and hence W is a diagonal matrix. Taking the derivative of D ˜ θ ( e ) with respect to e , it gets
D ˜ θ ( e ) = D θ ( e 0 ) + W ( e e 0 ) .
As φ θ ( 0 ) = 0 is the global minimum of φ θ , the minimum of D θ ( e ) is D θ ( 0 ) . D ˜ θ ( e ) should also reach its minimum at e = 0 for it is an approximation of D θ ( e ) , so D ˜ θ ( 0 ) = 0 and then we can derive the following formulas from Equation (11):
D θ ( e 0 ) W e 0 = 0
W i , i = φ θ ( e 0 , i ) e 0 , i
where W i , i is the i -th diagonal element of W . Denote w i = W i , i , Equation (13) can be written as
φ θ ( e i ) = w i e i
As φ θ ( x ) is a nonlinear and nonconvex function, it is difficult to solve the model (9) directly. Inspired by the above Formula (14), we can get:
φ θ ( e i ) = w i e i 2 ,
and then the Model (9) can be expressed as a weighted NMF model:
min Z , S i = 1 M w i e i 2 + λ S 1 / 2 ,     s . t . ,   Z 0 , S 0 ,    
The objective function of Model (16) can be rewritten as:
i = 1 M w i e i 2 + λ S 1 / 2 = i = 1 M w i H i ( Z S ) i 2 2 + λ S 1 / 2 = i = 1 M w i H i ( w i Z S ) i 2 2 + λ S 1 / 2 = H ˜ Z ˜ S F 2 + λ S 1 / 2 ,
where H ˜ = W H , Z ˜ = W Z . Then, the Model (16) can be expressed as:
min Z , S i = 1 M H ˜ Z ˜ S F 2 + λ S 1 / 2 ,     s . t . ,   Z 0 , S 0 .
It is easy to see that model (18) is also an l 1 / 2 -NMF algorithm, and can be solved by the multiplication update iteration rule as follows [9,13]:
Z ˜     Z ˜ . ( H ˜ S T ) . / ( Z ˜ S S T )
S     S . ( Z ˜ T H ˜ ) . / ( Z ˜ T Z ˜ S + λ 2 S 1 2 )
The final endmember matrix is Z = W 1 2 Z ˜ .
In the model (18), a key factor is the weight. In this paper, the weight function is set as the logistic function [23,24,26]:
w i w ( e i ) = 1 1 + exp ( γ ( τ e i 2 ) ) = exp ( γ ( τ e i 2 ) ) 1 + exp ( γ ( τ e i 2 ) ) ,
where γ , τ are positive scalars. Parameter γ controls the decreasing rate from 1 to 0, and τ controls the location of demarcation point [24]. It is clear that the value of weight function decreases rapidly with the increase of residual e i .
MLE weight function in Equation (21) can approximate the weight of commonly used loss functions, such as l 2 , 1 , maximum correntropy and Huber weights.
When γ = 2 and τ 0 , the MLE weight function is:
w ( e i ) = 1 1 + exp ( 2 e i 2 )   f o r   s m a l l   e i   1 2 ( 1 + e i 2 )
which is close to l 2 , 1 weight: 1 1 + e i 2 . The corresponding weights are shown as red and blue lines in Figure 1a.
When γ = 1 σ 2 and τ 0 , the MLE weight function is:
w ( e i ) = 1 ( 1 + exp ( e i 2 σ 2 ) )
which is close to the weight of maximum correntropy criterion: exp ( e i 2 σ 2 ) ( σ is a parameter). The corresponding weights are shown in Figure 1b.
By choosing appropriate parameters, the MLE weight can also approximate the Huber weight:
w Huber ( e i ) = { 1 , | e i | c c | e i | , | e i | > c
as shown in Figure 1c.
Based on Equations (14) and (21), the objective function of MLE can be obtained as:
φ θ ( e i ) = 0 e i φ θ ( e i ) d e i = 0 e i e i w ( e i ) d e i = 0 e i e i exp ( γ ( τ e i 2 ) ) 1 + exp ( γ ( τ e i 2 ) ) d e i = 1 2 γ ln (   1 + exp ( γ ( τ e i 2 )   ) ) | 0 e i = 1 2 γ ln   1 + exp ( γ ( τ e i 2 ) ) 1 + exp ( γ τ )
From Equations (8) and (25), we can see that the probability distribution function g θ ( e i ) has the form:
g θ ( e i ) = ( 1 + exp ( γ ( τ e i 2 ) ) 1 + exp ( γ τ ) ) 1 2 γ
If τ = 0 , γ 0 , the probability distribution function g θ ( e i ) is actually a Gaussian distribution:
ln g θ ( e i ) = ln 1 + exp ( γ e i 2 ) 2 2 γ γ 0 exp ( γ e i 2 ) · ( e i 2 ) 1 + exp ( γ e i 2 )   γ 0   e i 2 2        
g θ ( e i ) γ 0 exp ( e i 2 2 )
In this case, the weight defined in Equation (21) is: ω i = 1 / 2 , which is the LS case.
In Figure 2a, we compare the MLE objective function with the LS loss function. MLE objective function is controlled by the parameters γ , τ , and is truncated to a constant for large residuals (e.g., | e i | > 2 ). As the constant has no effect on the optimization model, the negative effect of noise (points with large residuals) can be automatically diminished. Compared with the MLE function, LS loss function is global and increases quadratically as the increase of residual. When there has heavy noise, the objective function of LS model will be dominated by the points with heavy noise. Figure 2b shows the influence function [22,27] of MLE and LS. The influence function of a loss φ ( e ) is defined as: ψ ( e ) = φ ( e ) / e , which measures the robustness of loss function as the increase of error residual. For residual e i > 0 , the influence function of MLE increases first, then decreases and finally reaches the zero value. It means that larger errors finally have no effect on the MLE-based model. However, the influence function of LS continues to grow linearly. So, the LS loss function is seriously affected by noise. In the presence of noise, MLE is obviously more robust than LS.
The procedure of the proposed MLENMF is shown in Algorithm 1.
Algorithm 1 MLENMF.
Input: hyperspectral matrix H , the parameter γ , τ
Initialization: endmember Z 0 and abundance S 0 ,
Output: estimated endmember and abundance matrices.
 1.
Initialize Z ( 0 ) = Z 0 ,   S ( 0 ) = S 0 ,   v = 1 ,   W = I
 2.
Run the following steps until convergence:
(a) Compute the errors:
( e i 2 ) ( v ) = H i ( Z ( v 1 ) S ( v 1 ) ) i 2 2
(b) Calculate the weight of each entry:
w ( v ) ( e i ) = exp ( γ τ   γ ( e i 2 ) ( v ) ) 1 + exp ( γ τ   γ ( e i 2 ) ( v ) )
(c) Compute the weighted matrices:
H ˜ = ( W ( v ) ) 1 2 H Z ˜ ( v 1 ) = ( W ( v ) ) 1 2 Z ( v 1 )
(d) Updating endmember matrix and weighted abundance matrix:
( Z ˜ ( v ) , S ( v ) ) = L 1 / 2 NMF ( H ˜ , Z ˜ ( v 1 ) , S ( v 1 ) ) Z ( v ) = ( W ( v ) ) 1 2 Z ˜ ( v )
(e) v = v + 1
Remark. 
In the current method, it assumes that different bands are independent and then an MLE solution can be deduced. The band independence assumption is only used in the derivation of MLE estimator. By means of this assumption, it can finally generate a weighted NMF model where the weight function can be used to reduce the effect of noisy bands. Although hyperspectral bands are not independent from each other in practice, the final weighted NMF model (i.e., MLENMF) can still alleviate negative effects of noise.

4. Results

In this section, we perform experiments on a simulated and two real hyperspectral data sets to test the performance of MLENMF model and compare the results with l 1 / 2 -NMF [13], l 21 -NMF [21], CENMF [19], CIMNMF [18], and HuberNMF (HubNMF for short) [18].

4.1. Evaluation Metrics

Spectral angular distance (SAD) and root mean square error (RMSE) are used to quantitatively evaluate the accuracy of estimated endmembers and abundances.
The formula of SAD is:
SAD k = arccos ( z k T z ^ k z k · z ^ k ) ,
where SAD k represents the similarity between the k -th real endmember z k and estimated endmember z ^ k .
The RMSE is:
RMSE k = ( 1 N s k s ^ k 2 ) 1 2 ,
where s k and s ^ k are the k -th real and estimated abundance maps (i.e., the k -th row vector in S and S ^ ), respectively. N is the number of pixels in HSI.

4.2. Implementation Details

The vertex component analysis (VCA) and fully constrained least squares (FCLS) methods are used to generate the initial endmember Z 0 and abundance S 0 for different unmixing methods [11,12,13,14,15,16,17,18,19]. The regularization parameter λ in l 1 / 2 -NMF and CENMF is dependent on the sparsity of the material abundances and is estimated based on the sparseness criterion in Ref. [13]. The parameter of CIMNMF and Huber-NMF are set to be the recommended values in Ref. [18]. The proposed MLENMF contains two parameters γ and τ as shown in Equation (21). It is clear that τ is related to the amplitude of residual e i 2 . For different data sets, the amplitude of residuals may be different. So, it is difficult to determine a specific value of τ . Here, we set τ in a data-dependent way: τ is the ( 100 ξ ) -th percentile of residual vector e ˜ = [ e 1 2 ,   ,   e M 2 ] T , where ξ ( 0 , 1 ] controls the ratio of inliers. Following Reference [24], parameter γ is set as γ = c / τ , c ( 0 , 10 ] . So, in the experiments, we only need to tune the parameters ξ and c .

4.3. Experiments on Simulated Data

Seven spectral signatures (i.e., “Carnallite NMNH98011”, “Actinolite NMNHR16485”, “Andradite WS487”, “Diaspore HS416.3B”, “Erionite+Merlinoit GDS144”, “Halloysite NMNH106236”, “Hypersthene NMNHC2368”) from the USGS spectral library (https://www.usgs.gov/labs/spec-lab, accessed on 2 July 2019) are selected to construct the endmember matrix Z R 224 × 7 . Then, these seven spectra are mixed according to the method described in Ref. [28] to form the corresponding abundance matrix S R 7 × 4096 . The hyperspectral data matrix is obtained by the product of endmember and abundance matrices, i.e., H = Z S . To simulate the real situations, Gaussian noise is added into the data matrix H such that the signal noise ratio (SNR) of different bands follow the Gaussian distribution, i.e., SNR ~ N ( μ , δ 2 ) with δ = 5 . In the experiment, μ { 5 ,   10 ,   15 ,   20 } are considered. A large μ corresponds to small noise. In the MLENMF, parameter ξ is set as 0.4, and parameter c = 10 is used for μ 10 and c = 1 is used for μ 10 .
Table 1 and Table 2 show the average results of 20 random experiments under different degrees of noise. Each SAD (RMSE) value is the mean of SAD (RMSE) over seven endmembers. It is clear that the performance of different methods are improved as the increase of SNR or μ , and MLENMF shows better results in different degrees of noise.
To visualize the results of different methods, the real and estimated spectra for the endmember 1 (i.e., “Carnallite NMNH98011”) at μ = 20 are shown in Figure 3. Here, we only show the results for the endmember 1 due to space limitations. Similar good results are obtained for the other endmembers. It can be seen that the spectral curve estimated by the MLENMF can well approximate the reference one while the curve of other methods exhibit deviations in amplitude from the reference spectral. As the reference spectral and estimated spectral by different methods have similar shape, the SAD of different methods show small differences as shown in Table 1. Notwithstanding, the estimated abundance map of different methods show large differences as shown in Figure 4. Taking into account both SAD and RMSE results, we can see that our MLENMF method is more robust than other NMF methods when the data contains noise.

4.4. Experiments on Real Data

Two real hyperspectral unmixing data sets, i.e., Urban and Japser are used to evaluate the performance of different NMF unmixing methods (Available at https://rslab.ut.ac.ir/data, https://sites.google.com/site/feiyunzhuhomepage/datasets-ground-truths, accessed on 2 July 2019). The Urban data was obtained by the HYDICE sensor. This scene has the size of 307 × 307 pixels and each pixel corresponds to an 2 × 2   m 2 area. The original data has 210 bands, where band 1–4, 76, 87, 101–111, 136–153, 198–210 are severely affected by dense water vapor and atmosphere. After removing these noisy bands, 162 bands are kept. This scene contains four reference materials: Asphalt road, Grass, Tree and Roof, which are also available at the https://rslab.ut.ac.ir/data, accessed on 2 July 2019.
We first perform experiments on the Urban data with 162 bands. The parameters of MLENMF are set as: ξ = 0.8 and c = 1 . The estimated endmembers and abundances by different unmixing methods are compared with the groundtruth references and then the SAD and RMSE results are computed, as shown in Table 3 and Table 4, respectively. Compared with other NMF methods, the proposed MLENMF shows better overall results. Figure 5 shows the estimated endmembers by different methods. It can be seen that the other methods cannot well estimate the endmember ‘Roof’, while our MLENMF generates spectral curve that is similar to the reference signature. From the abundance maps in Figure 6, we can see that the maps of MLENMF are more consistent with the reference maps than comparison methods.
To test the unmixing performance of different methods in the case of noisy bands, we also calculate the SAD and RMSE for the Urban data with the whole 210 bands and show the results in Table 5 and Table 6, respectively. The parameters of MLENMF in this case are set as: ξ = 0.4 and c = 10 . Even with some known bad bands, our MLENMF also provides the best results.
The Japser data is collected by the AVIRIS sensor, covering a spectral range of 380 to 2500 nm, with a total of 224 spectral bands, including 26 noisy bands. The spectral resolution is 9.46 nm, and the image size is 100 × 100. The image mainly contains four materials: Tree, Water, Soil, and Road. The parameters of MLENMF are set as: ξ = 0.4 and c = 1 . The SAD and RMSE results of different unmixing methods on this data set are shown in Table 7 and Table 8, respectively. Figure 7 and Figure 8 show the estimated endmembers and abundance maps of different methods. It can be seen from these results that the proposed MLENMF can provide more accurate estimation on the endmembers and abundances.

5. Discussion

As described in Section 4.2, τ is the ( 100 ξ ) -th percentile of residual vector e ˜ = [ e 1 2 ,   ,   e M 2 ] T , and γ = c / τ , c ( 0 , 10 ] ,   ξ ( 0 , 1 ] . By tuning the parameters c and ξ , the MLE objective function in Equation (26) can be truncated, as shown in Figure 9. Parameter c and ξ control the decreasing rate and the location of truncation point, respectively. The larger the value of c , the greater the degree of truncation. The smaller the value of ξ , the more forward the position of the truncation point. As shown in Figure 9, when the noise or residual is large, it is better to choose a larger c and a smaller ξ that truncates the weight of larger residuals to a constant (seeing the red dotted line).
We take the Urban data set as an example to show the effect of parameters c and ξ . Figure 10 shows the SAD results of MLENMF on Urban data with 210 bands. The results in Figure 10a are obtained by fixing ξ = 0.4 and changing c in the set { 0.1 ,   0 , 5 ,   1 ,   2 ,   5 ,   10 } . When ξ is fixed, larger c values correspond to better unmixing results. As shown in Figure 9, c affects the degree of truncation. If choosing a large c , the weight of large errors can be truncated to a constant (e.g., the objective function values are constant for errors larger than 1.5, showing as the red solid line in Figure 9). As their objective function values are constant, they have no influence on the model. For Urban data with all 210 bands, MLENMF with a larger c can effectively alleviate the effect of noisy bands. By fixing c = 10 and changing ξ in the set { 0.1 ,   0.2 ,   0 , 4 ,   0.6 ,   0.8 ,   1 } , Figure 10b shows the SAD of MLENMF versus parameter ξ . It is better to set the parameter ξ in the interval [0.4 0.8] when c is fixed. Parameter ξ determines the ratio of inliers. As the data contains noisy bands, the value of ξ should be less than 1.
When the known noisy bands on the Urban data are removed, the experimental results on Urban data with 162 bands are obtained at fixing ξ = 0.8 and c = 1 , respectively. The results are shown in Figure 11. From Figure 11a, we can see that the proposed MLENMF is not sensitive to parameter c because different c values generate similar results for small errors in the case of low noise or no noise data as shown in Figure 9. From Figure 11b, the best result is achieved at ξ = 1 , which means that the data points are almost inliers.
The above analysis recommends setting the parameter ξ in the interval [0.4 0.8]. For data with heavy noise, ξ can be set to be a small value, such as ξ = 0.4 . Parameter c is chosen in the interval [1,10]. For data with heavy noise, it can set c = 10 . Otherwise, a moderate value c = 1 is recommended.

6. Conclusions

This paper proposes a maximum likelihood estimation-based nonnegative matrix factorization (MLENMF) model for hyperspectral unmixing. The proposed MLENMF employs an MLE-like loss function that replaces the least-squares loss function in the NMF model. The MLE-like loss is a robust loss, which can truncate the objective function value of noise and can reduce their negative effects on the unmixing model. Experimental results on a simulated data and two real data sets (Urban and Jasper) show that the proposed MLENMF model has obvious noise suppression effect and can obtain more accurate unmixing results. In the current model, it assumes that different bands are independent and then an MLE solution can be deduced. Notwithstanding, in practice the assumption of band independence is not generally valid. Taking also into account the dependence between different bands, improved the unmixing performance may result. However, this issue deserves further research. In addition, parameter selection is a key problem for the unmixing model. The cross-validation strategy can be considered for parameter selecting, such as dividing the whole hyperspectral image into two disjoint subimages, one for training and another for testing, and then performing cross-validation to automatically select the parameters. This also deserves research in the future.

Author Contributions

Conceptualization, Q.J., J.P., Y.D., Y.S. and M.Y.; Methodology, Q.J., J.P., Y.D. and M.Y.; Software, Q.J., J.P. and Y.D.; Validation, Q.J., Y.D., Y.S. and M.Y.; Formal Analysis, Q.J., Y.D., Y.S. and M.Y.; Investigation, Q.J., Y.D. and M.Y.; Resources, J.P. and Y.S.; Data Curation, Q.J., J.P. and Y.D.; Writing—Original Draft Preparation, Q.J., Y.D. and M.Y.; Writing—Review and Editing, Q.J., J.P., Y.D. and Y.S.; Visualization, Q.J., Y.D. and Y.S.; Supervision, J.P. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China under Grant Nos.61871177, 11771130, and by the National Key Research and Development Program of China (No. 2020YFA0714200).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Bioucas-Dias, J.M.; Plaza, A.; Camps-Valls, G.; Scheunders, P.; Nasrabadi, N.M.; Chanussot, J. Hyperspectral remote sensing data analysis and future challenges. IEEE Geosci. Remote Sens. Mag. 2013, 1, 6–36. [Google Scholar] [CrossRef] [Green Version]
  2. Zhou, Y.; Peng, J.; Chen, C.L. Philip. Dimension reduction using spatial and spectral regularized local discriminant embedding for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2015, 53, 1082–1095. [Google Scholar] [CrossRef]
  3. Ghamisi, P.; Yokoya, N.; Li, J.; Liao, W.; Liu, S.; Plaza, J.; Rasti, B.; Plaza, A. Advances in hyperspectral image and signal processing: A comprehensive overview of the state of the art. IEEE Geosci. Remote Sens. Mag. 2017, 5, 37–78. [Google Scholar] [CrossRef] [Green Version]
  4. Bioucas-Dias, J.M.; Plaza, A.; Dobigeon, N.; Parente, M.; Du, Q.; Gader, P.; Chanussot, J. Hyperspectral unmixing overview: Geometrical, statistical, and sparse regression-based approaches. IEEE J. Sel. Top. Appl. Earth Observ. Remote Sens. 2012, 5, 354–379. [Google Scholar] [CrossRef] [Green Version]
  5. Heylen, R.; Parente, M.; Gader, P. A review of nonlinear hyperspectral unmixing methods. IEEE J. Sel. Top. Appl. Earth Observ. Remote Sens. 2014, 7, 1844–1868. [Google Scholar] [CrossRef]
  6. Peng, J.; Zhou, Y.; Sun, W.; Du, Q.; Xia, L. Self-paced nonnegative matrix factorization for hyperspectral unmixing. IEEE Trans. Geosci. Remote Sens. 2021, 59, 1501–1515. [Google Scholar] [CrossRef]
  7. Iordache, M.D.; Bioucas-Dias, J.M.; Plaza, A. Total variation spatial regularization for sparse hyperspectral unmixing. IEEE Trans. Geosci. Remote Sens. 2012, 50, 4484–4502. [Google Scholar] [CrossRef] [Green Version]
  8. Hong, D.; Yokoya, N.; Chanussot, J.; Zhu, X.X. An augmented linear mixing model to address spectral variability for hyperspectral unmixing. IEEE Trans. Image Process. 2018, 28, 1923–1938. [Google Scholar] [CrossRef] [Green Version]
  9. Danie Lee, D.; Sebastian, S.H. Algorithms for non-negative matrix factorization. In Proceedings of the 13th International Conference on Neural Information Processing Systems, Denver, CO, USA, 27 November–2 December 2020; pp. 556–562. [Google Scholar]
  10. Miao, L.; Qi, H. Endmember extraction from highly mixed data using minimum volume constrained nonnegative matrix factorization. IEEE Trans. Geosci. Remote Sens. 2007, 45, 765–777. [Google Scholar] [CrossRef]
  11. Jia, S.; Qian, Y. Constrained nonnegative matrix factorization for hyperspectral unmixing. IEEE Trans. Geosci. Remote Sens. 2009, 47, 161–173. [Google Scholar] [CrossRef]
  12. Liu, X.; Xia, W.; Wang, B.; Zhang, L. An approach based on constrained nonnegative matrix factorization to unmix hyperspectral data. IEEE Trans. Geosci. Remote Sens. 2011, 49, 757–772. [Google Scholar] [CrossRef]
  13. Qian, Y.; Jia, S.; Zhou, J.; Robles-Kelly, A. Hyperspectral unmixing via L1/2 sparsity-constrained nonnegative matrix factorization. IEEE Trans. Geosci. Remote Sens. 2011, 49, 4282–4297. [Google Scholar] [CrossRef] [Green Version]
  14. Lu, X.; Wu, H.; Yuan, Y.; Yan, P.; Li, X. Manifold regularized sparse NMF for hyperspectral unmixing. IEEE Trans. Geosci. Remote Sens. 2013, 51, 2815–2826. [Google Scholar] [CrossRef]
  15. Wang, N.; Du, B.; Zhang, L. An endmember dissimilarity constrained non-negative matrix factorization method for hyperspectral unmixing. IEEE J. Sel. Top. Appl. Earth Observ. Remote Sens. 2013, 6, 554–569. [Google Scholar] [CrossRef]
  16. Peng, J.; Sun, W.; Li, W.; Li, H.; Meng, X.; Ge, C.; Du, Q. Low-rank and sparse representation for hyperspectral image processing: A review. IEEE Geosci. Remote Sens. Mag. 2021. [Google Scholar] [CrossRef]
  17. He, W.; Zhang, H.; Zhang, L. Sparsity-regularized robust non-negative matrix factorization for hyperspectral unmixing. IEEE J. Sel. Top. Appl. Earth Observ. Remote Sens. 2016, 9, 4267–4279. [Google Scholar] [CrossRef]
  18. Du, L.; Li, X.; Shen, Y. Robust nonnegative matrix factorization via half-quadratic minimization. In Proceedings of the 2012 IEEE 12th International Conference on Data Mining, Brussels, Belgium, 10–13 December 2012; pp. 201–210. [Google Scholar]
  19. Wang, Y.; Pan, C.; Xiang, S.; Zhu, F. Robust hyperspectral unmixing with correntropy-based metric. IEEE Trans. Image Process. 2015, 24, 4027–4039. [Google Scholar] [CrossRef]
  20. Huang, R.; Li, X.; Zhao, L. Spectral-spatial robust nonnegative matrix factorization for hyperspectral unmixing. IEEE Trans. Geosci. Remote Sens. 2019, 57, 8235–8254. [Google Scholar] [CrossRef]
  21. Kong, D.; Ding, C.; Huang, H. Robust nonnegative matrix factorization using L21-norm. In Proceedings of the 20th ACM International Conference on Information and Knowledge Management, Glasgow Scotland, UK, 24–28 October 2011; pp. 673–682. [Google Scholar]
  22. Peng, J.; Sun, W.; Jiang, F.; Chen, H.; Zhou, Y.; Du, Q. A general loss based nonnegative matrix factorization for hyperspectral unmixing. IEEE Geosci. Remote Sens. Lett. 2020. [Google Scholar] [CrossRef]
  23. Peng, J.; Li, L.; Tang, Y.Y. Maximum likelihood estimation based joint sparse representation for the classification of hyperspectral remote sensing images. IEEE Trans. Neural Netw. Learn. Syst. 2019, 30, 1790–1802. [Google Scholar] [CrossRef]
  24. Yang, M.; Zhang, L.; Yang, J.; Zhang, D. Robust sparse coding for face recognition. In Proceedings of the CVPR, Colorado Springs, CO, USA, 20–25 June 2011; pp. 625–632. [Google Scholar]
  25. Xu, Z.; Guo, H.; Wang, Y.; Zhang, H. Representative of L1/2 regularization among Lq (0 < q < 1) regularizations: An experimental study based on phase diagram. Acta Autom. Sin. 2012, 38, 1225–1228. [Google Scholar]
  26. Zhang, J.; Jin, R.; Yang, Y.M.; Hauptmann, A.G. Modified logistic regression: An approximation to SVM and its applications in large-scale text categorization. In Proceedings of the 20th International Conference on Machine Learning, Washington, DC, USA, 21–24 August 2003; pp. 888–895. [Google Scholar]
  27. Li, X.; Lu, Q.; Dong, Y.; Tao, D. Robust subspace clustering by Cauchy loss function. IEEE Trans. Neur. Netw. Learn. Syst. 2019, 30, 2067–2078. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  28. Feng, X.; Li, H.; Li, J.; Du, Q.; Plaza, A.; Emery, W. Hyperspectral unmixing using sparsity-constrained deep nonnegative matrix factorization with total variation. IEEE Trans. Geosci. Remote Sens. 2018, 56, 6245–6257. [Google Scholar] [CrossRef]
Figure 1. MLE weights fit l 2 , 1 weight (a), maximum correntropy weight (b), Huber weight (c).
Figure 1. MLE weights fit l 2 , 1 weight (a), maximum correntropy weight (b), Huber weight (c).
Remotesensing 13 02637 g001
Figure 2. Comparison of objective function (a) and influence function (b) between MLE and LS.
Figure 2. Comparison of objective function (a) and influence function (b) between MLE and LS.
Remotesensing 13 02637 g002
Figure 3. The reference and estimated spectra for the endmember 1 of simulated data.
Figure 3. The reference and estimated spectra for the endmember 1 of simulated data.
Remotesensing 13 02637 g003
Figure 4. The reference and estimated abundance maps for the endmember 1 of simulated data.
Figure 4. The reference and estimated abundance maps for the endmember 1 of simulated data.
Remotesensing 13 02637 g004
Figure 5. Comparison of endmembers estimated by different methods for Urban data with 162 bands for different materials: (a) Asphalt Road, (b) Grass, (c) Tree, (d) Roof.
Figure 5. Comparison of endmembers estimated by different methods for Urban data with 162 bands for different materials: (a) Asphalt Road, (b) Grass, (c) Tree, (d) Roof.
Remotesensing 13 02637 g005
Figure 6. Comparison of abundance maps estimated by different algorithms for Urban data with 162 bands for different materials: (a) Asphalt Road, (b) Grass, (c) Tree, (d) Roof.
Figure 6. Comparison of abundance maps estimated by different algorithms for Urban data with 162 bands for different materials: (a) Asphalt Road, (b) Grass, (c) Tree, (d) Roof.
Remotesensing 13 02637 g006aRemotesensing 13 02637 g006b
Figure 7. Comparison of endmembers estimated by different methods on Japser data for different materials: (a) Tree, (b) Water, (c) Soil, (d) Road.
Figure 7. Comparison of endmembers estimated by different methods on Japser data for different materials: (a) Tree, (b) Water, (c) Soil, (d) Road.
Remotesensing 13 02637 g007
Figure 8. Comparison of abundance estimated by different methods on Japser data for different materials: (a) Tree, (b) Water, (c) Soil, (d) Road.
Figure 8. Comparison of abundance estimated by different methods on Japser data for different materials: (a) Tree, (b) Water, (c) Soil, (d) Road.
Remotesensing 13 02637 g008aRemotesensing 13 02637 g008b
Figure 9. Comparison of MLE objective function and LS under different parameters.
Figure 9. Comparison of MLE objective function and LS under different parameters.
Remotesensing 13 02637 g009
Figure 10. The SAD results under different parameter settings on Urban data with 210 bands. (a) SAD versus c at ξ = 0.4 . (b) SAD versus ξ at c = 10 .
Figure 10. The SAD results under different parameter settings on Urban data with 210 bands. (a) SAD versus c at ξ = 0.4 . (b) SAD versus ξ at c = 10 .
Remotesensing 13 02637 g010
Figure 11. The SAD results under different parameter settings on Urban data with 162 bands. (a) SAD versus c at ξ = 0.8 . (b) SAD versus ξ at c = 1 .
Figure 11. The SAD results under different parameter settings on Urban data with 162 bands. (a) SAD versus c at ξ = 0.8 . (b) SAD versus ξ at c = 1 .
Remotesensing 13 02637 g011
Table 1. The SAD results of different unmixing methods for simulated data.
Table 1. The SAD results of different unmixing methods for simulated data.
μ NMF l 1 / 2 - NMF l 21 - NMF CENMFCIMNMFHubNMFMLENMF
50.49720.47540.43240.40880.47320.47180.3895
100.35130.31460.29010.28130.30930.30860.2537
150.19970.17490.17640.16260.17810.17720.1134
200.09880.09500.09360.08650.09850.09460.0689
Table 2. The RMSE results of different unmixing methods for simulated data.
Table 2. The RMSE results of different unmixing methods for simulated data.
μ NMF l 1 / 2 - NMF l 21 - NMF CENMFCIMNMFHubNMFMLENMF
50.26790.27630.27370.25860.26530.26500.2459
100.24910.25540.24970.23450.24350.24480.2114
150.19510.19560.19830.18110.19540.19200.1440
200.11370.12190.12330.11450.12080.12030.0599
Table 3. The SAD results of different methods for Urban data with 162 bands.
Table 3. The SAD results of different methods for Urban data with 162 bands.
NMF l 1 / 2 - NMF l 21 - NMF CENMFCIMNMFHubNMFMLENMF
Asphalt0.15870.10790.15760.15750.16340.16010.1127
Grass0.29540.24770.29830.28570.29220.28260.0543
Tree0.19190.10580.19170.19430.19380.19230.1011
Roof0.67090.57480.68410.65010.67270.65340.0914
Mean0.32920.25910.33290.32190.33050.32210.0899
Table 4. The RMSE results of different methods for Urban data with 162 bands.
Table 4. The RMSE results of different methods for Urban data with 162 bands.
NMF l 1 / 2 - NMF l 21 - NMF CENMFCIMNMFHubNMFMLENMF
Asphalt0.21260.28430.21230.21140.21260.21270.1383
Grass0.23860.26830.23950.23910.23940.23850.1318
Tree0.16920.13870.16930.16990.16930.16910.0605
Roof0.15740.17180.15680.15740.15690.15740.0538
Mean0.19450.21580.19450.1945 0.19450.19440.0961
Table 5. The SAD results of different methods for Urban data with 210 bands.
Table 5. The SAD results of different methods for Urban data with 210 bands.
NMF l 1 / 2 - NMF l 21 - NMF CENMFCIMNMFHubNMFMLENMF
Asphalt0.12290.12340.12950.12710.13450.13120.1481
Grass0.47030.35440.46070.48540.46370.46180.1436
Tree0.31970.20570.31390.34010.30350.30480.1979
Roof0.44460.30480.40830.47050.44660.43480.3039
Mean0.33940.24710.32810.35580.33710.33320.1984
Table 6. The RMSE results of different methods for Urban data with 210 bands.
Table 6. The RMSE results of different methods for Urban data with 210 bands.
NMF l 1 / 2 - NMF l 21 - NMF CENMFCIMNMFHubNMFMLENMF
Asphalt0.24430.32040.24530.23910.24280.24410.2659
Grass0.37630.46550.37700.37610.37700.37690.2985
Tree0.33740.37110.33810.33730.33800.33800.2679
Roof0.22130.23770.22170.21550.21970.22080.2616
Mean0.29480.34870.29550.29200.29440.29490.2735
Table 7. The SAD results of different methods for Jasper data.
Table 7. The SAD results of different methods for Jasper data.
NMF l 1 / 2 - NMF l 21 - NMF CENMFCIMNMFHubNMFMLENMF
Tree0.30400.27870.29810.25030.21890.30290.0851
Water0.31560.11630.27990.29740.29680.29690.1841
Soil0.28340.16600.31860.38710.24900.18720.0908
Road0.64690.53410.66270.69530.70520.69610.2269
Mean0.38750.27380.38980.40750.36750.37080.1468
Table 8. The RMSE results of different methods for Jasper data.
Table 8. The RMSE results of different methods for Jasper data.
NMF l 1 / 2 - NMF l 21 - NMF CENMFCIMNMFHubNMFMLENMF
Tree0.23100.30320.27110.22510.22870.23500.1055
Water0.15630.17260.17380.16230.15020.15080.1029
Soil0.33820.37890.33420.32090.30450.32070.2510
Road0.23850.26340.25340.26390.23510.23370.2350
Mean0.24100.27960.25810.24300.22960.23510.1736
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Jiang, Q.; Dong, Y.; Peng, J.; Yan, M.; Sun, Y. Maximum Likelihood Estimation Based Nonnegative Matrix Factorization for Hyperspectral Unmixing. Remote Sens. 2021, 13, 2637. https://doi.org/10.3390/rs13132637

AMA Style

Jiang Q, Dong Y, Peng J, Yan M, Sun Y. Maximum Likelihood Estimation Based Nonnegative Matrix Factorization for Hyperspectral Unmixing. Remote Sensing. 2021; 13(13):2637. https://doi.org/10.3390/rs13132637

Chicago/Turabian Style

Jiang, Qin, Yifei Dong, Jiangtao Peng, Mei Yan, and Yi Sun. 2021. "Maximum Likelihood Estimation Based Nonnegative Matrix Factorization for Hyperspectral Unmixing" Remote Sensing 13, no. 13: 2637. https://doi.org/10.3390/rs13132637

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop