Hyperspectral Nonlinear Unmixing by Using Plug-and-Play Prior for Abundance Maps

Spectral unmixing (SU) aims at decomposing the mixed pixel into basic components, called endmembers with corresponding abundance fractions. Linear mixing model (LMM) and nonlinear mixing models (NLMMs) are two main classes to solve the SU. This paper proposes a new nonlinear unmixing method base on general bilinear model, which is one of the NLMMs. Since retrieving the endmembers’ abundances represents an ill-posed inverse problem, prior knowledge of abundances has been investigated by conceiving regularizations techniques (e.g., sparsity, total variation, group sparsity, and low rankness), so to enhance the ability to restrict the solution space and thus to achieve reliable estimates. All the regularizations mentioned above can be interpreted as denoising of abundance maps. In this paper, instead of investing effort in designing more powerful regularizations of abundances, we use plug-and-play prior technique, that is to use directly a state-of-the-art denoiser, which is conceived to exploit the spatial correlation of abundance maps and nonlinear interaction maps. The numerical results in simulated data and real hyperspectral dataset show that the proposed method can improve the estimation of abundances dramatically compared with state-of-the-art nonlinear unmixing methods.


Introduction
Hyperspectral remote sensing imaging is a combination of imaging technology and spectral technology. It can obtain two-dimensional spatial information and spectral information of target objects simultaneously [1][2][3]. Benefiting from the rich spectral information, hyperspectral images (HSIs) can be used to identity materials precisely. Hence, HSIs have been playing a key role in earth observation and used in many fields, including mineral exploration, water pollution, and vegetation [3][4][5][6][7][8][9]. However, due to the low spatial resolution, mixed pixels always exist in HSIs, and it is one of the main reasons that preclude the widespread use of HSIs in precise target detection and classification applications. So it is necessary to develop the technique of unmixing [2,3,[10][11][12][13][14]. Thanks to the rich band information of hyperspectral images, which allows us to design an effective solution to the problem of mixed pixels. Hyperspectral unmixing (HU) is the process of obtaining the basic components (called endmembers)

•
We exploit spatial correlation of abundance maps through plug-and-play technique. The idea of the plug-and-play technique was firstly applied to the problem of hyperspectral nonlinear unmixing. We propose a general nonlinear unmxing framework that can be embedded with any state-of-the-art denoisers. • We tested two state-of-the-art denoisers, namely BM3D and DnCNN, and both of them yield more accurate estimates of abundances than other state-of-the-art GBM-based nonlinear unmixing algorithms.
The rest of the article is structured as follows. Section 2 introduces the related works and the proposed plug-and-play prior based hyperspectral nonlinear unmixing framework. Experimental results and analysis for the synthetic data are illustrated in Section 3. The real hyperspectral dataset experiments and analysis are described in Section 4. Section 5 concludes the paper.

Symbols and definitions
We first introduce the notation and definitions used in the paper. An nth-order tensor is identified using Euler-cript letters-for example, Q ∈ R k 1 ×k 2 ×...×k i ×...×k n , with the k i is the size of the corresponding dimension i. Hence, an HSI can be naturally represented as a third-order tensor, T ∈ R k 1 ×k 2 ×k 3 , which consists of k 1 × k 2 pixels and k 3 spectral bands. Three further definitions related to tensors are given as follows.
Definition 1. The dimension of a tensor is called the mode: Q ∈ R k 1 ×k 2 ×...×k i ×...×k n has n modes. For a third-order tensor T ∈ R k 1 ×k 2 ×k 3 , by fixing one mode, we can obtain the corresponding sub-arrays, called slices-for example, T :,:,i . Definition 2. The 3-mode product is denoted as G = Q × 3 X ∈ R k 1 ×k 2 ×j for a tensor Q ∈ R k 1 ×k 2 ×k 3 and a matrix X ∈ R j×k 3 . Definition 3. Given a matrix A ∈ R k 1 ×k 2 and vector c ∈ R l 1 , their outer product, denoted as A • c, is a tensor with dimensions (k 1 , k 2 , l 1 ) and entries (A • c) i1,i2,j1 = A i1,i2 c j1 .

Nonlinear Model: GBM
A general expression of nonlinear mixing models, considering the second-order photon interactions between different endmembers, is given as follows: where the y ∈ R L×1 is a pixel with L spectral bands. C = [c 1 , c 2 , ..., c R ] ∈ R L×R , a = [a 1 , a 2 , ..., a R ] T ∈ R R×1 , and n ∈ R L×1 represent the mixing matrix containing the spectral signatures of R endmembers, the fractional abundance vector, and the white Gaussian noise, respectively. The nonlinear coefficient b i,j controls the nonlinear interaction between the materials, and is a Hadamard product operation. With different specific definitions of b i,j , there are several well-known mixture models, such as GBM [15], FM [1], and PPNM [16].
To satisfy the physical assumptions and overcome the limitations of the FM [1], the GBM redefines the parameter b i,j as b i,j = γ i,j a i a j . Meanwhile, the abundance non-negativity constraint (ANC) and the abundance sum-to-one constraint (ASC) are satisfied as follows: The spectral mixing model for N pixels can be written in matrix form: where Y = [y 1 , y 2 , ..., , and N ∈ R L×N represent the observed hyperspectral image matrix, the fractional abundance matrix with N abundance vectors (the columns of A), the bilinear interaction endmember matrix, the nonlinear interaction abundance matrix, and the white Gaussian noise matrix, respectively. As for Equations (1) and (3), both of them model the the hyperspectral image with two-dimensional matrix for processing, thus destroying the internal spatial structure of the data and resulting in poor abundance inversion. However, given that the hyperspectral images can be naturally represented as a third-order tensor, we rewritten the GBM model based on tensor representation for the original hyperspectral image cube. The hyperspectral image cube Y ∈ R n row ×n col ×L can be expressed in the following format: where A ∈ R n row ×n col ×R , B ∈ R n row ×n col ×R(R−1)/2 , and N ∈ R n row ×n col ×L denote the abundance cube corresponding to R endmembers, the nonlinear interaction abundance cube, and the white Gaussian noise cube, respectively. This work aims to solve a supervised unmixing problem-that is to estimate the abundances, A, and nonlinear coefficients, B, given the spectral signatures of the endmembers, C, which are known beforehand.

Motivation
In this paper, we firstly apply the plug-and-play technique to the unmixing problem, especially to the abundance maps and interaction abundance maps for enhancing the accuracy of the estimated abundance results. The plug-and-play technique can be used as the prior information, instead of other convex regularizers [21,22].
The performance of this method is constrained by the denoiser. Two state-of-the-art denoisers, BM3D and DnCNN, are chosen for the prior information of the abundance maps [43,44]. BM3D is well-known nonlocal patch-based denoiser, which can remove noise in a natural image by taking advantage of high spatial correlation of similar patches in the image. As geographic hyperspectral data, the materials in HSIs tend to be spatially dependent, so it is very easy to find similar patches from the images. Meanwhile, the spatial distribution of a single material tends to be aggregated instead of being purely random. The texture structure of abundance maps can be illustrated with an example given in Figure 1. The unmixing of a San Diego Airport image of size 160 × 140 pixels was carried out. The first row in Figure 1 shows the abundance map of 'Ground & road' estimated by the FCLS [45] followed by an endmember estimation step (vertex component analysis (VCA) [46]). As shown in Figure 1, we can find many similar patches (marked with small yellow squares) from the abundance map. Hence, this nonlocal patch-based denoiser can be used on the abundance maps.  With the development of deep learning, convolutional neural network (CNN) based denoising methods have achieved good results. Specifically, deep network structure can effectively learn the features of images. Hence, in the paper, we also chose a well-known CNN-based denoiser as the prior of the abundance maps, named DnCNN (shown in in Figure 1). DnCNN can handle zero-mean Gaussian noise with unknown standard deviation, and residual learning is adopted to separating noise from noisy observations. Therefore, this method can effectively capture the texture structure of abundance maps.

Proposed Method: Unmixing with Nonnegative Tensor Factorization and Plug-and-Play Proir
To better represent the structure of abundance maps, mixing model (4) can be equivalently written as where A :,:,i ∈ R n row ×n col , c i ∈ R L×1 , B :,:,j ∈ R n row ×n col , and m j ∈ R L×1 denote the ith abundance slice, the ith endmember vector, the jth interaction abundance slice, and the jth interaction endmember vector, respectively. Model (5) is depicted in Figure 2.  To take full advantage of the abundance maps' prior, we propose a new unmixing method based on the Plug-and-Play (PnP) framework of abundance maps and Nonnegative Tensor Factorization, termed PnP-NTF, which aims to solve the following optimization problem: arg min where X 2 F denotes the Frobenius norm which returns the square root of the sum of the absolute squares of its elements. The symbol ψ(.) represents the plugged state-of-the-art denoiser, and 1 d represents a vector whose components are all one and whose dimension is given by its subscript.

Optimization Procedure
The optimization problem in (6) can be solved by optimization using the alternating direction method of multipliers (ADMM) [47]. To use the ADMM, first (6) is converted into an equivalent form by introducing multiple auxiliary variables V i , E j to replace A :,:,i (i = 1, ..., R), B :,:,j (j = 1, ..., R(R − 1)/2). The formulation is as follows: arg min By using the Lagrangian function, (7) can be reformulated as: where D i , H j and G are scaled dual variables [48], and µ is the penalty parameter. The variables A :,:,i , B :,:,j , V i , E j , D i , H j , G were updated sequentially: this step is shown in Algorithm 1. The solution of optimization is detailed below.
Update nonlinear map slice: B k+1 :,:,j = ( Update multiple auxiliary variable: Update multiple auxiliary variable: Updating of A The optimization problem for A :,:,i is B k :,:,j • m j ∈ R n row ×n col ×L , and O :,:,b is the bth slice.
Hence the solution for A :,:,i can be derived as follows: 2.
Updating of B The optimization problem for B :,:,j is B k+1 :,:,j = arg min B k :,:,j • m j ∈ R n row ×n col ×L , and K :,:,b is the bth slice.

Updating of V
The optimization problem for V i is 4.

Updating of E
The optimization problem for E j is where E j = B k+1 :,:,j − H k j ∈ R n row ×n col . Sub-problem (15) can be solved via PnP framework of E j , then E k+1 j can be expressed as 5. 6.
Updating of H H k+1

7.
Updating of G

Experiments and Analysis on Synthetic Data
In this section, we illustrate the performance of the proposed PnP-NTF framework on the two state-of-the-art denoising method, named BM3D and DnCNN, for the abundance estimation. We compare the proposed method with some advanced algorithms to address the GBM, which contains gradient descent algorithm (GDA) [49], the semi-nonnegative matrix factorization (semi-NMF) [50] algorithm and subspace unmixing with low-rank attribute embedding algorithm (SULoRA) [11]. For specifically, the GDA method is a benchmark to solve the GBM pixel by pixel, and semi-NMF can speed up and reduce the time loss. Meanwhile, the semi-NMF based method can consider the partial spatial information of the image. SULoRA is a general subspace unmixing method that jointly estimates subspace projections and abundance, and can model the raw subspace with low-rank attribute embedding. All of the experiments were conducted in MATLAB R2018b on a desktop of 16 GB RAM, Intel (R) Core (TM) i5-8400 CPU, @2.80 GHz.
In order to quantify the effect of the proposed method numerically, three widely metrics, including the root-mean-square error (RMSE) of abundances, the image reconstruction error (RE), and the average of spectral angle mapper (aSAM) are used. For specifically, the RMSE quantifies the difference between the estimated abundance A and the true abundances A as follows: The RE measures the difference between the observation Y and its reconstruction Y as follows: The aSAM qualifies the average spectral angle mapping of the estimated ith spectral vectorŷ l and observed ith spectral vector y. The aSAM is defined as follows:

Data Generation
In the simulated experiments, the synthetic data was generated similar to References [32,51], and the specific process is as follows: 1. Six spectral endmembers signals with 224 spectral bands were randomly chosen from the USGS digital spectral library (https://www.usgs.gov/labs/spec-lab), namely Carnallite, Ammonio-jarosite, Almandine, Brucite, Axinite, and Chlonte. 2. We generated a simulated image of size s 2 (rows) × s 2 (columns) × L(bands), which can be divided into small blocks of size s × s × L. 3. A randomly selected endmember was assigned to each block, and a k × k low-pass filter was used to generate abundance map cube of size s 2 × s 2 × R that contained mixed pixels, while satisfying the ANC and ASC constraints. 4. After obtaining the endmember information and the abundance information, then clean HSIs can be generated by the generalized bilinear model and the polynomial post nonlinear model. The interaction coefficient parameters in the GBM were set randomly, and the nonlinear coefficient parameters in the PPNM were set to 0.25. 5. To effectively evaluate the robustness performance of the proposed method on the different signal-to-noise ratio (SNR), the zero-mean Gaussian white noise was added to the clean data.

Evaluation of the Methods
The details of the simulated data can be obtained with the previous steps, then we generated a series of noisy images with SNRs = {15, 20, 30} dB to evaluate performance of the proposed method and compare with other methods.

Parameter Setting
To compare all the algorithms fairly, the parameters in the all compared methods were hand-tuned to the optimal values. Specifically, the FCLS was used to initialize the abundance information in the all methods (including the proposed method). Note that a direct compasion with FCLS unmixing results is unfair and FCLS is served as a benchmark, which shows the impact of using a linear unmixing method on nonlinear mixed images. The GDA is considered as the benchmark to solve the GBM.
The tolerances for stopping the iterations in GDA, Semi-NMF, and SULoRA were set to 1 × 10 −6 . For the proposed PnP-NTF framework based method, the parameters to be adjusted were divided into two parts, one of which is the parameter of the denoiser we chosen, and the other part is the penalty parameter µ. Firstly, the standard deviation of additive white Gaussian noise σ is searching from 0 to 255 with the step of 25, the the block size used for the hard-thresholding (HT) filtering is set as 8 in BM3D, respectively. The parameters of the DnCNN is the same as Reference [44]. Meanwhile, the penalty parameter µ was set to 8 × 10 −3 , and the tolerance for stopping the iterations was set to 1 × 10 −6 .

Comparison of Methods under Different Gaussian Noise Levels
In our experiments, we generate three images of size 64 × 64 × 224 with 4096 pixels and 224 bands. More specifically, the 'Scene1' is generated by the GBM model, and the 'Scene2' is generated by the PPNM model. The 'Scene3' is a mixture of the 'Scene1' and 'Scene2', as half pixels in 'Scene3' were generated by the GBM and the others were generated by PPNM [50]. The 'Scene1' is used to evaluate the efficiency of the proposed method to handle mixtures based on GBM, while the 'Scene2' and 'Scene3' were used to evaluate the robustness of the proposed method to mixtures based on different mixing models.
For the proposed method and the other methods, the abundances were initialized with the same method, that is FCLS. In a supervised nonlinear unmixing problem, the spectral vectors of endmember were known as a priori. Considering that the accuracy of abundance inversion depends on the quality of endmember signals, we used the true endmembers in the experiments for fair comparison. Table 1 quantifies the corresponding results of the three evaluation indicators (RMSE, RE, and aSAM) in detail on the 'Scene1'. As seen from the Table 1, the proposed PnP-NTF based framework with the advanced denoisers provide the superior unmixing results, compared with other methods. Specifically, we tested two state-of-the-art denoisers, namely BM3D and DnCNN, and both of them obtained the best performance. The RMSE, RE, and aSAM obtained minimum values from the proposed PnP-NTF based frameworks, which show the efficiency of the proposed methods is superior compared with other state-of-the-art methods (shown in bold). Figure 3 shows the results of the proposed algorithm and the others algorithms under three indexes (RMSE, RE, and aSAM). For the different levels of noise in 'Scene1', the proposed methods yield the superior performance in all indexes. Also we can see from the histogram of Figures 4-6 that the proposed methods obtain the minimum RMSEs in all scenes. To evaluate the robustness of the proposed methods against model error, we generated 'Scene2' and 'Scene3' of size 64 × 64 × 224. As shown in Tables 2 and 3, the proposed methods obtained the best estimate of abundances in terms of RMSE, RE, and aSAM (shown in bold). We cannot provide the proof of the convergence of the proposed algorithm, but the experimental results show that it is convergent when plugged by BM3D and DnCNN (shown in Figures 7 and 8).

Comparison of Methods under Denoised Abundance Maps
We make a series of experiments to show the difference between the proposed methods and the conventional unmixing methods (FCLS, GDA, and Semi-NMF) with afterwards denoising the calculated abundance maps by BM4D. The results in Tables 4-6 show that the denoised abundance maps provided by FCLS, GDA, and Semi-NMF can obtain a better results than corresponding original abundance maps. However, for the proposed methods, we use directly a state-of-the-art denoiser as the regularization, which is to exploit the spatial correlation of abundance maps. The results show that using plug-and-play prior for the abundance maps and interaction abundance maps can enhance the accuracy of the estimated abundance results efficiently.

Experiments and Analysis on Real Dataset
In this section, we use two real hyperspectral datasets to validate the performance of the proposed methods. Due to the lack of the groundtruth of abundances in the real scenes, the RE in (21) and the aSAM in (22) were used to test the unmixing performance of the all methods. The convergence of the proposed methods on the two real hyperspectral datasets are shown in Figure 9.

San Diego Airport
The first real dataset is called 'San Diego Airport' image, which was captured by the AVIRIS over San Diego. The original image of size 400 × 400 includes 224 spectral channels in the wavelength range of 370 nm to 2510 nm. After removing bands affected by water vapor absorption, 189 band are kept. For our experiments, a subimage of size 160 (rows)×140 (columns) (shown in Figure 10a) is chosen as the test image. The selected scene mainly contains five endmembers, namely 'Roof', 'Grass', 'Rround and Road', 'Tree', and 'Other' [52]. The subimage we chosen has been studied in Reference [52]. VCA [46] method is used to estiamte the endmebers. Meanwhile, the FCLS is used to initialize the abundances in all methods. The ASC constraint in the semi-NMF was set to 0.1. Two state-of-the-art denoisers, embedded in the proposed PnP-NTF-based framework were tested. For the BM3D denoiser, the standard deviation of noise was hand-tuned. For the DnCNN denoiser, its parameter was set in a same way as Reference [44]. The penalty parameter µ was set to 1 × 10 −4 . The tolerance for stopping the iterations was set to 1 × 10 −6 for all algorithms. Table 7 shows the performance of different unmixing methods in terms of RE and aSAM in the San Diego Airport image. Our proposed method obtains the best results. Figure 11 shows the estimated abundance maps obtained by all methods. Focusing on the abundance maps of 'Ground and Road', we can see that the roof area is regarded as a mixture of 'Roof' and 'Ground and Road' in the unmixing results of FCLS, GDA, Semi-NMF and SULoRA methods. In fact, the the roof area only contains endmember 'Roof'. Unmixing results of the proposed PnP-NTF-DnCNN/BM3D are more reasonable.
Furthermore, Figure 12 shows the distribution of the RE on the San Diego Airport. The bright areas in Figure 12 indicate large errors in the reconstructed images. The error map shows that the FCLS performed worst, because the FCLS only can handle the linear information but ignore the nonlinear information in the image. Meanwhile, the semi-NMF performed better than GDA because the GDA is a pixel-based algorithm that does not take any spatial information into consideration. Our method, exploiting self-similarity of abundance maps, can perform better than other methods.

Washington DC Mall
The second real dataset is called 'Washington DC Mall' image, which was acquired by HYDICE sensor over Washington DC, USA. The original image of size 1208 × 307 includes 210 spectral bands. Its spatial resolution is 3 m. After removing bands corrupted by water vapor absorption, 191 band are kept. There are seven endmembers in the image: 'Roof', 'Grass', 'Road', 'Trail', 'Water', 'Shadow', and 'Tree' [52]. We chose a subimage with 256 × 256 pixels for the experiments, called sub-DC (shown in Figure 10b). The Hysime [53] was firstly used to estimate the number of endmembers, then the VCA was used to extract the spectral information of endmembers. The extracted endmembers were named 'Roof1', 'Roof2', 'Grass', 'Road', 'Tree', and 'Trail'.
The parameters in the comparison methods were manually tuned to obtain optimal performance. The parameter setting of our methods was same as that in the 'San Diego Airport' image. Table 8 shows the results of the proposed method and the state-of-the-art methods in the 'Washington DC Mall' image. The proposed methods obtained the best results in terms of RE and aSAM. Figures 13 and 14 show the estimated abundance maps and the error maps, respectively. In Figure 14, the proposed methods show much smaller errors in the reconstructed images.

Conclusions
In this paper, we propose a new hyperspectral nonlinear unmixing framework, which takes advantage of spatial correlation (i.e., self-similarity) of abundance maps through a plug-and-play technique. The self-similarity of abundance maps is imposed on our objective function, which is solved by ADMM embedded with a denoising method based regularization. We tested two state-of-the-art denoising methods (BM3D and DnCNN). In the experiments with simulated data and real data, the proposed methods can obtain more accurate estimation of abundances than state-of-the-art methods. Furthermore, we tested the proposed method in case of the number of endmembers with 5, and obtained better results compared to other methods. However, with the growing of the number of endmembers, the difficulty of unmixing will also increase, which is our future research direction.