Kernel-Based Nonlinear Spectral Unmixing with Dictionary Pruning

Abstract: Spectral unmixing extracts subpixel information by decomposing observed pixel spectra into a collection of constituent spectra signatures and their associated fractions. Considering the restriction of linear unmixing model, nonlinear unmixing algorithms find their applications in complex scenes. Kernel-based algorithms serve as important candidates for nonlinear unmixing as they do not require specific model assumption and have moderate computational complexity. In this paper we focus on the linear mixture and nonlinear fluctuation model. We propose a two-step kernel-based unmixing algorithm to address the case where a large spectral library is used as the candidate endmembers or the sparse mixture case. The sparsity-inducing regularization is introduced to perform the endmember selection and the candidate library is then pruned to provide more accurate results. Experimental results with synthetic and real data, particularly those laboratory-created labeled, show the effectiveness of the proposed algorithm compared with state-of-art methods.


Introduction
Hyperspectral images integrate the spectrum representing radiant attributes of an observed scene with homological images standing for spatial and geometric relations.Thanks to the high spectrum resolution and rich spectral information, hyperspectral imaging has been widely used in applications ranging from forest mapping and monitoring, land cover change detection, mineral exploration, object identifying, etc.
Hyperspectral image analysis plays a key role in revealing information from high-dimensional data.In the last decade, spectral unmixing has received considerable attention.It aims to decompose the observed pixel spectra into a collection of constituent spectra signatures (called endmembers) and estimate fractions associated with each component (called abundances) [1].Spectral unmixing is particularly useful to analyze pixels with low spatial or with photon interactions among materials.Unmixing techniques have been extensively studied and most of them perform the endmember extraction and abundance estimation either in an independent or simultaneous manner.Provided that the pure pixels are present in the observed scene, methods such as pixel purity index algorithm (PPI) [2], vertex component analysis (VCA) [3] and the N-FINDR algorithm [4] are proposed for extracting endmembers.Meanwhile some algorithms consider to generate virtual endmembers and abundances due to the absence of pure pixels, for example, the minimum volume simplex analysis (MVSA) [5], the minimum volume constrained nonnegative matrix factorization [6].To pursue interpretable and tractable solutions, most algorithms are based on the linear mixture model (LMM).The LMM assumes that an observed pixel is a combination of signature spectra weighted by the abundances and to be physically meaningful, the problem is usually subject to two constraints: the abundance nonnegativity constraint (ANC) and the abundance sum-to-one constraint (ASC).The linear unmixing algorithms can be classified as the least squares principle [7], statistical framework [8,9], sparse regression [10,11], and independence component analysis (ICA) based algorithms [9,12,13].
Since the LMM may not be appropriate in some practical situations where the light is scattered by multiple reflective or interacted materials [14], the nonlinear mixture model (NLMM) then provides an alternative to overcome the limitation of the LMM.For instance, a specific class of nonlinear models referred to as bilinear models has been studied in [15][16][17][18][19] for modeling the second-order reflectance by introducing extra additional bilinear terms to LMM.The postnonlinear mixture model (PNMM) has also been introduced in [20,21] that considers an appropriate nonlinear function mapping from [0, 1] L into [0, 1] L [18].In [22], based on nonlinear models, the authors proposed a parameter-free unmixing algorithm where abundance fractions and a set of nonlinear parameters are under the specific constraints which can be satisfied by minimizing the penalty function.The above algorithms are presented based on certain models, and thus lack the flexibility.The neural networks based methods can be regarded as model-free methods.In [23], the authors designed a multi-layer perceptron neutral network combined with a Hopfield neural network to deal with nonlinear mixture.In [24] the auto-associative neural network for abundance estimation was introduced for nonlinear unmixing purposes consisting of dimension reduction stage and the mapping stage from feature to the abundance percentages.Furthermore, in [25] the authors proposed an end-to-end unmixing method based on the convolutional neural network to improve the accuracy where the spatial information is taken into account.The kernel-based methods can also be regarded as the model-free methods.
The kernel-based methods serve as one of the most popular tools in addressing nonlinear learning problems.These methods map the data to a feature space of higher dimension where the mapped data can be represented with a linear model in this space [26].It is important to note that finding the explicit mapping is however bypassed via the kernel trick [27][28][29][30][31].In [32,33], the authors proposed a model consisting of a linear mixture and a nonlinear fluctuation.The nonlinear fluctuation function characterizes the high-order interactions between endmembers and is restricted within a reproducing kernel Hilbert space (RKHS).The so called K-Hype and SK-Hype algorithms are proposed therein.In contrast to several other classes of algorithms, the kernel-based methods is independent of a specific observation model and has the generalization ability in multiple scenes.Thus, sebsequent models have been further studied in the literature.An 1 -spatial regularization term is added to the K-Hype problem in [34] to promote the piece-wise spatial continuity of the abundance maps.In [35,36], the nonlinear fluctuation term is also called the residual term and the associated algorithm is called the residual component analysis.Post-nonlinear models as well as some robust unmixing models that consider such a fluctuation are presented in [37].In [38], the authors extended this method by accounting for band-dependent and neighboring nonlinear contributions using separable kernels.Further, kernel-based nonnegative matrix factorization (NMF) techniques are studied to simultaneously capture nonlinear dependence features and estimate the abundance.
The above kernel-based algorithms has its inherent limitations.On one hand, these algorithms focus on the abundance estimation, and do not consider the extraction of the endmembers.On the other hand, the structures of the K-hype type algorithms and its variations impose a regularization with unclear physical interpretation on the abundance vector (as elaborated in Section 2) and the fluctuation is independent of the abundance fractions.In this work, we propose a kernel-based sparse nonlinear spectral unmixing.The nonlinear unmixing algorithm is designed to run with a large number of candidate spectral signatures.A sparse regularization step and a dictionary pruning step are conducted in a sequential manner: the former select endmembers with significant contributions; and the latter then performs the abundance estimation with a more proper optimization problem using the pruned endmember dictionary.The contributions of this work are summarized as follows:

1.
A kernel based sparse nonlinear unmixing problem is formulated and a two-step solving strategy is proposed.This strategy allows to use a spectral library for selecting the endmembers and bypass the problem of endmember extraction in the nonlinear unmixing.

2.
A more reasonable formulation of the optimization problem is proposed for solving the linear mixture/nonlinear fluctuation model.This formulation improves the K-Hype formulation in several aspects and serves as a key component in the proposed sparse unmixing scheme.

3.
The algorithm is tested using real data with ground-truth created in our laboratory.Lack of publicly available data sets with ground truth imposes difficulties to compare unmixing algorithms.Most of the existing works rely on the use of numerically produced synthetic data and real data without ground truth.Using a labeled real data provides a more meaningful comparison results.
The remainder of this paper is organized as follows.Section 2 introduces the related work based on kernel methods.The proposed basic kernel-based algorithm is described in Section 3. Section 4 describes experimental results with simulated hyperspectral data and real hyperspectral data sets.Finally the conclusion is given in Section 5.

Kernel-Based Nonlinear Abundance Estimation
Notation.Scalars are denoted by italic letters.Vectors and matrices are denoted by boldface small and capital letters respectively.Specifically, if each pixel of the hyperspectral image consists of a reflectance vector in L contiguous spectral bands, then r = [r 1 , r 2 , . . ., r L ] ∈ R L is an observed pixel, M = [m 1 , m 2 , . . ., m L ] ∈ R L×R is the endmember matrix which is a spectral library consisting of spectral signatures m i with R numbers of endmembers, and m λ is the vector of R endmember signatures at the -th wavelength bands of M, α = [α 1 , x 2 , . . ., α R ] ∈ R R is the abundance vector.
In this section, we first review the general unmixing model and the kernel-based linear mixture/nonlinear fluctuation model proposed in [33].A general mixing mechanism can be formulated as where ψ is an unknown function that defines the interactions between the endmembers in matrix M parameterized by their associated abundance fractions α, and n is the modeling noise.Though general enough, this strategy may fail if the function ψ cannot be adequately and finitely parameterized.
A semi-parametric model proposed in [33] is described by: This model is composed by a linear mixture term and a nonlinear fluctuation term defined by ψ nlin .Several useful nonlinear mixing model can be considered as a specific case of (2).For instance, (2) reduces to a bilinear model if the second-order polynomial is used for ψ nlin .Models and algorithms of the residual component analysis follows the same principle [35][36][37].Assume that the endmember matrix M is known, and ψ nlin (m λ ) is a real-valued function of a reproducing kernel Hilbert space H, endowed with the reproducing kernel κ, i.e., Selecting a proper kernel is essential for describing the mixture.For instance, the second-order homogeneous polynomial kernel is able to represent the second-order interactions between endmembers; and the Gaussian kernel involves an infinity order of interactions since it can be expanded as the sum of polynomials of all orders.Then [33] proposes to evaluate the abundances and the nonlinear function by solving the following optimization problem: where µ is a positive parameter.The above problem minimizes the reconstruction error, as well as the regularity of ψ nlin and α is characterized by the squared 2 -norm with the ANC and ASC.This convex problem can be solved via its duality theory and leads to the so-called K-Hype algorithm.The -th element of this pixel is then reconstructed by where * denotes the optimal estimates.K-Hype has been shown efficient in addressing several nonlinear mixture scenarios.However, it has the following restrictions: • R1: While the 2 -regularization on ψ nlin controls it regularity, the 2 -regularization on α does not possess a clear interpretation, as there is no reason to consider that an abundance vector is having a small 2 -norm.• R2: Problem (6) aims to estimate the abundances with the known endmember matrix.Considering to use a large spectral library as candidates, specific variants for extracting active endmembers are needed.
Facts R1 and R2 motivate us to derive an improved model and the associated unmixing algorithms by modfiying the 2 -regularization and considering an endmember selection strategy.

Kernel-Based Nonlinear Unmixing Problem
In this section, we derive a new sparse kernel-based nonlinear unmixing algorithm with the data model (2).We shall first consider R1 by removing the 2 -regularization of the abundances and devise the associated algorithm.Thereafter, we consider R2 by proposing a sparse unmixing technique consisting of an 1 -regularization step and a dictionary prunning step.

Nonlinear Unmixing with the Regularity Constraint on ψ nlin .
Under the assumption that the endmember matrix M is known, we estimate the abundance vector α and to infer the nonlinear function ψ nlin ∈ H by solving the following functional optimization problem, that is, In contrast to (6), the 2 -norm regularization of α is discarded, then the optimal α * can not be explicitly expressed with the dual variables as in ( [33] Equation ( 18)).In this case, resorting to the semi-parametric Representation Theorem [39] leads to the following expression for the nonlinear function ψ nlin : By substituting ( 9) to ( 8), we get the following problem in a quadratic form with respect to parameters α and β: where K is a Gram matrix with (k, )-th entry being κ(m λ k , m λ ).Then the abundance vector can be evaluated by solving the problem (10) under the ASC and ANC via a quadratic programming solver.

Kernel-Based Sparse Nonlinear Unmixing with Dictionary Pruning
In this subsection, we consider a kernel-based model with pruning dictionary via a two-step method.It is possible to use a large spectral library as a dictionary, and then select a few elements as the endmembers that contribute to the mixture.This way also provides a solution to the simultaneous endmember extraction and abundance estimation for the nonlinear unmixing.

Problem Formulation
We first consider to add 1 -norm regularization to (8) and to promote the sparsity of estimated abundances.The optimization problem is then formulated as Note that the ASC is ignored since it is contradictory to the 1 -norm regularization.In order to solve (11), we introduce an auxiliary variable ζ and reformulate the problem to an equivalent form: where the I R + (•) is the indicator function such that I R + (•) is 0 if its argument belongs to the nonnegative orthant and +∞ otherwise.This new variable ζ allows us to decouple the non-smooth 1 -norm regularization functional from the constrained problem.As studied in [40], the split-Bregman iteration algorithm is an efficient method to deal with a broad class of 1 -regularized problems.By applying this framework to (12), the following formulation is obtained: and Because of the way that we have split the components of the cost function, we can now perform the above minimization efficiently by iteratively minimizing with respect to α and ψ nlin , and ζ separately.The two steps that we have to perform are as follows: 1.
Optimization with respect to α and ψ nlin : Discarding irrelevant variables, the optimization problem (13) reduces to min By introducing the Lagrange multipliers {β} L =1 , the Lagrange function associated with the problem (15) can be written as The conditions for optimality of L 1 with respect to the primal variables are given by where . By substituting ( 17) into (16), this results in a quadratic form with respect to the Lagrange multiplier and we get the following dual problem: This is a quadratic program and the updated value β (k+1) is readily obtained by Clearly, matrix K + µI + 1 ρ M M is positive-definite and invertible.

2.
Optimization with respect to ζ: Discarding irrelevant variables, the problem reduced to Its solution can be expressed by the well-known soft threshold function where S(•, •) denotes the component-wise application of the soft threshold function, given by S(x, y) = sign(x) max(|x| − y, 0), (22) and (•) + projects the argument to the nonnegative orthant by setting a negative value to 0.
These iterations are repeated until convergence.The stopping criteria as in [41] can be used where the primal and dual residuals must be smaller than some tolerance thresholds.We denote the optimal solution obtained from this step by α † , as this quantity will be used in the later part.

Dictionary Pruning
By solving the above problem, the abundance vector is obtained from a relatively large original dictionary.We then propose to refine the estimation and prune the large spectral library to a smaller one by discarding the endmembers with zero fractions.Therefore, in this second step, we reformulate the optimization problem using this pruned dictionary as follows: where supp(α † ) ∈ {0, 1} is a vector that denotes the support of the abundance estimated by the first step and denotes the Hadamard product.The Hadamard product between the endmember information and the support set discards the inactive ones defined in α † .Note that a large number of endmembers interact in ψ nlin with the model in Section 3.3, and only a small active part is involved after pruning the dictionary, which is reasonable for providing a better model.Problem ( 23) can be approached in a similar way as in Section 3.3.

Experiments
In this section, experiments are conducted to validate the proposed unmixing algorithms and compare them with several state-of-the-art methods.The algorithms are tested on synthetic data, real data captured created in our laboratory with ground-truth, and the well-known Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) Cuprite image over the Cuprite mining region in NV, USA.Note that testing unmixing algorithms using real data with ground-truth would produce more meaningful results.

Simulation Settings
Two synthetic datasets are generated for the experiments.In the first scene (denoted by S 1 ), a large spectral library is considered, since the sparse unmixing algorithm aims to select proper endmembmers from this library.A pruned-version endmember library [42] is used with 342 candidates whose reflectance values are measured for 224 spectral bands distributed uniformly in the interval 0.4−2.5 µm.A total of 1000 pixels are generated for each mixture setting.In each pixel, three endmembers in the library are randomly activated, with their associated abundance vectors being generated in the simplex defined by non-negative and sum-to-one constraints.
In the second scene (denoted by S 2 ), nine endmembers are selected from the above library to generate a 50-by-50 image with N = 2500 pixels, using the upper-left region of ( [11] Figure 2) for the associated abundances.These mixtures in this scene involved only nine endmembers, however, very few endmembers are activated in each pixel as shown in Figure 1.Both of these two scenes are generated with the general bilinear model (GBM) defined by: and the PNMM defined by r = (Mx) τ + n. (24) In these experiments, the parameter γ i,j is simply set to 1 and τ is set to 0.7, same as [33].Finally, all data are corrupted with an additive white Gaussian noise with two levels of SNR, namely, 30 dB and 15 dB respectively.

Comparative Simulations
Several algorithms are conducted in order to compare their unmixing performance.Their tuning parameters are set during preliminary experiments on independent data, via a simple search over the grids defined hereafter.

•
The FCLS algorithm [7]: This classical algorithm relies on the linear model and seeks the optimal solution of a least-square problem subject to the ASC and ANC.

•
The sparse unmixing by variable splitting and augmented Lagrangian (SUnSAL) [10]: This algorithms fits the observed (mixed) hyperspectral vectors with sparse linear mixtures of spectral signatures from a large dictionary available a priori.The parameter λ SUnSAL controls the trade-off between the fitting and sparsity.

•
The K-Hype algorithm [33]: This algorithm is based on the scalar RKHS model.The parameter µ K−Hype controls the trade-off between the fitting and the functional regularity.

•
The nonlinear neighbor and band dependent unmixing (NDU) [38]: This method extends the K-Hype by accounting for band-dependent and neighboring nonlinear contributions with vector-valued kernel function.
The separable (Sp.) structure is considered due to the high computational complexity.This algorithm is characterized by the trade-off parameter λ NDU , fitting parameter µ NDU and penalty parameter ρ NDU .

•
The improved incremental KNMF (IIKNMF) [31]: This method extends KNMF by introducing partition matrix theory and considering the relationships among dividing blocks.The incremental KNMF (IKNMF) is proposed to reduce the computing requirements for large-scale data and IIKNMF aims to further improve the abundance results.The parameter k controls the number of blocks.For each experiment of IIKNMF, 5 independent runs are carried out and the results are averaged.
The root mean square error between the true values α n and the estimates αn defined by is used to evaluate the performance of the abundance estimation.All the kernel-based methods are tested with a Gaussian (G) given by and a second order homogeneous polynomial (P) kernel given by respectively.The bandwidth σ of the Gaussian kernel in the algorithms K-Hype, NDU, IIKNMF and the proposed algorithm is varied within {1, 2, 3, ..., 10}.In these algorithms, the tuning parameters λ SunSAL , λ NDC and λ of the sparse regularization are tested in the range {1, 0.5, 0.1, 0.05, 0.01, 0.005, 0.001, 0.0005}, and parameters µ, µ K−Hype and µ NDU for the fitting term are tested in the range {1000, 500, 100, 20, 10, 5, 2, 1, 0.5, 0.2, 0.1, 0.05, 0.01, 0.005}.The penalty parameter ρ NDU in NDU and ρ in the proposed method are both set to 1.All the parameters used in the experiments of S 1 and S 2 are reported in Appendix Section (Tables A1 and A2 respectively).In the experiment with S 1 , the results of IIKNMF are not given, since its computational time is beyond acceptance with a very large endmember library.Besides selecting the Gaussian kernel parameter from the candidate values, we also use an automatic parameter setting strategy, where we set σ = max ,k d(m λ , m λ k ) , so that the kernel width covers the variability range of the input.Associated performance of this setting is denoted by "(auto)" in the results.

Sensitivity Analysis
We analyze the sensitivity of the proposed method with respect to the algorithm parameters λ and µ.This test is conducted with S 1 using both Gaussian and Polynomial kernels under SNR of 30 dB.The results are shown in Figures 2 and 3.As can be seen within a reasonable range around the optimal parameter values, the algorithm exhibits satisfactory RMSE.

Data Presentation
In order to perform quantitive evaluation of unmixing performance, we designed several experimental scenes in our laboratory with the GaiaField (Sichuan Dualix Spectral Image Technology CO. Ltd., GaiaField-V10) and GaiaSorter systems.Our GaiaField is a push-boom imaging spectrometer with a HSIA-OL50 lens, covering the visible and NIR wavelengths ranging from 400 nm to 1000 nm.GaiaSorter sets an environment that isolates external lights, and is endowed with a conveyer to move samples for the push-boom imaging.A dataset is created by imaging these scenes with the hyperspectral camera in our laboratory, providing mixtures with 256 wavelength bands.The experimental settings are strictly controlled so that pure material spectral signatures and material compositions are known.See [43,44] for experiment and data description in detail.
In this subsection, uniform mixture #1, mixture #7 and two non-uniform mixtures cases of Scene-II in [44] are used (In [44], to get the ground truth, the aligned high-resolution RGB images of S 5 and S 6 were captured, and then the percentage of each colored sand in a low-resolution hyperspectral pixel could be analyzed with the help of the associated RGB image).In this scene, quartz sand of different colors are mixed, and the diameter of the granules is controlled to be approximately 0.03 mm (see Figure 4).Seven endmembers (four different pure colored quartz sand and three other materials in the dataset) are included as candidate spectral signatures, shown in Figure 5.The true abundances are provided either by priorly known fractions (uniform mixtures) or analysis of aligned high-resolution images (mixtures with spatial patterns).One uniform mixture is formed by mixing 50% blue sand and 50% white sand so that the true abundance vector α = [0.5, 0.5, 0, 0, 0, 0, 0] (denoted by S 3 ).The other uniform mixture is formed by mixing the blue, red and green sand so that the true abundance vector α = [1/3, 0, 1/3, 1/3, 0, 0, 0] (denoted by S 4 ).S 5 and S 6 respectively denote the two mixtures with spatial patterns formed by mixing two and three types of sand.As this scene mimics the intimate mixture model, the Gaussian kernel is used for this high-order nonlinear case.

Results
The obtained RMSEs are reported in Table 3, with the algorithm parameters listed in Table A3.We observe that the proposed algorithm achieves the lowest RMSEs, and the proposed algorithm with automatically set bandwidth also leads to satisfactory performance.For data S 5 and S 6 , abundance maps estimated by the compared algorithms are shown in Figures 6 and 7. We see that most algorithms recovered the spatial pattern of the abundance maps.However, the K-Hype algorithm produces a low contrast map for the first component in S 5 .The proposed algorithm results in sharper abundance maps with lowest RMSEs.Experiments with these labeled real data show the benefit of the proposed algorithm.

Experiment with AVIRIS Cuprite Data
This section illustrates the performance of the proposed algorithms and other algorithms when applied to real remote sensing hyperspectral data.The scene that is used for our experiment is the well-known image captured on the Cuprite mining district (NV, USA) by AVIRIS.A sub-image of 250 × 191 pixels is chosen to evaluate the algorithms.This area of interest has 188 spectral bands.This scene is denoted by S 7 .The same twelve spectra signatures as in [33] are used for the endmember matrix, as shown in Figure 8. Results from most existing works illustrate that only few materials are active in each pixel, thus sparse unmixing is potentially beneficial for analyzing this scene.Note that this real data is extensively used in the literature, however no ground-truth information is available for an objective performance evaluation.The quality of these algorithms can be defined by the Reconstruction Error (RE) which is defined as In general, RE can be used to provide a quantitative information of the unmixing performance.However, as pointed in [33], without the ground-truth information on the abundances for the real data, the reconstruction quality measured by RE is not necessarily proportional to the quality of the abundance estimation, and therefore it can only be considered as complementary information.In this experiment, the selected parameters are given in Table A4 and the RE comparison by the linear and nonlinear models is reported in Table 4. Figure 9 illustrates the maps of the reconstruction error.These tables and figures clearly indicate that nonlinear unmixing algorithms lead to significantly lower REs compared to linear algorithms, especially at several particular spatial regions.The proposed algorithm exhibits a comparable RE compared to K-Hype with Gaussian kernel, recalling that it is common that a regularization term increases the data fitting cost.Figure 10 shows the abundance maps of selected materials estimated by these algorithms.The proposed algorithm provides a sharper map with several particular locations emphasized.

Conclusions
In this work a new nonlinear sparse unmixing method is proposed.Under the assumption that the model consists of a linear mixture and a nonlinear fluctuation defined in an RHKS, the unmixing is conducted by the kernel-based method with a sparsity promoting term and the dictionary pruning technique.Experiments with both synthetic and real data were conducted to illustrate the performance of the proposed algorithm and compared algorithms.The proposed method outperformed the competing ones in most cases.Particularly, quantitive abundance estimation errors were reported with laboratory created real data, and highlighted the advantage of the proposed algorithm.Although the NDU performed slightly better in one synthetic data setting, it had higher computation complexity.In future work the automatic kernel parameter learning can be considered to increase the flexibility of the algorithm.

Figure 1 .
Figure 1.Ground-truth abundance maps associated to nine endmembers used in S 2 .

Figure 2 .Figure 3 .
Figure 2. RMSE as a function of the regularization parameters for the proposed method with S 1 using Gaussian kernel.

Figure 4 .Figure 5 .
Figure 4. Laboratory created data for unmixing performance evaluation (RGB images).Subfigures (a-d): pure quartz sand of 0.03 mm of four colors, and they serve as pure materials for providing endmembers.Subfigures (e,f): uniform mixtures of sand of two (e) and three colors (f).Subfigures (g,h): mixtures with spatial patterns of sand of two (g) and three colors (h).Square regions of 60-by-60 pixels in the center of each subfigures are clipped out and used in experiments.

Figure 8 .
Figure 8. AVIRIS Cuprite data (denoted by S 7 ) used in the experiment.(a) The false color image illustrated with bands {30, 40, 80} of the sub-image of the AVIRIS Cuprite data set.(b) Twelve endmembers used for unmixing.

Table 4 .
RE comparison of S 7 .