A Multiscale Hierarchical Model for Sparse Hyperspectral Unmixing

Due to the complex background and low spatial resolution of the hyperspectral sensor, observed ground reflectance is often mixed at the pixel level. Hyperspectral unmixing (HU) is a hot-issue in the remote sensing area because it can decompose the observed mixed pixel reflectance. Traditional sparse hyperspectral unmixing often leads to an ill-posed inverse problem, which can be circumvented by spatial regularization approaches. However, their adoption has come at the expense of a massive increase in computational cost. In this paper, a novel multiscale hierarchical model for a method of sparse hyperspectral unmixing is proposed. The paper decomposes HU into two domain problems, one is in an approximation scale representation based on resampling the method’s domain, and the other is in the original domain. The use of multiscale spatial resampling methods for HU leads to an effective strategy that deals with spectral variability and computational cost. Furthermore, the hierarchical strategy with abundant sparsity representation in each layer aims to obtain the global optimal solution. Both simulations and real hyperspectral data experiments show that the proposed method outperforms previous methods in endmember extraction and abundance fraction estimation, and promotes piecewise homogeneity in the estimated abundance without compromising sharp discontinuities among neighboring pixels. Additionally, compared with total variation regularization, the proposed method reduces the computational time effectively.


Introduction
Hyperspectral images possess abundant spectral information, which makes target detection and classification become feasible [1,2].However, due to a low spatial resolution of hyperspectral sensor and the complex background, amounts of mixed pixels exist in the image and that makes it impossible to determine the material directly from the pixel level.In order to ensure the materials, called endmembers, in a scene, hyperspectral unmixing (HU) is being researched to solve the issue.Through the HU technique, we can identify the distinct endmember signatures that are present in a scene and its corresponding abundance fractions in each pixel.Thanks to the HU, this makes it possible for precision classification and target detection at the sub-pixel level for risk prevention and response [3][4][5].
Research work has been devoted to HU.Among it, non-negative matrix factorization (NMF) has been shown to be a useful unsupervised decomposition for hyperspectral unmixing [6].The learned non-negative basis vectors that are used are distributed, yet they are still sparse combinations that generate expressiveness in the signal reconstructions [7].Generally, NMF is attractive because it usually provides a part-based representation of the data, making the decomposition matrices more interpretable [8].There are many possibilities to define the cost function and procedures for performing its alternating minimization, which leads to several improved NMF algorithms.The most popular algorithms for NMF belong to the class of multiplicative Lee-Seung algorithms, which have relatively low complexity but are characterized by slow convergence and risk becoming stuck in local minima [9].To improve the performance of the NMF based hyperspectral unmixing method, further constraints were imposed on NMF [10][11][12][13][14]. Miao and Qi proposed a minimum volume constrained non-negative matrix factorization [15].Sparsity constraints have gained much attention since most of the pixels are mixtures of only a few endmembers in the scene, which implies that the abundance matrix is a large degree of sparsity.Furthermore, regularization methods are usually utilized to define the sparsity constraint on the abundance matrix.Along these lines, L 1/2 regularization is introduced into NMF so as to enforce the sparsity of the endmember abundance matrix [8].However, the performance of many existing NMF algorithms may be quite poor, especially under the condition where the unknown nonnegative components are badly scaled (ill-conditioned data), insufficiently sparse, and a number of observations are equal or only slightly greater than a number of latent (hidden) components [16].Therefore, a hierarchical strategy or multilayer NMF (MLNMF) is proposed to improve the performance of existing NMF, which is fully confirmed by extensive simulations with diverse types of data to blind source separation [16,17].
It should be noted that the phenomenon of spectral variability, which can be caused by variable illumination and environmental, atmospheric, material aging, object contamination, and other conditions, cannot be neglected [18][19][20][21][22].It is shown that the same material spectra may be varied in the different area or the different materials may have similar spectra.Likewise, as the spatial resolution of imagery increases [23], it reduces the mixed pixel number in the image while increasing the spectral variability.Therefore, if we only extract the spectra from one of the areas as the endmember for unmixing, then the abundance map may not be accurate.To circumvent these issues, spatial regularization approaches, such as Total Variation Regularization (SUnSAL-TV), deserve special attention since they promote solutions that are spatially piecewise and homogeneous without compromising sharp discontinuities between neighboring pixels [24,25].However, this adoption has come at the expense of a massive increase in computational cost and it needs the complete endmember matrix prior knowledge.On the other hand, it is worth considering that the hyperspectral image data acquired by different imagery sensors is various.Likewise, different optimal spatial resolutions exist for different object characteristics, suggesting for the need for a multiscale spatial approach for detection and analysis.In the literature [26], it has been proved that the spectral variability of land cover was reduced in coarser resolution images when compared to finer resolutions.Thus, it is essential to find a multiscale HU method to improve the unmixing accuracy regardless of the spatial resolution.
In this paper, a multiscale spatial regularization hierarchical model is considered to tackle the sparse unmixing problem.It breaks the unmixing problem into two domain problems.One is in an approximation domain that considers the spatial contextual information and the other is in the original domain that considers the detail information.Unlike the Total Variation Regularization that requires considering explicitly the dependences between pairs of pixels, the proposed multiscale regularization process compares the abundance matrix directly, which results in a computationally efficient procedure.
The research is presented in five sections.Section 2 is the theoretical background and the proposed method.Sections 3 and 4 report the experimental results and discussion and give suggestions for future research, respectively.Finally, Section 5 concludes the paper.

Linear Mixture Model
When using the linear mixture model, the following three assumptions apply: (1) the spectral signals are linearly contributed by a finite number of endmembers within each IFOV weighted by their covering percentage (abundance); (2) the land covers are homogeneous surfaces and spatially segregated without multiple scattering; (3) the electromagnetic energy of neighboring pixels do not affect each other [27,28].Under the linear mixture model assumption, we have: where M = [m 1 , . . ., m p ] ∈ R B×p is the matrix of endmembers (m i denotes the ith endmember signature and p is the number of endmember, B is the band number), and S ∈ R p×N is the abundance matrix ([S] i,j denotes the fraction of the ith endmember of the pixel j and N denotes the total number of pixels), and ε denotes a source of additive noise.The abundance matrix needs every element within it, should be non-negative, and sum to one in each column.

Multiscale Hierarchical Unmixing Methods
The proposed multiscale hierarchical model for sparse hyperspectral unmixing method (MHS-HU) consists of two steps.First, we transform the original image domain (D) to an approximation (coarse) domain (C) using the multiscale method.Then we conduct the hierarchical sparsity unmixing (HSU) [22] method to obtain the endmember and abundance matrix to regularize the original unmixing problem.Different from the former method, the weighting matrix in this paper is set to the unit matrix.Superior to the HSU method, this procedure avoids the man-made endmember spectral variability library selection and estimation.On the other hand, it should be noted that the literature [24] also made the HU procedure in the coarse domain.However, compared with the literature [24] method, this paper uses a L 1/2 norm to impose further sparsity, rather than using the L 1 norm.In addition, we use multilayer sparsity to achieve better estimates of endmembers and abundance matrix results.
HU in the coarse domain can reduce the spectral variability effect.Next, we apply a conjugate transformation to convert the abundance matrix obtained in the coarse domain before going back to the original image domain.The converted abundance matrix is then used to regularize the unmixing process to promote the spatial dependency between neighboring pixels.Then, with the abundance matrix constraints, we conduct the unmixing procedure again.Finally, we obtain the endmember and abundance matrix.
Consider a linear operator W ∈ R N×K , K < N that implements a spatial transformation of the original domain.Then, where Y C ∈ R B×K is the coarse domain of the original image Y. K denotes the total number of pixels in the coarse domain.The W matrix is upscaling matrix which reduces the image dimension and computational cost.Then the hierarchical sparsity unmixing procedure can be re-casted into the coarse domain.It can be computed as follows: The value of the parameter λ depends on the sparsity of the material abundance and it is computed based on the sparseness criteria, shown as follows where α and τ are some constants to regularize the sparsity constraints, and t denotes the iteration number in the process of optimization [16].Then we obtain the estimated abundance matrix ŜC ∈ R p×K in the coarse domain.We define a conjugate transform W * , which converts the images from the coarse domain back to the original domain: where ŜD ∈ R p×N is the smooth approximation of the abundance in the original domain, which can be used to regularize the original unmixing problem.In this way, it is possible to introduce a spatial correlation into the abundance map solutions by separately controlling the regularization strength in each of the coarse and original domain.Compared with the traditional total variation regularization [25] that requires to consider explicitly the dependences between all pairs of pixels, the proposed method considering the dependences between the two abundance matrices can reduce time consuming computational cost.Following this idea, we define the multiscale hierarchical model for sparsity unmixing method as follows: where β is a regularization parameter.Since the S D obtained in the coarse domain is invariant in this equation, the additional item is still convex.Note that compared with the hierarchical sparsity unmixing method in the coarse domain, the update rule for the endmember matrix stays invariant, while the update rule for the abundance matrix must be different.Now we address the update rule for the abundance matrix S. To make our elaboration clearer, the objective function is separable in the columns of S. Likewise, the corresponding row of Y is denoted y.The column-wise objective function becomes

Multiscale Methods
Large volumes of high-resolution hyperspectral image data offer challenges that are more time-consuming, complex, and computationally intensive than a single scene analysis to the user.In addition, in high resolution hyperspectral image data, the spectral variability phenomenon is also obvious, which restricts the unmixing accuracy.However, there are amounts of mixed pixels in the low spatial hyperspectral image.Balancing the effect of the mixed pixel and spectral variability, using the multiscale unmixing method to interpret each pixel can no doubt improve the classification accuracy [29].That is reasonable because the multiscale strategy considering the surrounding pixels reduces the spectral variability effect.Additionally, as a rule of thumb, each object has its suitable scale to be detected [26].Using high resolution hyperspectral image data makes it possible to provide multiscale analysis.
In multiscale methods, nearest-neighbor (NN), bilinear interpolation (BIL) and cubic convolution (CC) are commonly available in a commercial image processing software package [30].In NN, the DN of the pixel closest to the location of the original input pixel is assigned to the ones at the output pixel location.In BIL, the DN is assigned to the output pixel by interpolating DNs in two orthogonal directions within the input image.Essentially, it also can be computed based on the weighted distances to the points.In CC, the weighted DN values of 16 pixels surrounding the location of the pixel in the input image are used to determine the value of the output pixel.
However, as illustrated in the literature [30,31], the nearest neighbor, bilinear interpolation, and cubic convolution resampling methods are not suitable for resampling remotely sensed data.And the results showed that both Gaussian and aggregation resampling methods were found to produce similar radiance values [26].In the literature [32], the study showed that compared with other aggregation resampling methods, a point-centerd distance-weighted moving window (PDW) is the best option to be used in, for example, the studies on ecological resource management where consistency of the class proportions at coarser resolutions is required.Therefore, for convenience, no weighting PDW is used in this study.

The Optimization Problem
To guarantee the convergence of the update rule, we define an auxiliary function G(s, s t ) satisfying the condition G(s, s) = D(s) and G(s, s t ) ≥ D(s) such that D(s) is a non-increasing function when updated using the following equation This is guaranteed by The Taylor expansion of D(s) can be written as We define the function G as where the diagonal matrix K(s t ) is Then Set ∇ s G s, s t = 0, we get To improve performance in solving the Equations ( 3) and ( 6), especially for ill-conditioned or badly scaled data, and reducing the risk of getting stuck in the local minima of a cost function, we develop a hierarchical procedure in which we perform a sequential decomposition of non-negative matrices for hyperspectral unmixing.The mathematical representation of the hierarchical structure is formulated as where the l denotes the layer number.Thus, the endmember and abundance matrix in each layer, M l and S l , can be achieved by a modification of the multiplicative update rules as follows.
Furthermore, regarding the sum-to-one constraint on the abundance fraction in the model ( 1), the data matrix and the endmember matrix are augmented by a row of constants defined by Remote Sens. 2019, 11, 500 6 of 16 where δ controls the impact of additivity constraint over the endmember abundance fractions, and N and p denote the whole pixel number and endmember number, respectively.The larger the δ, the closer the sum over the columns of the abundance matrix are to unity.In each layer iteration, these two matrices are taken as the input of the update rule in Equations ( 16) and ( 17) as an alternative to Y and M.
Figure 1 shows the flowchart of the proposed method.The abundance and endmember matrix in blue dotted line are the outputs of the coarse domain.And the matrices in the red dotted line are the final outputs of the method.
Remote Sens. 2019, 11, x FOR PEER REVIEW 6 of 16 Figure 1 shows the flowchart of the proposed method.The abundance and endmember matrix in blue dotted line are the outputs of the coarse domain.And the matrices in the red dotted line are the final outputs of the method.

Experimental Data
In this section, we introduce two sets of experimental data, simulated data, and real image data.The simulated data makes it possible for presetting the endmember set and abundance fraction map as prior knowledge that can be used as a measurement for the approach performance.The procedure described in the literature [15] has been used to create the simulated data.In this paper, the simulated data endmember spectra are derived from the USGS digital library.The data set is a 224 spectral band image with n=2000 pixels consisting of five endmembers.The endmembers are extracted from the USGS library randomly.The abundance maps are piecewise smooth images.In addition, the variety of spectral features and different signal-to-noise ratios have been employed to generate the simulated images.To ensure no pure pixel exists, all the abundance fractions larger than 0.8 are discarded.To make the scenes more realistic and reasonable, white Gaussian noise is added and SNR is set to 30 dB [3].
Apart from the simulated data, we also conduct a real data experiment.There are two real data cubes.The first real hyperspectral image data cube, RC1, we choose the data obtained by groundbased hyperspectral imaging instrumentation.The experiment site is located in Wuyuan (29.35°N; 118.09°E),Jiangxi province, China.The images are captured by a Pika XC2 imaging of Resonon, which is a push broom imager designed for the acquisition of visible and near-infrared hyperspectral images ranging from 400 nm to 1000 nm with 1.3 nm spectral resolution in September 2017 [22].The image was calibrated using a white reference panel and converted from radiance to reflectance.The original image has 462 channels.Note that channels 1-13 and 385-462 were blurred due to the sensor noise and atmospheric water vapor absorption.As a result, we only used channels 13-384 with the wavelength ranging from 400 nm to 900 nm.Therefore, the size of the test image is 400 × 400 × 372.Rice, grass, and corn are in the scene, shown in Figure 2. The right curve plot displays an example of the spectral variability of the weed.Each spectrum is collected in a different area.

Experimental Data
In this section, we introduce two sets of experimental data, simulated data, and real image data.The simulated data makes it possible for presetting the endmember set and abundance fraction map as prior knowledge that can be used as a measurement for the approach performance.The procedure described in the literature [15] has been used to create the simulated data.In this paper, the simulated data endmember spectra are derived from the USGS digital library.The data set is a 224 spectral band image with n = 2000 pixels consisting of five endmembers.The endmembers are extracted from the USGS library randomly.The abundance maps are piecewise smooth images.In addition, the variety of spectral features and different signal-to-noise ratios have been employed to generate the simulated images.To ensure no pure pixel exists, all the abundance fractions larger than 0.8 are discarded.To make the scenes more realistic and reasonable, white Gaussian noise is added and SNR is set to 30 dB [3].
Apart from the simulated data, we also conduct a real data experiment.There are two real data cubes.The first real hyperspectral image data cube, RC1, we choose the data obtained by ground-based hyperspectral imaging instrumentation.The experiment site is located in Wuyuan (29.35 • N; 118.09 • E), Jiangxi province, China.The images are captured by a Pika XC2 imaging of Resonon, which is a push broom imager designed for the acquisition of visible and near-infrared hyperspectral images ranging from 400 nm to 1000 nm with 1.3 nm spectral resolution in September 2017 [22].The image was calibrated using a white reference panel and converted from radiance to reflectance.The original image has 462 channels.Note that channels 1-13 and 385-462 were blurred due to the sensor noise and atmospheric water vapor absorption.As a result, we only used channels 13-384 with the wavelength ranging from 400 nm to 900 nm.Therefore, the size of the test image is 400 × 400 × 372.Rice, grass, and corn are in the scene, shown in Figure 2. The right curve plot displays an example of the spectral variability of the weed.Each spectrum is collected in a different area.The second real data cubes, RC2, is the classic hyperspectral data, Urban, collected by HYDICE hyperspectral imagery.The size of the image is 307 × 307 with a spatial resolution of 2 m.It contains 210 spectral channels with the spectral resolution of 10 nm in the 400 nm and 2500 nm range.The imaging area is located at Copperas Cove near Fort Hood, TX, U.S..After removing low SNR and water-vapor absorption bands, a total number of 162 bands remained.The five materials prominently present in RC2 are asphalt, grass, roof, shadow and tree.The false color of the RC2 is displayed in Figure 3.For the evaluation of the proposed method, spectral angle distance (SAD), abundance angle distance (AAD), root mean square error (RMSE) and abundances mean squared error (MSEa) are used in this paper.It should be noted that the endmembers of the synthetic data are extracted from USGS serving as the reference endmembers.For the real image data, the reference endmembers are extracted from the image.Each endmember class is extracted from the pure pixel area.Then an average of the observations in each endmember class is computed, serving as reference endmember spectra.
SAD is defined as: where mi denotes the ith endmember spectra, < mi , mj> denotes the inner product of the two spectra, || || • denotes the vector magnitude.It is used to compare the similarity of the original pure endmember signatures and the estimated ones.The second real data cubes, RC2, is the classic hyperspectral data, Urban, collected by HYDICE hyperspectral imagery.The size of the image is 307 × 307 with a spatial resolution of 2 m.It contains 210 spectral channels with the spectral resolution of 10 nm in the 400 nm and 2500 nm range.The imaging area is located at Copperas Cove near Fort Hood, TX, U.S.After removing low SNR and water-vapor absorption bands, a total number of 162 bands remained.The five materials prominently present in RC2 are asphalt, grass, roof, shadow and tree.The false color of the RC2 is displayed in Figure 3.The second real data cubes, RC2, is the classic hyperspectral data, Urban, collected by HYDICE hyperspectral imagery.The size of the image is 307 × 307 with a spatial resolution of 2 m.It contains 210 spectral channels with the spectral resolution of 10 nm in the 400 nm and 2500 nm range.The imaging area is located at Copperas Cove near Fort Hood, TX, U.S..After removing low SNR and water-vapor absorption bands, a total number of 162 bands remained.The five materials prominently present in RC2 are asphalt, grass, roof, shadow and tree.The false color of the RC2 is displayed in Figure 3.For the evaluation of the proposed method, spectral angle distance (SAD), abundance angle distance (AAD), root mean square error (RMSE) and abundances mean squared error (MSEa) are used in this paper.It should be noted that the endmembers of the synthetic data are extracted from USGS serving as the reference endmembers.For the real image data, the reference endmembers are extracted from the image.Each endmember class is extracted from the pure pixel area.Then an average of the observations in each endmember class is computed, serving as reference endmember spectra.
SAD is defined as: where mi denotes the ith endmember spectra, < mi , mj> denotes the inner product of the two spectra, || || • denotes the vector magnitude.It is used to compare the similarity of the original pure For the evaluation of the proposed method, spectral angle distance (SAD), abundance angle distance (AAD), root mean square error (RMSE) and abundances mean squared error (MSE a ) are used in this paper.It should be noted that the endmembers of the synthetic data are extracted from USGS serving as the reference endmembers.For the real image data, the reference endmembers are extracted from the image.Each endmember class is extracted from the pure pixel area.Then an average of the observations in each endmember class is computed, serving as reference endmember spectra.
SAD is defined as: where m i denotes the ith endmember spectra, <m i , m j > denotes the inner product of the two spectra, ||•|| denotes the vector magnitude.It is used to compare the similarity of the original pure endmember signatures and the estimated ones.AAD is used on the condition that the abundance fractions are known as prior knowledge.It measures the similarity between original abundance fractions (s i ) and the estimated ones ( ŝi ), which is formulated in the equation below.
To obtain an overall measure accuracy, root mean square of these measures are defined as where p denotes the number of endmember and N denotes the whole number of pixels.RMSE is denoted by where N denotes the total number of pixels, B denotes the total number of spectral bands, and ε i,j = Y i,j − (MS) i.j denotes the error of the ith row and the jth column between the original and simulated image data.It is used to evaluate the reconstruction estimates' error.Since the model errors are likely to have a normal distribution rather than a uniform distribution, the RMSE is a good metric to present model performance [33].
MSE a is denoted by where N denotes the total number of pixels, p denotes the endmember number, and M denotes the original endmember matrix, M denotes the estimated endmember matrix.It is used to evaluate the abundance estimates' error.MSE a is used on the condition that the truth endmember is known as prior knowledge.

Simulated experiments
In the simulated experiment, the parameters of the algorithm are set as follows: α = 0.1, τ = 25, L = 4, β = 0.2, the window scale is five and the maximum iteration time is 1000.In this experiment, SNR is set to 25 dB. Figure 4 shows the proposed method, MLNMF, L 1/2 -NMF, VCA-FCLS and SUnSAL-TV [25,34,35] method unmixing results, respectively.The selection of these algorithms comes naturally since the proposed methods, L 1/2 -NMF and SUnSAL-TV, share similar sparse regression formulations [8,24], and MLNMF [17] also considers the multilayer strategy to improve the existing NMF performance.VCA-FCLS is the traditional method and is also used as the initial unmixing method, which results in the inputs of the proposed method.It should be noted that the SUnSAL-TV is only an abundance estimation method that needs the endmember matrix input as prior knowledge.Therefore, we input the endmember extracted by the VCA to the SUnSAL-TV.The input parameters used in SUnSAL-TV are λ = 0.01 and λ TV = 0.001.The first column demonstrates the original simulated abundance fraction, and the second column to the sixth ones are corresponding to the abundance fraction map estimated by different methods, respectively.The abundance fraction map is denoted using a color bar, which the blue color shows the least fraction near to zero and the red color shows Remote Sens. 2019, 11, 500 9 of 16 the highest fraction near to one.The variation of color bar is consistent with the natural spectral wavelength change.From the figure, it is obvious that the proposed method can obtain a higher precision in abundance estimation than the traditional single layer L 1/2 -NMF algorithm.Table 1 shows the true and reconstructed abundance maps for the different algorithms.The compared methods are L 1/2 -NMF, MLNMF, VCA-FCLS and SUnSAL-TV.As expected, models accounting for hierarchical or multilayer strategy tend to yield better reconstruction quality than L 1/2 -NMF.The proposed method yields the smallest abundance reconstruction error, followed by MLNMF.The discrepancies between the MSE a and RMSE results among the methods that address spectral variability indicate that there is no clear relationship between these two variables.This is due to the fact that the MSE A pays more attention to the abundance estimates, while RMSE focuses on the reconstruction error of the image data.Additionally, the proposed method considers the spatial regularization as the additional constraints, therefore it tends to yield more stable unmixing results compared with the MLNMF.In terms of computational cost, we compared the proposed method with the SUnSAL-TV.The computational complexity of the algorithms is evaluated through the average execution time of 20 runs, which is displayed in Table 2.The results show that the proposed method performed significantly better than the SUnSAL-TV, with execution times 7 to 12 times smaller than the SUnSAL-TV.That is due to the fact that the proposed method tends to compare the dependences between the two domain abundance matricies rather than all pairs of pixels that reduce the computational cost.
Remote Sens. 2019, 11, x FOR PEER REVIEW 9 of 16 algorithm.Table 1 shows the true and reconstructed abundance maps for the different algorithms.
The compared methods are L1/2-NMF, MLNMF, VCA-FCLS and SUnSAL-TV.As expected, models accounting for hierarchical or multilayer strategy tend to yield better reconstruction quality than L1/2-NMF.The proposed method yields the smallest abundance reconstruction error, followed by MLNMF.The discrepancies between the MSEa and RMSE results among the methods that address spectral variability indicate that there is no clear relationship between these two variables.This is due to the fact that the MSEA pays more attention to the abundance estimates, while RMSE focuses on the reconstruction error of the image data.Additionally, the proposed method considers the spatial regularization as the additional constraints, therefore it tends to yield more stable unmixing results compared with the MLNMF.In terms of computational cost, we compared the proposed method with the SUnSAL-TV.The computational complexity of the algorithms is evaluated through the average execution time of 20 runs, which is displayed in Table 2.The results show that the proposed method performed significantly better than the SUnSAL-TV, with execution times 7 to 12 times smaller than the SUnSAL-TV.That is due to the fact that the proposed method tends to compare the dependences between the two domain abundance matricies rather than all pairs of pixels that reduce the computational cost.

MHS-HU
SUnSAL-TV simulated data 10.62s 88.93sIn addition, to verify how the multiscale window affects the proposed method in hyperspectral unmixing accuracy, we experiment with various window scale and the coefficient beta under the  In addition, to verify how the multiscale window affects the proposed method in hyperspectral unmixing accuracy, we experiment with various window scale and the coefficient beta under the same fixed initial condition.Both the statistics are based on the average value of thirty times running results.Table 3 shows the various window scale comparison results.It is observed that the performance does not vary significantly from its optimal values unless the window becomes too large.This is reasonable that when the window becomes too large, it may produce a higher mixture pixel that increases the difficulty of the inverse solution.Furthermore, Table 4 illustrates the weighting coefficient of the multiscale effect on the final unmixing accuracy.As the results show, β = 0.2 tends to obtain better results.Therefore, we have illustrated the advantages of our proposed method on simulated data against L 1/2 -NMF, MLNMF, SUnSAL-TV and VCA-FCLS.The experimental results consistently show that MHS-HU exhibits better performance in hyperspectral unmixing.This is particularly true in the presence of ill-conditioned or projected gradient algorithms.

Real Data Experiments
In the first real data, RC1, experiment, the endmember and layer number are set as 3 and 4, respectively.The iteration in each layer is set to 1000.Furthermore, we set α = 0.1 , β = 0.2, τ = 10 and the multiscale as 5.
Figure 5 shows the unmixing results of the proposed method and the compared methods.The compared methods are MLNMF, VCA-FCLS, SUnSAL-TV and L 1/2 -NMF.It is worth mentioning that both the spatial and spectral resolution of the image are so tiny that they capture the shadow details among the rice and corn.Furthermore, except for the spectral variability of crops especially caused by the shadow effect, the high similarity of corn and weed also increases the difficulty in vegetable abundance estimation.That is the reason why traditional unmixing methods such as VCA-FCLS and L 1/2 -NMF cannot produce the expected results in the vegetable abundance estimation, especially in rice abundance estimation.However, in this case, the proposed method shows significantly better performance estimating the rice abundance map when compared with the other algorithms.This indicates that the proposed method effectively exploits the spatial properties of the abundance maps by using multiscale strategy, resulting in more spatially consistent estimates with fewer outliers resulting from the influence of measurement noise.In addition, a hierarchical sparsity strategy is also used to search the optimal global solution.Thus we can obtain a better result than the traditional one, especially in distinguishing corn and rice.In the corn abundance map, all the methods made the wrong estimation and some weeds showed a high fraction.That is because the corn and the weed spectra are too similar in the visible waveband to distinguish them.Nevertheless, the proposed method can still estimate the corn abundance completely compared with the others.Table 5 illustrates the unmixing results of different methods.As mentioned before, the proposed method also outperforms other methods.
Table 6 illustrates the multiscale effect on the unmixing results.As expected, as the window scale varies, the accuracy increases to some degree.But when the window scale is too large, the accuracy tends to decrease.In this experiment, it should be noted that when the scale window is 1, it cannot be regarded as the original domain, such as in MLNMF.On the contrary, it means we conduct two continuous hierarchical sparsity hyperspectral unmixing procedures on the original domain, but with the first abundance result as the spatial regulation constraint of the second one.This tends to yield more stable results.In the second real data experiment, the layer number is 4 and the maximum iteration time in each layer is 1000.Furthermore, we set α=0.1, β = 0.2 and τ=10.The endmember number is set as 5 and the multiscale is 5. Since the true abundance maps are unavailable for the images, we make a qualitative assessment of the recovers abundance maps based on knowledge of materials present in a prominent fashion in those scenes.The major endmember abundance maps for the Urban data set  Table 6 illustrates the multiscale effect on the unmixing results.As expected, as the window scale varies, the accuracy increases to some degree.But when the window scale is too large, the accuracy tends to decrease.In this experiment, it should be noted that when the scale window is 1, it cannot be regarded as the original domain, such as in MLNMF.On the contrary, it means we conduct two continuous hierarchical sparsity hyperspectral unmixing procedures on the original domain, but with the first abundance result as the spatial regulation constraint of the second one.This tends to yield more stable results.In the second real data experiment, the layer number is 4 and the maximum iteration time in each layer is 1000.Furthermore, we set α = 0.1, β = 0.2 and τ = 10.The endmember number is set as 5 and the multiscale is 5. Since the true abundance maps are unavailable for the images, we make a qualitative assessment of the recovers abundance maps based on knowledge of materials present in a prominent fashion in those scenes.The major endmember abundance maps for the Urban data set are depicted in Figure 6.The compared methods are geometric-based VCA with FCLS, MLNMF and L 1/2 -NMF.As can be seen, the proposed method yields the best results for the overall abundances of all materials.Especially the roof abundance map, it corresponds with ground truth well.Not only the fraction of the biggest roof is near to 1, shown in red color, but also the other small size roof can also be estimated accurately.Also, in the tree abundance map, the proportion of the proposed method outperforms the other unmixing methods.Regarding the abundance map, Tables 7 and 8 show the SAD and RMSE of different methods.The SAD results varied among the algorithms, with no method performing uniformly better than the others.Table 7 indicates that the proposed method and the MLNMF results are very close and significantly smaller than those of the other methods, which agrees with their better abundance estimation results.This falls in line with the fact that compared with the single layer L 1/2 -NMF, the hierarchical or multilayer NMF with abundance sparsity representation in each layer aims to obtain the global optimal solution [17].On the other hand, since the grass and the tree have highly similar diagnostic spectral features, the grass and the tree cannot be endmember sets simultaneously in the geometric simplex [36].This is why the traditional geometric based VCA-FCLS method cannot distinguish them effectively.Therefore, the proposed method and the MLNMF outperform the traditional methods.It should be noted that compared with MLNMF, the proposed method does not show the same advantage of using the multiscale strategy to improve the unmixing accuracy as the RC1 data experimental results.That is due to the fact that the Urban dataset is one of the prevalent authentication data for unmixing, which means every pixel in the dataset is well calibrated.The data is collected from the airborne imagery with nadir, observing that the shadow effect within the endmember class is much smaller than the RC1.In addition, the area is homogeneous.Thus, compared with the RC1 data cube, the spectral variability phenomenon is not obvious in this dataset.Table 9 shows the results versus various window scales.As mentioned before, the spatial resolution of RC2 data is 2 m and each endmember class is homogeneous with a large area and less spectral variability.The unmixing accuracy increases with the window scale to some degree.When the window scale becomes too large, the unmixing result deviates its optimal value.This is expected since the large multiscale size hinders the capability of transformation matrix W to capture coarse scale information [24].Therefore, the best results are obtained with a proper window scale.To summarize, the results mentioned above confirm the satisfactory performance of the proposed method in terms of unmixing quality. in the dataset is well calibrated.The data is collected from the airborne imagery with nadir, observing that the shadow effect within the endmember class is much smaller than the RC1.In addition, the area is homogeneous.Thus, compared with the RC1 data cube, the spectral variability phenomenon is not obvious in this dataset.Table 9 shows the results versus various window scales.As mentioned before, the spatial resolution of RC2 data is 2m and each endmember class is homogeneous with a large area and less spectral variability.The unmixing accuracy increases with the window scale to some degree.When the window scale becomes too large, the unmixing result deviates its optimal value.This is expected since the large multiscale size hinders the capability of transformation matrix W to capture coarse scale information [24].Therefore, the best results are obtained with a proper window scale.To summarize, the results mentioned above confirm the satisfactory performance of the proposed method in terms of unmixing quality.

Discussion
Previous work has documented the effectiveness of various unmixing algorithms, especially in synthetic data.It should be noted that synthetic data is generated in a fully controlled environment, and the accuracy of the unmixing results can be effectively validated.However, it is not easy to realistically simulate the data close to the real data, which consists of various spectral variability.That is the reason why the unmixing results based on simulated data are often inconsistent with the ones based on real data.Notably, when it comes to verifying the endmember extraction accuracy, we tend to extract the spectra directly from the image by averaging the selected sample.Therefore, regarding the idea, this study decomposes the HU into the coarse and original domain to reduce the spectral variability effect.
As a rule of thumb, coarser spatial resolutions can result in a loss of spatial and spectral information.Amounts of studies have examined the impact of spatial resolution on the mapping of vegetation [26].The results have indicated that a pixel size of 6 m or less would be optimal for studying the functional properties of southern California grassland [37].However, considering the current conditions for the satellite hyperspectral imagery, it is hard to obtain a finer spatial resolution than 6 m.In this case, to improve the mapping accuracy, the unmixing procedure is in need.Since the demonstrated viability of upscaling approaches suggests that current ground instrumentation is adequate for satellite mission validation needs, it is also possible that new ground measurement technologies could significantly expand the spatial support of observations derived from ground-based instrumentation.Therefore, by resampling the ground-based or airborne images to coarser spatial resolutions and applying unmixing modeling algorithms, the impacts of spatial resolution on fraction estimation can be simulated.In addition, high resolution reference images can also be used for the validation of global land cover datasets.The accuracy assessment of the land surface parameters estimation algorithm is difficult to achieve due to the difficulties in obtaining actual fraction vegetation cover from the field surveys [38].For this purpose, it is necessary to aggregate the high resolution reference map to the corresponding coarse resolution global land cover dataset [32].Additionally, it is more persuadable to verify the unmixing result accuracy and saves much time or cost for the situ survey.
However, it should be noted that each material has its own fit resolution to be detected.Therefore, when we use the multiscale unmixing method, the multiscale window cannot be too large.Analogously, the literature [24] used superpixel algorithms for multiscale transformation, and it also mentioned that the performance might deviate, which is optimal value if the superpixel size became too large.This is expected since very large multiscale sizes may contain semantically different pixels, hindering the capability of the transformation in matrix W to adequately capture coarse scale information.Furthermore, the unmixing results for super resolution mapping (SRM) can also be used in a wide range of applications.Providing a map at the sub-pixel level is more persuadable than the simply fractional images.As illustrated in the literature [39], in this study both the unmixing method and the SRM were focused on developing the boundary of the river at the sub-pixel.On the other hand, compared with the SUnSAL-TV method using the endmember matrix as input, the proposed method can obtain the endmember and abundance matrix simultaneously without prior knowledge.In addition, the proposed method tends to compare the two domain abundance matrices rather than all the pixels that reduce the computational time.
Rather, it is impossible for the HU method results to maintain consistency under the various spatial resolution conditions.In this case, the users only pay more attention to which kind of HU method outperforms the others regardless of the spatial resolution or which spatial resolution is fit for the specific application.Therefore, it is feasible that the HU method using a multiscale unmixing strategy can consider the different spatial resolutions and enhance the stability of the results with spatial regularization.

Conclusions
This paper proposes an unsupervised multiscale hierarchical sparsity unmixing method to improve the accuracy of hyperspectral unmixing.It decomposes large scale spatially regularized sparse unmixing into two domain problems in different image domains, one capturing coarse domains and another representing fine scale details.Considering the spatial contextual information with an abundance matrix regularization in a coarse domain, the proposed method leads to a simple and efficient strategy to deal with spectral variability, especially caused by shadow between neighboring pixels.The unmixing procedure can then be solved at a reasonable computational cost.Additionally, it uses a hierarchical strategy to decompose matrices into a multilayer.In each layer, we force an abundance sparsity representation to improve the performance in hyperspectral unmixing.The proposed method has been applied to simulated and real datasets.Compared with MLNMF, SUnSAL-TV, L 1/2 -NMF and other approaches, the proposed method produces promising results in endmember extraction and abundance fraction estimation.Furthermore, it reduces the computational time effectively compared with SUnSAL-TV.It also considerably improves the unmixing performance especially if a problem is ill-conditioned.In addition, the projected gradient algorithms are used to reduce the risk of getting stuck in local minima.Furthermore, the proposed method can be readily applied to other conditions in which nonnegative sparse matrix factorization is a valuable computational tool.Additionally, it is worth mentioning that it may be more appropriate to focus future sensor development towards collecting data at the finest spatial resolution, and develop algorithms focused on upscaling routines, rather than collecting a host of different spatial resolution data with high expense cost.Thus, it is essential to find an appropriate upscaling technique to represent the hyperspectral data at different scales and a stable multiscale HU method in further research.

Figure 1 .
Figure 1.Flowchart of the proposed method.

Figure 1 .
Figure 1.Flowchart of the proposed method.

Figure 2 .
Figure 2. Illustration of the study agricultural site near Wuyuan, Jiangxi Province, China.The site is dominated by the rice, corn, and weed.

Figure 3 .
Figure 3. False color of the Urban HYDICE data.

Figure 2 .
Figure 2. Illustration of the study agricultural site near Wuyuan, Jiangxi Province, China.The site is dominated by the rice, corn, and weed.

Figure 2 .
Figure 2. Illustration of the study agricultural site near Wuyuan, Jiangxi Province, China.The site is dominated by the rice, corn, and weed.

Figure 3 .
Figure 3. False color of the Urban HYDICE data.

Figure 3 .
Figure 3. False color of the Urban HYDICE data.

Figure 5 .
Figure 5. Abundance maps estimated by the different methods.

Figure 5 .
Figure 5. Abundance maps estimated by the different methods.

Figure 6 .
Figure 6.Fractional abundance maps estimated for the URBAN images.

Table 1 .
Comparison of MSEa and RMSE on simulated data.

Table 2 .
Execution time of the unmixing algorithms on experimental data.

Table 1 .
Comparison of MSE a and RMSE on simulated data.

Table 2 .
Execution time of the unmixing algorithms on experimental data.

Table 3 .
Comparison of MSE a , RMSE and rmsSAD accuracy with various scale.

Table 4 .
Comparison of MSE a , RMSE and rmsSAD accuracy with various β.

Table 5 .
Comparison of rsmSAD and RMSE on real data.

Table 6 .
Comparison of rmsSAD and RMSE on data with various scale.

Table 5 .
Comparison of rsmSAD and RMSE on real data.

Table 6 .
Comparison of rmsSAD and RMSE on data with various scale.

Table 9 .
Comparison of rmsSAD and RMSE on Urban data with various scales.