A Spectral Unmixing Method by Maximum Margin Criterion and Derivative Weights to Address Spectral Variability in Hyperspectral Imagery

Limited to the low spatial resolution of the hyperspectral imaging sensor, mixed pixels are inevitable in hyperspectral images. Therefore, to obtain the endmembers and corresponding fractions in mixed pixels, hyperspectral unmixing becomes a hot spot in the field of remote sensing. Endmember spectral variability (ESV), which is common in hyperspectral images, affects spectral unmixing accuracy. This paper proposes a spectral unmixing method based on maximum margin criterion and derivative weights (MDWSU) to reduce the effect of ESV on spectral unmixing. Firstly, in the MDWSU model, an effective and fast algorithm is employed for establishing the endmember spectral library. Then a spectral weighting matrix based on the maximum margin criterion is constructed based on the endmember spectral library. Besides, derivative analysis and local neighborhood weights are merged into local neighborhood derivative weights, which act as a regularization term to penalize different abundance vectors. Local neighborhood derivative weights and spectral weighting matrix are proved to reduce the effect of ESV. Real hyperspectral data experiments show that the MDWSU model can obtain more accurate endmembers and abundance estimation. In addition, the experimental results, including the spectral angle distance and the root mean square error, prove the superiority of the MDWSU model over the previous methods.


Introduction
Hyperspectral imaging is the multi-dimensional information acquisition technology combining imaging technology and spectroscopy technology.It simultaneously detects two-dimensional geometric spatial information and one-dimensional spectral information of the scene, and acquires continuous, narrow-band image data with high spectral resolution.Limited to the low spatial resolution of the hyperspectral imaging sensors, in spite of utilizing super-resolution techniques [1] the area corresponding to a single pixel in the hyperspectral image usually covers disparate substances.The spectral information of the pixel is actually a mixture of spectra of disparate substances, and such pixels are called mixed pixels.If one pixel contains only one substance, the pixel is a pure pixel.Mixed pixels arise for one of two reasons: first, some substance may become the major substance in the mixed pixels when the spatial resolution of the image is lower; second, mixed pixels appear when distinct materials are combined into a homogeneous mixture [2].Spectral unmixing, which decomposes a mixed pixel into a collection of constituent spectra, or endmembers, and their corresponding fractional abundances [2,3], can obtain sub-pixel level information and improve the accuracy of substance recognition.Therefore, spectral unmixing makes a lot of sense and has become a research hotspot in recent years.
Spectral unmixing involves two main aspects, namely endmember extraction and abundance estimation.In most traditional endmember extraction algorithms, it is assumed that the spectral characteristics of different endmembers have differences between categories and spatiotemporal stability, that is, different endmembers have absolutely different spectral characteristic curves and spectral curves of the same endmembers are not affected by external factors.However, affected by factors such as sensors, the atmosphere, and the surrounding environment, endmember spectral variability (ESV) is an inevitable phenomenon in hyperspectral images [4][5][6][7].ESV will greatly affect the accuracy of spectral unmixing [4,7].In this paper, we focus on reducing the effect of endmember spectral variability on spectral unmixing.Changes in the distribution of material properties, differences in illumination conditions, changes in atmospheric conditions, and changes in natural factors such as time and space can cause spectral variability of endmembers [5,8].The change in illumination radiation is caused by varying degrees of shadow and bright areas due to changes in the surface of the material.The strong absorption and scattering properties of the atmosphere affect the measurement of the spectrum.Due to changes in the intrinsic parameters of the material (e.g., the concentration of suspended solids in water), the inherent change in the material also changes the reflectivity of the material [6,9].Endmember spectral variability includes two categories, namely intra-class variability and inter-class variability [5,10].Spectral variation within one endmember class is called intra-class variability (e.g., the spectra of intra-tree can vary from tree to tree within a given tree species) and spectral variation between endmember classes (e.g., crops and weeds) is called inter-class variability.Especially, the accuracy of abundance estimation linearly decreases in the presence of intra-class variability.On one hand, the higher the intra-class variability, the higher the probability that the actual spectrum of the endmember in the single pixel deviates from the fixed endmember.On the other hand, the high similarity between the endmembers leads to a high correlation between the endmember spectra, which causes a decrease in the accuracy of the abundance estimation [5].Therefore, reducing the effect of ESV is an effective approach to improve the spectral unmixing accuracy.
To overcome the effect of endmember spectral variability, different algorithms have been proposed.Various algorithms focus on narrowing intra-class variability and expanding inter-class variability to a certain extent to improve the abundance estimation precision.Somers et al. classify existing algorithms for coping with ESV into five categories: iterative mixture analysis, spectral feature selection, spectral weighting bands, spectral transformation, and spectral models [5].(1) To accomplish the iterative mixture analysis algorithms, an endmember combination candidate library consisting of all possible endmember combinations is needed to define firstly.Then, for each mixed pixel, the optimal endmember combination suitable for the pixel is selected from the endmember combination candidate library by the loop iteration method.Multiple endmember spectral mixture analysis (MESMA) [11] is one of the most widely used algorithms [12,13].In the MEMSA algorithm, each endmember class is represented by a set of intra-class endmember spectra, all endmember combinations variable in category and quantity for each pixel are exhausted, and the optimal endmember combination with the minimum reconstruction error between the pixel reconstruction value and the real pixel value as the criterion is selected to unmix the corresponding pixel.Although such an algorithm can effectively cope with variability, it has the disadvantage of frequent preselection of endmember sets.In addition, the premise of the MESMA algorithm is the availability of a spectral library containing all representative instances of the whole endmembers in the hyperspectral scene.Such spectral libraries are commonly modeled by filed measurements.In an operational setting, these libraries of field measurements may not be available, thus limitations in the operational applicability of MESMA are introduced [14].Therefore, an automatic extraction of image-based endmember bundles and a new extended linear mixture model are proposed to address ESV [14,15].(2) Another existing algorithm includes spectral feature selection [16].A specific index is used as the evaluation index, the index values of each band are sorted, and the spectral feature combinations that minimize the intra-class variability and maximize inter-class variability are selected.Commonly used spectral feature selection algorithms include mutual information (MI) [17], instability index (ISI) [18], and so on.Although such an algorithm can effectively avoid the excessive calculation, some information will be lost during the spectral feature selection process.(3) Another existing algorithm includes spectral weighting bands.On the basis of the equal weight of each band participation in spectral unmixing, the algorithm gives greater weight to the bands that have less correlation with intra-class differences and greater correlation with inter-class differences.Such algorithms reduce the effect of endmember spectral variability and perform better than traditional unweighted algorithms [19].(4) Spectral transformation is used to convert the original hyperspectral data into new spectral features with smaller intra-class differences and larger inter-class differences for spectral unmixing.Common spectral transformation methods include principle component analysis (PCA) [20], wavelet transform [21], derivative spectral unmixing [22], and normalized spectral mixture analysis [23].The spectral transformation process reduces data redundancy.(5) Based on the theory of various radioactive transfer models, the spectral modeling algorithms construct the corresponding spectral library according to the specific environment and climate of the research area.But the accurate construction of the spectral model requires a certain prior knowledge.Analyzing and comparing all of the above methods, a method combining spectral transformation and spectral weighting bands is proposed to reduce the influence of ESV, thereby improving spectral unmixing precision.
A spectral unmixing method based on maximum margin criterion and derivative weights (MDWSU) is proposed in the paper to address ESV in hyperspectral imagery.To cope with endmember spectral variability, the endmember spectral library firstly needs to be acquired.Zou et al. select typical endmembers from the image by visual interpretation method or perform field acquisition experiments to obtain endmember spectra [24].However, the above-mentioned methods are labor intensive and the established endmember spectral library is not suitable for all scenarios.Jin et al. [25] select a certain number of pure pixels as the training samples of each endmember through the purity pixel index (PPI) [26] algorithm, but the PPI algorithm randomly generates the projection vector, which cannot guarantee the accuracy of obtaining the endmember.Inspired by the literature [25], we focus on building endmember spectral library from images automatically.This paper tends to explore an effective method to acquire endmember spectral libraries adapted for different scenarios.On the basis of the endmember spectral library, a weighting matrix assigned to different band weights is constructed.The maximum margin criterion (MMC) [27] will be introduced into the construction of the spectral weighting matrix.MMC is applied to find the linear transform that maximizes inter-class differences, while minimizing intra-class differences.Meanwhile, derivative weights in local neighborhood are designed to improve spectral unmixing accuracy.Different from the weights designed in weighted nonnegative matrix factorization [28], the paper uses the first derivative of the spectrum to construct weights.Utilization of the available spectral information is a challenge in hyperspectral remote sensing analysis.The available spectral information involves the spectral magnitude and spectral shape.Due to the changes in the atmosphere and topography, the spectral magnitude of the same endmember is more affected than the spectral shape of the same endmember.The computation of derivative spectra is sensitive to the spectral shape rather than the magnitude of spectra and can precisely model changes in the shape of original spectra [22].At last, the proposed method is experimented on synthetic and real hyperspectral data to verify its validity.
The rest of the paper is organized as follows.The second part presents the proposed theoretical methodology.The third part presents the experimental data acquisition process.The fourth part shows the experimental results and analysis.The fifth part provides discussion about the proposed method, experimental results, and suggestions for the future research.Finally, the sixth part summarizes the paper.

Theoretical Methodology
Linear mixture model or nonlinear mixture model are common models when modeling mixed pixels.Generally speaking, the linear mixture model is regarded as mixtures for which the pixel components are homogeneous surfaces in spatially segregated patterns.On the contrary, the nonlinear mixture model takes the intimate association or interaction with more than one component into account [24].Most algorithms prove that the linear mixture model is an effective model which can meet the accuracy requirements.The linear model conforms to the formation mechanism of hyperspectral image.In this paper, the proposed algorithm model will be introduced based on the linear mixture model.Figure 1 shows the schematic overview of the proposed method.components are homogeneous surfaces in spatially segregated patterns.On the contrary, the nonlinear mixture model takes the intimate association or interaction with more than one component into account [24].Most algorithms prove that the linear mixture model is an effective model which can meet the accuracy requirements.The linear model conforms to the formation mechanism of hyperspectral image.In this paper, the proposed algorithm model will be introduced based on the linear mixture model.Figure 1 shows the schematic overview of the proposed method.

Linear Mixture Model
The linear mixture model is established under the following three assumptions.Firstly, spectral signal of the pixel is a linear combination of spectra of a finite number of endmembers with their corresponding abundance fractions.Secondly, the land covers are regarded as homogeneous surfaces.Thirdly, the radiation reflection of neighboring pixels within each instantaneous field of view (IFOV) of the imaging sensor will not interact [29][30][31].
For the l-band hyperspectral image containing n pixels with m endmembers, the linear mixture model is expressed in Equation (1).
denotes the coverage percentage of the area occupied by the i th endmember in the j th pixel.
Under the guidance of the linear mixture model assumption,  is non-negative and the sum of each column of  is unity. can be expressed in Equation (2).

Modeling Endmember Spectral Library
In order to avoid the subjective misjudgment of the visual interpretation method and the human burden caused by the field experiment, the paper will extract the optimal endmember spectral library from the hyperspectral image.Based on the endmember spectral library, a spectral weighting matrix is constructed.
All of the pixels in the hyperspectral image can form a simplex with the endmembers as the vertices.Vertex component analysis (VCA) algorithm uses the orthogonal projection method to find the vertices of simplex as endmembers [32].Several vertices of the simplex can be expanded into the subspace, and the maximum point of projection length of the simplex in the orthogonal complement space of the subspace must be the vertices of the simplex.Therefore, at each cycle, the VCA algorithm finds a projection matrix that is orthogonal to the known endmember set, and projects all the pixels onto the matrix, and records the pixel with the largest projection length as the new endmember, then starts the next iteration until all the endmembers are found.Assuming that k endmembers marked

Linear Mixture Model
The linear mixture model is established under the following three assumptions.Firstly, spectral signal of the pixel is a linear combination of spectra of a finite number of endmembers with their corresponding abundance fractions.Secondly, the land covers are regarded as homogeneous surfaces.Thirdly, the radiation reflection of neighboring pixels within each instantaneous field of view (IFOV) of the imaging sensor will not interact [29][30][31].
For the l-band hyperspectral image containing n pixels with m endmembers, the linear mixture model is expressed in Equation (1).
where X = [x 1 , . . ., x n ] ∈ R l×n represents the hyperspectral image ( x i represents the i th pixel regarded as l-band spectral vector, l denotes the number of the total bands, n denotes the number of the total pixels); G = g 1 , . . ., g m ∈ R l×m represents the endmember matrix (g i represents the i th endmember signature); A = [a 1 , . . ., a n ] ∈ R m×n denotes abundance fractions; ε denotes model error.
A ij denotes the coverage percentage of the area occupied by the i th endmember in the j th pixel.Under the guidance of the linear mixture model assumption, A ij is non-negative and the sum of each column of A is unity.A ij can be expressed in Equation (2).

Modeling Endmember Spectral Library
In order to avoid the subjective misjudgment of the visual interpretation method and the human burden caused by the field experiment, the paper will extract the optimal endmember spectral library from the hyperspectral image.Based on the endmember spectral library, a spectral weighting matrix is constructed.
All of the pixels in the hyperspectral image can form a simplex with the endmembers as the vertices.Vertex component analysis (VCA) algorithm uses the orthogonal projection method to find the vertices of simplex as endmembers [32].Several vertices of the simplex can be expanded into the subspace, and the maximum point of projection length of the simplex in the orthogonal complement space of the subspace must be the vertices of the simplex.Therefore, at each cycle, the VCA algorithm finds a projection matrix that is orthogonal to the known endmember set, and projects all the pixels onto the matrix, and records the pixel with the largest projection length as the new endmember, then starts the next iteration until all the endmembers are found.Assuming that k endmembers marked as have been extracted, the k endmembers can be expanded into the k-dimensional subspace G k .
The base matrix of the orthogonal complement space G ⊥ k of G k is as follows: where I k represents the k-dimensional unit matrix, (.) T represents transpose of the matrix, G k represents g 1 , . . ., g k .The projection matrix ω ⊥ k of the VCA algorithm is shown in Equation (4).
The VCA algorithm projects the pixel x i according to the Equation ( 5).The projection results in Equation ( 5) form a new dataset {x i } n i=1 , in which the largest value can be regarded as a new endmember.
The above analysis is the basic principle of VCA extraction endmember algorithm.According to the above analysis, pure pixels located at the vertices of the simplex are treated as endmembers.Using a simplex to approximate the hyperspectral data, the pixels near the vertices of the simplex are of high purity.Pixels with high purity are viewed as endmember training samples.Inspired by the principle of VCA extraction endmember algorithm, the orthogonal projection method can be applied to find the pixels near or at the vertices of the simplex as the candidates of the endmember spectral library.In the IEA algorithm, Neville et al. select the average of several pure pixels as the optimal endmember spectrum, which can improve the spectral unmixing accuracy and reduce the noise and error [33].
Under the guidance of the above analysis, the algorithm of modeling endmember spectral library in the paper is described below.
Step (1) Determine that the class number of endmembers is m and the number of spectra contained in each endmember is t.
Step (3) For each endmember k = 1, . . ., m, the algorithm constructs the new endmember set that removes the k th endmember g K .The new endmember set is marked as G k = g 1 , . . ., g k−1 , g k+1 , . . ., g m .According to the Equations (3)-( 5), the algorithm generates the projection matrix ω ⊥ k orthogonal to the endmembers set G k , projects all the pixels of the hyperspectral image onto the matrix, and records the pixels of the first t maximum projection length as the spectrum library C k of the k th endmember.The average of the t pixels is regarded as the improved spectrum g k of the k th endmember.The k th endmember g k is replaced with g k in the initial endmember matrix G.
Step (4) Repeat step (3) until the spectral library and the improved spectra of all the endmembers are found.
Then the above-mentioned algorithm is applied to the Samson hyperspectral dataset in Figure 2 to construct endmember spectrum library.The hyperspectral dataset in Figure 2 consists of 95 × 95 pixels.The Samson hyperspectral dataset covering spectral wavelength between 400 nm and 900 nm with spectral resolution of 3.2 nm can be available online [34].WeoGeo provides this hyperspectral dataset owned by Oregon State University.The hyperspectral dataset was obtained by the SAMSON instrument which is a push-broom, visible to near IR, hyperspectral imaging sensor.Figure 2 shows the position of endmember candidates of endmember spectral library.In Figure 2, red circles represent tree, green diamonds represent water, and blue squares represent rock.

Spectral Weighting Matrix
The most traditional algorithms assign equal weight to different bands which are assumed to have uniform effects.However, it may not be necessarily true since every band is not necessarily equally significant.The experimental results have already shown the weighting spectral unmixing methods outperform the traditional unweighted unmixing methods [35].Therefore, spectral weighting matrix K defined in the Equation ( 6) is introduced into the proposed method.It is obvious that K is a positive-definite and symmetric matrix.When K is chosen to be the identity matrix, it means to assign the equal weight to different bands.
where K is the diagonal matrix of the intra-class scatter matrix,  ∈  are  ×  training sample spectral vectors given by the algorithm of modeling endmember spectral library,  denotes the training sample vectors of the endmember spectral library of the k th class, and  is the intraclass furthest neighbor of the sample  ∈  . is defined in the Equation (7).

Spectral Weighting Matrix
The most traditional algorithms assign equal weight to different bands which are assumed to have uniform effects.However, it may not be necessarily true since every band is not necessarily equally significant.The experimental results have already shown the weighting spectral unmixing methods outperform the traditional unweighted unmixing methods [35].Therefore, spectral weighting matrix K defined in the Equation ( 6) is introduced into the proposed method.It is obvious that K is a positive-definite and symmetric matrix.When K is chosen to be the identity matrix, it means to assign the equal weight to different bands.
where K is the diagonal matrix of the intra-class scatter matrix, x ∈ {C k } m K=1 are m × t training sample spectral vectors given by the algorithm of modeling endmember spectral library, C k denotes the training sample vectors of the endmember spectral library of the k th class, and x I k is the intra-class furthest neighbor of the sample x ∈ C k .x I k is defined in the Equation (7).
Inspired by maximum margin criterion (MMC), spectral weighting matrix K in the proposed method tries to make the samples in the same class as compact as possible, which is conductive to reduce the spectral variability of the same endmember.Figure 3 provides an explanation of MMC.In Figure 3, the red circle assigned to be center point has six neighbors.The red circle and the black circles belong to the same class.The red circle and the yellow rectangle and triangle do not belong to the same class.After MMC, the samples in the same class become more compact.Inspired by maximum margin criterion (MMC), spectral weighting matrix K in the proposed method tries to make the samples in the same class as compact as possible, which is conductive to reduce the spectral variability of the same endmember.Figure 3 provides an explanation of MMC.In Figure 3, the red circle assigned to be center point has six neighbors.The red circle and the black circles belong to the same class.The red circle and the yellow rectangle and triangle do not belong to the same class.After MMC, the samples in the same class become more compact.In general, the model in Equation ( 1) can be interpreted as least squares error (LSE) problem [35].
The model in ( 1) is assumed to different bands have uniform effects on LSE.Then we introduce the weighting matrix K into Equation ( 8) so that LSE can be turned into a weighted LSE approach, which can be described in the Equation (9).
A weighted LSE approach assign different weights to different bands, so that endmember matrix  can be transformed into  and hyperspectral image  can be transformed into  .In general, the model in Equation ( 1) can be interpreted as least squares error (LSE) problem [35].
The model in ( 1) is assumed to different bands have uniform effects on LSE.Then we introduce the weighting matrix K into Equation ( 8) so that LSE can be turned into a weighted LSE approach, which can be described in the Equation (9).
A weighted LSE approach assign different weights to different bands, so that endmember matrix G can be transformed into = G and hyperspectral image X can be transformed into When the spectrum of the mixed pixel is transformed by K −1/2 , the spectral variability of the endmember is reduced, thereby improving the spectral unmixing accuracy.

Local Neighborhood Derivative Weights
Derivative analysis of spectra has been used in hyperspectral remote sensing [22,36].Derivative analysis can help deal with the overlapping spectral features and enhance spectral contrast, thereby improving the target identification accuracy.Derivatives of spectra are relatively insensitive to variations in illumination intensity caused by changes in sun angle, cloud cover, or topography [37].Thus, first-order derivative spectra reduce the intra-class difference and increase the inter-class differences.First-and second-order derivative spectra have been applied to improve the estimation of sediment in water [38].
Spectra extracted from hyperspectral images consist of reflection values corresponding to different bands (λ), which can be expressed in discrete forms: where x is the spectrum and x(λ i ) is the reflectance value at wavelength λ i .
The first-order derivative spectrum can be calculated by the following Equation ( 12).
Since the enhancement of noise caused by differentiation is largely related to high-frequency noise [22], the Savitzky-Golay smoothing filter [39] is used to remove the high-frequency noise before calculating the first-order derivative spectrum.
Derivative analysis and local neighborhood weights are merged into local neighborhood derivative weights in the proposed method.First-order derivative spectrum is brought into modeling local neighborhood weights.The local neighborhood derivative weights act as regularization terms to penalize different abundance vectors.The first law of geography is expressed as "Everything is related to everything else, but near things are more related than distant things" [40].Inspired by the first law of geography, the hyperspectral image is divided into multiple local windows.Figure 4 displays the diagram of the local window.It is assumed that the hyperspectral image with r×s pixels (r represents the length of the image; s represents the width of the image) is arranged on r×s circles.The spatial coordinate of the top left corner circle is (0,0), and the spatial coordinate of the bottom right corner grid is (r,s).In Figure 4, a black circle and red circles constitute a local window where the black circle represents central pixel and red circles represent neighboring pixels.Suppose that pixel x i , represented by the black circle, is set as the central pixel of the local window in Figure 4.If the pixel x j , represented by the red circle, is the neighboring pixel of pixel x i , we define j ∈ N(i).If the pixel x j , represented by the colorless circle, is not neighboring pixel of pixel x i , we define j N(i).
The spatial coordinate of the top left corner circle is (0,0), and the spatial coordinate of the bottom right corner grid is (r,s).In Figure 4, a black circle and red circles constitute a local window where the black circle represents central pixel and red circles represent neighboring pixels.Suppose that pixel  , represented by the black circle, is set as the central pixel of the local window in Figure 4.If the pixel  , represented by the red circle, is the neighboring pixel of pixel  , we define  ∈ ().If the pixel  , represented by the colorless circle, is not neighboring pixel of pixel  , we define  ∉ ().Local neighborhood derivative weights are expressed in Equation ( 13).The local neighborhood derivative weights can measure the similarity of the neighboring pixels to the center pixel.The form of local neighborhood derivative weights approximates the heat kernel [41], but the difference is that local neighborhood derivative weights use the derivative of the original spectrum to calculate the weight.
where σ is a scaling parameter of the local neighborhood derivative weight.x i ,x j are the i th and j th original spectrum vectors of the hyperspectral image X. x i , x j are the first-order derivative spectrum of x i ,x j .When the pixel x j locates inside the local window, its degree of similarity to the center pixel x i can be calculated using local neighborhood derivative weights.When the pixel x j locates outside of the local window, its degree of similarity to the center pixel does not need to be calculated.This is in line with the expression of the first law of geography.The σ value is expressed as follows: where h denotes the number of neighboring pixels in the local window.
Noted that the spectra of rock extracted from hyperspectral image vary in Figure 5.One of the rock spectra is selected as reference spectrum.Reference original spectrum and other original spectra are used to compute local neighborhood weights according to Equation (13).The same arithmetic operation works on reference derivative spectrum and other derivative spectra.The results are shown in Figure 5c.The conclusion that local neighborhood derivative weights are insensitive to ESV can be obtained by comparing with results of original spectra and derivative spectra.
Noted that the spectra of rock extracted from hyperspectral image vary in Figure 5.One of the rock spectra is selected as reference spectrum.Reference original spectrum and other original spectra are used to compute local neighborhood weights according to Equation (13).The same arithmetic operation works on reference derivative spectrum and other derivative spectra.The results are shown in Figure 5 (c).The conclusion that local neighborhood derivative weights are insensitive to ESV can be obtained by comparing with results of original spectra and derivative spectra.

The Proposed MDWSU Algorithm Model
Spectral unmixing aims to obtain endmember spectra and corresponding abundance fraction from hyperspectral image.Non-negative matrix factorization (NMF) has been widely under research in the field of spectral unmixing.During the spectral unmixing process, NMF finds the appropriate endmember matrix  and abundance A to minimize the difference between  and   .The difference between  and   can be measured in terms of Euclidean distance.The loss function of NMF based on Euclidean distance is as follows: where the operator ‖. ‖ represents the Frobenius norm.Due to the non-convexity of NMF, it is difficult to obtain the global optimal solution [42].Since the solution obtained by NMF is not unique, other constraints are often introduced into NMF, such as non-negative matrix factorization with minimum volume constraints (MVC-NMF) [43].To improve the sparsity of abundance, the proposed method introduces abundance sparsity constraints into NMF.The L1/2 regularizer has been proven to provide the best sparse solution [42].Using the L1/2 regularizer as sparsity constraints, NMF with sparsity constraints is written as: where the operator .F represents the Frobenius norm.
Due to the non-convexity of NMF, it is difficult to obtain the global optimal solution [42].Since the solution obtained by NMF is not unique, other constraints are often introduced into NMF, such as non-negative matrix factorization with minimum volume constraints (MVC-NMF) [43].To improve the sparsity of abundance, the proposed method introduces abundance sparsity constraints into NMF.The L 1/2 regularizer has been proven to provide the best sparse solution [42].Using the L 1/2 regularizer as sparsity constraints, NMF with sparsity constraints is written as: min where λ is the regularization parameter.A 1/2 is defined in the Equation (17).
The parameter λ depends on the sparsity of abundances.Since these abundances cannot be obtained a priori, we use a rough estimator for λ defined in Equation ( 18) based on the sparseness criteria in [42].
where X i denotes the i th band data of hyperspectral image, the operator . 1 represents the matrix 1-norm and the operator . 2 represents the matrix 2-norm.
There exists a close link between hyperspectral image and decomposed abundance matrix [44].The more similar the neighboring pixel to the central pixel is, the more similar the abundance of the neighboring pixel is to the abundance of the central pixel.Once the derivative of neighboring pixel is similar to the derivative of central pixel, the abundance of neighboring pixel is close to the abundance of central pixel.Considering the similarity between the abundance of neighboring pixels and the central pixel, the local neighborhood derivative weight regularization for guaranteeing abundance smoothness is written as: where Tr(.) represents the trace of matrix, W is weight matrix in which W i,j is element, D is a diagonal matrix, and Then the local neighborhood derivative weight regularization is incorporated into the Equation ( 16).The loss function of MDWSU based on Euclidean distance is written as: min where µ ≥ 0 is the regularization parameter of the local neighborhood derivative weight regularization.Subsequently, the multiplicative update rules [45] is adopted to solve the MDWSU model.The multiplicative update rules are as follows: where .* and ./represent element-wise division operation and element-wise division operation, respectively.The details of the multiplicative update rules and the implementation issue of the MDWSU model are presented in the Appendix A.
The flowchart of the proposed method is shown in Figure 6.

Data Acquisition
We choose the right place as the study site and obtain the hyperspectral images preparing for verifying the effectiveness of the proposed method.
Hyperspectral data acquisition system and data processing flow is illustrated in Figure 7. Hyperspectral data acquisition system consists of Analytical Spectral Devices (ASD), white reference panel, hyperspectral imager and GPS.GPS is used to record the geographical location of study site.FieldSpec Handheld2 Spectroradiometer is selected as ASD.The ground object spectra are measured using ASD.The ASD can measure the reflected electromagnetic radiance of ground objects covering the wavelength ranging from 325 nm to 1075 nm with spectral resolution superior to 3 nm in the field.The white reference panel is used to calibrate the ground object spectrum measurement every ten minutes to guarantee measurement accuracy.In addition, the spectrum of each ground object was measured ten times.During the acquisition of the sample spectrum, the distance between the acquisition target area and the ASD is between 0.5 and 1.0 m, and the acquisition target area should be within the 25° field of view in the lowest point of the ASD.A Resonon Pika XC2 hyperspectral imager is chosen to acquire hyperspectral images.The hyperspectral imager is a push-broom imager and used to acquire visible and near-infrared hyperspectral imagery with the wavelength ranging from 400 nm to 1000 nm.The hyperspectral imager has 462 spectral channels with 1.3 nm spectral resolution and 1600 spatial channels.The hyperspectral imager weights 2.2 kg and its dimensions are 10.1 cm × 27.5 cm × 7.4 cm.

Data Acquisition
We choose the right place as the study site and obtain the hyperspectral images preparing for verifying the effectiveness of the proposed method.
Hyperspectral data acquisition system and data processing flow is illustrated in Figure 7. Hyperspectral data acquisition system consists of Analytical Spectral Devices (ASD), white reference panel, hyperspectral imager and GPS.GPS is used to record the geographical location of study site.FieldSpec Handheld2 Spectroradiometer is selected as ASD.The ground object spectra are measured using ASD.The ASD can measure the reflected electromagnetic radiance of ground objects covering the wavelength ranging from 325 nm to 1075 nm with spectral resolution superior to 3 nm in the field.The white reference panel is used to calibrate the ground object spectrum measurement every ten minutes to guarantee measurement accuracy.In addition, the spectrum of each ground object was measured ten times.During the acquisition of the sample spectrum, the distance between the acquisition target area and the ASD is between 0.5 and 1.0 m, and the acquisition target area should be within the 25 • field of view in the lowest point of the ASD.A Resonon Pika XC2 hyperspectral imager is chosen to acquire hyperspectral images.The hyperspectral imager is a push-broom imager and used to acquire visible and near-infrared hyperspectral imagery with the wavelength ranging from 400 nm to 1000 nm.The hyperspectral imager has 462 spectral channels with 1.3 nm spectral resolution and 1600 spatial channels.The hyperspectral imager weights 2.2 kg and its dimensions are 10.1 cm × 27.5 cm × 7.4 cm.We chose two places in Jiangxi Province as research sites, whose latitude and longitude are (29.35°N,118.09°E) and (29.04°N, 117.74°E), respectively.The first study site covered by corn, rice and weed is shown in Figure 8.The hyperspectral dataset in Figure 8 consists of 400 × 400 pixels.The second study site containing vegetation, tidal flats and water is shown in Figure 9.The hyperspectral dataset in Figure 9 consists of 158 × 178 pixels.We used hyperspectral imager to acquire visible and near-infrared hyperspectral imagery with the wavelength ranging from 400 nm to 1000 nm in the above-mentioned two places in September 2017.Then we used a white reference panel to calibrate original hyperspectral images and converted the hyperspectral images from radiance to reflectance.There are 462 channels in hyperspectral images.Channels 1-12 and 385-462 were abandoned due to the hyperspectral imaging sensor noise and water vapor absorption in the atmosphere.As a result, channels 13-384 of hyperspectral images ranging from 400 nm to 900 nm were used to verify the algorithm.We chose two places in Jiangxi Province as research sites, whose latitude and longitude are (29.35• N, 118.09 • E) and (29.04 • N, 117.74 • E), respectively.The first study site covered by corn, rice and weed is shown in Figure 8.The hyperspectral dataset in Figure 8 consists of 400 × 400 pixels.The second study site containing vegetation, tidal flats and water is shown in Figure 9.The hyperspectral dataset in Figure 9 consists of 158 × 178 pixels.We used hyperspectral imager to acquire visible and near-infrared hyperspectral imagery with the wavelength ranging from 400 nm to 1000 nm in the above-mentioned two places in September 2017.Then we used a white reference panel to calibrate original hyperspectral images and converted the hyperspectral images from radiance to reflectance.There are 462 channels in hyperspectral images.Channels 1-12 and 385-462 were abandoned due to the hyperspectral imaging sensor noise and water vapor absorption in the atmosphere.As a result, channels 13-384 of hyperspectral images ranging from 400 nm to 900 nm were used to verify the algorithm.We chose two places in Jiangxi Province as research sites, whose latitude and longitude are (29.35°N,118.09°E) and (29.04°N, 117.74°E), respectively.The first study site covered by corn, rice and weed is shown in Figure 8.The hyperspectral dataset in Figure 8 consists of 400 × 400 pixels.The second study site containing vegetation, tidal flats and water is shown in Figure 9.The hyperspectral dataset in Figure 9 consists of 158 × 178 pixels.We used hyperspectral imager to acquire visible and near-infrared hyperspectral imagery with the wavelength ranging from 400 nm to 1000 nm in the above-mentioned two places in September 2017.Then we used a white reference panel to calibrate original hyperspectral images and converted the hyperspectral images from radiance to reflectance.There are 462 channels in hyperspectral images.Channels 1-12 and 385-462 were abandoned due to the hyperspectral imaging sensor noise and water vapor absorption in the atmosphere.As a result, channels 13-384 of hyperspectral images ranging from 400 nm to 900 nm were used to verify the algorithm.

Experimental Results and Analysis
To verify the effectiveness of the MDWSU algorithm model, the accuracy of unmixing results is evaluated using the spectral angle distance (SAD), and the root mean square error (RMSE).We use spectral angle distance (SAD) to measure the extent to which the estimated endmember spectrum approximates to the true endmember spectrum.SAD is expressed in the following Equation (23).
where  is the i th estimated endmember signature and  is the i th true endmember signature.The smaller the SAD value is, the more the estimated endmember is similar to the true spectrum.The root mean square error (RMSE) of the entire hyperspectral image is written as: where n denotes the number of the total pixels of hyperspectral image and  denotes the number of the total bands of hyperspectral image.RMSE measures the error between the original hyperspectral image and reconstructed image.If the calculated RMSE is smaller, the accuracy of the decomposed abundance matrix and endmember matrix is higher.In general, since the model errors intend to obey normal distribution, RMSE is a good metric of model performance [46].

Real Hyperspectral Data Experiments
Two different real hyperspectral datasets acquired by Resonon Pika XC2 hyperspectral imager are used to verify the effectiveness of the MDWSU method.Sixty spectra for each endmember are obtained by ASD.Then average values of spectra obtained by ASD for each endmember are adopted as the reference endmember spectra.Figure 8 shows the real hyperspectral image of the first study

Experimental Results and Analysis
To verify the effectiveness of the MDWSU algorithm model, the accuracy of unmixing results is evaluated using the spectral angle distance (SAD), and the root mean square error (RMSE).We use spectral angle distance (SAD) to measure the extent to which the estimated endmember spectrum approximates to the true endmember spectrum.SAD is expressed in the following Equation (23).
where g i is the i th estimated endmember signature and ĝi is the i th true endmember signature.The smaller the SAD value is, the more the estimated endmember is similar to the true spectrum.The root mean square error (RMSE) of the entire hyperspectral image is written as: where n denotes the number of the total pixels of hyperspectral image and l denotes the number of the total bands of hyperspectral image.RMSE measures the error between the original hyperspectral image and reconstructed image.If the calculated RMSE is smaller, the accuracy of the decomposed abundance matrix and endmember matrix is higher.In general, since the model errors intend to obey normal distribution, RMSE is a good metric of model performance [46].

Real Hyperspectral Data Experiments
Two different real hyperspectral datasets acquired by Resonon Pika XC2 hyperspectral imager are used to verify the effectiveness of the MDWSU method.Sixty spectra for each endmember are obtained by ASD.Then average values of spectra obtained by ASD for each endmember are adopted as the reference endmember spectra.Figure 8 shows the real hyperspectral image of the first study site (29.35 • N, 118.09 • E) near Wuyuan district of Jiangxi Province in China, which is covered by the rice, corn, and weed.Figure 9 shows another real hyperspectral image of the second study site (29.04 • N, 117.74 • E) near Dexing district of Jiangxi Province in China, which is covered by tidal flats, vegetation, and water.In this section, these two hyperspectral images are experimented with different algorithms.
The MDWSU model method is compared with Fisher discriminant null space (FDNS) [25], hierarchical MEMSA [12], VCA with fully constrained least squares (FCLS) [47], non-negative matrix factorization with minimum volume constraints (MVC-NMF) [43], and non-negative matrix factorization with sparsity constraints (L 1/2 -NMF) [42].FDNS and hierarchical MEMSA have been developed to deal with endmember spectral variability, while the other three methods are traditional methods which assume that each endmember must have only one constant spectral signature.The FDNS searches a linear transformation of the spectra, from the original sample space to a lower dimensional space, which reduces the intra-class spectral differences in the low dimensional space [25].The hierarchical MEMSA here is adopted to compare the abundance fractions accuracy with the MDWSU model.A spectral library containing different endmembers adapted to different scenes was created from the image by human visualization.Three endmember selection procedures (Endmember Average Root Mean Square Error, Minimum Average Spectral Angle, and Count Based Endmember Selection) were used to identify the most representative endmembers from the spectral library for each class [12].The VCA algorithm is widely used to compare spectral unmixing performance due to its relatively high precision and stability.NMF has drawn much attention in the field of hyperspectral remote sensing [48].When it is applied to spectral unmixing, auxiliary constraints are introduced into NMF algorithm, making the model in line with real geographical significance.Taking the attributes of endmember signatures and abundance fractions into account, algorithms with different constraints have been developed for spectral unmixing.The constraints for NMF-based spectral unmixing algorithms are mainly classified into two categories: spectral-based constraints and abundance-based constraints.MVC-NMF is a classic algorithm of NMF with the spectral-based constraints while L 1/2 -NMF is a typical algorithm of NMF with the abundance-based constraints.
Figure 10 displays the abundance fractions of the MDWSU model and the other methods.Evaluating abundance maps by visual interpretation method, the best abundance estimation is achieved by the MDWSU model, yet there are some obvious errors in the abundance maps of other methods.As the spectra of the weeds and crops are highly similar, traditional unmixing methods (i.e., VCA-FCLS, MVC-NMF, and L 1/2 -NMF) cannot distinguish them effectively, so an incorrect abundance estimation of weed and corn corresponding to VCA-FCLS, MVC-NMF, and L 1/2 -NMF algorithms exists in Figure 10.Some weeds with high abundance exist in the corn abundance map obtained by the FDNS algorithm, which is obviously not in line with actual situation.As the spectral transformation of the FDNS algorithm is a dimensionality reduction algorithm, the linearly transformed low-dimensional spectral information loses the information of the original spectrum that can effectively distinguish between corn and weeds.There are many errors in the weed and corn abundance maps of the Hierarchical MEMSA algorithm.However, the MDWSU algorithm constructs spectral weighting matrix assigning different weights to different bands, which enlarges the inter-class differences and decreases intra-class differences.Furthermore, the local neighborhood derivative weights allow the abundance of similar ground objects close.The MDWSU algorithm model can obtain better abundance estimation results than other methods.Figure 11 displays the abundance maps of the proposed method and other algorithms.The MDWSU model estimates better than other algorithms referring to tree and water abundance fraction maps.In the water abundance fraction maps, the other algorithms misidentified the tree as water, yet the MDWSU model avoided this error.As the tree shadow spectrum is close to the water spectrum in spectral magnitude, so the error appears in the water abundance of other algorithms.
Although the spectral curve of water is close to the spectral curve of the tree shadow, the spectral curves of the tree shadow and the trees are substantially coincident in the bands between 400 nm and 700 nm.Other algorithms cannot effectively use the above-mentioned information to distinguish between trees, tree shadow, and water, so the tree shadows are classified as water.Tree shadows are classified as trees rather than water in the MDWSU model.
Tables 1-4 provide the SAD and RMSE values of different algorithms.The best results are denoted by bold font.The lower the SAD and RMSE values are, the more accurate the abundance estimation and endmember spectra estimation are.The average SAD and RMSE values of the MDWSU algorithm are the lowest compared with other algorithms, which proves the effectiveness of the proposed method.Endmember spectral variability widely exists in real hyperspectral images, especially in vegetation areas, which can be interpreted by illumination variation.Experimental results show that it is feasible to combine the spectral weighting matrix with local neighborhood derivative weights in reducing the effect of ESV on unmixing accuracy.Figure 11 displays the abundance maps of the proposed method and other algorithms.The MDWSU model estimates better than other algorithms referring to tree and water abundance fraction maps.In the water abundance fraction maps, the other algorithms misidentified the tree as water, yet the MDWSU model avoided this error.As the tree shadow spectrum is close to the water spectrum in spectral magnitude, so the error appears in the water abundance of other algorithms.Although the spectral curve of water is close to the spectral curve of the tree shadow, the spectral curves of the tree shadow and the trees are substantially coincident in the bands between 400 nm and 700 nm.Other algorithms cannot effectively use the above-mentioned information to distinguish between trees, tree shadow, and water, so the tree shadows are classified as water.Tree shadows are classified as trees rather than water in the MDWSU model.

Experimental Results of the AVIRIS Cuprite Dataset
Another real hyperspectral data about the mineral district will be used to verify the effectiveness of the MDWSU model.The Cuprite mineral district is located in the western part of Nevada, USA, which has been established as a mineralogical reference site for remote sensing instruments [49][50][51].The image was acquired by an Airborne Visible-Near/Infrared Imaging Spectrometer (AVIRIS) sensor on 19 June 1997.The reference spectral library of minerals in Cuprite mineral district is available on the internet [52].The MDWSU algorithm and other algorithms will extract endmembers from the Cuprite image.SAD is used to measure the accuracy of the estimated endmember spectra.The subimage with 250 × 190 pixels is intercepted from the raw hyperspectral data.Figure 12 displays false-color subimage of AVIRIS Cuprite data.In the AVIRIS Cuprite data, there are 224 spectral bands with the wavelength ranging from 400 nm to 2500 nm.In experiments, the low SNR bands and the water vapor absorption bands (1-2, 104-113, 148-167, and 221-224) have been removed [42].Estimating the number of endmembers in the selected region is a challenging issue due to endmember spectral variability of the same mineral with different chemical compositions.Regarding the number of endmembers, different algorithms provide different estimations in the literature.To address the endmember spectral variability, human interpretation is needed for the estimation of the number of endmembers [53].Referring to the existing analysis in [44], there are 11 different types of minerals in the scene.Furthermore, the estimated number of endmembers in the selected region is set to 11 via virtual dimensionality (VD) method [54] with false alarm probability P f = 10 −8 .
Figure 13 displays a comparison between the endmember spectra of the MDWSU algorithm and the USGS library.The endmember spectra of the MDWSU algorithm are displayed in orange dotted lines while the reference endmember spectra are shown in blue solid lines in Figure 13. Figure 14 shows the hard classification maps where the class label is assigned to a pixel in which the class having the highest abundance fraction.The hard classification maps in Figure 14 are obtained by MDWSU, FDNS, Hierarchical MEMSA, L 1/2 -NMF, MVC-NMF, and VCA-FCLS.Table 5 provides SAD values for each method (MDWSU, FDNS, L1/2-NMF, MVC-NMF, VCA-FCLS).Table 6 provides RMSE values for each method (MDWSU, FDNS, Hierarchical MEMSA, L1/2-NMF, MVC-NMF, VCA-FCLS).The best results are denoted by bold font.It can be judged from Table 5 that the MDWSU algorithm has the highest number of endmember minimum SAD values, and the MDWSU algorithm has the smallest endmember SAD average value.According to Table 6, the MDWSU algorithm has the smallest RMSE value.The performance of the MDWSU algorithm is beyond other algorithms (FDNS, Hierarchical MEMSA, L1/2-NMF, MVC-NMF, VCA-FCLS).Figure 13 displays a comparison between the endmember spectra of the MDWSU algorithm and the USGS library.The endmember spectra of the MDWSU algorithm are displayed in orange dotted lines while the reference endmember spectra are shown in blue solid lines in Figure 13. Figure 14 shows the hard classification maps where the class label is assigned to a pixel in which the class having the highest abundance fraction.The hard classification maps in Figure 14 are obtained by MDWSU, FDNS, Hierarchical MEMSA, L1/2-NMF, MVC-NMF, and VCA-FCLS.Table 5 provides SAD values for each method (MDWSU, FDNS, L1/2-NMF, MVC-NMF, VCA-FCLS).Table 6 provides RMSE values for each method (MDWSU, FDNS, Hierarchical MEMSA, L1/2-NMF, MVC-NMF, VCA-FCLS).The best results are denoted by bold font.It can be judged from Table 5 that the MDWSU algorithm has the highest number of endmember minimum SAD values, and the MDWSU algorithm has the smallest endmember SAD average value.According to Table 6, the MDWSU algorithm has the smallest RMSE value.The performance of the MDWSU algorithm is beyond other algorithms (FDNS, Hierarchical MEMSA, L1/2-NMF, MVC-NMF, VCA-FCLS).(i) (

Discussion
In this study, we investigate the validity of a spectral unmixing method based on a spectral weighting matrix and local neighborhood derivative weights to address spectral variability in hyperspectral images.The experimental results prove that the proposed algorithm is superior to the other typical algorithms.In this section some issues will be improved or discussed for further research.
First, establishing the endmember spectral library is the pre-processing step for coping with the effect of endmember spectral variability on spectral unmixing.The methods for establishing an endmember spectral library can be summarized into three categories: (1) collecting endmember spectra in the field experiments [24]; (2) manually determining the endmember spectra by visual interpretation [55]; and (3) automatically establishing an endmember spectral library by an algorithm [25].This paper intends to extract the endmember spectral library from the hyperspectral image, which can avoid the subjective misjudgment of the visual interpretation method and the human burden caused by the field experiment.Bateson et al. [56] constructed the endmember spectral library from the hyperspectral remote sensing image itself.The endmembers would be the vertices of the minimum volume convex simplex found with the simulated annealing algorithm [56].The above-mentioned strategy for constructing endmember spectral library has proven to be very effective.Yet, it has the disadvantage of long running time.Inspired by the above-mentioned strategy, we focus on constructing the endmember spectral library based on the convex simplex theory.The VCA algorithm is a well-known convex-geometry-based endmember extraction method.Compared to other endmember extraction methods (e.g., PPI, N-FINDER [57]) based on convex simplex theory, the VCA algorithm not only has low computational complexity, but also performs better than PPI and N-FINDER.The paper proposes the effective and fast algorithm for establishing the endmember spectral library.The algorithm uses a simplex to approximate the hyperspectral images, the pixels near or at the vertices of the simplex are viewed as endmember training samples.Motivated by the VCA extraction endmember algorithm, the orthogonal projection method is applied to find the pixels near or at the vertices of the simplex as the candidates of the endmember spectral library in the proposed method.Despite the encouraging results acquired, some issues need to be discussed for further research.The availability of the VCA algorithm is based on assumptions that some pure pixels of each endmember class are present in hyperspectral image.The validity of the hypothesis depends not only on the fact that a certain number of pure pixels should be available for each endmember class, but also on the size of hyperspectral images.If pure pixels are not available in hyperspectral images, the endmember spectral library will be affected by the VCA algorithm, which further affects the accuracy of the spectral weighting matrix.Therefore, if pure pixels do not present in images, some algorithms [58,59] which do not assume the existence of pure pixels can be considered instead of the VCA algorithm.
Second, the traditional linear unmixing method provides a relative inaccurate abundance estimate for all hyperspectral scenarios.As generally accepted, this is due to the high degree of similarity between endmember classes and intra-class endmember spectral variability.As shown in Figure 8, the corn and weed spectrum is highly similar while the spectra of corn and weeds vary, which leads to unstable model inversion.Using spectral weighting matrix makes it possible to expand the difference between endmember spectra and to reduce the differences of the same endmember spectra.The aim of maximum margin criterion (MMC) is to find the linear transform that maximizes inter-class differences and minimizes intra-class differences.MMC is introduced into modeling spectral weighting matrix.The real experimental results prove that spectral weighting matrix based on MMC estimates the abundance more accurately.
Third, in addition to spectral weighting matrix, local neighborhood derivative weight is introduced into the proposed method to reduce the effect of ESV on spectral unmixing.Derivative analysis is used to convert the original hyperspectral data into new spectral feature values with smaller intra-class differences and larger inter-class differences for spectral unmixing.The first-order derivative of spectrum is adopted to obtain the spectral feature values in the proposed method.Local neighborhood weights prove to enhance the performance of the spectral unmixing algorithms [28,60,61].Local neighborhood weights characterize the degrees of similarity of pixel pairs.The higher the similarity between pixels is, the more similar the corresponding abundances are.In fact, the local neighbor weights vary due to the influence of ESV.The change in local neighborhood weight between pixel pair affects the corresponding abundances.Hence, derivative analysis and local neighborhood weights are merged into local neighborhood derivative weights in the proposed method.The first-order derivative of spectra of pixel pair are adopted to model local neighborhood derivative weights.Figure 5 shows that local neighborhood derivative weights are insensitive to ESV.To prove the generalization of the fact, the spectra of corn and rice are intercepted from Figure 8.Note that the spectra of corn and the spectra of rice vary in Figures 15b and 16b, respectively.One of the spectra is selected as reference spectrum.The original spectra and derivative spectra are used to compute local neighborhood weights according to Equation ( 13), respectively.The results are shown in in Figures 15c and 16c, respectively.By comparing the local neighborhood weights between original spectra and derivative spectra, local neighborhood derivative weights are insensitive to ESV.
Remote Sens. 2019, 11, x FOR PEER REVIEW 23 of 29 [28,60,61].Local neighborhood weights characterize the degrees of similarity of pixel pairs.The higher the similarity between pixels is, the more similar the corresponding abundances are.In fact, the local neighbor weights vary due to the influence of ESV.The change in local neighborhood weight between pixel pair affects the corresponding abundances.Hence, derivative analysis and local neighborhood weights are merged into local neighborhood derivative weights in the proposed method.The first-order derivative of spectra of pixel pair are adopted to model local neighborhood derivative weights.Figure 5 shows that local neighborhood derivative weights are insensitive to ESV.
To prove the generalization of the fact, the spectra of corn and rice are intercepted from Figure 8.Note that the spectra of corn and the spectra of rice vary in Figure 15  Remote Sens. 2019, 11, x FOR PEER REVIEW 23 of 29 [28,60,61].Local neighborhood weights characterize the degrees of similarity of pixel pairs.The higher the similarity between pixels is, the more similar the corresponding abundances are.In fact, the local neighbor weights vary due to the influence of ESV.The change in local neighborhood weight between pixel pair affects the corresponding abundances.Hence, derivative analysis and local neighborhood weights are merged into local neighborhood derivative weights in the proposed method.The first-order derivative of spectra of pixel pair are adopted to model local neighborhood derivative weights.Figure 5 shows that local neighborhood derivative weights are insensitive to ESV.
To prove the generalization of the fact, the spectra of corn and rice are intercepted from Figure 8.Note that the spectra of corn and the spectra of rice vary in Figure 15  Fifth, although hierarchical MEMSA also achieves small RMSE value compared with other methods (VCA-FCLS, MVC-NMF, and L1/2-NMF), hierarchical MEMSA produces a more pronounced abundance estimation error compared to the MDWSU algorithm via visual interpretation method.Because the accuracy assessment of the land surface parameters estimation algorithm is difficult to achieve due to the difficulties in the field surveys [62], we have to evaluate abundance estimation of land cover by visual interpretation method.Regarding the abundance maps obtained by hierarchical MEMSA in Figure 10, more weeds are mistakenly identified as corn in the corn abundance map, and Fourth, µ is the regularization parameter of the local neighborhood derivative weight regularization.The µ value controls the impact of the local neighborhood derivative weight regularization on the endmember abundances.In this section, the selection of the regularization parameter µ is considered.Fifth, although hierarchical MEMSA also achieves small RMSE value compared with other methods (VCA-FCLS, MVC-NMF, and L1/2-NMF), hierarchical MEMSA produces a more pronounced abundance estimation error compared to the MDWSU algorithm via visual interpretation method.Because the accuracy assessment of the land surface parameters estimation algorithm is difficult to achieve due to the difficulties in the field surveys [62], we have to evaluate abundance estimation of Fifth, although hierarchical MEMSA also achieves small RMSE value compared with other methods (VCA-FCLS, MVC-NMF, and L 1/2 -NMF), hierarchical MEMSA produces a more pronounced abundance estimation error compared to the MDWSU algorithm via visual interpretation method.Because the accuracy assessment of the land surface parameters estimation algorithm is difficult to achieve due to the difficulties in the field surveys [62], we have to evaluate abundance estimation of land cover by visual interpretation method.Regarding the abundance maps obtained by hierarchical MEMSA in Figure 10, more weeds are mistakenly identified as corn in the corn abundance map, and more rice and more corn are mistakenly identified as weeds in the weed abundance map.In Figure 11, the hierarchical MEMSA model estimates worse than the MDWSU algorithm referring to tree and water abundance fraction maps.The hierarchical MEMSA exhausts all endmember combinations and selects the optimal endmember combination to unmix the corresponding pixel with the minimum reconstruction error between the pixel reconstruction value and the real pixel value as the criterion.Based on the above strategy, hierarchical MEMSA can achieve small RMSE value.As shown in Figures 8  and 9, the spectra of corn, weeds, rice, and trees vary greatly in spectral magnitude.Meanwhile, some spectra of corn, weeds, and rice are close to each other in spectral magnitude.Some spectra of tree shadow are close to the spectra of water in spectral magnitude.As some spectra of different endmembers are close to each other in spectral magnitude, to satisfy the minimum reconstruction error, hierarchical MEMSA may select the wrong endmember combination or calculate unreasonable abundance fractions.So, the MDWSU model produces more accurate abundance estimations than hierarchical MEMSA.
In addition, we further make efforts to merge the different approaches-iterative mixture analysis cycles, spectral feature selection, spectral transformation, and spectral model-to reduce ESV in the future.

Conclusions
The MDWSU model, which can address endmember spectral variability in hyperspectral unmixing, is proposed in the paper.In the MDWSU model, an effective and fast algorithm is proposed for establishing the endmember spectral library.The algorithm can extract the endmember spectral library from the hyperspectral image, which can avoid the subjective misjudgment of the visual interpretation method and the human burden caused by the field experiment.Then we adopt the endmember spectral library to construct spectral weighting matrix based on maximum margin criterion.A spectral weighting matrix assigns different weights to different bands, which can increase inter-class differences and decrease intra-class differences.Besides, derivative analysis and local neighborhood weights are merged into local neighborhood derivative weights in the MDWSU model.Local neighborhood derivative weights prove to be insensitive to ESV.The local neighborhood weights act as a regularization term to penalize different abundance vectors.The spectral weighting matrix, local neighborhood derivative weights, and sparsity constraints are merged into nonnegative matrix factorization model.
To verify the proposed algorithm, experiments with three real hyperspectral datasets were conducted.The MDWSU model is compared with several classical hyperspectral unmixing models, including FDNS, hierarchical MEMSA, VCA-FCLS, L 1/2 -NMF, and MVC-NMF.Experimental results, including RMSE and SAD, on three real hyperspectral datasets demonstrate that the MDWSU model outperforms other several classical algorithms.Experiments have also shown that endmember spectral variability reduction technique have positive effects on sub-pixel fraction estimation accuracy.However, in some environments, the similarity of different endmembers poses additional difficulty when obtaining accurate abundance estimates or classification results.Compared with hierarchical MEMSA, the MDWSU model produces more accurate abundance estimations.
The object function of MDWSU is presented in Equation (20).The iteration rules of MDWSU model can be obtained by the projection gradient learning method [63].Setting the results of k th iteration as G (k) and A (k) , the iteration rules are written as: , A (k) )} ( 25) , A (k) )} (26) where α (k) and β (k) mean the learning step of projection gradient learning method, ∇ G f (G, A) and ∇ A f (G, A) are expressed in Equations ( 27) and (28).GA), the iteration rules can be transformed into the multiplicative update rules.The multiplicative update rules are described in Equations ( 21) and (22).
Subsequently, the implementation issue of the MDWSU model is presented in the following.As mentioned earlier, sum of each column of abundance is unity (Sum-To-One).To satisfy Sum-To-One constraints of abundance, X and G are instead of = X and = G in Equation (29).
where 1 n (1 m ) is a n (m)-dimensional column vector of all 1s.δ can control the influence of Sum-To-One constraints on abundance.The δ value will affect the accuracy of abundance and implementation efficiency in MDWSU model.The larger the δ value is, the closer the sum of each column of abundance is to one.However, large δ value will reduce the convergence rate.δ is set as 200 in our experiments.
The initial endmember matrix = G can be obtained when modeling the endmember spectral library and spectral weighting matrix.The initial abundance A can be obtained via the least squares method defined in Equation (30).
where the function max{0, x} is used to set the negative components of abundance A to zero while keeping the non-negative components of abundance A unchanged.When iteration number reaches maximum number of iterations or threshold τ defined in Equation ( 31) is satisfied, the iterative operation of the MDWSU will stop.The maximum number of iterations is selected as 1500 for all hyperspectral datasets.
The endmember matrix = G is transformed by G using a weighted LSE approach.After the optimal solution of the endmember matrix = G is obtained by multiplicative update rules, the endmember matrix = G needs to be transformed into G according to Equation (32).

Figure 1 .
Figure 1.The schematic overview of the proposed method.

Figure 1 .
Figure 1.The schematic overview of the proposed method.

Figure 2 .
Figure 2. (a) Position of endmember candidates of the endmember spectral library located at the falsecolor image (color composite with red-green-blue (RGB) channels at bands 85-50-10, centered around 671 nm-558 nm-430 nm), the latitude and longitude of hyperspectral dataset are (36.15°N,121.07°W);(b) Red circles represent tree and spectra of tree; (c) Green diamonds represent water and spectra of water; (d) Blue squares represent rock and spectra of rock.

Figure 2 .
Figure 2. (a) Position of endmember candidates of the endmember spectral library located at the false-color image (color composite with red-green-blue (RGB) channels at bands 85-50-10, centered around 671 nm-558 nm-430 nm), the latitude and longitude of hyperspectral dataset are (36.15• N, 121.07 • W); (b) Red circles represent tree and spectra of tree; (c) Green diamonds represent water and spectra of water; (d) Blue squares represent rock and spectra of rock.

Figure 3 .
Figure 3. (a) The red circle assigned to be center point has six neighbors; (b) The red circle and the black circles belong to the same class; (c) The red circle and the yellow rectangle and triangle do not belong to the same class (d) After the maximum margin criterion (MMC), the samples in the same class become more compact.

Figure 3 .
Figure 3. (a) The red circle assigned to be center point has six neighbors; (b) The red circle and the black circles belong to the same class; (c) The red circle and the yellow rectangle and triangle do not belong to the same class (d) After the maximum margin criterion (MMC), the samples in the same class become more compact.

Figure 4 .
Figure 4. Diagram of the local window in the hyperspectral image.The black circle is chosen to be the central pixel and the red circles are the neighboring pixels of the central pixel.

Figure 5 .
Figure 5. (a) The spectral variety of rock extracted from hyperspectral image; (b) The spectra of rock and the reference spectrum of rock; (c) Comparison of local neighborhood weights between origin spectra and derivative spectra.

Figure 5 .
Figure 5. (a) The spectral variety of rock extracted from hyperspectral image; (b) The spectra of rock and the reference spectrum of rock; (c) Comparison of local neighborhood weights between origin spectra and derivative spectra.

2. 5 .
The Proposed MDWSU Algorithm Model Spectral unmixing aims to obtain endmember spectra and corresponding abundance fraction from hyperspectral image.Non-negative matrix factorization (NMF) has been widely under research in the field of spectral unmixing.During the spectral unmixing process, NMF finds the appropriate endmember matrix = G and abundance A to minimize the difference between = X and = GA.The difference between = X and = GA can be measured in terms of Euclidean distance.The loss function of NMF based on Euclidean distance is as follows: min

Figure 6 .
Figure 6.The flowchart of the proposed method.

Figure 6 .
Figure 6.The flowchart of the proposed method.

Figure 7 .
Figure 7. Hyperspectral data acquisition system and data processing flow.

Figure 7 .
Figure 7. Hyperspectral data acquisition system and data processing flow.

29 Figure 7 .
Figure 7. Hyperspectral data acquisition system and data processing flow.

Figure 8 .
Figure 8.The false-color image (color composite with red-green-blue (RGB) channels at bands 179-80-50, centered around 639 nm-506 nm-466 nm) of the first study site (29.35 • N, 118.09 • E) near Wuyuan district of Jiangxi Province in China is covered by the rice, corn, and weed.

Figure 8 .
Figure 8.The false-color image (color composite with red-green-blue (RGB) channels at bands 179-80-50, centered around 639 nm-506 nm-466 nm) of the first study site (29.35°N,118.09°E) near Wuyuan district of Jiangxi Province in China is covered by the rice, corn, and weed.

Figure 9 .
Figure 9.The false-color image (color composite with red-green-blue (RGB) channels at bands 179-80-50, centered around 639 nm-506 nm-466 nm) of the second study site (29.04°N,117.74°E) near Dexing district of Jiangxi Province in China is covered by tidal flats, vegetation, and water.

Figure 9 .
Figure 9.The false-color image (color composite with red-green-blue (RGB) channels at bands 179-80-50, centered around 639 nm-506 nm-466 nm) of the second study site (29.04 • N, 117.74 • E) near Dexing district of Jiangxi Province in China is covered by tidal flats, vegetation, and water.

, 3 ,
Figure11displays the abundance maps of the proposed method and other algorithms.The MDWSU model estimates better than other algorithms referring to tree and water abundance fraction maps.In the water abundance fraction maps, the other algorithms misidentified the tree as water, yet the MDWSU model avoided this error.As the tree shadow spectrum is close to the water spectrum in spectral magnitude, so the error appears in the water abundance of other algorithms.Although the spectral curve of water is close to the spectral curve of the tree shadow, the spectral curves of the tree shadow and the trees are substantially coincident in the bands between 400 nm and 700 nm.Other algorithms cannot effectively use the above-mentioned information to distinguish between trees, tree shadow, and water, so the tree shadows are classified as water.Tree shadows are classified as trees rather than water in the MDWSU model.Tables1, 2, 3, and 4 provide the SAD and RMSE values of different algorithms.The best results are denoted by bold font.The lower the SAD and RMSE values are, the more accurate the abundance estimation and endmember spectra estimation are.The average SAD and RMSE values of the MDWSU algorithm are the lowest compared with other algorithms, which proves the effectiveness of the proposed method.Endmember spectral variability widely exists in real hyperspectral images, especially in vegetation areas, which can be interpreted by illumination variation.Experimental results show that it is feasible to combine the spectral weighting matrix with local neighborhood derivative weights in reducing the effect of ESV on unmixing accuracy.

Figure 15 .Figure 15 .
Figure 15.(a) The reference spectrum of corn intercepted from Figure 8; (b) The spectra of corn intercepted from Figure 8; (c) Comparison of local neighborhood weights between original spectra and derivative spectra.

Figure 16 .
Figure 16.(a) The reference spectrum of rice intercepted from Figure 8; (b) The spectra of rice intercepted from Figure 8; (c) Comparison of local neighborhood weights between original spectra and derivative spectra.

Figure 16 .Figure 17 .
Figure 16.(a) The reference spectrum of rice intercepted from Figure 8; (b) The spectra of rice intercepted from Figure 8; (c) Comparison of local neighborhood weights between original spectra and derivative spectra.Fourth, μ is the regularization parameter of the local neighborhood derivative weight regularization.The μ value controls the impact of the local neighborhood derivative weight regularization on the endmember abundances.In this section, the selection of the regularization parameter μ is considered.Figure17shows the performance of the MDWSU model varies with μ in terms of SAD and RMSE when the MDWSU model is applied to the real hyperspectral image of the second study site (29.04°N,117.74°E) near Dexing district of Jiangxi Province.As shown, the MDWSU model reaches good performance in terms of SAD and RMSE when μ varies between 0.25 and 0.4.According to the parameter analysis, the μ parameter is selected between 0.25 and 0.4 for all the experiments.When μ parameter is selected as 0.01, the performance of the MDWSU model will become relatively poor.The RMSE value of the MDWSU model applied to Figure9is 0.0078.The RMSE value and SAD value of the MDWSU model applied to Figure8is 0.2135 and 0.0086, respectively.Even so, the performance of the MDWSU model is better than VCA-FCLS and L1/2-NMF algorithms.

Figure 17 .
Figure 17.Performance of the MDWSU with respect to parameter µ in terms of (a) spectral angle distance (SAD) and (b) root mean square error (RMSE).

Table 1 .
Spectral angle distance (SAD) results of first real hyperspectral data (the best results are denoted by bold font).

Table 2 .
Root mean square error (RMSE) results of first real hyperspectral data (the best results are denoted by bold font).

Table 2 .
Root mean square error (RMSE) results of first real hyperspectral data (the best results are denoted by bold font).

Table 3 .
SAD results of second real hyperspectral data (the best results are denoted by bold font).

Table 4 .
RMSE results of second real hyperspectral data (the best results are denoted by bold font).

Table 5 .
SAD results of the AVIRIS Cuprite data (the best results are denoted by bold font).

Table 6 .
RMSE results of the AVIRIS Cuprite data (the best results are denoted by bold font).

Table 5 .
SAD results of the AVIRIS Cuprite data (the best results are denoted by bold font).

Table 6 .
RMSE results of the AVIRIS Cuprite data (the best results are denoted by bold font).