Hyperspectral Image Resolution Enhancement Approach Based on Local Adaptive Sparse Unmixing and Subpixel Calibration

: Unmixing based fusion aims at generating a high spectral-spatial resolution image (HSS) with the same surface features of the high spatial resolution multispectral image (MS) and low spatial resolution hyperspectral image (HS). In this paper, a new fusion method is proposed to improve the fusion performance by taking further advantage of the distribution characteristics of ground objects. First, we put forward a local adaptive sparse unmixing based fusion (LASUF) algorithm, in which the sparsity of the abundance matrices is appended as the constraint to the optimization fusion, considering the limited categories of ground objects in a speciﬁc range and the local correlation of their distribution. Then, to correct the possible original subpixel misregistrations or those introduced by the fusion procedures, a subpixel calibration method based on optimal matching adaptive morphology ﬁltering (OM-AMF) is designed. Experiments on various datasets captured by different sensors demonstrate that the proposed fusion algorithm surpasses other typical fusion techniques in both spatial and spectral domains. The proposed method effectively preserves the spectral composition features of the isolated ground objects within a small area. In addition, the OM-AMF postprocessing is able to spatially correct the fusion results at a subpixel level and preserve the spectral features simultaneously.


Introduction
Along with the innovation of imaging spectroscopy, hyperspectral (HS) images can provide a spatial scene in hundreds of spectral channels. The high spectral resolution can reflect diagnostic spectrum characteristics of ground materials, and HS data have obvious superiority in aspects such as precision agriculture, anomaly detection, mineral exploration and terrain classification. Nevertheless, the spatial resolution of HS imagery is relatively low compared to panchromatic (PAN) imagery and multispectral (MS) imagery in order to maintain an acceptable signal-to-noise ratio (SNR), narrow the spectral bandwidth needed to increase instantaneous field of view (IFOV), while having high spatial resolution (small IFOV) system needed to broaden spectral channel. Due to the trade-off between spectral and spatial information quality, it is hard to directly get high spectral-spatial (HSS) resolution data, which are required in various applications. Performing algorithms from image processing perspective is a very feasible way to overcome this problem.
To obtain HSS data, plenty of resolution enhancement methods can be found in literature [1][2][3]. Under certain circumstances, auxiliary information can be hard to acquire, and single image super-resolution aims to reconstruct high spatial resolution data from low-resolution HS images.
The basic approach through traditional interpolators was simple and fast, but it did not enhance the details. Various regularization-based methods were proposed to solve such an ill-posed problem including total variation regularization [4], sparse representation [5], and self-similarity grouping [6]. The introduction of regularization terms can preserve the edges and textures, so better performances were realized during the super-resolution process. Gu et al. [7] mapped abundance after unmixing on a high-resolution grid using spatial correlation to get resolution enhanced data. However, single image super-resolution mainly depends on priori and models, which are not universally applicable. With the development of a remote sensing platform carrying different sensors, multi sources are more available. Component substitution (CS) [8][9][10] relied on replacing components of MS data by panchromatic images. A multiresolution analysis (MRA) based approach injected spatial details obtained from multiscale decomposition of panchromatic images into MS data [11,12]. A combination of CS and MRA, named Hybrid, and a series of derivative algorithms were proposed to provide an appropriate balance between spatial and spectral preservation [13,14]. Pansharpening techniques achieved remarkable results due to information injection and were extended to HS and MS imagery fusion in the last few decades. Since both spatial and spectral information is contained in MS images, and the corresponding bands are highly correlated to HS images, image fusion has received extensive attention [15]. In this paper, we mainly discuss the HSS image generation utilizing HS and MS image fusion techniques.
From the literature, the first algorithm of merging HS and MS images was a wavelet-based method [16]. Both the images were transformed in wavelet space and then combined according to a fusion rule. Then, the contourlet transform compromising multi-scale and directional filters in its structure was developed [17]. Some researchers established the observation model between HS and MS images to estimate HSS data. Eismann [18][19][20] developed a maximum a posteriori (MAP) estimator incorporating with a stochastic mixing model (SMM), and minimized the objective function considering the underlying spectral characteristics and spatial position to get the optimal estimation. The Bayesian estimator [21][22][23][24][25] was another popular approach suited for the observation model. The spectral response function needed to be explored, and then the distribution of HS and MS data was estimated by the assumed priori distribution. The fusion model was processed via the posteriori distribution. The key step was the definition of the penalization term, and various estimators can be implemented: native Gaussian prior, sparse representation promoted prior, and vector total variation prior. Unmixing based fusion methods were proposed recently, in which the coupled nonnegative matrix factorization (CNMF) was the most famous one [26]. The theoretical foundation of CNMF was a linear spectral mixing model, and both low spatial resolution HS and high spatial resolution MS images were iteratively unmixed to eventually extract a high quality of endmembers and abundance maps. HSS imagery was then produced by endmember spectral of HS image and abundance map of MS image. Yokoya applied preprocessing and onboard cross-calibration into CNMF and solved the spectral range mismatch between real data [27]. Bendoumi added sub-image division to the original CNMF framework and simplified the solving process to get good fusion results [28]. In [29], spectral adjustment was used after CNMF to generate a simulated MS image and reduce the spectral distortion during simulation. High frequency information was extracted by a block based fusion phase. Compared with other fusion procedures, unmixing based methods attempted to approach physical reality more than mathematic approximation. However, the physical constraints of priori knowledge were almost neglected in the iterative solution of previous unmixing based methods.
In recent years, the sparsity of the HS images has been gradually regarded and taken advantage of in the resolution enhancement methods. By implementing an alternating direction method of multipliers (ADMM), Wycoff [30] transformed the optimization problem of integrating hyperspectral and visual color images into two Frobenius norm minimization problems utilizing coefficients matrix sparsity to improve the efficiency. However, the sparsity property of coefficients matrix was not further considered during the ADMM iterations, and fusion accuracy was not promoted markedly. In [31], mutual correlations between different HS channels were used to jointly sparse coefficient vectors, and dictionaries were created for each MS channel. Qi Wei [32] formed regularization priors and posteriori optimization to the joint nonnegative matrix factorization (NMF) problem, and the ADMM algorithm was used to alternately estimate the final abundance maps and endmember signatures. The authors [33] proposed a Generalization of Simultaneous Orthogonal Matching Pursuit (G-SOMP+) algorithm to learn a sparse code, and then the HSS image was estimated along with the spectral reflectance. Huang [34] supposed that there were probably only a few land surface materials contributing to each pixel in the HS image. The optimal set of atoms was obtained from HS data through sparse matrix factorization, and the coefficients matrix was calculated from MS data. Since the two steps were detached, no spatial correlation between HS and MS images was needed, which made it flexible in practical application, but the HS dataset with a large range was essential to extract representative atoms as a spectral dictionary. In the non-negative structured sparse representation (NSSR) method [35], the estimation of the HSS image was formulated as a joint estimation of the hyperspectral dictionary and the sparse codes were based on the sparsity of the HSS data. However, the hyperspectral dictionary representing prototype reflectance spectra vectors were only learned from the input HS image and fixed, not having spatial information of the MS image as a reference nor updating in the subsequent iterations. Hence, the fusion performance depended a lot on the initial dictionary generation accuracy. Considering the low spatial resolution of hyperspectral image, obviously spectrum mixture existed, and a representative dictionary was hard to decompose directly. Incorporating sparse prior to the NMF model, Chen [36] proposed a sparse constraint nonnegative matrix factorization (SCNMF) and used the PAN image to sharpen the abundance matrix. By adding the sparse constraint in the fusion procedure, spectral information was better preserved along with the increase of computation complexity. The above sparsity promoted algorithms basically all utilized the global spatial-spectral correlation, and the local sparsity and correlation of objects distribution is not further mined. In addition, although HS and MS images are geometrically coregistered, subpixel misregistration still probably exists [37], and a random subpixel shift between the obtained HSS and MS datasets may also be introduced by the fusion procedures for the sake of the minimization of global residual. Ref. [38] declared that one of the main objectives of the fusion was that the spatial characteristic of the synthetic image should be as identical as possible to the image with the highest resolution. However, the synthetic image does not match the MS image exactly, and a local calibration is beneficial for dealing with the misregistration.
In this paper, a novel fusion approach based on local adaptive sparse unmixing and subpixel adaptive morphology calibration is proposed to obtain the HSS image. The main contributions of the paper are as follows: (1) introducing the local sparsity of abundance matrices as constraint conditions to the global optimal fusion algorithm of HS and MS images; (2) estimating the authentic endmembers for each pixel in the optimization procedure on the basis of the occurrence probabilities in its neighborhood to respect the local correlation of ground object distribution; and (3) proposing an adaptive morphology based subpixel calibration framework to enhance spatial matching and preserve the spectral composition simultaneously. The remainder of this paper is organized as follows: Section 2 elaborates the proposed approach for hyperspectral image resolution enhancement. We focus on the local adaptive sparse unmixing based fusion approach and the subpixel spatial calibration. Experiment results are reported in Section 3 using three hyperspectral datasets. Finally, discussions and conclusions are given in Sections 4 and 5.

Materials and Methods
In this section, we first introduce the experimental datasets, and then we elaborate on the proposed resolution enhancement algorithm.

Datasets
Three synthetic groups of hyperspectral and multispectral datasets generated from real airborne HS images have been used to test the proposed technique. To evaluate the effectiveness of the proposed method, various scenes taken from different types of sensors are selected. The first HS data are taken over Salinas Valley, California, USA by the Airborne Visible Infrared Imaging Spectrometer (AVIRIS). The scene contains 512 × 217 pixels with 224 spectral bands in the wavelength range from 400 nm to 2500 nm. The second HS data are taken over Washington D.C., USA by the Hyperspectral Digital Imagery Collection Experiment (HYDICE) sensor. These data are characterized by 210 bands in the spectral range from 400 nm to 2400 nm and the scene contains 1208 × 307 pixels. The last scene is collected in the area around Montana, USA by a HyMap instrument provided by Rochester Institute of Technology (RIT), Henrietta, NY, USA [39]. The 126 spectral channels range from 453.8 nm to 2496.3 nm with a scene of 240 × 240 pixels selected. We select 120 × 120 pixels size of image of the first data with 204 spectral bands by removing water absorption region: 108-112, 154-167, 224. For the second dataset, a spatial size of 240 × 240 with 191 spectral channels by omitting atmosphere opaque region 103-106, 138-148, 207-210 remain. Figure 1 shows the experimental datasets. The MS data are generated by spectrally downsampling of the HSS data corresponding to Landsat TM data, and the six bands cover a spectral range of 450-520, 520-600, 630-690, 760-900, 1550-1750 and 2080-2350 nm, respectively. The HS image is a spatially Gaussian downsampling of the HSS data with full-width at half maximum to a lower scale by a factor of 6. The downsampling factor is set as 6, which results in a sixfold difference in spatial resolution between two sensors, adopted and discussed in [15]. Considering that the spectral characters of synthetic images should be as identical as possible to the sensor with the highest spectral resolution [38], MS images are radiometrically calibrated to HS data according to the spectral degradation model and the parameters of the sensors.

Local Adaptive Sparse Unmixing Based Fusion Approach
To obtain the HSS image The 3D data cube are rearranged to a 2D matrix with each row representing all the pixels from a specific band, and each column representing the spectral signature at each pixel. b h and b m denote the numbers of spectral bands of HS and MS data, respectively. c h and r h denote the numbers of column and row pixels in HS data, respectively. c m and r m denote the numbers of column and row pixels in MS data, respectively.
Since the HS and MS images can be respectively considered as spatially and spectrally degraded versions of the HSS image, the sensor observation models that have been widely cited in [40][41][42] are given as: where G ∈ R (r m ×c m )×(r h ×c h ) denotes the spatial spread transform matrix, contains elements representing point spread function from MS to HS images, L ∈ R b m ×b h is the spectral response transform matrix that represents the spectral response function from HS to MS images. G and L can be a priori known according to the sensors parameters, when applying to real data, G is estimated by Gaussian point spread function, L can be derived by cross calibration [27], and n s and n r are residuals. A spectral unmixing technique is the decomposition of an image into a collection of constituent spectra called endmembers E, and a set of corresponding fractions called abundances A, and abundance maps indicate the proportion of individual endmember present in one pixel. With the advantages in physical effectiveness and mathematical simplicity, the linear spectral mixture model (LMM) [43] is the most commonly used model in solving the unmixing problems. It assumes that the spectrum of each pixel is a linear combination of spectral of several endmembers. Therefore, X HSS can be defined as: where E HS = [e HS1 , e HS2 , . . . , e HSd ] ∈ R b h ×d is the endmember matrix of HS image, {e HSi } d i=1 is the endmember spectrum and d is the total number of endmembers. The multispectral image abundance is the endmember abundance fractions at each pixel and (r m × c m ) is the total number of pixels. n represents the noise. By substituting LMM of X HSS defined by Equation (3) into the sensor observation models defined by Equations (1) and (2), A HS is the spatially degraded abundance matrix and can be derived as A HS = A MS G, and E MS is the spectrally degraded endmember matrix and can be derived as E MS = LE HS .
Unmixing based fusion techniques extract the endmembers from both data pairs and optimally calculate the high spatial resolution abundances accordingly. In previous works, such as CNMF and its derived methods, each pixel was regarded as the combination of all the endmembers according to LMM. To minimize the squared Frobenius norm of estimation residual matrices: F , endmembers and abundances were alternately calculated by the multiplicative update rules. However, the endmember matrix is over completed for most local areas, so the update procedure may also force the noise n s and n r approximately represented by the linear mixture of the endmembers to reduce the residual and generate local full rank abundance matrices. Meanwhile, the endmembers of ground objects with small areas are gradually neglected for global optimization. The regions may also be fitted by the collinear endmembers as an ill-posed problem [44]. These will all impair the representation space of the endmembers, the spectral-spatial feature of the reconstructed HSS image and the convergence efficiency.
To deal with the problems above, a local adaptive sparse unmixing fusion (LASUF) method is proposed. The proposed algorithm attempts to further preserve the spectral composition characteristics of HS image and the spatial feature of MS image, by introducing priori geological constraints to the iterative solution of the coupled unmixing. At first, each pixel in the remote sensing images has finite cover scope, and thus contains a few categories of ground features. This means that the desired spectrum of each pixel is supposed to be mixed by a small number of endmembers. Consequently, the corresponding abundance matrix should be sparse. Secondly, the local correlation of the ground surface is respected in our procedure. That is, the distribution of ground objects has continuity to some extent and some category of object is more likely to appear around its congeneric areas. We suppose the abundance of one pixel indicates the existence probability of the endmembers, but not their exact composing proportion considering the noise. Thereupon, the abundance matrix can be regarded as a Markov Random Field, where the probability of the pixel x containing the n th endmember completely depends on abundance distribution of neighborhood N x centered on x: where P(E 1 |E 2 ) represents the conditional probability of one event E 1 occurring given that E 2 is already known, and A x (n) is the abundance of nth endmember at the position of x.
The distribution probability of the endmembers can be calculated as P = A ⊗ W, ⊗ represents convolution operation, and P ∈ R d×(r×c) and W ∈ R w×w are a Gaussian window with width of w. P x is a vector in dimension of d × 1 with P x (i) representing the existence probability of the ith endmember at position x, and its sum is normalized to 1. Then, we construct a binary sparsification matrix S ∈ R d×(r×c) utilizing P to make A sparse. Considering that limited kinds of ground objects exist in one pixel, we suppose x is actually mixed by q x d endmembers with greater existence probability, and other components are the approximate expression of noise n in Equations (1) and (2). The sparsification matrix S is initialized as zero matrix and calculated as: where ε is the permissible error of unmixing reconstruction. When the sum of largest q x probabilities is higher than 1 − ε, the corresponding elements in S x are set to "1" and others remain as "0". The objective function of the optimization problem can be formulated as minimization of the global estimation residual constrained by the priori sparsity: The proposed LASUF algorithm is summarized in Algorithm 1. The endmembers and abundance matrix are initialized using the input HS and MS data by endmember extraction and fully constrained least squares (FCLS) method [45]. Considering the good robustness of unmixing iteration procedures to the initial endmember extraction result, which has been discussed in the previous work [32], various kinds of endmember extraction methods can be used including vertex component analysis (VCA), N-finder algorithm (N-FINDR), pure pixel index (PPI), etc. In addition, VCA [46] is adopted for all of the fusion methods for a fair comparison in our experiments, since it is a state-of-the-art endmember extraction method that does not require the presence of pure pixels in the image. Then, through introducing the sparse and local correlation constraint to each inter-optimization iteration of the unmixing procedure, the sparsification matrix of the MS image and HS image are calculated as: and Then, the local adaptive sparse abundance matrix is calculated as: Unlike the sparse method in [33][34][35], endmembers are updated in the iterative optimization utilizing local sparsity of ground objects distribution along with the spatial correlation between MS and HS images via Lee and Seung's multiplicative update rules [47], which are given as: where (·) T denotes the transposition of the matrix and * and ./ denote elementwise multiplication and division, respectively. In LASUF, we can gradually acquire the local adaptive optimal solution instead of global optimization. Therefore, the endmembers of small ground objects are harder to ignore, for their abundances in other areas are set to zero and barely contribute to the overall estimation residual. The endmembers matrix E HS and E MS become more expressive than the previous unmixing procedure consequently.  (7) and (8)

Subpixel Spatial Calibration Phase
The existing fusion approaches, including unmixing based methods, generally expect that the observed high spatial resolution MS and low spatial resolution HS images are geometrically coregistered with radiometric correction. That is, each pixel in HS spatially corresponds to (r m /r h ) × (c m /c h ) entire pixels in MS exactly, as shown in Figure 2a. Unfortunately, in practical application, the integer-pixel matching is difficult to achieve due to different sensors, platform, and atmospheric refraction conditions [48]. On one side, with the development of an unmanaged autonomous vehicle (UAV) technique, airborne sensors provide a major source of high-resolution remote sensing images. However, due to airstream and flight control, the airborne platform route cannot maintain adequate precision like a satellite. Consequently, subpixel misalignment may appear in some local areas as shown in Figure 2b. On the other side, except for the same multi-sensor platform, HS and MS images are generally captured in different time phases. Ground objects and atmospheric condition probably change as time goes by, such as movements of man-made cars or boats, evolution of vegetation, waves or coastlines. This may lead to the accidental subpixel shift as shown in Figure 2c and the anamorphic misregistration in Figure 2d. Even HS and MS images can be absolutely precisely calibrated, the fusion processes are likely to introduce some random subpixel shift between HSS and MS images. The existing fusion procedure attempts to minimize the squared norm of estimation residual matrices and obtain the global optimization, but ignores the local accuracy at the pixel level. In addition, fusion methods usually assume that the underlying surface is a plane and a diffuse reflect incident electromagnetic wave; therefore, it conforms to the Gaussian mixture model. While the hypothesis may not be entirely accurate, for instance, when there exists slope, specular reflection or ground features that have obvious absorption of some certain bands, mixed pixels in an HS image are not simply a Gaussian mixture of pure pixels. The unmixed result also has spatial bias. If these factors are considered in the unmixing model, the fusion algorithm complexity will be greatly increased. Considering that the MS image has more accurate and reliable spatial information, we expect to correct the HSS image on the subpixel level to make it have spatial features in accordance with the MS image and preserve its spectrum composition characteristics simultaneously. Hence, we design a subpixel calibration postprocessing to solve the above problem and further improve the fusion accuracy. In this section, an optimal matching based adaptive morphology filtering (OM-AMF) using MS data as the reference is proposed for subpixel spatial calibration of the HSS image. Traditional mathematical morphology can explore and calibrate specific morphological structure in a gray scale image utilizing a structuring element (SE) and sorting operator [49]. The erosion result of an image f (x) with an SE is the minimum value of the pixels in the scope of SE when its origin is at x: where ε SE (·) denotes the erosion operator, and b indicates the vectors from x to the pixels in SE. Equation (17) can be rewritten as an optimization form: whereb represents the position vector from x that minimizes the object function f (·).
Then, we extend the optimization form of morphology to the condition with reference data to match. The filtering result of optimal matching (OM) morphology can be defined as the pixel in the SE with the minimum mean square error (MSE) of the corresponding pixel in the reference image: where ψ SE [C, R] denotes the OM morphology operator of the input data C and the reference data R, and x c and x r are the corresponding positions in C and R; when C and R are cube data, MSE(C(x c + b),R(x r )) represents the mean square error of the vectors at x c + b and x r .b represents the position vector from x c that minimizes the MSE of C(x c + b) and R(x r ) in the scope of SE. We take advantage of the OM morphology and adaptive structuring element (ASE) to adjust the spatial structure of HSS data in accordance with the MS image. The flowchart is shown in Figure 3. At first, the subpixels of the HSS image, with the identical spectrum components, are constructed. We divide each pixel into k × k subpixels in space to get a dense hyperspectral image (DHS, X DHS ∈ R k 2 b h ×(r m ×c m ) ) as shown in Figure 3b. Then, each pixel in DHS is replaced by the average linear mixture of its k × k neighborhood to obtain a spatial overcomplete hyperspectral image (SHS, X SHS ∈ R k 2 b h ×(r m ×c m ) ), as shown in Figure 3c. One pixel in the SHS map contains the same spatial scope as the pixel in the MS image and is centered at the corresponding position of the subpixel in the DHS image. Meanwhile, the SHS map can also be linear represented by E HS . To preserve the spatial character of MS image, the edge constrained 4-neighbour growth is used to generate the ASE for an SHS image constraint by the edge of the MS image [50,51]. PCA transformation is executed on the MS image, and the edge map is detected by implementing a Sobel operator on the k × k super-resolution of the largest PCA component. Then, ASE is constructed for each pixel of the SHS image to reveal its homogenous region. Finally, the OM-AMF is executed to update the HSS image with the pixel in the ASE scope of SHS data, whose spectrum degeneration has the minimum MSE with the corresponding pixel in MS image: , (20) where z h is the position of the pixel to be operated in the HSS image and z s is the corresponding position in the SHS image; ASE is the adaptive structuring element generated via the edge of the MS image. For the spectral response transform matrix, L is the same as the fusion procedure and has been cross calibrated, and the overall radiation consistency of LX SHS and Y MS is receivable. Denoising process should be executed on the MS data firstly to reduce the risk of radiation deviation for the proposed subpixel calibration. Most fusion methods, including the proposed LASUF, provide overall or local optimal estimation from MS and HS images. Due to the low spatial resolution of HS data, they can hardly achieve the best spatial resolution on the level of high spatial resolution pixels. The better matched results may be found in the neighbor subpixels. Consequently, the proposed OM-AMF can improve the results of most fusion procedures even though the MS and HS images are completely matched in spatial dimension.

Experimental Setup
The proposed LASUF-OM method is applied to the above datasets. To demonstrate the effectiveness of the proposed method, four representative methods are compared as competitors, including MAP [19], CNMF [26], Bayesian [22], and NSSR [35]. The parameters of these methods were manually adjusted to the optimal. In CNMF and LASUF-OM, the maximum number of iterations for inner and outer loops are set to 200 and 3. The thresholds for the convergence condition are set at 10 −6 and their endmembers are initialized via VCA with the same number at 30 empirically referring to [15]. This number and the final performance depend on the scene complexity of the images in practical application. Particularly, for the proposed method, the permissible error of the local adaptive sparse process is set as ε = 0.1; the maximum radius ASE is 5, while, in MAP, six PCs, four endmembers, and 20 mixture classes are adopted. The regularization parameter of Bayesian is fixed as 0.0001. The number of atoms in the dictionary in NSSR is set as 80, and iteration numbers and convergence thresholds are set as in the reference. The processor Intel (R) Core (TM) i7 CPU 2.40 GHz RAM 12 GB is selected for all experiments.
In addition, the ideal estimated HSS image should preserve the spectral information of the HS image and the spatial information of the MS image, and both qualitative and quantitative assessments are adopted to overall evaluate the performance from those two aspects. Some typical bands and spectral profiles are presented as visual indices, and the quantitative measurements are evaluated between the generated HSS data and the original reference image, including the spatial measure of the peak signal-to-noise ratio (PSNR) index and cross correlation (CC) index, a spectral measure of the spectral angle mapper (SAM) index and a global measure of the erreur relative globale adimensionnelle de synthése (ERGAS) index [52]. The definition of PSNR is as follows: whereX k HSS denotes a single band of the estimated HSS image, and X k HSS denotes a single band of the reference HSS image. Max k is the maximum pixel value in the kth band image. The CC index characterizes the geometric distortion between estimated and reference data, the optimal value of CC is 1, and it is defined as: where µ (·) denotes the sample mean. SAM measures the spectral shape preservation by calculating the distance between two spectral vectors of two pixels, and it is defined as: where x andx denote the spectral vector of reference and estimated HSS images, respectively. x,x = x Tx is the inner product. The lower SAM means the lower spectral distortion and the optimal value is 0. ERGAS evaluates the global quality of the estimated HSS image and is defined as: where d is the spatial resolution ratio from HSS to HS images, and RMSE is the well-known root mean square error that computes spatial quality. By calculating each band, the spectral information quality is considered, and the optimal value is 0.

Integer Pixel Matched Fusion Experiment
At first, we assume that the MS and HS images are completely matched with no subpixel shift. The space enhancement ratio in OM-AMF postprocessing is set as k = 3. In Tables 1-3, the average statistical indexes of the PSNR, SAM, CC, and ERGAS values and the running time of all the tested methods are given for the three pairs of datasets. Globally, these tables show that the proposed LASUF-OM algorithm has remarkable results in both spectral and spatial perspectives.

From the Spectral Viewpoint
As indicated in Tables 1-3, the proposed LASUF method presents lower overall SAM values. After the postprocessing of OM-AMF, the SAM values are further reduced. It means that the OM-AMF is valuable even though no subpixel shift exists, as the fusion methods provide overall or local optimum estimation and better results may be found in the neighbor subpixels. Figures 4-6 show the corresponding SAM heat maps, which reveal spatial distribution of the spectral errors for each pixel. It can be observed that there are many small spots with warm color in the SAM maps of MAP, which indicate that greater spectral distortion exists in these reconstruction areas. For CNMF, these warm pixels mean that the spectrum of these small separate areas can hardly be perfectly expressed by the final endmembers of the iterative unmixing procedures. The average SAM value of Bayesian is high since the spectral physical property is not considered, and the warm color and block effect of the SAM map also prove that. Although NSSR has a relatively small average SAM value, the warmer areas of spatial structure details in the SAM map illustrate that the NSSR method is not suitable for regions with distinct spectral mixing. Nevertheless, the SAM maps of the LASUF and LASUF-OM are almost all blue and have barely warm areas. It demonstrates that the local adaptive sparsification can preserve the spectral composition of isolated ground objects, making use of the local correlation and sparsity of the ground features in limited areas. Our unmixing update procedure achieves local optimization adaptively and the impact from nonlocal or non-homogeneous regions for the global optimum are avoided by the sparse abundance matrix. In addition, in the OM-AMF calibration, the subpixels for selection are generated strictly according to the rule of LMM; hence, spectral signatures stay unchanged after subpixel calibration.

From the Spatial Viewpoint
To further analyze the relationship between the spatial fusion quality and the wavelength, the comparisons of PSNR and CC curves for each band of the HSS data are shown in Figures 7-9, and the wavelength distributions of the MS data are also marked in the PSNR curves. It can be seen that fusion results of all the methods are more ideal in the spectrum wavelength range that have overlapping parts between MS and HS images. Results of the Bayesian estimator have relatively poor sensitivity compared to other methods, while NSSR has great fusion results. In bands with no corresponding MS wavelength, the fusion effect of the HSS image falls sharply, and NSSR results in 1100-1400 nm perform the worst. The proposed method shows better results in the different wavelength regions. It can also be observed that the CC of our method is closest to the ideal value 1, and the slightest geometric distortion is introduced by the proposed method. In addition, the superiorities in the non-repetitive bands are even more obvious, which demonstrates better spectral enhancement capability of the proposed method. For visual comparison, Figures 10-12 show the grayscale images and their local details of the reference, low resolution HS images and the resolution enhanced results for MAP, CNMF, Bayesian, NSSR and the proposed algorithm, with the wavelength of 1541 nm, 1204 nm and 1190 nm, respectively, for Salinas, Washington D.C. and Montana data. It can be visibly perceived that, for the non-repetitive wavelength, the fusion results of the proposed LASUF-OM method are the closest to the reference data by visual inspection. The local details include that the grayscale and shape features are maintained well in our results.      In general, high spectral spatial resolution hyperspectral data obtained by the proposed LASUF-OM strategy maintains the spectral signature of hyperspectral image and embodies the spatial features of multispectral images. The global measure ERGAS index also emphasizes this fact.

Subpixel Shift Experiment
A subpixel shift experiment is designed to further verify the performance of the proposed OM-AMF based subpixel calibration algorithm. In addition, 1/3 × 1/3 and 1/2 × 1/2 of the high spatial resolution pixel shift are manually added into HS Washington D.C. data along the low right side to discuss the influence of subpixel shift to various fusion methods.
Statistical indexes of the fusion results on the two shift datasets obtained by MAP, CNMF, Bayesian, NSSR, LASUF and LASUF-OM with k = 3, k = 6 are given respectively in Tables 4 and 5. The SAM maps are shown in Figures 13 and 14. Compared with integer pixel matched fusion results in Table 2 and Figure 5, we can observe that, when subpixel shift exists, the fusion performances of each method become worse. Meanwhile, along with the degree of shifts increasing, results become even worse. Among the contrasting algorithms, MAP is the most sensitive to spatial alignment, for its fusion quality decreases the most. It can be seen in SAM maps that the areas with over 0.25 spectral angle amplifies evidently compared to Figure 5a, and are larger than results of other methods in Figures 13 and 14. While the results of the proposed LASUF drop slightly and the SAM maps in Figures 13c and 14c are similar to Figure 5c, which mean local adaptive optimal solution embodied more robustness than global optimization. Through the subpixel calibration phase, the fusion performances improve obviously. Comparing Table 2 with Tables 4 and 5, when k = 3, the improvement degree of the shifted data is superior to that of the completely matched data. In addition, with the decreasing of the subpixel measure in the OM-AMF, the fusion results are better. The LASUF-OM results of the subpixel shift experiment are superior to the CNMF results of matched data. In addition, when k = 6, the SAM and PSNR of LASUF-OM overmatch that of LASUF in Table 2. These all demonstrate that the OM-AMF based postprocessing could calibrate the subpixel shift effectively.

Endmembers Collinearlity
To experimentally verify the improvement of the endmembers obtained by our method, we propose an evaluation index Er ∈ R d×1 to further quantitatively measure the endmember collinearity. First, arbitrary endmember e k in matrix E is approximately linearly expressed by other endmembers in least squares estimator as: where r k ∈ R 1×b h is the residual error using other endmembers to linear express e k . Then, vector Er representing the discrete degree of each endmember linearly expressed by other endmembers is calculated as: The mean value mean(Er) and minimum value min(Er) of Er are together used to describe the non-collinearity of the endmembers matrix, the larger those two values are, the more difficult they are to be linear expressed by other endmembers, and the lower collinearity degree of the endmember matrix.
Experiments are executed on the final endmember matrices of the CNMF method and the proposed LASUF method to quantitatively compare the endmembers' expressiveness. The evaluation indexes are calculated for the three datasets and shown in Table 6. It is evident that endmembers obtained by the proposed LASUF are much harder to be linearly expressed by the others than CNMF. The results further illustrate that a local adaptive sparse procedure can improve endmembers' expressiveness.

Computational Efficiency
LASUF can not only improve the fusion performance but also promote the computational efficiency. The computation burden is huge in CNMF. In per-iteration of the multiplicative update procedure, plenty of multiply-add operations are executed. Taking the E MS update procedure as an example, Equation (15) needs 2b m (r m c m ) 2 d + b m d 2 (r m c m ) + 3b m d multiply-add operations, and its computation complexity is O(b m (r m c m ) 2 d). Although in our unmixing algorithm additional computations, including Equations (9), (10) and (12) are executed to generate sparse abundance matrix A S MS , their computation complexities are all no more than O(d(r m c m )), which are negligible compared with Equations (15) and (16). With a sparse matrix A S HS or A S MS , the actual computation cost of each of the multiplicative update reduces significantly. Furthermore, the numbers of the inter iterations can also be reduced by the proposed method. Falling in the local minimum is another one of the crucial factors, which affects the convergence rate when solving the optimization problem. In our algorithm, the straightforward sparse operation for the abundance matrix provides an extra opportunity to escape the local minimum in each iteration. Furthermore, the local adaptive sparsification makes the abundance of the nonexistent ground objects converge to "0" more efficiently. Hence, the convergence rate of LASUF is improved and the holistic computing efficiency of the unmixing fusion is increased.
The running time in Tables 1-3 reveal that the computational efficiency of LASUF increases 3-4 times that of CNMF. This is because the computation cost of the multiplicative update reduces significantly due to the abundance matrix sparse operation in each iteration. In addition, the sparsification process has much lower computational cost. Although the computation complexity increases when OM-AMF is executed, the running time of LASUF-OM is still about half of CNMF.

Conclusions
In this paper, we propose a novel fusion method for high spatial resolution MS images and low spatial resolution HS images integration based on local adaptive sparse unmixing and subpixel calibration. Considering the spectrum composition characteristics and the spatial local correlation of ground objects distribution in spectral images, the proposed algorithm makes the abundance matrix sparse according to probability distribution of the endmembers in the neighborhood, thereby achieving the local optimal unmixing automatically. Then, to correct the subpixel shift between HS and MS images, an optimal matching adaptive morphology filtering based subpixel calibration algorithm is proposed. Experimental results show that the resolution enhanced hyperspectral image obtained by the proposed LASUF-OM method has high spectral and spatial fidelities. The proposed algorithm is easy for implementation and combinatorial application. The local adaptive sparsification can be utilized in various kinds of unmixing based fusion to improve their endmembers expressiveness and the convergence efficiency. The actual computational efficiency of the unmixing processes can also be distinctly enhanced by the sparse abundance matrices. The OM-AMF procedure can not only calibrate the original subpixel misregistration and the random shift introduced by fusion, but also preserve the spectral composition features of HSS images.