Fusing Multiple Multiband Images

We consider the problem of fusing an arbitrary number of multiband, i.e., panchromatic, multispectral, or hyperspectral, images belonging to the same scene. We use the well-known forward observation and linear mixture models with Gaussian perturbations to formulate the maximum-likelihood estimator of the endmember abundance matrix of the fused image. We calculate the Fisher information matrix for this estimator and examine the conditions for the uniqueness of the estimator. We use a vector total-variation penalty term together with nonnegativity and sum-to-one constraints on the endmember abundances to regularize the derived maximum-likelihood estimation problem. The regularization facilitates exploiting the prior knowledge that natural images are mostly composed of piecewise smooth regions with limited abrupt changes, i.e., edges, as well as coping with potential ill-posedness of the fusion problem. We solve the resultant convex optimization problem using the alternating direction method of multipliers. We utilize the circular convolution theorem in conjunction with the fast Fourier transform to alleviate the computational complexity of the proposed algorithm. Experiments with multiband images constructed from real hyperspectral datasets reveal the superior performance of the proposed algorithm in comparison with the state-of-the-art algorithms, which need to be used in tandem to fuse more than two multiband images.


I. INTRODUCTION
HE WEALTH of spectroscopic information provided by hyperspectral images containing hundreds or even thousands of contiguous bands can immensely benefit many remote sensing and computer vision applications, such as object recognition [1], change detection [2], material classification [3], and spectral unmixing [4], commonly encountered in environmental monitoring, resource location, weather or natural disaster forecasting, etc. Therefore, finely-resolved hyperspectral images are in great demand [5]- [8]. However, limitations in light intensity as well as efficiency of the current sensors impose an inexorable trade-off between the spatial resolution, spectral sensitivity, and the signal-to-noise ratio (SNR) of existing spectral imagers [9]. As a results, typical spectral imaging systems can capture multiband images of high spatial resolution at a small number of spectral bands or multiband images of high spectral resolution with a reduced spatial resolution. For example, imaging devices onboard Pleiades or IKONOS satellites 1 provide single-band panchromatic images with spatial resolutions of less than a meter and multispectral images with a few bands and spatial  R. Arablouei is with the Commonwealth Scientific and Industrial Research Organisation (CSIRO), Pullenvale QLD 4069, Australia (email: reza.arablouei@csiro.au).
resolutions of a few meters while NASA's Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) 2 provides hyperspectral images with more than two hundred bands but with a spatial resolution of several ten meters.
One way to surmount the abovementioned technological limitation of acquiring high-resolution hyperspectral images is to capture multiple multiband images of the same scene with practical spatial and spectral resolutions, then fuse them together in a synergistic manner. Fusing multiband images combines their complementary information obtained through multiple sensors that may have different spatial and spectral resolutions and cover different spectral ranges.
Initial multiband image fusion algorithms were developed to fuse a panchromatic image with a multispectral image and the associated inverse problem was dubbed pansharpening [10]- [13]. Most of the pansharpening algorithms are based on either of the two popular pansharpening strategies: component substitution (CS) and multiresolution analysis (MRA). The CSbased algorithms substitute a component of the multispectral image obtained through a suitable transformation by the panchromatic image. The MRA-based algorithm inject the spatial detail of the panchromatic image obtained by a multiscale decomposition into the multispectral image. There also exist hybrid methods that use both CS and MRA. Some of the algorithms originally proposed for pansharpening have been successfully extended to be used for fusing a panchromatic image with a hyperspectral image, a problem that is called hyperspectral pansharpening [13].
Recently, significant research effort has been expended to solve the problem of fusing a multispectral image with a hyperspectral one. This inverse problem is essentially different from the pansharpening and hyperspectral pansharpening problems since a multispectral image has multiple bands that are intricately related to the bands of its corresponding hyperspectral image. Unlike a panchromatic image that contains only one band of reflectance data usually covering parts of the visible and near-infrared spectral ranges, a multispectral image contains multiple bands each covering a smaller spectral range, some being in the shortwave-infrared (SWIR) region. Therefore, extending the pansharpening techniques so that they can be used to inject the spatial details of a multispectral image into a hyperspectral image is not straightforward.
In some works on multispectral-hyperspectral image fusion, it is assumed that each pixel on the hyperspectral image, which has a lower spatial resolution that the target image, is the average of the pixels of the same area on the target image [14]- [18]. Clearly, the size of this area depends on the downsampling 1 http://www.satimagingcorp.com/satellite-sensors/ 2 http://aviris.jpl.nasa.gov/data/free_data.html Reza Arablouei T ratio. Based on this pixel-aggregation assumption, one can divide the problem of fusing two multiband images into subproblems dealing with smaller blocks and hence significantly decrease the complexity of the overall process. However, it is more realistic to allow the area on the target image corresponding to a pixel of the hyperspectral image to span as many pixels as determined by the point-spread function of the sensor, which induces spatial blurring. The downsampling ratio generally depends on the physical and optical characteristics of a sensor and is usually fixed. Therefore, spatial blurring and downsampling can be expressed as two separate linear operations. The spectral degradation of a panchromatic or multispectral image with respect to the target image can also be modeled as a linear transformation. Articulating the spatial and spectral degradations in terms of linear operations forms a realistic and convenient forward observation model to relate the observed multiband images to the target image.
Hyperspectral image data is generally known to have a lowrank structure and reside in a subspace that usually has a dimension much smaller than the number of the spectral bands [4], [19]- [21]. This is mainly due to correlations among the spectral bands and the fact that the spectrum of each pixel can often be represented as a linear combination of a relatively few spectral signatures. These signatures, called endmembers, may be the spectra of the material present at the scene. Consequently, a hyperspectral image can be linearly decomposed into its constituent endmembers and the fractional abundances of the endmembers for each pixel. This linear decomposition is called spectral unmixing and the corresponding data model is called the linear mixture model. Other linear decompositions that can be used to reduce the dimensionality of a hyperspectral image in the spectral domain are dictionary-learning-based sparse representation and principle-component analysis.
Many recent works on multiband image fusion, which mostly deal with fusing a multispectral image with a hyperspectral image of the same scene, employ the abovementioned forward observation model and a form of linear spectral decomposition. They mostly extract the endmembers or the spectral dictionary from the hyperspectral image. Some of the works use the extracted endmember or dictionary matrix to reconstruct the multispectral image via sparse regression and calculate the endmember abundances or the representation coefficients [22]. Others cast the multiband image fusion problem as reconstructing a high-spatial-resolution hyperspectral datacube from two datacubes degraded according to the mentioned forward observation model. When the number of spectral bands in the multispectral image is smaller than the number of endmembers or dictionary atoms, the linear inverse problem associated with the multispectral-hyperspectral fusion problem is ill-posed and needs be regularized to have a meaningful solution. Any prior knowledge about the target image can be used for regularization. Natural images are known to mostly consist of smooth segments with few abrupt changes corresponding to the edges and object boundaries [23]- [25]. Therefore, penalizing the total-variation [26]- [28] and sparse (low-rank) representation in the spatial domain [29]- [32] are two popular approaches to regularizing the multiband image fusion problems. Some algorithms, developed within the framework of the Bayesian estimation, incorporate the prior knowledge or conjecture about the probability distribution of the target image into the fusion problem [33]- [35]. The work of [36] obviates the need for regularization by dividing the observed multiband images into small spatial patches for spectral unmixing and fusion under the assumption that the target image is locally low-rank.
When the endmembers or dictionary atoms are induced from an observed hyperspectral image, the problem of fusing the hyperspectral image with a multispectral image boils down to estimating the endmember abundances or representation coefficients of the target image, a problem that is often tractable (due to being a convex optimization problem) and has a manageable size and complexity. The estimate of the target image is then obtained by mixing the induced endmembers/dictionary and the estimated abundances/coefficients. It is also possible to jointly estimate the endmembers/dictionary and the abundances/coefficients from the available multiband data. This joint estimation problem is usually formulated as a non-convex optimization problem of nonnegative matrix factorization, which can be solved approximately using block coordinate-descent iterations [37]- [40].
To the best of our knowledge, all existing multiband image fusion algorithms are designed to fuse a pair of multiband images with complementary spatial and spectral resolutions. Therefore, fusing more than two multiband images using the existing algorithms can only be realized by performing a hierarchical procedure that combines multiple fusion processes possibly implemented via different algorithms as, for example, in [58] and [59]. In addition, there are potentially various ways to arrange the pairings and often it is not possible to know beforehand which way will provide the best overall fusion result. For instance, in order to fuse a panchromatic, a multispectral, and a hyperspectral image of a scene, one can first fuse the panchromatic and multispectral images, then fuse the resultant pansharpened multispectral image with the hyperspectral image. Another way would be to first fuse the multispectral and hyperspectral images, then pansharpen the resultant hyperspectral image with the panchromatic image. Apart from the said ambiguity of choice, such combined pairwise fusions can be slow and inaccurate since they may require several runs of different algorithms and may suffer from propagation and accumulation of errors. Therefore, the increasing availability of multiband images with complementary characteristics captured by modern spectral imaging devices has brought about the demand for efficient and accurate fusion techniques that can handle multiple multiband images simultaneously.
In this paper, we propose an algorithm that can simultaneously fuse an arbitrary number of multiband images. We utilize the forward observation and linear mixture models to effectively model the data and reduce the dimensionality of the problem. Assuming matrix normal distribution for the observation noise, we derive the likelihood function as well as the Fisher information matrix (FIM) associated with the problem of recovering the endmember abundance matrix of the target image from the observations. We study the properties of the FIM and the conditions for existence of a unique maximumlikelihood estimate and the associated Cramer-Rao lower bound. We regularize the problem of maximum-likelihood estimation of the endmember abundances by adding a vector total-variation penalty term to the cost function and constraining the abundances to be nonnegative and add up to one for each pixel. The total-variation penalty serves two major purposes. First, it helps us cope with the likely ill-posedness of the maximum-likelihood estimation problem. Second, it allows us to take into account the spatial characteristics of natural images that is they mostly consist of piecewise plane regions with few sharp variations. Regularization with a vector totalvariation penalty can effectively advocate this desired feature by promoting sparsity in the image gradient, i.e., local differences between adjacent pixels, while encourages the local differences to be spatially aligned across different bands [25]. The nonnegativity and sum-to-one constraints on the endmember abundances ensure that the abundances have practical values. They also implicitly promote sparsity in the estimated endmember abundances.
We solve the resultant constrained optimization problem using the alternating direction method of multipliers (ADMM) [41]- [46]. Simulation results show that the proposed algorithm outperforms several combinations of the state-of-the-art algorithms, which need be cascaded to carry out fusion of multiple (more than two) multiband images.

A. Forward observation model
Let us denote the target multiband image by ∈ ℝ × where is the number of spectral bands and is the number of pixels in the image. We wish to recover from observed multiband images ∈ ℝ × , = 1, … , , that are spatially or spectrally downgraded/degraded versions of . We assume that these multiband images are geometrically co-registered and are related to via the following forward observation model where ≤ and = / 2 with being the spatial downsampling ratio of the th image; ∈ ℝ × is the spectral response of the sensor producing ; ∈ ℝ × is a band-independent spatial blurring matrix that represents a two-dimensional convolution with a blur kernel corresponding to the point-spread function of the sensor producing ; ∈ ℝ × is a sparse matrix with ones and zeros elsewhere that implements a two-dimensional uniform downsampling of ratio on both spatial dimensions and satisfies ⊤ = ; ∈ ℝ × is an additive perturbation representing the noise or error associated with the observation of . We assume that the perturbations , = 1, … , , are independent of each other and have matrix normal distributions expressed by where × is the × zero matrix, is the × identity matrix, and ∈ ℝ × is a diagonal matrix that represents the correlation among rows of , which correspond to different spectral bands. Note that we consider the columncovariance matrices to be identity assuming that the perturbations are independent and identically-distributed in the spatial domain. However, by considering diagonal rowcovariance matrices, we assume that the perturbations are independent in the spectral domain but may have non-identical variances at different bands. Note that , = 1, … , , in (1) contain the corrected (preprocessed) spectral values, not the raw measurements produced by the spectral imagers. The pre-processing usually involves several steps including radiometric calibration, geometric correction, and atmospheric compensation [47]. The radiometric calibration is generally performed to obtain radiance values at the sensor. The reflected sunlight passing through the atmosphere is partially absorbed and scattered through a complex interaction between the light and various parts of the atmosphere. The atmospheric compensation counters these effects and converts the radiance values into ground-leaving radiance or surface reflectance values. To obtain accurate reflectance values, one additionally has to account for the effects of the viewing geometry and sun's position as well as the surfaces structural and optical properties [5]. This pre-processing is particularly important when the multiband images to be fused are acquired via different instruments, from different viewpoints, or at different times. After the pre-processing, the images should also be coregistered.

B. Linear mixture model
Under some mild assumptions, multiband images of natural scenes can be suitably described by a linear mixture model [4]. Specifically, the spectrum of each pixel can often be written as a linear mixture of a few archetypal spectral signatures known as endmembers. The number of endmembers, denoted by , is usually much smaller than the spectral dimension of a hyperspectral image, i.e, ≪ . Therefore, if we arrange endmembers corresponding to as columns of the matrix ∈ ℝ × , we can factorize as where ∈ ℝ × is the matrix of endmember abundances and ∈ ℝ × is a perturbation matrix that accounts for any possible inaccuracy or mismatch in the linear mixture mode. We assume that is independent of , = 1, … , , and has a matrix normal distribution as where ∈ ℝ × is its row-covariance matrix. Every column of contains the fractional abundances of the endmembers at a pixel. The fractional abundances are nonnegative and often assumed to add up to one for each pixel. The linear mixture model stated above has been widely used in various contexts and applications concerning multiband, particularly hyperspectral, images. Its popularity can mostly be attributed to its intuitiveness as well as relative simplicity and ease of implementation. However, there are a few caveats regarding this model that should be kept in mind. First, in (3) corresponds to a matrix of corrected (pre-processed) values, not raw ones that would typically be captured by a spectral imager of the same spatial and spectral resolutions. However, whether these values are radiance or reflectance has no impact on the validity of the model, though it certainly matters for further processing of the data. Second, the model (3) does not necessarily require each endmember to be the spectral signature of only one (pure) material. An endmember may be composed of the spectral signatures of multiple materials or may be seen as the spectral signature of a composite material made of several constituent materials. Additionally, depending on the application, the endmembers may be purposely defined in particular subjective ways. Third, in practice, an endmember may have slightly different spectral manifestations at different parts of a scene due to variable illumination, environmental, atmospheric, or temporal conditions. This so-called endmember variability [48] along with possible nonlinearities in the actual underlying mixing process [49] may introduce inaccuracies or inconsistencies in the linear mixture model and consequently in the endmember extraction or spectral unmixing techniques that rely on this model. Moreover, the sum-to-one assumption on the abundances of each pixel may not always hold, especially, when the linear mixture model is not able to account for every material in a pixel possibly because of the effects of endmember variability or nonlinear mixing.

C. Fusion model
Substituting (3) into (1) gives where the aggregate perturbation of the th image is ̌= + .
Instead of estimating the target multiband image directly, we consider estimating its abundance matrix from the observations , = 1, . . , given the endmember matrix . We can then obtain an estimate of the target image by multiplying the estimated abundance matrix by the endmember matrix. This way, we reduce the dimensionality of the fusion problem and consequently the associated computational burden. In addition, by estimating first, we attain an unmixed fused image obviating the need to perform additional unmixing, if demanded by any application utilizing the fused image. However, this approach requires the prior knowledge of the endmember matrix . The columns of this matrix can be selected from a library of known spectral signatures, such as the U.S. Geological Survey digital spectral library 3 , or extracted from the observed multiband images that have the appropriate spectral dimension.
Since and have independent matrix normal distributions [see (2) and (4)], has a multivariate normal distribution expressed as where stands for the × 1 vector of zeroes. Using the approximation ⊤ ⊤ ≈ with > 0, we get where = + ⊤ .
In view of (6) and (7), we have Hence, the probability density function of parametrized over the unknown can be written as Since the perturbations , = 1, … , , are independent of each other, the joint probability density function of the observations is written as Accordingly, the maximum-likelihood estimate of is found by solving the following optimization problem ̂= argmax ( | 1 , … , ) This problem can be stated in terms of = vec −1 { } as ̂= argmin The Fisher information matrix (FIM) of the maximumlikelihood estimator ̂ in (8) is calculated as where ( ) denotes the Hessian, i.e., the Jacobian of the gradient, of the log-likelihood function ( | 1 , … , ). The entry on the th row and the th column of ( ) is computed as where and denote the th and th entries of , respectively. Accordingly, we can show that If is invertible, the optimization problem (8) has a unique solution given by and the Cramer-Rao lower bound for the estimator ̂, which is a lower bound on the covariance of ̂, is the inverse of . The FIM is guaranteed to be invertible when, for at least one image, the matrix ⊤ ⊤ ⨂ ⊤ ⊤ −1 is full-rank. The matrix ⊤ has a rank of hence for > 1 is rankdeficient. The blurring matrix does not change the rank of the matrix that it multiplies from the right. In addition, as −1 is full-rank, ⊤ ⊤ −1 has a full rank of when the rows of are at least as many as its columns, i.e., ≥ . Therefore, and consequently is guaranteed to be uniquely identifiable given , = 1, … , , only when at least one observed image, say the th image, has full spatial resolution, i.e., = , with the number of its spectral bands being equal to or larger than the number of endmembers, i.e., ≥ , so that, at least for the th image, ⊤ ⊤ ⨂ ⊤ ⊤ −1 is full-rank.
In practice, it is rarely possible to satisfy the abovementioned requirements as multiband images with high spectral resolution are generally spatially downsampled and the number of bands of the ones with full spatial resolution, such as panchromatic or multispectral images, is often less than the number of endmembers. Hence, the inverse problem of recovering from , = 1, … , , is usually ill-posed or illconditioned. Thus, some prior knowledge need be injected into the estimation process to produce a unique and reliable estimate. The prior knowledge is intended to partially compensate for the information lost in spectral and spatial downsampling and usually stems from experimental evidence or common facts that may induce certain analytical properties or constraints. The prior information is commonly incorporated into the problem in the form of imposed constraints or additive regularization terms. Examples of prior knowledge about that are regularly used in the literature are nonnegativity and sumto-one constraints, matrix normal distribution with known or estimated parameters [33], sparse representation with a learned or known dictionary or basis [29], and minimal total variation [26].

B. Regularization
To develop an algorithm for effective fusion of multiple multiband images with arbitrary spatial and spectral resolutions, we employ two mechanisms to regularize the maximum-log-likelihood cost function in (9).
As the first regularization mechanism, we impose a constraint on such that its entries are nonnegative and sum to one in all columns. We express this constraint as ≥ 0 and ⊤ = ⊤ where ≥ 0 means all the entries of are greater than or equal to zero. As the second regularization mechanism, we add an isotropic vector total-variation penalty term, denoted by ‖∇ ‖ 2,1 , to the cost function. Here, ‖•‖ 2,1 is the ℓ 2,1 -norm operator that returns the sum of ℓ 2 -norms of all the columns of its matrix argument. In addition, we define ∇ = [ ℎ ] ∈ ℝ 2 × where ℎ and are discrete differential matrix operators that, respectively, yield the horizontal and vertical first-order backward differences (gradients) of the row-vectorized image that they multiply from the right. Consequently, we formulate our regularized optimization problem for estimating as subject to: ≥ 0 and ⊤ = ⊤ where α ≥ 0 is the regularization parameter. The nonnegativity and sum-to-one constraints on , which force the columns of to reside on the unit ( − 1)-simplex, are naturally expected and help find a solution that is physically plausible. In addition, they implicitly induce sparseness in the solution. The total-variation penalty promotes solutions with a sparse gradient, a property that is known to be possessed by images of most natural scenes as they are usually made of piecewise homogeneous regions with few sudden changes at object boundaries or edges. Note that the subspace spanned by the endmembers is the one that the target image lives in. Therefore, through the total-variation regularization of the abundance matrix , we regularize indirectly.

IV. ALGORITHM
Defining the set of values for that satisfy the nonnegativity and sum-to-one constraints as = { | ≥ 0, ⊤ = ⊤ } (11) and making use of the indicator function ( ) defined as we rewrite (10) as
Using the ADMM, we minimize the augmented Lagrangian function in an iterative fashion. At each iteration, we alternate the minimization with respect to the main unknown variable and the auxiliary variables; then, we update the scaled Lagrange multipliers. Hence, we compute the iterates as where superscript ( ) denotes the value of an iterate at iteration number ≥ 0. We repeat the iterations until convergence is reached up to a maximum allowed number of iterations.
Since we define the auxiliary variables independent of each other, the minimization of the augmented Lagrangian function (14) with respect to the auxiliary variables can be realized separately. Thus, (16) is equivalent to

B. Solutions of subproblems
Considering (14), (15) can be written as Calculating the gradient of the cost function in (20) with respect to and setting it to zero gives where, for the convenience of presentation, we define 1 ( −1) and 2 ( −1) as To make the computation of ( ) in (21) more efficient, we assume that the two-dimensional convolutions represented by , = 1, … , , are cyclic. In addition, we assume that the differential matrix operators ℎ and apply with periodic boundaries. Consequently, multiplications by ⊤ , ℎ ⊤ , and ⊤ as well as by (∑ ⊤ =1 + ℎ ℎ ⊤ + ⊤ + ) −1 can be performed through the use of the fast Fourier transform (FFT) algorithm and the circular convolution theorem. This theorem states that the Fourier transform of a circular convolution is the pointwise product of the Fourier transforms, i.e., a circular convolution can be expressed as the inverse Fourier transform of the product of the individual spectra [50].
Equating the gradient of the cost function in (17) with respect to to zero results in Multiplying both sides of (22) from the right by the masking matrix = ⊤ and its complement − yields (23) and respectively. Note that we have ⊤ = and is idempotent, i.e., = . Summing both sides of (23) and (24) gives the solution of (17) for = 1, … , as The terms ( ⊤ ⊤ −1 + ) −1 and ⊤ ⊤ −1 ⊤ do not change during the iterations and can be precomputed.
The subproblem (18) can be decomposed pixelwise and its solution is linked to the so-called Moreau proximity operator of the ℓ 2,1 -norm given by column-wise vector-soft-thresholding [51], [52]. If we define ( ) = ∇ ( ) − ( −1) , the th column of ( ) , denoted by ( ) , is given in terms of the th column of ( ) , denoted by ( ) , as The solution of (19) is the value of the proximity operator of the indicator function ( ) at the point ( ) − ( −1) , which is the projection of ( ) − ( −1) onto the set defined by (11). Therefore, we have where Π {•} denotes the projection onto . We implement this projection onto the unit ( − 1)-simplex employing the algorithm proposed in [53].
We present a summary of the proposed algorithm in Table I where The function ( ) is closed, proper, and convex as it is a sum of closed, proper, and convex functions and has full column rank. Therefore, according to theorem 8 of [42], if (25) has a solution, the proposed algorithm converges to this solution, regardless of the initial values as long as the penalty parameter is positive. If no solution exists, at least one of ( ) and ( ) will diverge.

V. SIMULATIONS
To examine the performance of the proposed algorithm in comparison with the state-of-the-art, we consider the problem of fusing three multiband images, viz. a panchromatic image, a multispectral image, and a hyperspectral image. To create the multiband images used in our experiments, we use five publicly available hyperspectral datasets, namely, Botswana 4 , Indian Pines [54], Washington DC Mall 5 , Moffett Field 6 , and Kennedy Space Center 4 . The Botswana dataset has been collected by the Hyperion sensor aboard the Earth Observing 1 (EO-1) satellite, the Washington DC Mall dataset by the airborne-mounted Hyperspectral Digital Imagery Collection Experiment (HYDICE), and the Indian Pines, Moffett Filed, and Kennedy Space Center datasets by the NASA Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) instrument. We crop the original datasets to the spatial resolutions given in Table II. The datasets cover the visible near-infrared (VNIR) and shortwavelength infrared (SWIR) ranges with uncalibrated, noisy, and water-absorption bands removed. The spectral resolution of each dataset is also given in Table II. The data as well as the MATLAB code used to produce the results of this paper can be found at [55].
We generate three multiband images (panchromatic, multispectral, and hyperspectral) using each dataset and use the original dataset as the ground truth (reference image) for evaluating the fusion performance. We obtain the hyperspectral image by applying a Gaussian blur filter with a kernel size of 5 × 5 and a variance of 1.28 to the reference image followed by downsampling with a ratio of 4 in both horizontal and vertical directions for all bands. For the multispectral image, we use a Gaussian blur filter with a kernel size of 3 × 3 and a variance of 0.64 and downsampling with a ratio of 2 in both horizontal and vertical directions for all bands of the reference image. Afterwards, we downgrade the resultant image spectrally by applying the spectral responses of the Landsat 8 multispectral sensor. This sensor has eight multispectral bands and one panchromatic band. Fig. 1 depicts the spectral responses of all the bands of this sensor 7 . We create the panchromatic image from the reference image using the panchromatic band of the Landsat 8 sensor without applying any spatial blurring or downsampling. We add zero-mean Gaussian white noise to each band of the produced multiband images such that the band-specific signal-to-noise ratio (SNR) is 30 dB for the multispectral and hyperspectral images and 40 dB for the panchromatic image.
The current multiband image fusion algorithms published in the literature are designed to fuse two images at a time. In order to compare the performance of the proposed algorithm with the state-of-the-art, we consider fusing the abovementioned three multiband images in three different ways, which we refer to as Pan + HS, Pan + (MS + HS), and (Pan + MS) + HS, using the existing algorithms for pansharpening, hyperspectral pansharpening, and hyperspectral-multispectral fusion. In Pan + HS, we only fuse the panchromatic and hyperspectral images. In Pan + (MS + HS), and (Pan + MS) + HS, we fuse the given images in two cascading stages. In Pan + (MS + HS), first, we fuse the multispectral and hyperspectral images. Then, we fuse the resultant hyperspectral image with the panchromatic image. We use the same algorithm at both stages, albeit with different parameter values. In (Pan + MS) + HS, we first fuse the panchromatic image with the multispectral one. Then, we fuse the pansharpened multispectral image with the hyperspectral image. We use two different algorithms at each of the two stages resulting in four combined solutions.
For pansharpening, which is the fusion of a panchromatic image with a multispectral one, we use two algorithms called the band-dependent spatial detail (BDSD) [56] and the modulation-transfer-function generalized Laplacian pyramid with high-pass modulation (MTF-GLP-HPM) [57]. The BDSD algorithm belongs to the class of component substitution methods and the MTF-GLP-HPM algorithm falls into the category of multiresolution analysis. In [10], where several pansharpening algorithms are studied, it is shown that the BDSD and MTF-GLP-HPM algorithms exhibit the best performance among all the considered ones.
For fusing a panchromatic or multispectral image with a hyperspectral image, we use two algorithms proposed in [26] and [60], [61], which are called HySure and R-FUSE-TV, respectively. These algorithms are based on total-variation regularization and are among the best performing and most efficient hyperspectral pansharpening and multispectralhyperspectral fusion algorithms currently available [13], [62].
We use three performance metrics for assessing the quality of a fused image with respect to its reference image. The metrics are the relative dimensionless global error in synthesis (ERGAS) 8 [63], spectral angle mapper (SAM) [64], and 2 [65]. The metric 2 is a generalization of the universal image quality index (UIQI) proposed in [66] that extends it to hyperspectral images by making use of the concept of hypercomplex numbers.
We extract the endmembers (columns of ) from the hyperspectral image using the vertex component analysis (VCA) algorithm [67]. The VCA is a fast unsupervised unmixing algorithm that assumes the endmembers as the vertices of a simplex encompassing the hyperspectral data cloud. We utilize the SUnSAL algorithm [68] together with the extracted endmembers to unmix the hyperspectral image and obtain its abundance matrix. Then, we upscale the resulting matrix by a factor of four and apply two-dimensional spline interpolation on each of its rows (abundance bands) to generate the initial estimate for the abundance matrix (0) . We initialize the proposed algorithm as well as the HySure and R-FUSE-TV algorithms by this matrix.
To make our comparisons fair, we tune the values of the parameters in the HySure and R-FUSE-TV algorithms to yield the best possible performance for each dataset considered. In addition, in order to use the BDSD and MTF-GLP-HPM algorithms to their best potential, we provide them with the true point-spread functions, i.e., the blurring kernels used to generate the multiband images.
Apart from the number of endmembers, which can be estimated using, for example, the HySime algorithm [20], the proposed algorithm has two tunable parameters, the totalvariation regularization parameter α and the ADMM penalty parameter . The automatic tuning of the values of these parameters is an interesting and challenging subject. There are a number of strategies that can be employed such as those proposed in [69] and [70]. We found through experimentations that although the value of impacts the convergence speed of the proposed algorithm, as long as it is within an appropriate range, it has little influence on the accuracy of the proposed algorithm. Therefore, we set it to = 1.5 × 10 3 in our experiments with all considered datasets. The value of α affects the performance of the proposed algorithm in subtle ways as shown in Fig. 2 where we plot the performance metrics, ERGAS, SAM, and 2 , against α for the Botswana and Washington DC Mall datasets. The results in Fig. 2 suggest that, for different values of α, there is a trade-off between the performance metrics, specifically, ERGAS and 2 on one side and SAM on the other. Therefore, we tune the value of α for each dataset only roughly to obtain a reasonable set of values for all three performance metrics. We give the values of α used in the proposed algorithm with different datasets in Table II.
In Table III, we give the values of the performance metrics to assess the quality of the images fused using the proposed algorithm and the considered benchmarks for all considered datasets. We provide the performance metrics for the case of considering only the bands within the spectrum of the panchromatic image as well as the case of considering all bands, i.e., the entire spectrum of the reference image. We also give the time taken by each algorithm to produce the fused images 9 . According to the results in Table III, the proposed algorithm significantly outperforms the considered benchmarks. It is also evident from the required processing times that the computational (time) complexity of the proposed algorithm is lower than those of its contenders.
In Figs. 3 and 4, we plot the sorted per-pixel normalized root mean-square error (NRMSE) values of the proposed algorithm and the best performing algorithm(s) from each of the Pan + HS, Pan + (MS + HS), and (Pan + MS) + HS categories for all considered datasets. Fig. 3 corresponds to the case of considering only the spectrum of the panchromatic image and Fig. 4 to the case of considering the entire spectrum. We define the per-pixel NRMSE as where and ̂ are the th column of the reference image and the fused image ̂, respectively. We sort the NRMSE values in the ascending order.
In Fig. 5, we show the panchromatic, multispectral, and hyperspectral images generated from the considered datasets and used for the fusion as well as the reference images. We also show the fused images yielded by the proposed algorithm and Pan + (MS + HS) fusion using the HySure algorithm, which generally performs better than the other considered benchmarks. The multispectral images are depicted using their red, green, and blue bands. The RGB representations of the hyperspectral images are rendered through transforming the spectral data to the CIE XYZ color space and then transforming the XYZ values to the sRGB color space. From visual inspection of the reference and fused images shown in Fig. 5, it is observed that the images fused by the proposed algorithm match their corresponding reference images better than the ones produced by the Pan + (MS + HS) fusion using the HySure algorithm do.

VI. CONCLUSION
We proposed a new image fusion algorithm that can simultaneously fuse multiple multiband images. We utilized the well-known forward observation model together with the linear mixture model to cast the fusion problem as a reduceddimension linear inverse problem. We used a vector totalvariation penalty as well as nonnegativity and sum-to-one constraints on the endmember abundances to regularize the associated maximum-likelihood estimation problem. The regularization encourages the estimated fused image to have low rank with a sparse representation in the spectral domain while preserving the edges and discontinuities in the spatial domain. We solved the regularized problem using the alternating direction method of multipliers. We demonstrated the advantages of the proposed algorithm in comparison with the state-of-the-art via experiments with five real hyperspectral image datasets.     5. The panchromatic, multispectral, and hyperspectral images that are fused together, the reference hyperspectral image, and the fused images produced by the proposed algorithm and the Pan + (MS + HS) method using the HySure algorithm.