Revealing Implicit Assumptions of the Component Substitution Pansharpening Methods

: The component substitution (CS) pansharpening methods have been developed for almost three decades and have become better understood recently by generalizing them into one framework. However, few studies focus on the statistical assumptions implicit in the CS methods. This paper reveals their implicit statistical assumptions from a Bayesian data fusion framework and suggests best practices for histogram matching of the panchromatic image to the intensity image, a weighted summation of the multispectral images, to better satisfy these assumptions. The purpose of histogram matching was found to make the difference between the high-resolution panchromatic and intensity images as small as possible, as one implicit assumption claims their negligible difference. The statistical relationship between the high-resolution panchromatic and intensity images and the relationship between their corresponding low-resolution images are the same, as long as the low resolution panchromatic image is derived by considering the modulation transfer functions of the multispectral sensors. Hence, the histogram-matching equation should be derived from the low-resolution panchromatic and intensity images, but not derived from the high-resolution panchromatic and expanded low-resolution intensity images. Experiments using three example CS methods, each using the two different histogram-matching equations, was conducted on the four-band QuickBird and eight-band WorldView-2 top-of-atmosphere reﬂectance data. The results veriﬁed the best practices and showed that the histogram-matching equation derived from the high-resolution panchromatic and expanded low-resolution intensity images provides more-blurred histogram-matched panchromatic image and, hence less-sharpened pansharpened images than that derived from the low-resolution image pair. The usefulness of the assumptions revealed in this study for method developers is discussed. For example, the CS methods can be improved by satisfying the assumptions better, e.g., classifying the images into homogenous areas before pansharpening, and by changing the assumptions to be more general to address their deﬁciencies.


Introduction
Remotely-sensed images have exhibited explosive growth trends in multi-sensor, multi-temporal, and multi-resolution characteristics. However, there are contradictions between the resolution limitations of current remote sensing systems and the increasing need for high-spatial, high-temporal, and high-spectrum resolutions of satellite images [1][2][3]. One limitation is the spectral and spatial resolution tradeoff, e.g., more than 70% of current optical earth observation satellites simultaneously collect low spatial resolution (LR) multispectral and high spatial resolution (HR) panchromatic images.

Concepts and Methodology
From a Bayesian perspective, images are represented in vector form (symbolized by a letter in bold and italics) and operations on images in matrix form (symbolized by a letter in bold). Consider the LR and HR images have n and N pixels, where N = r 2 n and r is the spatial resolution ratio. The desired HR multispectral image with k spectral bands in band-interleaved-by-pixel lexicographical notation is denoted as a vector, M = [M 1 , M 2 , . . . , M N ] T , where M j = [M 1 j , M 2 j , . . . , M k j ] is the spectrum at the spatial location j (j = 1, 2, . . . , N) and M k j is the kth multispectral band pixel value at the spatial location j. The HR panchromatic image is denoted as a vector with N elements, P = [P 1 , P 2 , . . . , P N ] T , where P j is the panchromatic pixel value at the spatial location j. In the CS methods: where I is the HR intensity image vector with N elements, B is a weight coefficient matrix with N × Nk elements, M is the HR multispectral image vector with Nk elements, and α is a bias coefficient vector with N elements. The corresponding LR versions of M, P, and I are denoted as m, p, and i, respectively. The expanded images of m, p, and i having the same spatial scale as P are denoted as m, p, and i, respectively. Similar to M j , m j = [ m 1 j , m 2 j , . . . , m k j ] is the spectrum at location j.M andM j denote the pansharpened image and pansharpened jth pixel spectrum, respectively.

Bayesian Fusion Framework
Bayesian data fusion treats the LR multispectral image m, HR panchromatic image P and HR multispectral image M as random vectors, and the solution can be derived by maximizing the conditional probability density function prob(M|P,m) [26][27][28][29][30]. Applying the Bayes rule:

prob(M|P, m) = prob(M)prob(P|M) prob(m|M)/prob(P, m)
( where prob(x|y) is the conditional probability density function of x given y, prob(M) is a prior probability density function of the vector M, prob(P,m) is the probability density function of the co-occurrence of vector P and m. Since prob(P,m) is not a function of M, we have:

prob(M|P, m) ∝ prob(M)prob(P|M) prob(m|M)
Thus, the key to solve Equation (3) is to find the explicit expressions of the three probability density functions, prob(M), prob(P|M), and prob(m|M).

Assumption 1.
To derive prob(M), the difference vector between the HR and expanded LR multispectral image vectors, m − M, is assumed to be a Gaussian vector with zero mean, then: where N and k are the numbers of HR multispectral pixels and bands, respectively, C M is the covariance matrix of the vector ( m − M) with Nk × Nk elements, and m is the expanded multispectral image from LR multispectral image m.

Assumption 2.
To derive prob(P|M), the spectral mismatch image P − I is assumed to be a Gaussian vector with zero mean, then: where N is the number of HR multispectral pixels, C e is the covariance matrix of vector P − I with N × N elements, and P and I are the HR panchromatic and intensity image vectors. Assumption 3. The contribution of the term prob(m|M), which can be used to guarantee that the pansharpened imagesM are spectrally consistent to the observed LR multispectral images m, is negligible. It should be noted that this assumption is necessary to derive the CS methods formulation. It implies that CS methods are not strictly spectrally consistent, which is verified in the Section 3.3 and discussed in the Section 4.1 Combining Equations (3)-(5) and neglecting the term prob(m|M), the closed-form solutionM is [25,28] where m is the expanded LR multispectral image, C M is the covariance matrix of the vector ( m − M) first introduced in Equation (4), B is the weight coefficient matrix first introduced in Equation (1), C e is the covariance matrix of vector P − I first introduced in Equation (5), and ĩ is the expanded intensity image: where α is the bias coefficient vector first introduced in Equation (1).

Assumption 4.
To make this solution doable, all pixels are assumed to share the common weight and bias coefficients. Hence: where α is the bias coefficient vector with N elements first introduced in Equation (1) and α is the common bias coefficient value.
where B is the weight coefficient matrix first introduced in Equation (1) and is a special type of block matrix derived as the direct sum (diag operator) of N duplicate vector β = [β 1 , β 2 , . . . , β k ], 0 is a k-element zero vector, β 1 , β 2 , . . . , β k are the common spectral band weight coefficients, and N is the number of HR pixels.

Assumption 5.
A further assumption is that all the pixel values in the two difference vectors ( m − M) and (P − I) are independent and identically distributed (i.i.d.). In such a way that: where diag is the matrix direct sum operation as defined in Equation (9), σ 2 e is the variance of all the pixels in image (P − I) and C s is a spectral band covariance matrix with k × k elements: where cov(x,y) is the covariance of two vectors x and y, M k = [M k 1 , M k 2 , . . . , M k N ] represents the kth HR multispectral band image vector with N elements with M k j being the kth multispectral band pixel value at the HR spatial location j. Assumption 6. The LR panchromatic image p is assumed being derived from the HR panchromatic P in the same way that m derived from M. This means that the MTF of the multispectral sensors reflecting the relationship between the LR and HR multispectral image (m and M) must be considered to derive p from P. Consequently, due to the assumption of the independent and identical distribution of HR pixels (Assumption 5) in P and M, the LR pixels in p and m are also independent and identically distributed and have the same distributions with their corresponding HR pixels. Hence, σ 2 e can be estimated from (p − i) and cov(M k , M k ) from m, i.e.: where n is the number of the LR multispectral pixels, and p j and i j are the jth LR panchromatic and intensity image pixel values, respectively. and: cov(M x , M y ) = cov(m x , m y ) x = 1, 2, . . . , k; y = 1, 2, . . . , k where m k = [m k 1 , m k 2 , . . . , m k n ] represents the kth LR multispectral band image vector with n elements with m k j being the kth multispectral band pixel value at the LR spatial location j, and k is the number of multispectral spectral bands. (6) can then be spatially decomposed into pixel level equations:M j = m j + g T P j − i˜j (15)

The large equation system in Equation
whereM j and m j are the pansharpened and expanded multispectral spectra at location j, respectively, g is a gain coefficient vector with k elements, the super script T means the transpose of a vector, P j and ĩ j are the HR panchromatic and expanded intensity values at location j, respectively, and definitions of C s , β, and σ 2 e refer to the previous Equations (10), (12), and (13).

CS Methods from a Bayesian Perspective
Alternative Assumption 2. Equation (15) can be shrunk into the GS-based methods by assuming there is no spectral mismatch, i.e., σ 2 e = 0 in Assumption 2 and the HR panchromatic image P is perfectly matched with the intensity image I. This is based on the linear transformation and sum properties of the covariance: g T GS = cov m 1 , i /σ 2 i , cov m 2 , i /σ 2 i , . . . , cov m k , i /σ 2 i = C s β T /βC s β T (17) where g GS is the GS gain coefficient vector defined in Table 1 in [19], σ 2 i is the variance of the intensity image i and cov(m x , i) is the covariance between the xth band LR multispectral image m x and the LR intensity image i.
For the GS method, the weight and the gain coefficients satisfy: This property has been mentioned in [33]. From Table 1 in [19], it is easy to prove that this equation is satisfied by the IHS, generalized IHS (GIHS) and Brovey methods. The PCA method also satisfies this equation as the PCA gain and weights coefficients are the same and from a column of a unitary matrix created by PCA.

Best Practices in Histogram Matching
The spectral difference between the HR intensity I and panchromatic P images, i.e., σ 2 e value, should be minimized to satisfy the no spectral mismatch assumption (Alternative Assumption 2) as much as possible. Several methods have been introduced to minimize σ 2 e including histogram matching of the panchromatic image P to the intensity image I and deriving the best intensity image with weight coefficients derived from the multivariate regression between the multispectral and panchromatic images [19]. The conception of histogram matching is to make a virtual observed panchromatic image, P hist , that is statistically (mean and standard deviation) similar to the intensity image I. However, the target image that the observed panchromatic image P should be histogram matched to, i.e., the HR intensity image I, is unavailable. Recall the independent and identical distribution assumption (Assumption 5) and the HR image pair histogram matching equation can be derived from their LR image pair (Equations (13) and (14)). Consequently, the pixel value in P is histogram-matched using the equation derived from the LR panchromatic p and intensity image i: where P hist (P→I) and P hist (p→i) are the histogram-matched HR panchromatic images using the equations from the LR image pair and the HR image pair, respectively, and mean and cov represent the mean and covariance operations, respectively. Histogram matching directly from P to ĩ, P hist (P→ĩ), is not proper as the statistical relationship between the HR panchromatic image P and intensity image I is not the same as that between the HR panchromatic image P and the expanded intensity image ĩ: where P hist (P→I) has been defined in Equation (19), P hist (P→ĩ) is the histogram-matched HR panchromatic image using the equation from HR panchromatic image P and expanded intensity image ĩ, mean and cov represent the mean and covariance operations, respectively. Although the mean values of p and P are the same, their variance values could be largely different due to the scale difference [34,35]. The difference between P and ĩ includes not only the residual σ 2 i , but also the spatial details to be injected, P − ĩ = (P − p) + ( p − ĩ), where P − p is the spatial details to be injected and ( p − ĩ) can be interpreted as the spectral mismatch (residual σ 2 i ) between the panchromatic and intensity images.
Due to the Assumption 6, the LR panchromatic image p should be derived from the HR panchromatic image P in the same way that the LR multispectral image m derived from the HR multispectral image M (i.e., taking care of the multispectral sensor modulation transfer functions).
The two histogram matching equations (Equations (19) and (20)) were compared for the generalized IHS (GIHS), GS, and adaptive GS (GSA) [19] pansharpening all of the four multispectral bands of the QuickBird rural images and all of the eight multispectral bands of the two WorldView-2 urban images. These three CS methods are selected as they are commonly used in the literature [19,20] and have been proved with better performance than the other CS methods [19,37]. The GSA method is the same as the GS method except that the intensity image is synthesized by using the weight coefficients derived using regression between the multispectral and the degraded LR panchromatic images rather than using equal weight coefficients for each multispectral band image [19]. During the histogram matching and GSA regression coefficients derivation, the LR panchromatic image is derived by degrading panchromatic image by mimicking the multispectral sensor MTF so that the LR panchromatic and multispectral images are comparable (Assumption 6).       The QuickBird and WorldView-2 panchromatic and multispectral bands' top-of-atmosphere (TOA) reflectance, derived from DN values using the coefficients provided in the metadata and method provided by [36], were used in the pansharpening experiments to reduce incoming solar irradiance and calibration gain variations among different bands in the DN values.
We conducted pansharpening at both reduced and full scales. At the reduced scale, the panchromatic and multispectral images are first degraded by four before pansharpening so that the original multispectral data could be used as references for the pansharpening result evaluation. Sensor modulation transfer functions (MTF) are taken into account during the degradation. The sensor MTF can be matched using Gaussian filters with parameters tuned using the values of the amplitude response at the Nyquist frequency depicting the MTF, commonly provided by the manufacturer as a sensor specification [8,17]. They are 0.35 and 0.27 for the first seven and the eighth multispectral bands for the WorldView-2 sensor, and 0.34, 0.32, 0.30, and 0.22 for the four QuickBird multispectral bands. Their corresponding Gaussian filters implemented in Matlab codes are provided by Vivone [8] available at http://openremotesensing.net/index.php/codes/11-pansharpening/2-pansharpening.
The two histogram matching equations (Equations (19) and (20)) were compared for the generalized IHS (GIHS), GS, and adaptive GS (GSA) [19] pansharpening all of the four multispectral bands of the QuickBird rural images and all of the eight multispectral bands of the two WorldView-2 urban images. These three CS methods are selected as they are commonly used in the literature [19,20] and have been proved with better performance than the other CS methods [19,37]. The GSA method is the same as the GS method except that the intensity image is synthesized by using the weight coefficients derived using regression between the multispectral and the degraded LR panchromatic images rather than using equal weight coefficients for each multispectral band image [19]. During the histogram matching and GSA regression coefficients derivation, the LR panchromatic image is derived by degrading panchromatic image by mimicking the multispectral sensor MTF so that the LR panchromatic and multispectral images are comparable (Assumption 6).

Quantitative Evaluation of the Experimental Results
The consistency and synthesis properties of the Wald's protocol [38] were used as validation strategies. The consistency property has been proved effective in [38] and was used to validate the full scale experiments. To check the consistency property, the pansharpening images were first degraded using the sensor specific MTF as described in Section 3.1. The synthesis property can only be applied to the reduced scale experiment. Only the consistency property has been evaluated in the full scale experiments and other full scale evaluation methods [39], such as quality without reference (QNR) [35], were not used as the main purpose of this study is to check the spectral consistency of the pansharpened images induced by different strategies in the histogram matching. Three quantitative scores were used: (i) the first is the ERGAS index, from its French name "relative dimensionless global error in synthesis", is a global radiometric value error index [10]. The best ERGAS value is zero; (ii) the second is the spectral angle mapper (SAM), which calculates the angle between two spectral vectors at each pixel and is averaged over the test image-the best value is zero; and (iii) the third ones are related to universal quality index (Q) index. For the four-band QuickBird image the Q4 index proposed by [40] was used, which is an extension of the universal quality index suitable for four-band multispectral images. For the eight-band WorldView-2 image, Q2 n [41] was used, which is an extension of the Q4 index for any number of spectral bands. These two indices are sensitive to both correlation loss and spectral distortion between two multispectral images, and allow both spatial and spectral distortions to be combined in a unique parameter. The best value is one, with a range from zero to one. In this study, Q4 and Q2 n were both calculated on 32 × 32 pixel blocks as suggested by the authors who proposed them.
In order to give a clear picture on the difference between the panchromatic and intensity images (i.e., the spectral mismatch), the σ 2 e values from Equation (13) are also calculated.

Results
Tables 1-3 show quantitative scores for the synthesis property at the reduced scale (columns on the left side of the tables) and for the consistency property at the full scale (columns on the right side of the tables) for the QuickBird rural (Table 1) and WorldView-2 urban 1 ( Table 2) and 2 (Table 3)

datasets.
σ 2 e is also tabulated in all of the tables. 'EXP' in these tables represents the plain expanded resampling of the multispectral dataset at the scale of the panchromatic, i.e., m. All of the CS pansharpening methods have relatively low σ 2 e values (less than 0.02 in top-of-atmosphere reflectance units) indicating that the histogram matching effectively reduce the radiometric difference between the intensity and panchromatic images. This number is smaller than the top of atmosphere reflectance noise effect induced by the atmospheric scattering and by the sun-sensor-target geometry [42]. The GIHS and GS methods have the same σ 2 e values as they have the same intensity image derived using an equal weight coefficient for each multispectral band image. The GSA method produced smaller σ 2 e values than the GS method as the design of the GSA has minimized the spectral mismatch between the LR intensity image i and LR panchromatic image p [19]. The σ 2 e values are greater for the full scale pansharpening than that for the reduced scale pansharpening for all of the same methods and the same datasets. This could be because, at the full scale experiments, there is still some mismatch between the real multispectral sensor MTF and the MTF used for degrading the panchromatic image (i.e., MTF provided by the manufacturer) due to the on-orbit sensor degradation. To obtain a more reliable sensor MTF, on-orbit estimation of the sensor point spread function is needed [43]. Table 1. Average quality scores of the pansharpened four-band QuickBird rural dataset (Figure 1). The scores shown in bold indicate the better metric between the two results produced by the same pansharpening method (GIHS, GS, or GSA), but using two different histogram equations (P hist (p→i) and P hist (P→ĩ)).  Table 2. Average quality scores of the pansharpened eight-band WorldView-2 urban 1 dataset (Figure 2). The scores shown in bold indicate the better metric between the two results produced by the same pansharpening method (GIHS, GS, or GSA), but using two different histogram equations (P hist (p→i) and P hist (P→ĩ)).  Table 3. Average quality scores of the pansharpened eight-band WorldView-2 urban 2 dataset (Figure 3). The scores shown in bold indicate the better metric between the two results produced by the same pansharpening method (GIHS, GS, or GSA), but using two different histogram equations (P hist (p→i) and P hist (P→ĩ)). Comparing the pansharpened results using the histogram-matched panchromatic image with Equation (19) (P hist (p→i)) and with Equation (20) (P hist (P→ĩ)), the P hist (p→i) in Equation (19) always performs better for all of the quantitative scores for the WorldView-2 urban dataset and has smaller σ 2 e values. There are a few exceptions where P hist (P→ĩ) in Equation (20) performs slightly better (e.g., the SAM metric of the synthesis property the GSA method) in the QuickBird rural dataset. This experimentally confirmed our analysis that the histogram-matched equation should be derived from the LR intensity and panchromatic image pairs. The SAM exceptions in the QuickBird dataset could be because the SAM metric is more robust to noise when the vector dimension it measures is larger. The QuickBird images only have four multispectral bands, which make the SAM metric sensitive to the radiometric distortion in the multispectral bands. For example, a small improper injection on the blue band of the QuickBird image, due to the less overlapping between the blue and panchromatic bands, could cause large errors in the SAM values.  Figure 3b). This is reasonable as the purpose of histogram matching is to adjust the mean and variance of the original image (i.e., LR or HR panchromatic images, p or P, in this case) to the same as the target image (i.e., LR panchromatic intensity image i or its expanded version ĩ in this case). The P hist (P→ĩ) has less spatial detail because it has a lower variance since the LR panchromatic image p has a lower variance compared to the HR panchromatic image P [34,35]. Comparing the two different pansharpened images and original multispectral images show clearly less sharpening detail in the QuickBird rural dataset and slight spectral distortion in the two WorldView-2 urban datasets for the pansharpened images using the directly-histogram-matched panchromatic band P hist (P→ĩ) in Equation (20).

The Usefulness of the Revealed Statistical Assumptions
It should be noted that the statistical assumptions are not proposed by the authors. However, they are revealed by us to better understand the implicit assumptions that are made by the CS methods. This is useful for the CS method development, e.g., as illustrated in this study for the best histogram matching method suggestion. Some other possible uses for the method development are discussed below, which requires further research.
One assumption is that the spectral consistency term prob(m|M) is neglected in Equation (2), which results in the pansharpened images not being strictly spectrally consistent. This has been shown in Tables 1-3 where no pansharpened images are perfectly spectrally consistent with the original multispectral images. Facilitated by the flexibility of the extension of the Bayesian data fusion framework, studies have tried to add this term at the price of a more complicated solution [25]. This is because the point spread function (i.e., MTF) used in the spectral consistency term will make spatially decomposing the large equation system of Equation (6) into pixel level equation, such as Equation (15), impossible [25]. A detailed analysis of the contribution of this term is illustrated in [25].
Another assumption is that the difference between the LR and HR multispectral image vectors ( m − M) is a Gaussian vector with zero mean. Perhaps other models of distribution, such as multimodal distribution, are more suitable to represent this difference. Within the Bayesian framework and the deduction process in this paper, one can just replace the Gaussian distribution with a novel multimodal distribution and derive its solution to design new CS methods.
Another assumption is that all the pixels in the difference vector between the LR and HR multispectral images ( m − M) and in the difference vector between the HR panchromatic and intensity images (P − I) are independent and identically distributed. Apparently, due to the spatial heterogeneity nature of the earth surface, the validity of this assumption is really subject to the extent and location of the study area. However, restricting this assumption in a spatial homogenous area is always a better option. To this aim, some authors have improved CS methods by restricting this assumption in a local sliding window [44][45][46], in a group of homogenous pixels after image classification [47,48], or paying attention to the mixed pixels [49].

Time Complexity of the Suggested Histogram Matching Method
The computational efficiency of the different pansharpening methods using the two different histogram-matching equations was measured by counting the run time of each method for each study image at the reduced scale experiments. The run time measurements excluded the image reading and writing operations. All of the code was written in the Matlab language and run using an Intel ® Xeon ® Processor X3450 (4 cores, 8 M Cache, 2.66 GHz) CPU with 8.00 GB of memory and the 64-bit Windows 7 operating system. Table 4 shows the run time of each method for all three datasets. The benchmark plain expanded/resampling algorithm is computationally the most efficient, with run times of less than 0.6 s for all three datasets. The additional step of derivation of the LR panchromatic images for the suggested histogram matching is not very time consuming for the GIHS and CS methods. The GIHS and CS methods using the suggested histogram matching are only 17-27% more than those using the histogram-matching equation provided by the HR panchromatic and resampled LR intensity images. This can be mitigated by the development of hardware computation capabilities or by using advanced paralleling algorithms [50]. Furthermore, the run time of the GSA method using the suggested histogram matching is slightly less than those using the histogram matching with the equation provided by the HR panchromatic and resampled LR intensity images. This is because (i) the GSA method needs the LR panchromatic image for the optimal weights determination in intensity image derivation no matter what histogram matching is used; and (ii) using LR images in Equation (19) with n pixels saves some time, as opposed to using the HR images in Equation (20) with N pixels (16 times greater than n in our study) for histogram matching.

Conclusions
In this paper, the implicit statistical assumptions of the CS pansharpening methods are revealed by interpreting them from the Bayesian data fusion framework. Best practices for histogram matching of the HR panchromatic image to the intensity image are suggested to better satisfy the implicit assumptions. The HR panchromatic image should be histogram matched using the equation derived from the LR panchromatic and intensity images instead of using equations derived from the HR panchromatic and expanded LR intensity images. This is because (i) one assumption suggests the negligible difference between the HR intensity and panchromatic images; and (ii) the relationship between the HR intensity and panchromatic images is comparable to the relationship between the LR intensity and panchromatic images provided the LR panchromatic image is derived from the HR panchromatic image by considering the multispectral sensor modulation transfer functions. We tested the two different histogram-matching equations using the GIHS, GS, and adaptive GS (GSA) methods on both QuickBird and WorldView-2 top-of-atmosphere reflectance images and proved that the suggested histogram-matching equation is effective. The usefulness of the statistical assumptions revealed in this study for method developers is discussed. For example, none of the CS methods can produce pansharpened images spectrally consistent to the LR multispectral images since the implicit Assumption 1 omits the spectral consistency term. Classifying the images into homogenous areas before pansharpening can make Assumption 5 be better satisfied, i.e., all of the pixels in the difference vector between the LR and HR multispectral images ( m − M) and in the difference vector between the HR panchromatic and intensity images (P − I) are independent and identically distributed.