Fusing China GF-5 Hyperspectral Data with GF-1, GF-2 and Sentinel-2A Multispectral Data: Which Methods Should Be Used?

: The China GaoFen-5 (GF-5) satellite sensor, which was launched in 2018, collects hyperspectral data with 330 spectral bands, a 30 m spatial resolution, and 60 km swath width. Its competitive advantages compared to other on-orbit or planned sensors are its number of bands, spectral resolution, and swath width. Unfortunately, its applications may be undermined by its relatively low spatial resolution. Therefore, the data fusion of GF-5 with high spatial resolution multispectral data is required to further enhance its spatial resolution while preserving its spectral ﬁdelity. This paper conducted a comprehensive evaluation study of fusing GF-5 hyperspectral data with three typical multispectral data sources (i.e., GF-1, GF-2 and Sentinel-2A (S2A)), based on quantitative metrics, classiﬁcation accuracy, and computational e ﬃ ciency. Datasets on three study areas of China were utilized to design numerous experiments, and the performances of nine state-of-the-art fusion methods were compared. Experimental results show that LANARAS (this method was proposed by lanaras et al.), Adaptive Gram–Schmidt (GSA), and modulation transfer function (MTF)-generalized Laplacian pyramid (GLP) methods are more suitable for fusing GF-5 with GF-1 data, MTF-GLP and GSA methods are recommended for fusing GF-5 with GF-2 data, and GSA and smoothing ﬁltered-based intensity modulation (SFIM) can be used to fuse GF-5 with S2A data.


Introduction
Hyperspectral imaging sensors generally collect more than 100 spectral bands with a wavelength range within 400-2500 nm. Because of their high spectral resolution, hyperspectral data have achieved widespread applications in numerous research fields, such as in the fine classification of ground objects. In recent years, hyperspectral remote sensing has developed rapidly. For example, Italy launched the PRecursore IperSpettrale della Missione Applicativa (PRISMA) earth observation satellite in March 2019 [1], Japan launched the Hyperspectral Imager Suite (HISUI) hyperspectral satellite sensor in 2019 [2], India launched the ISRO's Hyperspectral Imaging Satellite (HYSIS) hyperspectral satellite in 2018 [3], and Germany launched the DLR Earth Sensing Imaging Spectrometer (DESIS) hyperspectral satellite in 2018 and are planning to launch the Environmental Mapping and Analysis Program (EnMAP) hyperspectral satellite in 2020 [4,5]. The development of the hyperspectral satellite field will bring about new requirements for image processing.
The China GaoFen-5 (GF-5) satellite was launched on 9 May 2018, and one of its six main payloads is an advanced hyperspectral (HS) imager developed by the Shanghai Institute of Technical Physics Table 1. Main parameters of on-orbit and recently launched spaceborne hyperspectral sensors. GF-5: GaoFen-5; VNIR: visible/near-infrared; SWIR: short-wave infrared 1 30 30 Similar to other spaceborne HS sensors, the GF-5 has a relatively low spatial resolution compared to its high spectral resolution. The reason for that is the inevitable tradeoff between spatial resolution, spectral resolution and signal-to-noise ratio in the design of optical remote sensing systems [10][11][12][13]. This limits some specific applications of GF-5 HS data, since geographical elements or ground objects have a spatial width of less than 30 m; e.g., inland rivers and urban roads. In this case, seeking help from image fusion methods becomes an essential solution [14][15][16]. The image fusion methods can fuse the GF-5 HS data with either panchromatic (PAN) or multispectral (MS) remote sensing images to improve its spatial resolution while preserving the fidelity of its spectrum [17]. In recent years, a large number of state-of-the-art HS and MS fusion methods have emerged and achieved good results. Therefore, we focus here on fusing GF-5 with MS data.
Current HS and MS fusion methods can be roughly divided into four categories [17][18][19][20]: component substitution (CS), multiresolution analysis (MRA), subspace-based methods, and color mapping-based methods. CS-based methods are classical image fusion approaches based on the projection transformation, with typical examples of intensity-hue-saturation (IHS) [21], principal component analysis (PCA) [22], and Gram-Schmidt (GS) [23]. MRA-based methods originate from multi-resolution analysis, and they enhance the spatial resolution of HS data by injecting detailed information of MS data into the resampled HS data; e.g., the "à trous" wavelet transform (ATWT) [24] and decimated wavelet transform (DWT) [25]. Subspace-based methods find a common subspace of both input images, and they generally enhance the spatial resolution of HS data using machine learning. They mainly include unmixing-based algorithms such as sparse spatial-spectral representation [26], Bayesian-based algorithms such as fast fusion based on Sylvester equation (FUSE) [27], and deep learning methods such as the two-branch convolutional neural network (Two-CNN-Fu) [19]. Color mapping-based methods promote the spatial resolution of HS data by color mapping; e.g., hybrid color mapping (HCM). Using the above popular methods, Loncan et al. [20] made a performance comparison of different pansharpening methods. Yokoya et al. [18] investigated the behaviors of several fusion methods on HS and MS data. The above study provides us with a good direction for fusing HS data with MS data. However, they only implemented simulated HS data, and therefore the conclusion is limited in realistic applications. In detail, different hardware designs between GF-5 HS imagers, different ratios of spatial resolution, effects of real acquisition geometry and even slight cloud contaminations will definitely complicate the image fusion procedure.
In this paper, we will investigate the issue of image fusion on GF-5 HS data and typical MS data; i.e., from GF-1, GF-2 and Sentinel-2A (S2A). We chose these MS images because of their popularity and ease of access. Other MS data from industrial satellites (e.g., worldview-3 or Spot) can also be considered in combination with our work by interested readers. We test nine state-of-the-art image fusion methods and present a comprehensive evaluation framework to evaluate their fusion performance, including aspects of their quantitative measures, classification behaviors and computation efficiency measures. Moreover, we make comparison experiments with the GF-5 HS data on three study areas in China. The objective of our paper is two-fold: 1) to propose a comprehensive framework to evaluate the fusion performance of GF-5 HS data and MS data; and 2) to find appropriate fusion methods for fusing GF-5 HS data with GF-1, GF-2, and S2A images, respectively.
Compared against other works, the proposed evaluation framework for fusing hyperspectral images is more comprehensive and thoughtful. In addition to the usual assessment of spectral distortion, the adopted evaluation metrics quantify the spatial distortions of fused images with a high frequency correlation coefficient and also include an evaluation of the application performance (e.g., classification) and computational time. Moreover, our paper investigates the fusion performance of China GF-5 satellite hyperspectral data on real images rather than simulated data.
The paper is arranged as follows. Section 2 describes the sensors and data. Section 3 details the tested fusion methods and evaluation metrics. Section 4 presents the experimental results of different fusion methods for different datasets and MS sensors. Section 5 presents the discussion on the performance of fusion methods. Section 6 draws conclusions on the implemented data and suggests the appropriate methods for each MS sensor.

Data
In this section, we briefly describe the characteristics of the four used sensors (i.e., GF-5 HS sensor, GF-1, GF-2 and sentinel-2A MS sensors; Table 2 lists the main parameters), the principal required preprocessing steps, and the datasets organized for the implemented experiments.

GF-5 Spaceborne Hyperspectral Sensor
The GF-5 HS imager adopts convex grating spectrophotometry and has an elevated performance with a three-concentric-mirror configuration. It includes two spectrometers and provides 330 spectral bands (from 400 nm to 2500 nm) with 30 m spatial resolution (http://data.cresda.com). The VNIR spectrometer has a spectral resolution of 5 nm, and the SWIR spectrometer has a spectral resolution of 10 nm. Operating in a pushbroom fashion, the GF-5 sensor has a swath width of 60 km. Moreover, it optimizes the acquisition and processing techniques of weak signals, and the signal-to-noise ratio (SNR) reaches 700 and 500 in VNIR and SWIR bands, respectively. By removing 50 severe water absorption bands within 1350-1440 nm, 1800-1965 nm, and 2400-2500 nm, we utilize 280 bands from the initial 330 bands to carry out our experiments, and the level of the products is L0.
2.2. GF-1, GF-2, and S2A Spaceborne Multispectral Sensors GF-1 satellite carries MS and PAN spectrometers, and it was launched on 26 April 2013 [28]. The instruments adopt the time delay and integration (TDI) charge coupled device (CCD) imaging technology to unify the five spectra and for the structural design of combining dual cameras. The GF-1 satellite carries four MS cameras and two PAN/MS cameras. These four MS sensors simultaneously capture images with four bands and a 16 m spatial resolution, and the swath width reaches 800 km through image mosaic technology. The PAN/MS cameras acquire PAN images with a 2 m spatial resolution and MS images with 8 m spatial resolution and with a swath width of 60 km through image mosaic technology. The GF-1 instruments, which operate on the pushbroom principle, provide VNIR bands but no SWIR bands. These acquired images of MS sensors cover the spectral ranges of 450-520 nm, 520-590 nm, 630-690 nm and 770-890 nm. In the experiment, we implement MS data with a 8 m spatial resolution (http://data.cresda.com), and the level of the products is L1.
The GF-2 satellite was launched on 19 August 2014, and it was equipped with two PAN/MS cameras [29]. It adopts a long focal length, an advanced design of its optical system, and the TDI CCD imaging technology to unify the five spectra. Operating in a pushbroom fashion, the instrument collects one PAN band of 1 m spatial resolution and four MS bands (450-520 nm, 520-590 nm, 630-690 nm, and 770-890 nm) of 4 m spatial resolution. The GF-2 satellite optimizes the parameters of orbit operation and skew maneuvering, and the field of view angle of 2.1 • for a single camera is realized to obtain a swath width of 45 km. We use the GF-2 MS data with 4 m spatial resolution in the experiment (http://data.cresda.com), and the level of the products is L1.
Sentinel-2A is the second satellite of the Global Monitoring of Environment and Safety (GMES) project in Europe, which was launched on 23 June 2015 [30]. It has a swath width of 290 km, and a revisited period of 10 days. The instrument carries a MS imager covering 13 spectral bands with different spatial resolutions (consisting of four bands of 10 m spatial resolution, six bands of 20 m spatial resolution and three bands of 60 m spatial resolution), ranging from visible to short-wave infrared. In the optical field, the S2A data is the only source with three bands in the red-edge range, which is effective in monitoring the vegetation health status. The images with four bands and a 10 m spatial resolution are used in our experiment, and the spectral ranges of these bands are 430-450 nm, 450-510 nm, 530-590 nm and 640-670 nm, respectively (https://scihub.copernicus.eu/dhus/#/home), and the level of the products is L0.

Data Preprocessing
Before multi-source data fusion, some data preprocessing work is essential, including ortho correction, atmospheric correction, spatial registration, and image clipping. All data preprocessing steps were carried out in ENVI software. These data sets were first converted into the world geodetic system 1984 (WGS 1984) coordinate system. The global digital elevation model (DEM) data at 30 m (https://asterweb.jpl.nasa.gov/gdem.asp) was used to correct the GF-5 hyperspectral images and MS images, and then the data were resampled using the rational polynomial coefficient files and the Remote Sens. 2020, 12, 882 5 of 26 bilinear interpolation method. All original images with digital numbers were calibrated to the surface reflectance data. The Fast Line-of-Sight Atmosphere (FLAASH) module was performed for all data, and the radiometric calibration was performed by gain and offset coefficients [31]. In addition, we removed the bad bands and noisy bands (1342-1460 nm, 1797-1973 nm, 1999-2024 nm, 2353-2500 nm) from GF-5 data, and the left 280 bands were used in our experiments. Using MS data as reference images, GF-5 images were registered. Control points were collected evenly for the whole image to ensure that the spatial error was less than one pixel. Finally, we clipped the same region of each dataset to obtain the data for fusion.

Study Area and Fusion Datasets
The GF-5, GF-1, GF-2 and S2A images shown in Figure 1 covering three study areas-i.e., the Yellow River Estuary area, Taihu Lake area and Poyang Lake area-were utilized. The Yellow River Estuary area is located in the northeast of Shandong Province, China, and is an important ecological protection area which mainly includes wetland vegetation. The GF-5 data were captured on 1 November 2018, and the S2A data were collected on 24 October 2018 in the Yellow River estuary area. We chose 850 × 670 pixels of GF-5 data as the experimental data in order to fuse them with the S2A data of 2550 × 2010 pixels.
Remote Sens. 2020, 01, x FOR PEER REVIEW 5 of 25 ensure that the spatial error was less than one pixel. Finally, we clipped the same region of each dataset to obtain the data for fusion.

Study Area and Fusion Datasets
The GF-5, GF-1, GF-2 and S2A images shown in Figure 1 covering three study areas-i.e., the Yellow River Estuary area, Taihu Lake area and Poyang Lake area-were utilized. The Yellow River Estuary area is located in the northeast of Shandong Province, China, and is an important ecological protection area which mainly includes wetland vegetation. The GF-5 data were captured on 1 November 2018, and the S2A data were collected on 24 October 2018 in the Yellow River estuary area. We chose 850 × 670 pixels of GF-5 data as the experimental data in order to fuse them with the S2A data of 2550 × 2010 pixels. The Taihu Lake area is one of five freshwater lakes in China and is located in the south of Jiangsu Province, China. The GF-5 data of Taihu Lake area were captured on 1 June 2018, and the main ground objects are water bodies, architecture, vegetation, cultivated land, and urban green space. We clipped six small datasets from a large GF-5 image, which was consistent with the three small datasets of GF-2 data, one small dataset of GF-1 data, and one small dataset of S2A.
The Poyang Lake area, as the largest freshwater lake in China, is located in Jiangxi Province, China. The main ground objects are cultivated land, vegetation and water bodies. The GF-5 data were collected on 7 October 2018, with an image size of 2420 × 2463 pixels. We clipped two small datasets from the large images of GF-5 data: one was the Poyang Lake-1 area with 600 × 500 pixel sizes, which The Taihu Lake area is one of five freshwater lakes in China and is located in the south of Jiangsu Province, China. The GF-5 data of Taihu Lake area were captured on 1 June 2018, and the main ground objects are water bodies, architecture, vegetation, cultivated land, and urban green space. We clipped six small datasets from a large GF-5 image, which was consistent with the three small datasets of GF-2 data, one small dataset of GF-1 data, and one small dataset of S2A. The Poyang Lake area, as the largest freshwater lake in China, is located in Jiangxi Province, China. The main ground objects are cultivated land, vegetation and water bodies. The GF-5 data were collected on 7 October 2018, with an image size of 2420 × 2463 pixels. We clipped two small datasets from the large images of GF-5 data: one was the Poyang Lake-1 area with 600 × 500 pixel sizes, which corresponds to GF-1 data of a 2550 × 1875-pixel size, and the other was the Poyang Lake-2 area with 700 × 700 pixels that corresponds to S2A data of a 2100 × 2100-pixel size. Table 3 lists the specific information of these datasets fpr the three study areas. In order to explore the specific application performance of different fusion methods, the fused images were classified and analyzed. By using Google Earth and field sampling, we obtained the region of interest (ROI) for all land cover types. Tables 4-6 show the number of labeled samples in each dataset. Training samples and testing samples were randomly selected from the selected ROI. In order to ensure the classification accuracy, the number of testing samples was almost three times that of training samples. All images were classified by using the support vector machine (SVM) classifier in ENVI 5.3 software. Table 4. Training and testing samples of each land cover type in all datasets (Taihu Lake-1, Taihu Lake-2, Poyang Lake-1). ROI: region of interest.

The Study Framework
Remote Sens. 2020, 01, x FOR PEER REVIEW 8 of 25 atmospheric correction, image registration, and image clipping. The parameters of data fusion methods were manually determined for different methods and different datasets. After that, the performances of different fusion methods were evaluated by using comprehensive evaluation measures; i.e., quantitative measures, application evaluation measures, and computational efficiency measures. Finally, the behaviors of different methods for different datasets were analyzed (e.g., fusing GF-5 and GF-1, GF-5 and GF-2, GF-5 and S2A) to summarize the appropriate fusion methods.

Fusion Methods
The current HS and MS fusion methods can be roughly divided into four categories: component substitution (CS), multiresolution analysis (MRA), subspace-based methods, and color mappingbased methods. Among them, the nine typical methods mentioned below were tested in this experiment.

CS-Based Methods: GSA
The CS-based methods transform HS images into other feature spaces by matrix transformation. After histogram matching, MS images are used to replace the intensity components of HS images, and the spatial resolution of HS images is then enhanced by inverse transformation. The typical method used in this paper is Adaptive GS (GSA), which is an improved method based on GS by Aiazzi et al [32]. It calculates the correlation between hyperspectral and multispectral bands and fuses the bands in groups. GSA calculates the linear relationship between HS and MS images, and uses regression coefficients to perform a forward transformation on HS data to extract the intensity component. The intensity component of HS images is replaced by MS images for inverse transformation. The mathematical models of GSA can be written as follows:

Fusion Methods
The current HS and MS fusion methods can be roughly divided into four categories: component substitution (CS), multiresolution analysis (MRA), subspace-based methods, and color mapping-based methods. Among them, the nine typical methods mentioned below were tested in this experiment.

CS-Based Methods: GSA
The CS-based methods transform HS images into other feature spaces by matrix transformation. After histogram matching, MS images are used to replace the intensity components of HS images, and the spatial resolution of HS images is then enhanced by inverse transformation. The typical method used in this paper is Adaptive GS (GSA), which is an improved method based on GS by Aiazzi et al. [23]. It calculates the correlation between hyperspectral and multispectral bands and fuses the bands in groups. GSA calculates the linear relationship between HS and MS images, and uses regression coefficients to perform a forward transformation on HS data to extract the intensity component. The intensity component of HS images is replaced by MS images for inverse transformation. The mathematical models of GSA can be written as follows: where the subscript i k indicates the k-th spectral band of the ith group data. HS and HS represent the fused image and the resampled hyperspectral image, respectively. MS i is the i-th multispectral band. g i k and w i k are the forward transform coefficient and transform coefficient, respectively. I represents the intensity component.

MRA-Based Methods: MTF-GLP and SFIM
The MRA-based methods inject detail spatial features of MS images into resampled HS images to enhance spatial resolution of HS data. A general formulation of MRA is given by where MS iL represents the i-th filtered multispectral band. a i k is the gain coefficient of k-th hyperspectral band in the i-th group data. Two typical methods are Modulation transfer function(MTF)-generalized Laplacian pyramid (GLP) [32] and smoothing filtered-based intensity modulation (SFIM) [33]. The MTF-GLP categorizes the bands of HS images into different groups according to the correlation coefficient between HS bands and MS bands and enhances each group of HS images. It uses a Gaussian MTF filter to perform low-pass filtering for MS images. The high spatial detail image is obtained by subtracting filtered images from the original MS data, and then the extracted detail image is injected into HS images by using the global gain coefficient. SFIM is a fusion algorithm that formulates the relationship between solar radiation and land surface reflection. It uses the same fusion steps to obtain detailed images as MTF-GLP; however, SFIM does not use the global gain, but uses the ratio between HS images and low-pass filter images of MS data as the gain coefficient. In SFIM and MTF-GLP, a i k is Subspace-based methods mainly consist of Bayesian-based approaches, unmixing-based approaches, and deep learning approaches. Bayesian-based approaches enhance spatial resolution by maximizing the posterior probability density of the full-resolution images. We implemented two typical Bayesian-based approaches: i.e., fast fusion based on Sylvester equation (FUSE) and the maximum a posteriori-stochastic mixing model (MAP-SMM) [34]. FUSE takes the original image as a prior probability density, and it achieves image fusion by calculating the maximum posterior probability density of the target image. It implements the alternating direction method of multipliers (ADMM) [35] and block coordinate descent method [36] to merge prior information into the fusion program. Moreover, it adopts the Sylvester equation [37] to give a close solution to the optimization problem, which also greatly improves the computational efficiency. MAP-SMM uses the stochastic mixing model (SMM) [38] to evaluate the conditional mean vector and covariance matrix of HS images relative to MS images, and obtains the mean spectrum, abundance map, and covariance matrix of each endmember. After that, it establishes a maximum a posteriori (MAP) [39] function to optimize the fused images and obtains the fused images by operating in the principal component subspace of HS images. A general formulation of Bayesian based approaches is given by where n is the number of data bands and i is the i-th band of data. The maximum posterior probability can be obtained by solving the following formula: where δ represents the standard deviation of the error, Ω stands for an open interval, and S is a constant term.
Unmixing-based methods decompose the HS and MS data into the HS basis and low-resolution coefficient matrix, MS basis and high-resolution coefficients matrix, respectively. The resolution-enhanced HS images can be obtained by reconstructing the HS basis and high-resolution coefficients. The key formula is described as follows: HS ≈ HSS = EAS = EÃ (10) where E and A are the hyperspectral basis and high-resolution coefficient matrix, respectively. S represents the spatial response function, and R represents the spectral response function. Then, E and A can be obtained by the minimum loss function: arg min where ||·|| F is the Frobenius norm. We implement two typical methods: i.e., the coupled nonnegative matrix factorization (CNMF) [40] and LANARAS [41]. The CNMF, proposed by Naoto Yokoya et al., uses a nonnegative matrix factorization (NMF) [42] to obtain the endmembers and abundances of HS and MS images. The fused images are obtained by recombining the endmembers of HS images and the abundances of MS images. CNMF uses vertex component analysis (VCA) [43] to calculate the initialized endmembers, and the endmembers and abundances are iteratively updated through the minimum loss function. LANARAS has similar fusion steps to CNMF, but it uses simplex identification via the split augmented Lagrangian (SISAL) [44] to initialize the endmembers. It implements sparse unmixing by variable splitting and the augmented Lagrangian [45] to initialize the abundance matrix and adopts the projection gradient algorithm to update the endmember and abundance matrices of HS and MS images.
Deep learning methods obtain the fused images through a neural network framework. The trained framework includes high-resolution HS (HRHS), high-resolution MS (HRMS), and low spatial resolution HS (LRHS). We implement one typical deep two-branch convolutional neural network (two-CNN-Fu). The formula is as follows: where C l+1 V lr n , P hr n is the output of the (l + 1)-th layer, C hsi l V lr n and C msi l P hr n are the extracted features of HS and MS data, respectively, G l+1 is the weight matrix, and q l+1 is the bias of the fully connected (FC) layers. ⊕ represents the operation of concatenating HS features and the MS features. The last FC layer is the spectrum of the fused image.

Color Mapping-Based Methods: HCM
Color mapping-based methods enhance the spatial resolution of HS images by color mapping. The conversion coefficient between data is calculated and the MS data is used for inverse transformation to realize image fusion; the typical method we used is HCM [46]. The HCM obtains the transformation matrix by using the linear relationship between downsampled MS images and HS images, and utilizes the MS images and the transformation matrix to obtain the fused images. HCM adds some bands of HS images into MS images and enhances the correlations between the resolution-reduced MS images and HS images. In addition, it adds a white band to compensate for the atmospheric effect and other bias effects. The HCM is defined as where HS 1,···θ represents the added bands of hyperspectral data, Band white is the white band, T is the transform coefficient, and the operation * represents the transposition operation of the matrix. T * is defined as follows: where ||·|| F is the Frobenius norm, and λ is a regularization parameter. The optimal T * is where I is an identity matrix with the same dimension as MS × MS T .

Parameter Settings
In the Appendix A, Table A1 lists the main parameters of specific fusion methods. For SFIM and MTF-GLP, a synthetic image from MS images by regression coefficients is required to calculate the detailed images. In these algorithms, the bands of HS data are the independent variables and the bands of MS data are the dependent variables. Then, the least square method calculates the regression coefficient matrix between the independent variables and the dependent variables by tge loss function. The synthetic image is obtained by HS data multiplied by the coefficient matrix. Figure 3 shows the specific use of the least square method. In CNMF, the virtual dimensionality (VD) algorithm [47] analyzes the typical eigenvalue of HS data by PCA and estimates the number of spectrally distinct signal sources in the data. The number of spectrally distinct signal sources is the initial number of endmembers (K). Based on the Neyman-Pearson hypothesis test, the VD algorithm is as follows: where k is the number of endmembers, θ represents the output of the identification algorithm, and ϕ 1 and ϕ 0 are the alternate hypothesis and the null hypothesis, respectively. µ is a threshold that separates the two hypotheses, ϑ 0 represents the desired false alarm density, and P k (x) represents the probability density function for k endmembers. The Hysime algorithm [48] evaluates the correlation matrix between signal and noise in HS data. The eigenvector quantum set of the correlation matrix is calculated, which stands for the signal subspace. The subspace of HS data is obtained by minimizing the sum of the noise power and projection error power, which are the decreasing and increasing functions of the subspace dimension, respectively. When the dimension of the subspace of HS data is overestimated, the noise power term is dominant; otherwise, the projection error power term is dominant. The Hysime algorithm is used to obtain the subspace consisting of two sequential stages: the noise estimation and the signal subspace Remote Sens. 2020, 12, 882 12 of 26 estimation. For the LANARAS, parameter settings in the original research [18] were firstly verified, and the fine-tuning of the endmembers (k) was then manually performed. Using the same method as LANARAS, we manually set the numbers of subspaces (S), the endmembers (K) and the number of mixture classes (m) for MAP-SMM. There are three important parameters in HCM, including the number of extracted bands from HS image (B), the patch size (T), and the regularization parameter (Z). Using cross-validation, we manually set the B as 280 and Z as 0.01 to offer the best fusion effect. A small pitch size T causes a poor visualization effect of the fused images, and we accordingly chose the minimum number of columns and row sizes as the proper value. coefficient matrix between the independent variables and the dependent variables by tge loss function. The synthetic image is obtained by HS data multiplied by the coefficient matrix. Figure 3 shows the specific use of the least square method. In CNMF, the virtual dimensionality (VD) algorithm [48] analyzes the typical eigenvalue of HS data by PCA and estimates the number of spectrally distinct signal sources in the data. The number of spectrally distinct signal sources is the initial number of endmembers (K). Based on the Neyman-Pearson hypothesis test, the VD algorithm is as follows: where k is the number of endmembers, represents the output of the identification algorithm, and The Hysime algorithm [49] evaluates the correlation matrix between signal and noise in HS data. The eigenvector quantum set of the correlation matrix is calculated, which stands for the signal Figure 3. The use of the least square method to calculate the regression coefficient matrix between the independent variables and the dependent variables.

Comprehensive Evaluation Measures
We implement evaluation measures considering three different aspects: quantitative evaluation, application evaluation, and computation efficiency measures.

Spectral Evaluation Measures
Quantitative measures assess both the spectral and spatial distortions of fused HS images. The spectral angle measure (SAM) [49], erreur relative globale adimensionnelle de synthèse (ERGAS) [50], and peak signal-to-noise ratio (PSNR) [51] were used to quantify the spectral distortions between fused HS images and reference images. The reference image does not exist in the real experiments, and the resampled GF-5 images are used as reference images to evaluate the spectral distortions. To resample the GF-5 images, a cubic convolution interpolation algorithm was implemented (see Section 5 for details on the advantages and disadvantages of resampling algorithms).
SAM calculates the spectral angle between corresponding pixels of reference and fused images, which is defined as where z is the reference image,ẑ is the fused image, and z j ∈ R m×1 andẑ j ∈ R m×1 represent the spectral signatures of the jth pixel in the reference image and the fused images, respectively. A larger SAM means a more severe spectral distortion of the fused images. When the SAM equals 0, the fused images have the smallest spectral distortion.
Remote Sens. 2020, 12, 882 13 of 26 ERGAS is a global index for measuring spectral distortion, which is defined as where z i ∈ R n×1 andẑ i ∈ R n×1 represent the i-th band of the reference image and fused images, respectively. p is the ratio of the spatial resolution between MS and HS images, and n is the number of pixels in the images. A larger ERGAS brings about more spectral distortion. When ERGAS is equal to 0, the spectral distortion is the smallest. PSNR is the ratio of the maximum power of a signal to the noise power that affects its representational accuracy. It evaluates the reconstruction error of fused images, which is defined as where max(z i ) is the maximum value in the ith band of the reference image, in which a higher PSNR means a better result.

Spatial Evaluation Measures
Meanwhile, the high-frequency correlation coefficient (HCC) was utilized to measure spatial distortion. We adopt edge detection technology to extract edge information from MS images and fused images, respectively. Taking the edge detection results of MS images as the reference images, the correlation coefficients between the detection maps of fused images and MS images are calculated. We use the Sobel operator to extract the edge information. The HCC is defined as where A and B are the reference edge images and the evaluated edge images, respectively; A i and B i are the samples of A and B; e is the total number of the samples; andĀ andB are the means of A and B.
The ideal value of HCC is 1.

Classification Evaluation Measures
The overall accuracy (OA) [52] and Kappa coefficient (KC) [53] are used to quantify the classification accuracy of fused images and evaluate the application performance. OA is a commonly used indicator for evaluating the behaviors of image classification, which is defined as where w is the number of classes, u represents the total number of samples, and X ii represents the observation in row i and column i. A higher OA means a better classification result. KC is another reliable indicator for the accuracy evaluation of image classification. The formula is defined as follows: where X i+ and X +i represent the marginal total in row i and column i, respectively. The range of KC is between 0 and 1, and a larger KC means higher classification accuracy.

Computational Efficiency Measures
The running time is recorded to evaluate the computational efficiency of different fusion methods. All the fusion methods are implemented in MATLAB 2016a, and their codes are run on a WIN10 computer with Intel Core i7 processor and 64 GB RAM. Table 7 describes the quantitative evaluation results of fused images from GF-5 and GF-1 data. The bold type represents the best result, and the second-best result is underlined. Figure 4 shows the fused image of nine methods for the Taihu lake-1 dataset. The observations in Table 7 and Figure 4 show the robustness of GSA, CNMF, LANARAS, and FUSE in terms of spatial fidelity. The reason for this is that GSA injects spatial details of MS data into HS data via component substitution and has an ideal transformation coefficient to maintain the spectral separation of HS data. The two unmixing-based methods and FUSE formulate the relationships between HS data, MS data, and fused images, and implement the minimum loss function to reduce the approximation error while eliminating the effects of noise. It is interesting that FUSE has good spectral fidelity in the Poyang Lake-1 dataset, whereas it suffers serious spectral distortion in the Taihu Lake-1 area and Taihu Lake-2 area. Our assumed reason for this is that the Hysime is unsuitable for identifying the subspaces of data in complicated building areas. MTF-GLP, SFIM and MAP-SMM have good spectral fidelity, but SFIM and MAP-SMM have severe spatial distortion. MRA-based methods obtain detailed information by filtering MS data in the fusion process. They inject detailed information into HS data with the gain coefficient, and the spectral information of HS data is not modified in the fused images. In contrast, the gain coefficient of SFIM is not as ideal as that of MTF-GLP, resulting in an insufficient injection of spatial details in SFIM. The number of endmembers in MAP-SMM affects the behaviors of SMM and accordingly limits the enhancement of spatial resolution of HS data [18]. Table 7. Evaluation results of different fused methods on GF-5 and GF-1 data in terms of spectral (spectral angle measure (SAM), erreur relative globale adimensionnelle de synthèse (ERGAS), peak signal to noise ratio (PSNR)), spatial (high-frequency correlation coefficient (HCC)), and computational efficiency (Time) measures. (The double underline value is the best accuracy in each case, followed by single underline).   HCM has good visualization behaviors but has severe spatial distortion and regular spectral fidelity. It adds some bands of HS data into MS data, and the spatial resolutions of these added bands are not well enhanced by color mapping. The fused images accordingly preserve all spectral information of MS data but only part of the spectral information of HS data. In two-CNN-Fu, due to the lack of HRHS, the spectral response function was used to downsample the HS as HRMS, the Gauss fuzzy kernel and point distribution function were used to calculate HS to get LRHS, and the original HS is used as HRHS. In the experiment, the neural network framework was trained by using the simulation data set, which made the trained neural network framework have great defects, and the image with the worst fusion effect was obtained. Figure 5 shows an example of the classification results, and Figure 6 plots the classification accuracy of all fused images on the three datasets (see Table 4 for the categories of objects). The KC and OA of fused images are higher than those of HS and MS data. All fused images obtain similar and good classification behaviors in Poyang Lake-1, and their classification performances diverge greatly for the Taihu Lake-1 and Lake-2 datasets. We guessed that their divergent performances result from different land covers in the three datasets. The building areas in Taihu Lake-1 and Lake-2 have complicated spectral information and rich spatial detail, which demonstrates the behaviors of different fusion methods. GSA, FUSE and LANARAS have higher classification accuracies, SFIM, MAP-SMM, HCM and two-CNN-Fu have the worst KC and OA, and MTF-GLP and CNMF are unstable.

Datasets
greatly for the Taihu Lake-1 and Lake-2 datasets. We guessed that their divergent performances result from different land covers in the three datasets. The building areas in Taihu Lake-1 and Lake-2 have complicated spectral information and rich spatial detail, which demonstrates the behaviors of different fusion methods. GSA, FUSE and LANARAS have higher classification accuracies, SFIM, MAP-SMM, HCM and two-CNN-Fu have the worst KC and OA, and MTF-GLP and CNMF are unstable.

Fusion Results of GF-5 and GF-2
As shown in Table 8 and Figure 7, GSA, the unmixing-based methods-FUSE and MTF-GLPshow robustness in terms of spatial fidelity when fusing GF-5 data with GF-2 data (see the detail on the Taihu Lake-3 dataset in Figures 7 and 8). MAP-SMM, MTF-GLP, and SFIM exhibit less spectral distortion, while HCM and two-CNN-Fu has severe spatial distortions and regular spectral fidelity. However, compared with those of GF-5 data and GF-1 data, the SAMs of all fused images increase. A larger ratio of spatial resolution between GF-5 and GF-2 data requires more pixels of GF-2 data to be obtained to perform the image interpolation in the fusion process. That would involve more heterogenous pixels and then cause the spectral distortion of HS data. Except for two-CNN-Fu, all fusion methods have good spectral fidelity in the Taihu Lake-4 and Lake-5 areas. Except for CNMF, MAP-SMM, MTF-GLP and SFIM, the other five methods have serious spectral distortions in the Taihu Lake-3 area. The spectral fidelity of CNMF, MAP-SMM, MTF-GLP, and SFIM is the most robust. In terms of the spatial fidelity of GSA, the unmixing-based methods FUSE and MTF-GLP are more stable than others.

Fusion Results of GF-5 and GF-2
As shown in Table 8 and Figure 7, GSA, the unmixing-based methods-FUSE and MTF-GLP-show robustness in terms of spatial fidelity when fusing GF-5 data with GF-2 data (see the detail on the Taihu Lake-3 dataset in Figures 7 and 8). MAP-SMM, MTF-GLP, and SFIM exhibit less spectral distortion, while HCM and two-CNN-Fu has severe spatial distortions and regular spectral fidelity. However, compared with those of GF-5 data and GF-1 data, the SAMs of all fused images increase. A larger ratio of spatial resolution between GF-5 and GF-2 data requires more pixels of GF-2 data to be obtained to perform the image interpolation in the fusion process. That would involve more heterogenous pixels and then cause the spectral distortion of HS data. Except for two-CNN-Fu, all fusion methods have good spectral fidelity in the Taihu Lake-4 and Lake-5 areas. Except for CNMF, MAP-SMM, MTF-GLP and SFIM, the other five methods have serious spectral distortions in the Taihu Lake-3 area. The spectral fidelity of CNMF, MAP-SMM, MTF-GLP, and SFIM is the most robust. In terms of the spatial fidelity of GSA, the unmixing-based methods FUSE and MTF-GLP are more stable than others.
heterogenous pixels and then cause the spectral distortion of HS data. Except for two-CNN-Fu, all fusion methods have good spectral fidelity in the Taihu Lake-4 and Lake-5 areas. Except for CNMF, MAP-SMM, MTF-GLP and SFIM, the other five methods have serious spectral distortions in the Taihu Lake-3 area. The spectral fidelity of CNMF, MAP-SMM, MTF-GLP, and SFIM is the most robust. In terms of the spatial fidelity of GSA, the unmixing-based methods FUSE and MTF-GLP are more stable than others.   An example of classification results is shown in Figure 8, and Figure 9 plots the classification accuracy of GF-5 data, GF-2 data and all fused images (see Table 5 for the categories of objects). Similar to Figure 4, the fused images show better behaviors than the original HS data and MS data.  Table 9 shows the evaluation results and fused images of all nine methods for GF-5 and S2A data. Compared with GF-2 and GF-1 data, the divergences of SAM from the nine fused images are smaller. The explanation for that is that GF-5 and S2A data have a smaller ratio of spatial resolutions. GSA and unmixing-based methods show more robustness in the spatial fidelity. Table 9. Evaluation results of different fused methods on GF-5 and S2A data in terms of spectral (SAM, ERGAS, PSNR), spatial (HCC), and computational efficiency (Time) measures. (The single underline value is the best accuracy in each case, followed by double underline.)   Table 9 shows the evaluation results and fused images of all nine methods for GF-5 and S2A data. Compared with GF-2 and GF-1 data, the divergences of SAM from the nine fused images are smaller. The explanation for that is that GF-5 and S2A data have a smaller ratio of spatial resolutions. GSA and unmixing-based methods show more robustness in the spatial fidelity. Table 9. Evaluation results of different fused methods on GF-5 and S2A data in terms of spectral (SAM, ERGAS, PSNR), spatial (HCC), and computational efficiency (Time) measures. (The double underline value is the best accuracy in each case, followed by single underline). In the Yellow River Estuary area and Poyang Lake-2 area, FUSE has severe spatial distortion, but the enhancement of spatial resolution is obvious in the Taihu Lake-6 area (for detail, see Figure 10). Two-CNN-Fu has the worst fidelity in both spatial and spectral aspects. HCM could preserve the spectral information of MS data and part of the spectral information of HS data during the fusion process. It achieves the best spectral fidelity in Yellow River Estuary, but has relatively regular performance in terms of spectral fidelity on the Poyang Lake-2 and Taihu Lake-6 datasets. The reason for this is that different time lags in collection data make the spectral divergence of GF-5 and S2A in the Yellow River Estuary smaller than those in Poyang Lake-2 and Taihu Lake-6. An example of classification results is shown in Figure 11, and Figure 12 plots the OA and KC of GF-5, S2A data and all fused images (see Table 6 for the categories of objects). The classification results of the fused images are superior to the original HS data and MS data. GSA, SFIM, MAP-SMM and two-CNN-Fu have the best classification performance, with the highest OA and KC. The classification accuracies of MTF-GLP, FUSE, CNMF, LANARAS and HCM are unstable, and these methods perform better for Taihu Lake-6 than in the Yellow River Estuary and Poyang Lake-2 areas.   An example of classification results is shown in Figure 11, and Figure 12 plots the OA and KC of GF-5, S2A data and all fused images (see Table 6 for the categories of objects). The classification results of the fused images are superior to the original HS data and MS data. GSA, SFIM, MAP-SMM and two-CNN-Fu have the best classification performance, with the highest OA and KC. The classification accuracies of MTF-GLP, FUSE, CNMF, LANARAS and HCM are unstable, and these methods perform better for Taihu Lake-6 than in the Yellow River Estuary and Poyang Lake-2 areas.

Discussion
In this paper, we propose a comprehensive evaluation framework, which is very effective in comparing the current fusion methods. Although our predecessors have undertaken several corresponding works-for example, in 2014, Vivone et al. conducted a comparative analysis on the methods of MS pansharpening-they only used quantitative indicators to evaluate various methods [17]. Loncan et al. conducted a comparative analysis of the HS pansharpening methods in 2015 and conducted a quantitative evaluation of several methods [20], and the computational efficiency of various methods was also evaluated. In 2017, Yokoya et al. evaluated various HS and MS fusion approaches from a quantitative and applied perspective [18]. However, our evaluation framework adds HCC evaluation indicators, which can effectively evaluate the spatial distortion of various methods, which was not the case in previous literature. Our framework simultaneously evaluates spectral distortion, spatial distortion, application performance, and computational efficiency. In addition, previous papers used simulation datasets to evaluate various methods, but we used real datasets to verify the performance of different methods, which led to a more objective evaluation result.
In this experiment, real datasets were used for fusion, and resampled HS data were used as reference data to evaluate the fused data. Therefore, we summarized and analyzed the existing image interpolation methods. The current image interpolation methods mainly include nearest neighbor interpolation [54], bilinear interpolation [55], and cubic convolution interpolation [56]. The advantage of the nearest neighbor interpolation method is that the calculation is very small and the operation speed is fast. However, it only uses the gray value of the pixel closest to the sampling point to be tested as the gray value of the sampling point, without considering the influence of other neighboring pixel points. Therefore, after resampling, the gray value has obvious discontinuity, the image quality loss is large, and there will be obvious mosaic and saw tooth phenomena [57]. Bilinear interpolation is better than nearest neighbor interpolation, but the calculation is a little large, the algorithm is more complex, and the computational time of the program is a little long. The image quality after zooming is high, which basically overcomes the feature of the discontinuous gray value of nearest neighbor interpolation. The reason for this is that it considers the influence of four direct neighbors around the sampling point to be tested on the correlation of the sampling point. However, this method only considers the influence of the gray values of the four direct neighboring points around the testing sample, and it does not include the influence of the change rate of the gray value between the neighboring points. Accordingly, it has the property of a low-pass filter, which leads to the loss of the high-frequency components of the image after scaling, and the image edge becomes fuzzier to a certain extent. Compared with the input image, the output image after zooming by this method still has the problems of image quality damage and low calculation accuracy due to the poor consideration of interpolation function design [58]. Cubic convolution interpolation is the most widely used algorithm. It not only considers the influence of the gray values of the four neighboring pixels but also considers the influence of the change rate of their gray values. Therefore, it overcomes the shortcomings of the former two methods, with generally smoother edges than bilinear interpolation and high calculation accuracy [59].
Besides this, the performance of each fusion method was discussed comprehensively when fusing GF-5 data with GF-1, GF-2 and S2A data, respectively. In Table 10, more points indicate better fusion performance. By adding the scores of each method in SAM, HCC and KC, this comprehensive score result in overall terms is obtained.
When fusing GF-5 with three typical MS data sources, the spatial resolution enhancement of GSA is obvious when replacing intensity components of HS data, and spectral distortion is also caused during the image fusion process. MAP-SMM, MTF-GLP, and SFIM have better performance in terms of spectral fidelity, and MTF-GLP behaves well when enhancing the spatial resolution. However, MAP-SMM is sensitive to the number of endmembers, and SFIM is affected by the gain coefficient, which limits their performance in image fusion. HCM simultaneously preserves the spectral information of MS data and HS data by color mapping. However, the added bands have poor behaviors in spatial resolution enhancement, and that causes HCM to have clear defects. Two-CNN-Fu has the worst spectral and spatial fidelity for all datasets. Table 10. Comprehensive evaluation of all methods for fusing GF-5 with GF-1, GF-2 and S2A data on the basis of spectral (SAM), spatial (HCC), classification (Kappa coefficient (KC)), and all-inclusive (overall) features.

Fusion
The classification performance of all fused images exceeds those of HS and MS data. GSA, MTF-GLP, SFIM, and two-CNN-Fu achieve competitive classification results for some fused data. CS-based methods have serious spectral distortion, but the replacement of components enlarges the spectral divergences of different ground objects and then improves the classification accuracy. MTF-GLP and SFIM could enhance the spatial resolutions of GF-5 data, and their excellent spectral fidelity benefits the classification accuracies of fused images. Subspace-based methods have different fusion behaviors for various datasets, mainly because different ratios of spatial resolution affect the fusion results and classification accuracies. LANARAS shows a competitive classification result when GF-5 is fused with GF-1 data. FUSE performs the best when GF-5 is fused with GF-2 data. MAP-SMM has a higher classification accuracy when GF-5 is fused with S2A data. Two-CNN-Fu exhibits unstable performance of classification for different datasets. HCM exhibits poor classification results for all the datasets. The reason for this is that it has limitations in the enhancement of the spatial resolution and regular spectral fidelity of GF-5 data. The experimental results also show that the classification behaviors do not fully depend on the spectral and spatial fidelity of fused images, and GSA is a typical example. Tables 7-9 illustrate that two-CNN-Fu, MAP-SMM and LANARAS have the lowest computational speeds, while SFIM and HCM have the shortest computational time.
From the above comprehensive evaluation, when GF-5 is fused with GF-1 data, GSA and LANARAS have the best spatial fidelity, and LANARAS, GSA, FUSE show the best classification behaviors. The overall performances of GSA and MTF-GLP are the best. For the fusion of GF-5 and GF-2 data, GSA, MTF-GLP, and FUSE have the best enhancement performance in terms of spatial resolution, and the best classification results are obtained by MTF-GLP, GSA, and FUSE. MTF-GLP and GSA show the best overall behaviors. When GF-5 is fused with S2A data, GSA, LANARAS, and CNMF perform best, and HCC, GSA, SFIM, and MAP-SMM obtain good classification results. The overall performances of GSA and SFIM are the best of all.

Conclusions
GF-5 data provides 330 spectral bands with a spatial resolution of 30 m, which presents great advantages over on-orbit or planned spaceborne HS sensors. By means of fusion, the spatial resolution of GF-5 data can be improved and further applied to the high-resolution mapping of urban surface materials, minerals, plant species and so on. This paper investigates the performance of nine fusion methods in fusing GF-5 with GF-1, GF-2, and S2A data, respectively. A set of comprehensive measures including quantitative spectral and spatial evaluation (SAM, ERGAS, PSNR, and HCC), classification accuracy (OA and KC), and computation time (time) are employed to evaluate the behaviors of each method for different datasets. The experimental results show that the fused images are more advantageous than the original data, and the various methods behave divergently for different datasets. GSA, MTF-GLP, and SFIM are more competitive than others when GF-5 data are fused with GF-1, GF-2 and S2A data. Subspace-based methods have lower robustness in terms of spectral fidelity and slower computational speeds than GSA MTF-GLP and SFIM, but they have good spatial fidelity. HCM and two-CNN-Fu have drawbacks and relatively poor fusion results. Therefore, LANARAS, GSA and MTF-GLP are recommended when fusing GF-5 with GF-1 data, MTF-GLP and GSA are recommended when fusing GF-5 with GF-2 data, and GSA and SFIM are recommended when fusing GF-5 with S2A data.
Author Contributions: K.R. and W.S. analyzed the data, performed the experiments, and wrote the draft of the manuscript; X.M. designed the framework of this study, gave comments, and significantly revised the manuscript; G.Y. and Q.D. gave comments. All authors have read and agreed to the published version of the manuscript.