A Novel Image Fusion Method of Multi-Spectral and SAR Images for Land Cover Classiﬁcation

: The fusion of multi-spectral and synthetic aperture radar (SAR) images could retain the advantages of each data, hence beneﬁting accurate land cover classiﬁcation. However, some current image fusion methods face the challenge of producing unexpected noise. To overcome the aforementioned problem, this paper proposes a novel fusion method based on weighted median ﬁlter and Gram–Schmidt transform. In the proposed method, Sentinel-2A images and GF-3 images are respectively subjected to different preprocessing processes. Since weighted median ﬁlter does not strongly blur edges while ﬁltering, it is applied to Sentinel-2A images for reducing noise. The processed Sentinel images are then transformed by Gram–Schmidt with GF-3 images. Two popular methods, principal component analysis method and traditional Gram–Schmidt transform, are used as the comparison methods in the experiment. In addition, random forest, a powerful ensemble model, is adopted as the land cover classiﬁer due to its fast training speed and excellent classiﬁcation performance. The overall accuracy, Kappa coefﬁcient and classiﬁcation map of the random forest are used as the evaluation criteria of the fusion method. Experiments conducted on ﬁve datasets demonstrate the superiority of the proposed method in both objective metrics and visual impressions. The experimental results indicate that the proposed method can improve the overall accuracy by up to 5% compared to using the original Sentinel-2A and has the potential to improve the satellite-based land cover classiﬁcation accuracy.


Introduction
The application of remote sensing satellite images for land cover classification has received widespread attention [1]. Satellite images, with high spatial resolution and rich spectral information, could play a greater role in improving classification accuracy [2]. However, due to the limitation of sensor storage space or communication bandwidth, it is difficult for satellite data to take both spatial resolution and spectral resolution into account [3]. Sentinel-2 [4], the multi-spectral (MS) imaging satellite of the European Space Agency (ESA), contains relatively rich spectral information [5]. However, the maximum resolution of Sentinel-2 is 10 m, which makes it difficult to meet the requirements for ultra-high resolution. Synthetic aperture radar (SAR) sensors can capture images of the Earth all the time and in all weather conditions, and therefore play an increasingly important role in land cover classification [6]. SAR images provide richer surface information due to their imaging characteristics, which can compensate for the inability of other sensors to obtain sufficient information. For example, GF-3, a new high-resolution C-band fully polarized SAR image, has attracted more and more people [7]. The images obtained by GF-3 have been widely used and show excellent performance in many fields, such as target recognition and land cover classification [8][9][10]. However, this kind of data is easily affected by the multiplicative noise. The noise makes SAR images appear grainy and makes SAR interpretation a challenging task [11]. GF-3 data actively emit electromagnetic radiation, while Sentinel-2 data obtain information by passively measuring electromagnetic radiation. Hence, the fusion of these two types of data may provide supplementary information about the imaging area [12].
Data fusion is an effective way of synergistically combining information from various sources to better understand a given scene [13]. The fusion of MS and SAR data can sharpen the details of low-resolution data while providing complementary data observed from the same site [14]. In other words, the fusion of GF-3 and Sentinel-2 can not only help to generate images with rich spatial and spectral information but also contribute to better understand and explain the image area. In addition, a good fusion algorithm needs to ensure that the color content of the original MS image is not distorted while improving the spatial details [15]. With respect to the ordinary pan-sharpening, the fusion of the optical and SAR image is more difficult because the gray values of SAR image are not relevant to those of multi-spectral images [16].
Some researchers have proposed some effective algorithms to solve the problem in the fusion of SAR and optical images. For example, Montgomery et al. use multi-temporal and multi-modal data to develop a classification method based on decision-making and data fusion and apply it to wetland classification [17]. Iervolino et al. propose a multi-sensor data fusion method based on generalized intensity-hue-saturation transform and wavelet transform [18]. The above methods could remove some redundant information and achieved certain fusion results, however, there are noise components in the image surface after fusion [19]. In recent years, the fusion of GF-3 and Sentinel-2 has attracted the attention of some scholars. For example, Wang et al. realize polarization mode information fusion based on curvelet transform [9]. In [20], the full feature fusion method is proposed to classify crops. However, these studies only use part of the Sentinel data bands, not all bands. Consequently, the fusion algorithm of these two kinds of data is urgently needed.
Land cover classification is a vital assessment method to evaluate the performance of image fusion algorithms. For example, the maximum likelihood classifier is adopted to classify images before and after fusion in [18]. Experimental results show that the rich details of SAR texture can improve classification performance compared with MS images. In [21], authors integrate the multi-scale texture features of the image into the random forest (RF) classifier. Zhang et al. apply object-oriented multi-scale segmentation to the image after feature fusion and implement an improved support vector machine (SVM) model for classification [22]. According to algorithm comparison, RF is a powerful and highly recognized ensemble classification method [23,24]. In this paper, RF is selected to evaluate the fusion effect of the image.
Moreover, land cover is an important area of interest in Earth observation (EO) [25]. In recent years, there have been many works on EO data, and these works have proposed fusion algorithms based on multi-source images to obtain robust performance. For example, Rasaei et al. use the Bayesian method to fuse soil data from traditional soil surveys with direct soil information from remote sensing images [26]. In [27], authors merge information of different spectral indices with feature-level fusion technology to extract buildings from high-resolution satellites. Nowadays, powerful tools are still needed to process EO data and to generate EO products [25].
The objective of this paper was to propose a new algorithm to fuse GF-3 and Sentinel-2 images and apply the fusion result to land cover classification. The advantage of the algorithm lies in reducing the noise of GF-3 and keeping the spectral characteristics of Sentinel-2 during the fusion process.
The rest of this paper is organized as follows. At first, the image fusion methods and the random forest classification method are described in Section 2. Section 3 describes the studied area and the imagery being used for this study. In Section 4, the proposed method is presented. The fusion performance comparison and analysis of the experimental results obtained by the different transform fusion methods are made in Section 5. The discussion is presented in Section 6. Finally, the concluding remarks are given in Section 7.

Image Fusion Methods
The purpose of image fusion is to integrate information of different spatial and spectral resolutions, so as to generate images with more detail [28]. Image fusion methods could be divided into the pixel, feature, and decision levels roughly [29]. There is information loss in the process of feature and decision level fusion [19]. The pixel-level fusion technology has higher accuracy and fewer data changes; hence, these methods are more suitable for remote sensing applications [30,31].
Pixel-level fusion technology directly acts on the pixel data of multi-source images. Commonly used methods in pixel-level fusion technology include intensity-hue-saturation (IHS) transform [32], Brovey transform [33], principal component analysis (PCA) [34,35] and Gram-Schmidt (GS) transform [36,37]. IHS transform extracts the spatial and spectral information from the original image. However, this method may have a serious problem of spectral distortion [38,39]. Brovey transform is to normalize each band of the MS image and perform a multiplicative operation with the high-resolution image, so as to add the required spatial information to the MS image [40]. The merged image generated by this method can not retain the radiance of the input MS image [41]. Moreover, Brovey transform is the same as IHS transform in that only three bands can be converted at once. PCA fusion method extracts the main components in the MS image, where the first principal component contains most of the spatial information, and the remaining components reflect the specific spectral details of other bands. PCA can reduce data redundancy in the fusion process, however, it may cause the loss of important information in the original image, especially the spectral information [42]. Gram-Schmidt transform is an improved version of the PCA fusion method. Different from PCA, the components of the image generated by GS are orthogonal. Since GS applies the spectral response function to estimate SAR data, the fusion result may be more accurate.
Moreover, MS and SAR fusion images have a wide range of applications in earth sciences. Ma et al. propose a SAR image change detection method based on improved wavelet fusion [43]. The experimental results show that, compared with the original SAR image, their change detection method can better reflect the real change trend and reduce the influence of speckle noise. In [44], Salentinig and Gamba jointly apply multiple pixel fusion methods to multi-scale SAR data sets.

Classification Methods
Different types of classification methods have been applied to map land cover [45]. Classification methods could be divided into unsupervised algorithms and parameter-supervised algorithms. Among them, unsupervised algorithms mainly include some typical methods, such as K-means and iterative self-organizing data analysis methods. Supervised algorithms mainly include maximum likelihood method, support vector machine [46], some ensemble learning algorithms, etc. However, the structure of some supervised methods (such as neural networks and support vector machines) is very complex and requires a large number of parameters to be adjusted, making it difficult to automate [47].
Ensemble model, an effective method, has been successfully used in remote sensing classification [48][49][50][51][52]. Random forest (RF), a typical ensemble learning method, has received widespread attention due to its excellent classification effect, avoidance of overfitting and fast processing speed. Each tree in RF is constructed using a subset feature and random sampling with substitution. The training samples are divided into left sub-tree and right sub-tree with the best splitting attribute, selected out from original attributes, with a splitting criterion. The training dataset keeps splitting and divided until the stoppage condition reaches. When the trees are balanced, the RFs are very efficient, as the time complexity of classification is logarithmic in the number of nodes.
RF has been widely used in land cover classification. Tian et al. apply RF to classify wetland land cover in arid areas and find that the RF classification method fused with multi-sensor data can retrieve better land cover information than other classifiers [53]. In [54], RF is used in multi-source EO datasets. The experimental result shows that RF is more stable on the multi-source EO data than the support vector machine. Wu et al. use RF in the fusion data of the GF-2 images, the normalized digital surface model derived from Lidar data and their texture features [55].

Datasets
The experimental data contain the GF-3 radar images and the Sentinel-2A multi-spectral images in five areas. The GF-3 images were acquired on 15 May 2018. The data (level 1A) are the Horizontal transmitting and Horizontal receiving (HH) polarization image in the sliding spotlight (SL) mode with a high spatial resolution (0.99 m in azimuth and 0.56 m in range). The incidence angle is 39.27 • . In order to avoid any land cover inconsistencies between the acquisition dates of the image in the same areas, Sentinel-2A data imaged on 14 May 2018 were selected. The Sentinel-2A images were acquired from (https://scihub.copernicus.eu/dhus/) official ESA website. These images (level 1C) have 13 spectral bands, including four bands of 10m, six bands of 20 m and three bands of 60m spatial resolution. The test site is a bound part of Jiangsu province in China.

Data Preprocessing
The preprocessing of both GF-3 and Sentinel-2A images is implemented first. The speckle noise is randomly distributed in the SAR image and mixed with the ground objects, with a result of reducing the spatial resolution of the SAR image. To reduce the noise of the GF-3 image, 1 × 1 (range × azimuth) multi-look processing is applied to all the GF-3 images. The processed data have a range resolution of 0.89 m and an azimuth resolution of 0.99 m. Then, Frost filtering (window size of 3) is adopted to further reduce the negative effect of the noise. For a better registration between GF-3 image and Sentinel-2A image, the terrain radiation correction based on the digital elevation model is used to weaken the geometric deformation of GF-3 caused by the terrain. After preprocessing, the geometric resolution of GF-3 is 1.4 m. For the Sentinel-2A image, the geometric correction and atmospheric correction are implemented in the preprocessing step. Since there are three different resolutions between Sentinel bands, Sentinel Application Platform (SNAP) software is used to resample all the 20 m and 60 m bands to 10m resolution by the nearest neighbor method. After data preprocessing, all GF-3 images and the three-band color composite of Sentinel-2A images (bands 4, 3, 2) are shown in Figure 1.

Evaluation Methods
Five statistical parameters are used to evaluate the quality of different aspects of the fused image, namely standard deviation (SD), average gradient (AG), spatial frequency (SF), peak signal-to-noise ratio (PSNR) and correlation coefficient (CC). Moreover, three classic evaluation indicators are used to test the effect of the classifier, namely overall accuracy (OA), Kappa statistics, and McNemar's test [56]. In the following equations, F(i, j) and R(i, j) respectively represent the fused image and the reference image, M and N are the numbers of rows and columns of the whole image and (i, j) is the index. P = (p xy ) c×c is the confusion matrix, c is the number of classes, p x,y represents the total number of pixels belonging to category x but classified into category y. Sentinel-2A image is used as the reference image.
• Standard deviation (SD): It calculates the contrast in the image. The larger the standard deviation, the more scattered the gray level distribution, which means that more information is available.
whereF is the average gray value of the fused image. • Average gradient (AG): It reflects the image's ability to express small detail contrast and texture changes, as well as the sharpness of the image. It is desirable to have this value as high as possible.
Spatial frequency (SF): It measures the overall activity level of the image. Fusion images with high spatial frequency values are preferred. SF is expressed in row frequency (RF) and column frequency (CF).
where RF and CF are given by • Peak signal-to-noise ratio (PSNR): It calculates the visual error between the fused image and the reference image. A large value of PSNR indicates that the distortion of the fused image is small.
where MAX F is the maximum pixel value of the image and MSE represents the mean square error of the image.
• Correlation coefficient (CC): It measures the correlation between the reference image and the fused image.
whereR is the mean value of reference image. • Overall accuracy (OA): It represents the proportion of samples that are correctly classified among all samples.
• Kappa statistics: It is a multivariate statistical method for classification accuracy.
where τ is the total number of test samples, p x· and p ·x respectively represent the sum of the x-th row and the x-th column of the confusion matrix. • McNemar's test: It is used to determine whether a learning algorithm is superior to another in a particular task with acceptable the probability of false detection of differences when there is no difference. Suppose there are two algorithms A and B.
where e 01 is the number of samples misclassified by algorithm A but not B, and e 10 represents the number of samples misclassified by algorithm B but not A. The p-value of the Chi-squared value is calculated by McNemar's test. Since a small p-value indicates that the null hypothesis is unlikely to be established, we may reject the null hypothesis if the probability that Z ≥ 1.96 is less than 0.05.

Parameter Setting
In order to ensure that each datum pixel in the high-resolution GF-3 image has a value based on the Sentinel-2 image pixel, each Sentinel-2 image is adjusted to the same size as the corresponding GF-3 image through the nearest neighbor resampling. Two parameters need to be set in the RF classifier: the number of decision trees T and the number of features K. In this study, T is set to 100 and K is the square root of the number of inputs. Through visual interpretation, it is found that the classes of woodland, residential land, crops, roads and water are obvious in the study areas. Therefore, all areas are divided into these five categories through random forest classification. A total of 1000 samples are extracted from each study area through visual interpretation. The number of samples in each class is the same. The training samples and test samples are randomly selected from all samples.

The Proposed Method
The experiment in this paper consists of three steps, namely data preprocessing, image fusion and result evaluation. Firstly, the Sentinel-2A image and GF-3 image are respectively subjected to different preprocessing processes. Secondly, the proposed fusion algorithm is used in the preprocessed image. Meanwhile, PCA and GS are used in the comparison. Finally, the quality assessment function and random forest classification are used to evaluate the fusion image. The flowchart of the experiment is shown in Figure 2.

GS Fused
Image GS 1 Histogram Matching

Weighted Median Filter
Weighted median filter (WMF) mathematically corresponds to global optimization. It can effectively filter images while not strongly blurring edges [57]. It is an operator that replaces the current pixel with the weighted median of neighboring pixels within a local window. Consider only pixels within the local window R(p) of radius r centered at p in processing pixel p in image I. For each pixel q ∈ R(p), WMF associates it with a weight ω pq in corresponding feature map f, i.e., where f(p) and f(q) are features at pixels q and q, g is a typical influence function between neighboring pixels.
In [57], joint-histogram, which is a two-dimensional histogram structure for storing pixel count, is used to regenerate weights every time the window shifts. Each pixel q in the filter is associated with its value I(q) and feature f(q). Suppose the total value of pixels is N I and denote the pixel value index of I(q) as i. Then, list all possible pixel values in ascending order as L = I 0 , I 1 , · · · , I N I −1 . Similarly, suppose there are N f different features and denote the feature index for f (p) as f after ordering all features as F = f 0 , f 1 , · · · , f N f −1 .
In the joint-histogram H, the pixel q is put into the histogram bin H(i, f ) in the f -th row and the i-th column. H where Σ counts the number of elements. For any pixel belonging to bin H(i, f ), due to the filter center p, its weight is calculated as g(f f , f g ). All weights can be obtained by traversing the joint histogram. The total weight for all pixels with value index i is In the case where all i have the weight ω i , the weighted median can be found. By combining the joint histogram and sliding window strategy, the input images are processed in a scan line order. Traverse the joint histogram to calculate the weight {ω k } N I −1 k=0 , and add them to get the total weight ω t for the first pass. In the second pass, the sum of the weights is at most half of ω t , and the pixel value is output. Initially, at the first position of the image, the joint histogram calculates the number of the pixel whose index (i, f ) is within the filter range. The number is placed in the corresponding cell.

Gram-Schmidt Transform (GS)
Gram-Schmidt (GS) transform is one of the pan-sharpening methods for multi-spectral images. Firstly, to make the explanation simpler, two assumptions are proposed. The first concerns the mean of the intensity of each wavelength. It is first subtracted on both of the GF-3 and on the Sentinel-2A image before any processing.
where k indicates the wavelength of which I i,j,k is the intensity; i, j are the pixel coordinates and the summation on i, j is assumed to cover the whole image; N I × N J is the size of image I. After all processing, these means are added back.
The second assumption states that describing pixel locations using lines and columns is here not an issue. A raster scanning order is chosen and images are considered as column vectors where the nth-component is the nth image-intensity read when processing the image along with the chosen raster scanning order. Hence, images are represented as matrices, where each column is assigned with wavelength-specific intensities and each line is assigned with pixel-specific intensities.
when N J is the number of columns of the image I, and I is the matrix associated to I.
The following notations will be used.
Using the new notations, this can be rewritten as where W is a line-vector of size E whose components are w k and T stands for the action of transposing a matrix. A vector v is projected onto a line containing vector u: where · is the dot product defined as the sum of the products of corresponding components. As F k is a column of F, F k is a vector and it can be used to define a projection operator that can be applied to B.
GF-3 image is an SAR image not using any wavelength within or near to the visible band. From a statistical viewpoint, it makes sense to compute a panchromatic image whose pixel values are linear combinations of GF-3 pixel-values and the weights take into account how each wavelength intensity is statistically correlated to a Sentinel-2A panchromatic intensity. This computation using one weight is actually equivalent to a projection onto a wavelength-specific image and the whole computation is equivalent to adding all projections onto all F wavelengths.
where ∑ D is a diagonal F × F matrix whose diagonal components are F T k F k . Let us denoteŴ the line-vector of size F equal to (F T B∑ −1 D ) T . The panchromatic GF-3 image is defined asB = FŴ T By replacing the values of the first wavelength with those ofB, a modified GF-3 image is obtained.
Actually,Ŵ is an identity-matrix except for the first line equal toŴ. The Gram-Schmidt is an algorithm performing a Q-R () decomposition, this transform is applied to ∑ = QR with the following properties. • Q is an orthogonal matrix, R is an upper triangle matrix having an upper triangle matrix inverse and QQ −1 = 1 F .
The R matrix yielded from the Q-R decomposition of the variance-covariance matrix when applied to F yields a new image.
An important property of the Q-R decomposition here is that the first column of S is proportional to that of F and hence can still be interpreted as a panchromatic wavelength. This stems from the fact that R −1 is an upper triangle matrix having only one non-zero component in the first column.
The first column of S having this panchromatic interpretation is now replaced with the true panchromatic image B. All other columns remain unchanged.
Using the R matrix, the modified GF-3 image is obtained. The first column of the image has this SAR interpretation, while the other columns have the classical multi-spectral interpretation.

Experimental Results
This section shows the results obtained by different image fusion methods and evaluates the results. Figure 3 shows the fusion results of GF-3 images and Sentinel-2A images obtained by PCA method, GS transform method and WMF-based Gram-Schmidt transform (WMFGS). All bands in the Sentinel image are used for fusion, and each fused image contains 13 bands. Figure 3 is displayed in the band combination red, blue and green. The geometric resolution of the fused images is 1.4 m, which is the same as the resolution of the preprocessed GF-3 image.

Visual Comparison
In all fused images, it is obvious that some characteristic features such as roads, field boundaries (i.e., linear features) and residential features are inherited from SAR images. For PCA fusion images, although most of the multi-spectral image information is retained, the SAR image information is not well utilized, especially the edge information of water bodies and buildings. Compared with the PCA method, the GS fusion method makes better use of the advantages of SAR images. The characteristics of the water body in the GS fusion image can be seen in Figure 3h,k. The images fused by WMFGS retain the advantages of GS fusion. As a result of the use of WMF, these images can not only effectively filter out the noise in the sentinel images, but also protect the details of the images.
In all results, WMFGS fused results have slightly higher quality as image brightness and sharpness. Visually, the WMFGS fusion algorithm (Algorithm 1) has the best fusion ability compared with the PCA fusion algorithm and the traditional GS fusion algorithm. Compute ω i with (13) and (14)  3 Compute P F k (B) using P u (v) with (19)-(21) 4 end for 5 for j = 1 : N f do 6 ComputeB with (23)) 7 Compute F usingŴ with (24)  8 Compute S using ∑ and F with (25) 9 end for 10 for i = 1 : N e do 11 Compute S using S and W with (26)  12 Compute F with (27)

Image Histogram
The image histogram can describe the frequency of image intensity through a two-dimensional distribution curve [58]. Generally speaking, the image intensity histogram can reflect the image's lighting, contrast, and saturation effects. Histograms are used to evaluate different image fusion methods, which can overcome the shortcomings of conventional evaluation indicators. The histogram of the fusion images and the Sentinel image in the gray level is shown in Figure 4.  Generally, the abscissa of the histogram is between 0 and 255. Since the grayscale range of images is concentrated between 0 and 12, in order to make the image easier to observe, the grayscale range with a value of 0 is cropped in Figure 4. The curve of the Sentinel grayscale image has a single peak. Only in Area 1, based on the curves of the PCA fusion image and GS fusion image, two different peaks can be seen. The WMFGS-based fusion method can provide a larger image range in the peak. In all study areas, images based on WMFGS fusion have the most abundant spectral information.

Quantitative Analysis
Visual evaluation of fusion products is relatively subjective, therefore, the quality metric of image fusion needs to be evaluated objectively. SAR and multi-spectral image fusion is a special category of remote sensing image fusion, and the same quality indicators are used for evaluation [19].
To evaluate the performance of the fusion methods, the fused images are compared with the original Sentinel-2A image. The performance parameters are also evaluated and tabulated in Table 1.
In all research areas, the WMFGS fused images have the highest SD value. Since SD reflects the dispersion of the image gray relative to the average value, the larger the SD, the more information available in the image and the better the fusion quality. AG and SF are indicators to measure image spatial information. These two values of the WMFGS fused image are the largest in other regions, except that the GS fused image in Area 3 has a higher AG and SF. Since PSNR calculates the visual error between the images, it shows that the visual error between the WMFGS fusion image and the Sentinel image is the smallest in Areas 1, 2 and 5. The value of the CC ranges from 0 to 1. The correlation coefficients of all images have little difference.

Random Forest Classification Results
The purpose of remote sensing image fusion is to better apply in land cover. To further compare the fusion image in land cover, RF is used to classify the Sentinel image and all fusion images obtained from the new method. The same training samples are used for random forest classifications on both Sentinel-2A and the fusion images. Figure 5 shows the random forest classified maps when tested on each dataset. Samples classified as residential land, forest, road, water and crop are respectively represented by red, dark green, brown, blue and light green.
The Sentinel classification shows some considerable speckling throughout the study area, whereas the WMFGS classification map is much more homogeneous. The color maps obtained by WMFGS fusion method are more clear and smooth, especially the details of road and water bodies are accurately extracted. The classification result of the sentinel image has low pixels because the original image has a resolution of only 10 meters. Although the PCA fusion method improves the spatial resolution of the image, there is a certain loss of spectral information. Therefore, the classification map based on this algorithm has relatively many wrong pixels.
The overall accuracy (OA) and Kappa coefficient of the classification maps are calculated to further compare the classification results. Tables 2 and 3 respectively show the OA and Kappa coefficient of Sentinel images and the fused images classification map. Due to the loss of spectral information caused by the PCA fusion algorithm, the OA and Kappa values of the PCA fused images are the lowest in all studied areas. The classification of the GS fusion image is generally better than the original Sentinel image, especially in Area 5, the OA is increased by 3.58%, and the Kappa value is increased by 0.0457. The classification accuracy of sentinel image is higher than that of GS fused image in Area 2 and Area 4. Among all images, WMFGS fused images have the highest classification accuracy. Especially in Area 1, the OA of WMFGS fused image is improved by 5.36% compared with sentinel image, and Kappa is improved by 0.07. Table 4

Discussion
Due to the limitation of imaging equipment, multi-spectral data have relatively rich spectral information, but its spatial resolution is relatively low. To meet social and economic applications, more flexible global land cover data products are needed. However, land cover classification maps obtained from multi-spectral images are not yet ready to meet this demand because of their low resolution. This paper aims to fuse multiple sensors to generate synthetic data with fine resolution. The proposed pixel-level fusion method and the classification method used can provide a cost-effective way to generate high spectral and spatial resolution for improving the land cover classification accuracy.
In this study, a novel image fusion method of Sentinel image and SAR image based on weighted median filter and Gram-Schmidt transform is proposed. Qualitative and quantitative indicators are used to evaluate the quality of the fused image. As a supplementary benefit, the training of random forest verifies the role of fusion images in land classification. The results of the five study areas demonstrate the effectiveness of the proposed method. (1) In this study, a weighted median filter is used to combine with the Gram-Schmidt transform method. The result of Table 3 shows that compared with the traditional GS fusion method, the weighted median filter can improve the image quality. This is because WMF removes the noise in the Sentinel image while maintaining the details of the image. As described in reference [59], WMF could extract sparse salient structures, like perceptually important edges and contours, while minimizing the color differences in images regions. Since WMF is capable of identifying and filtering the noise while still preserving important structures, the classification map of WMFGS has higher OA and Kappa values. (2) In Figures 1 and 3, comparing the fused image with the Sentinel image, the contrast between the colors of different classes in the fused image is obvious. In Areas 3 and 4, GS fused images and WMFGS fused images are relatively good at dividing urban areas and roads, while there is no obvious difference in the PCA fused images. In addition, the RF classification maps also show that the PCA fused images do not have better performance than the Sentinel images on the land cover. This is because the spectral information of the original image has a certain percentage of loss in the PCA algorithm fusion process. Comparing the details in Figure 5, the sentinel image has a rougher classification map due to its lower resolution. Although the resolution of PCA fusion images has been significantly improved, many tiny places are misclassified. Water bodies have special spectral characteristics and are easier to distinguish from other types in land cover classification. However, the inner city river is still difficult to distinguish. In the classification map of the WMFGS fusion image, small rivers also have relatively high classification accuracy. (3) In this study, five indicators, including standard deviation, average gradient, spatial frequency, peak signal-to-noise ratio and correlation coefficient, are used to evaluate the quality of the fusion image. As shown in Table 1, WMFGS fused images have the best quality in most areas in terms of their own image quality and the correlation with the source image. In future work, more indicators will be used to evaluate the fusion image. (4) In this paper, RF is used for land cover classification. The OA and Kappa coefficient of the five study areas are respectively displayed in Table 2 and 3. Compared with Sentinel images, when using WMFGS fusion method, the classification performance is superior for all study areas. The OA and Kappa coefficient have been significantly improved. The classification performance of GS fused images is also improved in Areas 1, 2 and 5. This is mainly due to the addition of GF-3 image information. The classification results prove that the fusion image is very helpful for land cover. (5) There are also many researchers using GF-3 or Sentinel-2 data for land cover classification. For example, Fang et al. propose a fast super-pixel segmentation method to classify GF-3 polarization data, which has an excellent performance in land cover classification [60]. However, their research also requires pre-classification based on optical data. In [61], Sentinel-2 image is used in the land cover study of the Red River Delta in Vietnam, and the balanced and imbalanced data sets are analyzed, but only the 10 band images with 20m resolution are used for classification. Different from the pixel-level algorithm proposed in this paper, Gao et al. use a feature-level fusion algorithm to fuse the covariance matrix of GF-3 with the spectral band of Sentinel-2 [20]. (6) The proposed method provides a feasible option for generating images with high spatial resolution and rich spectral information, thereby extending the application of EO data. The method is suitable for the efficient fusion of registered SAR images and MS images. Using the proposed method can make full use of existing SAR and MS satellite data to generate products with different spatial resolutions and spectral resolutions, thereby optimizing the results of land cover classification.

Conclusions
In this paper, a novel image fusion method of Sentinel-2A image and GF-3 image is proposed, which can simultaneously retain the Spatial information in SAR images and the rich textural details in visible images. The proposed method is based on Gram-Schmidt transform, and can effectively avoid the loss of spectral information in the fusion process. In particular, weighted median filter is used to reduce noise. Five datasets are used for image fusion and evaluation. To examine performance, the proposed method is evaluated both by visual and quantitative methods and compared with various other fusion algorithms. By inputting the fused images into the random forest ensemble classifier, the image quality can be further checked. The classifying results show that the WMFGS fused images could effectively improve the classification accuracy and obtain accurate land cover maps. Recently, some approaches show great improvements in the remote sensing image fusion by using cloud computing [62,63], which will be exploited in our future work. In addition, we expect to extend and test the proposed method to other applications such as land cover change monitoring.