Radiometric Normalization for Cross-Sensor Optical Gaofen Images with Change Detection and Chi-Square Test

: As the number of cross-sensor images increases continuously, the surface reﬂectance of these images is inconsistent at the same ground objects due to different revisit periods and swaths. The surface reﬂectance consistency between cross-sensor images determines the accuracy of change detection, classiﬁcation, and land surface parameter inversion, which is the most widespread application. We proposed a relative radiometric normalization (RRN) method to improve the surface reﬂectance consistency based on the change detection and chi-square test. The main contribution was that a novel chi-square test automatically extracts the stably unchanged samples between the reference and subject images from the unchanged regions detected by the change-detection method. We used the cross-senor optical images of Gaofen-1 and Gaofen-2 to test this method and four metrics to quantitatively evaluate the RRN performance, including the Root Mean Square Error (RMSE), spectral angle cosine, structural similarity, and CIEDE2000 color difference. Four metrics demonstrate the effectiveness of our proposed RRN method, especially the reduced percentage of RMSE after normalization was more than 80%. Comparing the radiometric differences of ﬁve ground features, the surface reﬂectance curve of two Gaofen images showed more minor differences after normalization, and the RMSE was smaller than 50 with the reduced percentages of about 50–80%. Moreover, the unchanged feature regions are detected by the change-detection method from the bitemporal Sentinel-2 images, which can be used for RRN without detecting changes in subject images. In addition, extracting samples with the chi-square test can effectively improve the surface reﬂectance consistency. cross-sensor optical Gaofen smaller differences after normalization, Our proposed RRN method can effectively reduce the surface reﬂectance differences between cross-sensor optical Gaofen images. We will further study improving the radiometric consistency in the heterogeneous and multi-type land-use regions and apply our RRN method into change detection using multi-source images will be the of our work.


Introduction
Presently, a huge volume of cross-sensor images is constantly acquired, which provide more valuable information for scientific research and practical application [1]. However, the different satellites have different revisit periods and swaths, so it is difficult to obtain cross-sensor images at the same date and transit simultaneously [2][3][4]. The response to the ground reflection spectrum of cross-sensor images is different at the same ground objects, which results in differences in surface reflectance obtained by cross-sensor images [5,6]. In addition, the surface reflectance consistency between cross-sensor images determines the accuracy of change detection [7], classification [8], and land surface parameter inversion [9], which is the most widespread application. Therefore, the radiometric correction is an essential preprocessing step to reduce the radiometric differences before the joint application of multi-source satellite images, such as multi-source image analysis and Earth monitoring [10,11].
Radiometric correction is divided into two types: absolute correction and relative normalization [12]. Absolute correction includes two steps of absolute radiometric calibration (ARC) and atmospheric correction. ARC aims at converting the digital number (DN) method to extracting stably unchanged pixels from change-detection results instead of the iteratively reweighted (IW) for every sample. In addition, we have proved the effectiveness of extracting samples with the ET method, and it is better than the previous study of the IW method. Therefore, we use the AE and ET method implementing RRN to improve the surface reflectance consistency between cross-sensor optical Gaofen images. The main contribution is that a novel chi-square test further extracts stably unchanged samples from the unchanged regions detected by the AE method. Our method can effectively reduce the surface reflectance differences between cross-sensor optical Gaofen images.
This paper is organized as follows: Section 2 introduces the satellite image information and preprocessing steps. Section 3 explains the RRN and quantitative assessment method. Section 4 mainly proves the effectiveness of our proposed method and analyzes the RRN performance. Section 5 discusses the RNN performance in different ground features. Finally, Section 6 concludes this work.

Sentinel-2 and Gaofen Satellites
In selecting a reference image, since the ground reflective spectrum response in different sensors is different at the same ground objects, it is crucial to choose high-precision and stable data as reference images. Compared with other open access data such as Landsat and MODIS, the advantage of Sentinel-2 is the higher spatial resolution with 10 m spatial resolution in blue, green, red, and NIR bands. In addition, the surface reflectance images of Sentinel-2 can be freely downloaded or acquired using the Sen2cor toolbox to process the Level-1C data that are TOA radiance products with sub-pixel accuracy [41]. The European Space Agency provides the Sen2cor toolbox included standard algorithms to correct atmospheric effects in Sentinel-2 images, which has high accuracy and reliability [42]. Therefore, we detect unchanged regions from the bitemporal Sentinel-2 images and used one of the two images as a reference image to implement RRN in cross-sensor optical Gaofen images.
Sentinel-2 constellation has consisted of two satellites launched in 2015 and 2017 with a revisit time of 5 days, and the satellites every carry a multispectral instrument (MSI) [43]. Table 1 shows that the MSI captures 13 spectral bands from 443 nm to 2190 nm with different spatial resolutions, and it provides 10 m resolution on four bands of B2 (blue), B3 (green), B4 (red), B8 (near-infrared) [44]. Four bands usually are used for change detection, classification, and ground object recognition in optical remote sensing satellite images. The freely available Sentinel-2 products are divided into Level-1C and Level-2A. Level-1C is TOA radiance products with sub-pixel accuracy, and Level-2A is the surface reflectance data [45]. Please note that the surface reflectance values are magnified 10,000 times for numerical storage. In this paper, we only obtained the Gaofen-1 (GF-1) and Gaofen-2 (GF-2) satellite images due to the confidentiality of national geographic information. Other optical Gaofen images, such as Gaofen-6 and Gaofen-7, are not copyright for download and purchase. However, the GF-1 and GF-2 satellites have different characteristics with imaging sensors, spatial resolution, revisit time, and orbital parameters. The cross-sensor optical images from GF-1 and GF-2 satellites can test the effectiveness of our proposed RRN method. Moreover, since the optical Gaofen images are not freely available and the copyright restrictions for purchase, we only obtain the optical Gaofen images in one region. The experiment region is located around Wuhan city center and covers various land use (e.g., deep water, shallow water, residence, road, vegetation, school, and industrial regions). There are intensive and concentrated human activities, which results in highly heterogeneous surface reflectance. Therefore, the experiment region can assess spectral characteristics with different ground features and the effectiveness of our proposed RRN method in the complex region.
The GF-1 is the first high-resolution satellite for observing Earth launched in 2013 In China, and the GF-2 is a civilian optical remote sensing satellite launched in 2014. Table 2 shows the wavelength range of every band, spatial resolution, and revisit time. Although the wavelength range of the two sensors is the same on every band, the surface reflectance values may be different at the same ground object due to the different atmospheric conditions, sensors, and satellite constellations [17,46]. Therefore, it is essential to normalize the surface reflectance images, which can improve the application of multi-source remote sensing images

Images Preprocessing
The preprocessing step is necessary and crucial for converting the non-physical DN value into surface reflectance value to obtain the same spatial resolution and geography information between cross-sensor images. The preprocessing is divided into three parts: image values conversion, spatial information consistency, and clipping ( Figure 1). For the consistency of meaningful physical quantity, the DN images should be converted into surface reflectance images by preprocessing. In reference images, we download Sentinel-2 Level-1C products [41] and use the Sen2cor preprocessing the products to obtain surface reflectance images. Since the surface reflectance products of GF-1 and GF-2 are not released, we downloaded the DN images of GF-1 and GF-2 from the China Center for Resource Satellite Data and Applications [47]. First, DN images are converted into TOA radiance using Equation (1).
where L j is the radiance which receives in the top atmospheric sensor of band j, gain j and bias j are the ARC coefficients of band j (Table 3). Their units are W/ m 2 ·sr·µm . Then, the TOA radiance is converted into surface reflectance using the atmospheric correction algorithm that is the 6S radiative model with an open-source system [49][50][51]. Considering the geomorphic characters of the experiment area and imaging time, we chose the atmospheric model as the mid-latitude summer (MLS) model and the aerosol type as the urban type. Moreover, we set the surface reflectance model as the homogeneous Lambertian model and obtained the aerosol optical thickness (AOT) value from the AOT data in the Sentinel-2 L2A products. The spectral response matrix of GF-1 and GF-2 was resampled to 0.25 nm importing the 6S model [52]. After atmospheric correction, the surface reflectance values of the experimental images here are all magnified by 10,000 times.
For the consistency of the geography information and spatial resolution, relative geometric calibration and resampling are essential. If the spatial information errors are larger than one pixel, which would cause the mismatch of unchanged pixels between the subject with reference images, resulting in an incorrect RRN linear regression model. We manually selected 35 feature pixels in study regions to achieve the sub-pixel accuracy of relative geometric calibration base on the quadratic polynomial model and the nearest neighbor resampling method. These feature pixels include the prominent corners of ground objects such as houses, roads, and playgrounds. These ground features can effectively improve the accuracy of relative geometric calibration, and the error of that is controlled to less than 0.1 m in the north (x) and east (y) direction. Table 4 shows that the geography location offset between reference and subject images meets the sub-pixel accuracy after preprocessing, less than 10 m. The offset between GF-1 and GF-2 image with S2B image is 3.8 m and 3.3 m, respectively. The geography error of S2B and S2A images is 2.6 m. Finally, clipping makes the study regions of cross-sensor optical Gaofen images consistent with the Sentinel-2 images. After preprocessing, the size and spatial resolution of cross-sensor optical Gaofen images are unified to 450 × 450 pixels and 10 m, respectively, which is consistent with the Sentinel-2 image.

Methodology
In this paper, we proposed an RRN method based on a change-detection method and the chi-square test to automatically extract samples, which can effectively normalize surface reflectance between cross-sensor optical Gaofen images. First, the unchanged regions are detected by the change-detection method from the bitemporal Sentinel-2 images, so we need not detect a change in subject images. Then, the chi-square test can extract the stable samples from the unchanged regions to improve the surface reflectance consistency between cross-sensor optical Gaofen images.

Proposed Workflow
The methodology is divided into two parts, bitemporal change detection and extracting samples with the chi-square test ( Figure 2). First, we used an unsupervised autoencoder (AE) method to detect unchanged regions and generate binary change maps [39,40]. Then we used the chi-square test to construct a novel extraction (ET) method and obtain stably unchanged pixels instead of the iteratively reweighted (IW) for every sample. The IW method assigns a weight to every sample by several iterations using the chi-square test. When the difference (λ j ) between the Iterative Weighted Root Mean Square Error (IWRMSE j ) and the previous IWRMSE j is less than the threshold (τ) (τ = 0.001 in this paper), the iteration is stopped. Although the IW method does not discard any samples, the surface reflectance values have been changed by iteratively reweighted. The ET method aims at extracting stable pixels from unchanged feature regions detected by the AE method using chi-square distance as the threshold, which has not changed the surface reflectance values.
The two parts of our proposed RRN method are sketched in Figure 3. Figure 3a shows that the sky-blue square and red dot samples are the unchanged samples detected by the AE method, and the blue line (LR-1) denotes the linear regression result using these samples. Then, we used the ET method to obtain dense and stably unchanged samples shown as red dots, namely those sky-blue squares will be discarded. Finally, using these red dot samples for RRN, the black line (LR-2) denotes the final linear regression result.

Bitemporal Change Detection
The AE method is the fundament of our proposed RRN method based on the reconstruction loss of the joint autoencoder model. The method can detect non-trivial changes, and it is an unsupervised method. It is mainly divided into pre-training, fine-tuning, reconstruction losses, and Otsu's thresholding steps ( Figure 2). The first step is the pre-training of the autoencoder model by encoded and decoded back every pixel pair between bitemporal images. The aim is to choose the pixel pair of the minimal difference between the decoded output and the initial input. The fine-tuning step aims to optimize every pixel pair using a joint autoencoder model composed of two autoencoders. The third step is to calculate the reconstruction error of every pixel pair. Finally, Otsu's thresholding method detects high reconstruction errors to create a binary change map [53]. To prove whether the AE method is better than other change-detection methods, we compare the result of CVA [26], MAD [29], IMAD [30], ISFA [2] in Section 4.

Extracting Samples
Caselles et al. [54] verified the assumption that radiometric differences can be approximated by a linear function. However, the difficulty of RRN is how to select unchanged pixels and satisfy the assumption [18]. In the unchanged regions detected by the changedetection method, since some pixels are sparse and far from a linear function, these pixels cannot be used for linear regression in RRN. We should further extract samples from the unchanged regions to improve the RRN performance.
First, using the least square method, all pixels in the unchanged regions are used to construct a linear function and calculate the Root Mean Square Error (RMSE), which is expressed as [55]:ŷ where x j is the surface reflectance value of band j of the subject image,ŷ j is the surface reflectance value of the normalized image, a j and b j are linear regression coefficients.ŷ i j and y i j are the surface reflectance value of sample i of band j of the normalized and reference image, respectively.
According to the random characteristics of the radiometric differences, we expect that every pixel pair follows a chi-square distribution with one degree of freedom [2,6]. The RMSE of linear regression is used to construct a chi-square distance as the threshold to extract samples. When the weight of samples (w i j ) is larger than the weighted threshold (WT) set as 0.5 in this paper, and the sample will be preserved, otherwise discard it. The formula is expressed as: where T i j is the chi-squared distance of sample i of band j,ŷ i j and y i j are the surface reflectance value of sample i of band j of the normalized and reference image, respectively. P is the probability of being unchanged, measured by the probability of larger than the chi-squared distance. w i j is the weight of sample i of band j same as P. Finally, these extracted samples are used for linear regression with Equation (2) to normalize the surface reflectance between cross-sensor optical Gaofen images.

Quantitative Assessment
We calculate the RMSE (Equation (3)), spectral angle cosine (SAC), structural similarity (SSIM), and CIEDE2000 color-difference metrics to evaluate the performance of RRN. The RMSE and SAC can quantify the differences of surface reflectance values and spectral characteristics between two images, respectively. The smaller RMSE denotes the minor differences of surface reflectance and better the performance of RRN. In addition, if the SAC is closer to 1, the differences of spectral characteristics are smaller. The formula is defined as follows: where y i j andŷ i j are the surface reflectance value of sample i of band j of the reference and normalized image, respectively; α is the spectral angle.
The SSIM and CIEDE2000 are perceptual quality assessment methods that take advantage of the know characteristics of the human visual system. The SSIM combines luminance, contrast, and structural information between two images to evaluate the similarity [56]. In addition, SSIM is very well matched to perceived visual quality compared to the peak signal-to-noise ratio (PSNR) [56,57]. The CIE published the CIEDE2000 in 2001 to quantify the color differences between two pixels [58,59]. Dionisio et al. [60] used the CIEDE2000 color difference to evaluate the spectral quality of fused remote sensing images. Therefore, we use the SSIM and CIEDE2000 color-difference (DE) metrics to evaluate the perceptual quality before and after RRN. The DE metric is based on a transformation to consider the non-uniformities of the human visual system and works in a visible color space. First, we convert three bands of Red, Green, and Blue into CIE Lab color space. In addition, then convert it to CIE LCH color space and calculate DE for every pixel pair. Finally, we calculate the average DE of all pixel pairs as a color-difference metric. If the SSIM is closer to 1, the structural similarity between the two images is higher. In addition, the smaller DE denotes the minor color differences. The SSIM and DE metrics are defined as Equation (9) [56] and Equation (10) [58,59], respectively.
whereŶ j and Y j are the normalized and reference image of band j, respectively. Three functions l Ŷ j , Y j , c Ŷ j , Y j , and s Ŷ j , Y j are luminance, contrast, and structural comparison functions, respectively, combined by the function f (·).
where ∆L, ∆C, and ∆H are the lightness, chroma, and hue differences, respectively, calculated between the reference and normalized images. S L , S C , and S H are the weighting functions for the lightness, chroma, and hue components, respectively. K L , K C , and K H values are the parametric factors for the lightness, chroma, and hue components, respectively. R T is an interactive term between the chroma and hue differences.

Effectiveness of AE and ET Method
To evaluate the effectiveness of the AE and ET method for surface reflectance normalization, we compared five change-detection methods and the RRN performance with the chi-square test. In addition, the five different binary no-change maps were calculated by five change-detection methods, and the results of change detection were qualitatively assessed since there is no pixel-to-pixel truth change map ( Figure 4). The experiment demonstrates the effectiveness of the chi-square test that can improve the radiometric consistency ( Figure 5), and the ET method is better than the IW method ( Figure 6).   First the binary no-change map was calculated by five change-detection methods with S2B and S2A images ( Figure 4). In pre-training and fine-tuning of the AE method, the epochs were set to 1000 to determine the best train strategy. We chose ten epochs for optimal training time and loss, and the loss of pre-training and fine-tuning are 1.817 × 10 −4 and 1.650 × 10 −5 , respectively. Figure 4c shows that the no-change map of the CVA method is too blurred to identify its ground features. Similarly, almost all unchanged samples derived by MAD concentrate in the east of Figure 4d without the clear ground feature. Generally, in good change-detection results, the unchanged samples can show the distribution of some ground features, such as lakes, lush forests, roads, and residential regions. The results of CVA and MAD show too much salt-and-pepper noise to identify the ground features, especially the CVA method. Compared to IMAD and ISFA methods, the CVA and MAD methods are short of the iterative reweighting determined by a chi-square distribution which can further screen false detections [2,30]. Therefore, IMAD and ISFA are better than CVA and MAD methods, and the lake and vegetation regions obviously show unchanged ground features (Figure 4e,f). In the red box of Figure 4f regions, the ISFA result shows that some samples are detected as changes, which may be false detections in deep-water regions. Slow feature analysis transforms the data into a new feature space to suppress the unchanged components and highlight the quickly changed components [61,62]. In deep-water regions, we speculate that some abnormal pixels belong to quickly changing in the feature space, so the differences in these pixels cannot be suppressed, resulting in false detection. Figure 4g shows that the change samples detected by AE methods are more concentrated, rather than the types of salt-and-pepper noise. The change-detection result is more consistent with the change phenomenon and identifies characteristic distributions of ground features. Some regions are unchanged ground features obviously, such as a lake, vegetation, and some impervious areas in urban areas. The CVA, MAD, IRMAD, and ISFA are subtraction-based methods to calculate the difference of every pixel pair independently. These methods cannot keep the complete shape of change ground features and are susceptible to interference from some abnormal pixels, so the change samples show the types of salt-and-pepper noise. The AE method is a neural network autoencoder algorithm, which uses patch reconstruction error instead of image difference to extract features. According to the qualitative characteristics, the AE result shows the characteristic distributions of ground features in unchanged regions, and these change samples do not show a distribution similar to salt-and-pepper noise. Therefore, we used the AE method to detect unchanged ground features for RRN.
In Figure 5, the RMSE, SAC, and SSIM curves were calculated by the normalized image and the reference image (S2B) on four bands to assess the effectiveness of the chisquare test. Since the DE works in a visible color space, we only use three bands of Red, Green, and Blue to calculate the DE curves in different change-detection methods. The different line types denote different methods, the ET and IW denote the RRN result that the chi-square test to process further unchanged samples from the change-detection results. Specifically, it denotes our proposed ET method and IW method in part (II) of Figure 2, respectively. The INC denote the RRN result using unchanged samples detected by the change-detection method without the chi-square test. Figure 5 shows that the RMSE values of the ET and IW method are smaller than that of the INC method, and the SAC and SSIM values of the ET and IW method are larger than that of the INC method. Therefore, the chi-square test can effectively improve RRN, and the ET and IW methods are better than the INC method overall compared with the RMSE, SAC, and SSIM of five change-detection methods. Especially in the NIR band, we calculated the average values of RMSE, SAC, and SSIM of five change-detection methods. Compared with the INC method, the RMSE of GF-1 and GF-2 images was reduced by 576.538 and 605.982 (76.14 and 65.79%) using the ET method, respectively. The SAC increased by 0.018 and 0.011, and the SSIM increased by 0.093 and 0.136. The chi-square test can effectively reduce radiometric differences, and the NIR band has better performance, followed by red, blue, and green. In the GF-2 image, the RMSE of the blue band was reduced by 323.554 (73.47%), which is better than the red band (Figure 5e). Similarly, the SSIM curve shows that the performance of the blue band is better than the red band. Moreover, the DE curves significantly show that the DE values of the ET and IW method are smaller than that of the INC method, which indicates the effectiveness of the chi-square test in RRN (Figure 5d,h).
Compared to normalized and reference images, there is almost no difference between the effectiveness of the ET and IW methods. Compared with the IW method, the RMSE of the blue and red bands was reduced by 1.266 and 1.771 using the ET method, respectively. Meanwhile, the SAC increased by 3.772 × 10 −5 and 5.477 × 10 −4 , and the SSIM curves of ET and IW methods almost overlap. Compared with the IW method, the average DE of five change-detection methods for GF-1 and GF-2 images was reduced by 0.245 and 0.282 using the ET method, respectively. However, compared with the surface reflectance differences between normalized cross-sensor optical Gaofen images, the ET method is significantly better than the IW method ( Figure 6). Figure 5 had demonstrated that the chi-square test holp to improve RRN performance. We found that the performance differences between ET and IW are very small when comparing reference with normalized images. The RMSE, SAC, and SSIM curves were calculated by two normalized images of GF-1 and GF-2 ( Figure 6). The ET method is significantly better than IW compared with the radiometric differences between normalized cross-sensor optical Gaofen images, especially in the green and red bands. Other bands had also improved, i.e., the RMSE of the blue and NIR bands were reduced by10.198 and 5.041, respectively. In addition, the SAC of the blue and NIR bands increased by 5.248 × 10 −4 and 6.621 × 10 −4 , respectively. The SSIM curve shows that the performance of ET is better than IW in the blue and green bands, but the two methods have equivalent performance in the red and NIR bands (Figure 6c). Among the five change-detection methods, the overlap of the five dashed lines demonstrates that the RRN performances are almost the same as using the IW method, but five solid lines are significantly different using the ET method and better than the IW method (Figure 6a-c). Moreover, the DE values of five change-detection methods of ET are significantly small IW, indicating the superiority of the ET method (Figure 6d). Compared with the different change-detection methods in the ET method, the AE method is slightly better than others. Therefore, in our proposed RRN method, this is also the most important reason for using the AE method to detect unchanged ground features.
As mentioned above, the main reason the change-detection performance of IMAD and ISAF methods are better than that of the CVA and MAD methods is that the CVA and MAD methods are short of the iterative reweighting determined by a chi-square distribution. Therefore, we assume that the chi-square test combining the radiometric difference function will also contribute to screening unchanged samples to improve RRN performance. The above experiments demonstrated that our assumption is feasible and effective. Moreover, the ET method is better than the IW method because the ET method further extracts stably unchanged pixels from change-detection results instead of iteratively reweighting for every unchanged pixel. The IW method has changed the surface reflectance values through iterative reweighting, which is not conducive to exclude unstably unchanged samples from change-detection results.

The Performance of Different Reference Images
In this section, we compared Sentinel-2 images with two subject images to detect unchanged regions and RRN, which assess the RRN performance of different reference images. First, the binary no-change map was calculated by five change-detection methods with two subject images of GF-1 and GF-2 ( Figure 7). Then, we used the ET method to extract samples from the binary no-change map and used one of two subject images as a reference image to normalize the other image. The experiment proves that Sentinel-2 images are better than either of two subject images as reference ( Figure 8). Therefore, Sentinel-2 as reference images can improve the surface reflectance consistency. The unchanged feature regions are detected by the AE method from the bitemporal Sentinel-2 images, which can be used for RRN without detecting changes in subject images.  (a,b). The jade-green pixels denote changed pixels, conversely, the sky-blue pixels denote unchanged pixels. Similarly, we found that the AE result showed the characteristic distributions of ground features in unchanged regions, and the change sample did not show a distribution similar to salt-and-pepper noise (Figure 7g). Moreover, Figure 7a,b show there are lush forests in the yellow box region where belong to unchanged ground features. Consistent with the red box of Figure 7c-g, there was detected many changed pixels by the CVA, IMAD, and ISFA methods. Figure 7d shows dense changed pixels in the red ellipse region since there has a thin cloud on the GF-2 image. Some pixels still have radiometric information in the thin cloud region, which can show the unchanged ground feature [63]. The MAD method detects some false change samples, resulting in the loss of some useful radiometric information. In summary, compared with five change-detection methods, the AE method is the most effective to detect unchanged ground features.
In Figure 8, the RMSE, SAC, SSIM, and DE curves were calculated by two normalized images of GF-1 and GF-2 to evaluate the effectiveness of using Sentinel-2 as the reference image. The different line types denote different reference images to normalize subject images. The solid line denotes the quantitative assessment that S2B and S2A images are used to detect the unchanged ground features and the S2B image as reference images to normalize GF-1 and GF-2 images. The dotted and dashed lines denote respectively GF-1 and GF-2 as reference images to normalize the other image, and the binary no-change map is calculated by GF-1 and GF-2 images. In Figure 8a-c, the different line colors and marks denote different change-detection methods. Figure 8 shows the RMSE, SAC, SSIM, and DE curves of cross-sensor optical Gaofen images to quantitatively evaluate the RRN performance using different reference images. The RMSE both the dashed lines and dotted lines are larger than the solid lines, which implies the effectiveness of the S2B reference image is better than GF-1 and GF-2 images ( Figure 8a). The SAC of the dotted lines is smaller than the dashed and solid lines, and the solid red line is the largest in the green and red bands, which are about 0.9971 and 0.9918 (Figure 8b). The SSIM of the solid lines is significantly larger than the dashed lines and dotted lines, indicating the superiority of the S2B reference image (Figure 8c). Figure 8d shows that the DE values of the S2B reference image are smaller than that of GF-1 and GF-2 reference images, which indicates that Sentinel-2 images can effectively reduce color differences between cross-sensor optical Gaofen images.
Compared with the different change-detection methods, the red dashed and dotted lines are close to the solid lines in the blue and green bands, and the RMSE of the AE method is significantly smaller than the other four change-detection methods. The solid lines show that the RMSE of five change-detection methods is almost the same, only the RMSE of the MAD method is slightly larger than others. In the blue and NIR bands, the SAC differences between five solid lines are very little, and the difference between maximum and minimum is about 0.0018. Figure 8c shows that five solid lines almost overlap, implying the same performance in five change-detection methods. However, the DE values of the five change-detection methods are different, and the IRMAD, ISFA, and AE methods are all smaller than CVA and MAD (Figure 8d). This reason may be the difference between change-detection results, and the change-detection performance of CAD and MAD is worse than the other three change-detection methods. Therefore, it is crucial to obtain good change-detection results. In summary, Sentinel-2 images and the AE method are very effective to normalize cross-sensor optical Gaofen images.
For numerical comparison of Figure 8, Table 5 shows the RMSE, SAC, and SSIM of cross-sensor optical Gaofen images normalized by different reference images. Using S2B, GF-1, and GF-2 as reference images, the RMSE is 353.739, 1255.810, and 1335.977 in the NIR band, respectively. Comparing the S2B reference image with GF-1 and GF-2 reference images, the RMSE decreased by 902.071 (71.83%) and 982.238 (73.52%), respectively. Comparing the S2B reference image with the GF-2 reference image, the SAC increased by 0.0009. Comparing the S2B reference image with the GF-1 reference image, the SAC was reduced by 0.0003. Comparing the S2B reference image with GF-1 and GF-2 reference images, the SSIM increased by 0.3313 (92.70%) and 0.3215 (95.74%), respectively. In other bands, the performance of S2B as reference images is the best. Moreover, we calculated the average DE of five change-detection methods, and the S2B reference image is smallest for 42.0451, followed by GF-2 for 48.5945 and GF-1 for 49.9487. To summarize the above experiments, the AE change-detection method and the ET method are effective for RRN, and the Sentinel-2 images can effectively reduce the surface reflectance differences between cross-sensor optical Gaofen images.

Radiometric Normalization Result
In this section, we used the AE and ET method for RRN. The above experiments had proved the effectiveness of our proposed RRN method, and the method can improve the radiometric consistency between cross-sensor optical Gaofen images. It is divided into two parts. First, the binary no-change map is calculated by the AE method using bitemporal Sentinel-2 images. Then the ET method extracts the dense and stable samples to refine the RRN linear model. Figure 9 shows the linear regression result of the GF-1 image in every band. All different color scatters denote unchanged samples detected by the AE method, and the black line is the linear regression result using these samples. The blue, yellow and red scatters denote selected samples by the ET method, and the red line is the linear regression result using these samples. The regions with the densest numbers of samples are displayed in red, flanked by zones of yellow and blue, corresponding to declining numbers of samples. The RMSE of the red line is significantly smaller than that of the black line. In addition, the red line is closest to the red region, which indicates that the ET method can effectively discard abnormal and spare samples to improve RRN performance.  . (a-d) denotes blue, green, red, and NIR bands, respectively. The vertical and horizontal axis denotes the surface reflectance of the S2B reference image and the GF-1 subject image, respectively. Please note that the SR denotes surface reflectance values magnified 10,000 times for numerical storage. Figure 10 shows the linear regression result of the GF-2 image in every band. The RMSE of two linear regression is larger than GF-1 in every band, which may be caused by the thin clouds of the GF-2 image in the southwest [64].
The result of RRN is qualitatively analyzed by visually compared Gaofen images before and after normalization with the reference image. Five images are shown with the same linear stretch, brightness, contrast and sharpen ( Figure 11). Compared to the GF-1 and GF-2 normalized image with the S2B reference image, the color and brightness of the three images are very similar. The qualitative consistency effects of the two normalized images are higher to those of the reference images than to those of the subject images. The GF-2 image still has the thin cloud in the yellow ellipse region, and this influence needs to be weakened by other methods of cloud processing [65]. In general, the radiometric consistency between the subject and reference images is greatly improved. Figure 10. The linear regression result of the GF-2 image. Please note that the meaning of these elements is the same as in Figure 9. To numerically compare surface reflectance consistency of normalized cross-sensor optical Gaofen images, Table 6 shows the RMSE, SAC, and SSIM of the subject image before and after normalization. Three metrics were calculated by the whole image pixels between the subject and the S2B reference image. The rates are the reduced percentage of RMSE after normalization. The maximum percentage of the GF-1 and GF-2 images is 90.58% and 86.84%, respectively. In the green band, the percentage of GF-1 and GF-2 is 90.15% and 86.22%, but the SAC of the GF-2 image was reduced by 0.0003. The SSIM indicates that the blue band of the two Gaofen images is the best, 0.9977 and 0.9966, respectively. In addition, the SSIM of every band is larger than 0.9 after normalization. According to three metrics, the NIR band of the two Gaofen images is the worst. Moreover, we calculated the DE values of two Gaofen images before and after normalization. The GF-1 image before and after normalization is 47.6119 and 41.5654, and the GF-2 image is 46.8586 and 42.5957. The SSIM of the GF-1 and GF-2 was reduced by 6.0465 and 4.2629, respectively. According to atmospheric transmission characteristics of electromagnetic waves, the atmospheric transmittance of the visible band is greater than that of the NIR band, and the transmittance of the visible wavelength of 400-700 nm is about 95%, the NIR of 700-1100 nm is about 80% [66,67]. We speculate that the difference in the atmospheric transmittance results in the different surface reflectance differences between the different bands. The greater transmittance denotes that the surface reflectance is less susceptible to atmospheric influences in satellite sensor imaging, and the surface reflectance is closer to the true ground value. Therefore, the surface reflectance differences of the NIR band are the largest both before and after normalization.

Discussion
Section 4 had proved the effectiveness of our proposed RRN method, and this section would discuss the RRN performance in different ground features. After normalization, the radiometric differences are different in different ground features, so the normalized accuracy is the most important for practical applications [18,68]. To evaluate the RRN effectiveness in different ground features, we have selected five regions, namely the deep water, shallow water, vegetation, building-A, and building-B ( Figure 12). The deep-water region is in the Donghu lake of Wuhan, the largest lake in the city, and its depth is about 20 m. There is a scenery tourist region where the water source has always been protected. Compared with other different ground features, Figure 12b shows that three shallow water regions are colored sky-blue, two vegetation regions where are mainly evergreen broadleaved are colored light green. In addition, two building regions are colored red and pink, respectively. The characteristic difference between the building-A and building-B regions is that the horizontal and vertical roads intersect in building-A. Both building regions are very complex and heterogeneous, covering a variety of ground features.
The surface reflectance curves were calculated by the average of all surface reflectance values in the same ground feature regions ( Figure 13). The solid black line denotes the S2B reference image, and the dashed red and green lines denote the GF-1 and GF-2 images, respectively. In the legend, the RMSE was calculated by the surface reflectance curve of two images, which denotes the difference between two surface reflectance curves. Figure 13a,b shows that the curve trend of the deep water is similar to that of shallow water, but the RMSE of shallow water is smaller than that of deep water. In the reference image, the surface reflectance values of the NIR band are larger than other bands (Figure 13c). The NIR band characteristic contributes to extracting vegetation features using NDVI [69]. Similarly, the regions of building-A and building-B have this characteristic [70], while the subject images do not have this characteristic (Figure 14d,e). However, after normalization, the surface reflectance curve has a high correlation with the reference image, the GF-1 and GF-2 images also have this characteristic (Figure 14h-j).  In Figure 13f-j, RP is the reduced percentage of RMSE after normalization, and the RMSE was calculated by two surface reflectance curves of the GF-1 and GF-2 images. The RP is closer to 100%, and the reduced radiometric differences are more significant. After normalization, the surface reflectance curve of GF-1 and GF-2 is similar to the surface reflectance curve of the reference image, and the RMSE is smaller than 90. The surface reflectance curve differences are minor between GF-1 and GF-2, and the RMSE is smaller than 50 with the reduced percentages of about 50-80%. Therefore, our proposed RRN method can effectively reduce the surface reflectance differences between cross-sensor optical Gaofen images in different ground features. Especially in the deep water and vegetation regions, the normalized surface reflectance curve of the GF-1 image overlaps with that of the GF-2 images, and the RMSE is 11.740 and 14.896. Among five ground features, the maximum RP is about 82.5% in deep-water regions, and the minimum RP is about 52.9% in building-B regions. In building-B regions, the RMSE of GF-1 and GF-2 is the largest compared with other regions, which indicates that the RRN performance is less effective in the complex ground feature regions. For numerical comparison of the RRN performance in different ground features, Figure 14 shows the quantitative assessment of RMSE, SAC, and SSIM calculated by crosssensor optical Gaofen images and reference images. In deep water and shallow water regions, the RMSE of every band is smaller than 100 after normalization (Figure 14a,b). In the vegetation regions, the RMSE of GF-1 and GF-2 in the NIR band are 169.2 and 324.3 after normalization, equally the reduced percentage of RMSE is about 91.9% and 85.1% (Figure 14c). In two building regions, the RMSE of all bands is larger than 100 after normalization, especially the NIR band of GF-2 images, the RMSE is larger than 300 (Figure 14d,e). Figure 14f-j shows that the SAC after normalization of every band of two images is close to 0.998 in deep water, shallow water, and vegetation regions, except the GF-2 image of the NIR band. Although the building-A regions have greatly been improved, the SAC is smaller than 0.998. The performance of the building-B regions is better than the building-A regions, but the regions need further improvement in the NIR band. Figure 14k-o shows that the SSIM significantly increases after normalization, and the GF-1 is better than GF-2. Table 7 shows that the DE of two Gaofen images is reduced after normalization in five ground features. The GF-1 and GF-2 of the maximum DE differences before and after normalization are about 25.9488 and 10.1171, respectively. Similarly, the RRN performance of the two building regions is worse than other homogeneous ground features. In two building regions, the complex and multi-type land-use characteristics cause poor RRN performance, covering many heterogeneous ground features (e.g., residence, road, sparse vegetation, school, and industrial regions). In addition, the surface reflectance in these regions is greatly influenced by human activities. Since there are many mixed pixels, the radiometric difference may be nonlinear [71,72].

Conclusions
In this paper, we have proposed the RRN method for cross-sensor optical Gaofen images and is divided into two parts: bitemporal change detection and extracting samples. To validate the effectiveness of our proposed method, we used four metrics to quantitatively compare and analyze five change-detection methods, different reference images, and five regions with different ground features. First, the experiment demonstrates the effectiveness of the chi-square test to further extract samples for RRN, and our proposed ET method is better than the IW method. Then, the further experiment demonstrates that using Sentinel-2 as reference images can improve the surface reflectance consistency. In addition, the unchanged feature regions are detected by the AE method from the bitemporal Sentinel-2 images, which can be used for RRN without detecting changes in subject images. Four metrics demonstrated the effectiveness of our proposed RRN method, especially the reduced percentage of RMSE after normalization was more than 80%. Moreover, comparing the radiometric difference of five ground features, the surface reflectance curve of the cross-sensor optical Gaofen images showed smaller differences after normalization, and the RMSE was smaller than 50 with the reduced percentages of about 50-80%.
Our proposed RRN method can effectively reduce the surface reflectance differences between cross-sensor optical Gaofen images. We will further study improving the radiometric consistency in the heterogeneous and multi-type land-use regions and apply our RRN method into change detection using multi-source images will be the subject of our future work.