remote sensing Evaluation of Surface Reﬂectance Products Based on Optimized 6S Model Using Synchronous In Situ Measurements

: Surface reﬂectance (SR) estimation is the most essential preprocessing step for multi-sensor remote sensing inversion of geophysical parameters. Therefore, accurate and stable atmospheric correction is particularly important, which is the premise and basis of the quantitative application of remote sensing. It can also be used to directly compare different images and sensors. The Landsat-8 Operational Land Imager (OLI) and Sentinel-2 Multi-Spectral Instrument (MSI) surface reﬂectance products are publicly available and demonstrate high accuracy. However, there is not enough validation using synchronous spectral measurements over China’s land surface. In this study, we utilized Moderate Resolution Imaging Spectroradiometer (MODIS) atmospheric products reconstructed by Categorical Boosting (CatBoost) and 30 m ASTER Global Digital Elevation Model (ASTER GDEM) data to adjust the relevant parameters to optimize the Second Simulation of Satellite Signal in the Solar Spectrum (6S) model. The accuracy of surface reﬂectance products obtained from the optimized 6S model was compared with that of the original 6S model and the most commonly used Fast Line-of-sight Atmospheric Analysis of Spectral Hypercubes (FLAASH) model. Surface reﬂectance products were validated and evaluated with synchronous in situ measurements from 16 sites located in ﬁve provinces of China: Fujian, Gansu, Jiangxi, Hunan, and Guangdong. Through the indirect and direct validation across two sensors and three methods, it provides evidence that the synchronous measurements have the higher and more reliable validation accuracy. The results of the validation indicated that, for Landsat-8 OLI and Sentinel-2 MSI SR products, the overall root mean square error (RMSE) calculated results of optimized 6S, original 6S and FLAASH across all spectral bands were 0.0295, 0.0378, 0.0345, and 0.0313, 0.0450, 0.0380, respectively. R 2 values reached 0.9513, 0.9254, 0.9316 and 0.9377, 0.8822, 0.9122 respectively. Compared with the original 6S model and FLAASH model, the mean percent absolute error (MPAE) of the optimized 6S model was reduced by 32.20% and 15.86% for Landsat-8 OLI, respectively. On the other, for the Sentinel-2 MSI SR product, the MPAE value was reduced by 33.56% and 33.32%. For the two kinds of data, the accuracy of each band was improved to varying extents by the optimized 6S model with the auxiliary data. These ﬁndings support the hypothesis that reliable auxiliary data are helpful in reducing the inﬂuence of the atmosphere on images and restoring reality as much as is feasible.


Introduction
Surface reflectance is the most basic remote sensing parameter in the solar reflectance spectral bands (visible spectrum and infrared spectrum), and is an important input parameter to obtain leaf area index, vegetation index, burn area identification, water quality validation efforts relied on an indirect comparison with corresponding data, such as the MODIS SR products that were simultaneously acquired by a well-calibrated sensor [9,21]; however, comparisons with these data are limited due to its much lower spatial resolution and AERONET does not have estimated values of surface reflectance. In addition, there are relatively few relevant studies on validating remote sensing surface reflectance by using a large number of synchronous measurement spectral data. In terms of data sources, the above studies mainly use a single data source, in particular, the differences between sensors are rarely involved in using measurements. In terms of the study area, although there are surface synchronous survey verification methods in the literature, reflectance retrieval and validation of synchronous in situ measurements in large areas of China are not involved at present. The studies are less concerned about whether the reflectance of the same ground object is different due to different geographical locations or surrounding environments.
In this study, we utilized MODIS atmospheric products reconstructed by CatBoost and 30 m ASTER GDEM data to adjust the relevant parameters to optimize the 6S model. Higher precision parameters were used for the 6S model. We applied the optimized 6S model to Landsat-8 OLI and Sentinel-2 MSI data sources and compared the results with the original 6S and FLAASH model results using a large number of synchronous measurements. It is important to emphasize that the traditional synchronous ground measurement method is the most direct and best validation method to obtain the surface reflectance of the objects. Multi-type synchronous surface spectral measurement data in the large-scale area were applied for validation of the surface reflectance of satellite images, a wide geographic distribution mainly takes into account different cover types and atmospheric conditions. Most importantly, the deficiency of indirect verification between sensors can be avoided.
The structure of the paper is as follows. The first part provides an introduction to the study; the second part details the data used in the study; the third part outlines the methods; the fourth section demonstrates how the measured data used in validating and evaluating the accuracy of surface reflectance products of different sensors; and the fifth and sixth sections present the discussion and conclusion.

In Situ Measurements
In this study, we validated and evaluated the SR products of different sensors using in situ measurements. This part describes the distributions and related information of the in situ measurements synchronized with Landsat-8 overpass time. We measured spectral data from more than 20 sites, but because of the weather and other reasons, we ruled out some sites with questionable data. Through our efforts, we obtained optimal spectral data from 16 sites of five provinces in 2019: Fujian, Jiangxi, Guangdong, Hunan, and Gansu.
Synchronous measurements were collected by on-site personnel (4-10) using the Analytical Spectral Devices (ASD) FieldSpec4 Hi-Res Surface High Spectroradiometer (350-2500 nm). The Spectroradiometer is equipped with a fiber-optic probe to retrieve spectral characteristics of ground sites in this study. The specific procedures for collecting spectral data are as follows: (1) Place the white board around the sample and make sure that the white board is in the same level position as the sample surface. The operator faces the sun and makes the optical fiber is aligned vertically with the white board to determine the optical fiber sampling height. (2) Optimize the Spectroradiometer. Re-optimize every 15-20 min or when lighting conditions and environmental conditions change significantly. Each time the target is changed, the spectroradiometer is re-optimized. (3) After the optimization, the optical fiber is still vertically aligned with the whiteboard, and the DN value of reflected light is collected. Then the DN value of the sample will be collected and compared with the average DN value of the whiteboard, and the reflectance of the sample is calculated.  (4) The optical fiber aims at the sample and makes the relative reflectance curve of the sample display stably on the interface. (5) The current spectral curve is stored. Five radiance spectra curves as a group are collected for each point, and dozens or even hundreds of groups of data are collected for each site. These results are then averaged (after manual removal of spurious spectra) by ViewSpecPro software to retain the optimal spectral curve and achieve a representative radiance spectrum. (6) The native spectral resolution across the full range (i.e., 3 nm in the VNIR and 8 nm in the SWIR) is internally resampled to 1 nm for the output spectra.
The overpass time and location of the next Landsat-8 OLI were calculated according to that of the previous large number of Landsat-8 OLI. Survey personnel arrived at the crossing site one hour in advance to perform a series of preparatory work, such as ASD warm-up of instruments and calibration of the white board. In general, the measurement started half an hour before the satellite crossed the area. Sample points were evenly selected to collect data and mark coordinates in the measurement area, the geolocation of the sample points was established using a GPS kit with a positional accuracy of approximately 1 m. The marked sample points were about 30 m apart. Synchronous measurements were obtained with vegetation, water, and bare land sites, in which their surface area was large and homogeneous. Figure 1 shows the sample points of route maps for 3 representative ground features.
(2) Optimize the Spectroradiometer. Re-optimize every 15-20 min or when lighting conditions and environmental conditions change significantly. Each time the target is changed, the spectroradiometer is re-optimized. (3) After the optimization, the optical fiber is still vertically aligned with the whiteboard, and the DN value of reflected light is collected. Then the DN value of the sample will be collected and compared with the average DN value of the whiteboard, and the reflectance of the sample is calculated. (4) The optical fiber aims at the sample and makes the relative reflectance curve of the sample display stably on the interface. (5) The current spectral curve is stored. Five radiance spectra curves as a group are collected for each point, and dozens or even hundreds of groups of data are collected for each site. These results are then averaged (after manual removal of spurious spectra) by ViewSpecPro software to retain the optimal spectral curve and achieve a representative radiance spectrum. (6) The native spectral resolution across the full range (i.e., 3 nm in the VNIR and 8 nm in the SWIR) is internally resampled to 1 nm for the output spectra.
The overpass time and location of the next Landsat-8 OLI were calculated according to that of the previous large number of Landsat-8 OLI. Survey personnel arrived at the crossing site one hour in advance to perform a series of preparatory work, such as ASD warm-up of instruments and calibration of the white board. In general, the measurement started half an hour before the satellite crossed the area. Sample points were evenly selected to collect data and mark coordinates in the measurement area, the geolocation of the sample points was established using a GPS kit with a positional accuracy of approximately 1 m. The marked sample points were about 30 m apart. Synchronous measurements were obtained with vegetation, water, and bare land sites, in which their surface area was large and homogeneous. Figure 1 shows the sample points of route maps for 3 representative ground features. According to the above protocols, we completed these massive synchronous in situ measurements. The distribution of all measured sites ( Figure 2) and field measurement scenario of each site ( Figure 3) are as follows: (1) Fujian: Rice at heading stage and water in Nanping; sand beach and water in Pingtan.
Sites 1-4 were located in Fujian Province. Site 1 includes a sample of vegetation in Nanping; the growth of rice was relatively homogeneous when collecting data. Sites 2 and According to the above protocols, we completed these massive synchronous in situ measurements. The distribution of all measured sites ( Figure 2) and field measurement scenario of each site ( Figure 3) are as follows: (1) Fujian: Rice at heading stage and water in Nanping; sand beach and water in Pingtan.
Sites 1-4 were located in Fujian Province. Site 1 includes a sample of vegetation in Nanping; the growth of rice was relatively homogeneous when collecting data. Sites 2 and 4 were water located in Nanping and Pingtan, respectively. The land cover of Site 3, in Pingtan, was sand beach sand. At this site, the measured sand was moist because of its being constantly buffeted by the sea currents.
Sites 5-7 were located in Gansu Province. Both sites 5 and 6 were the Gobi. The texture at site 5, radiation correction field in Dunhuang, was relatively uniform. Site 6 was located at Gobi in Jiayuguan, surrounded by the endless Gobi with flat terrain, uniform texture, and no other disturbance land types. Site 7 was wheat in Zhangye, the representative crop of northern China. The wheat was ripe at the time of measurement. Sites 8-11 were located in Jiangxi Province. Site 8 was the water in Shangrao, which was generally deep and of good quality. We specially chartered a boat to measure the reflectance of the water in the deep area. Site 9 was the rice whose leaves were already yellowish and about to mature in Shangrao. Site 10 and Site 11 were rice with a different growth period in Fuzhou, in which the spectral responses of different maturity vegetation were compared.
Sites 12-14 were located in Zhuzhou, Hunan Province. Site 12 was at rice stubble cover in which the rice was harvested. Site 13 was grass, which was green, evenly distributed. Sites 14 was at the Xiangjiang river, which covered the shallow and deep water zone.
Sites 15-16 were located in Meizhou, Guangdong province. Site 15 was the location of rice; the land surface type of the sample plot was a large area of mature rice about a few acres in size. Sites 16 was the lake water with different depths.
The satellite overpass time, measurement time, and weather conditions of each site were shown in Table 1.
Pingtan, was sand beach sand. At this site, the measured sand was moist because of its being constantly buffeted by the sea currents.
Sites 5-7 were located in Gansu Province. Both sites 5 and 6 were the Gobi. The texture at site 5, radiation correction field in Dunhuang, was relatively uniform. Site 6 was located at Gobi in Jiayuguan, surrounded by the endless Gobi with flat terrain, uniform texture, and no other disturbance land types. Site 7 was wheat in Zhangye, the representative crop of northern China. The wheat was ripe at the time of measurement.
Sites 8-11 were located in Jiangxi Province. Site 8 was the water in Shangrao, which was generally deep and of good quality. We specially chartered a boat to measure the reflectance of the water in the deep area. Site 9 was the rice whose leaves were already yellowish and about to mature in Shangrao. Site 10 and Site 11 were rice with a different growth period in Fuzhou, in which the spectral responses of different maturity vegetation were compared.
Sites 12-14 were located in Zhuzhou, Hunan Province. Site 12 was at rice stubble cover in which the rice was harvested. Site 13 was grass, which was green, evenly distributed. Sites 14 was at the Xiangjiang river, which covered the shallow and deep water zone.
Sites 15-16 were located in Meizhou, Guangdong province. Site 15 was the location of rice; the land surface type of the sample plot was a large area of mature rice about a few acres in size. Sites 16 was the lake water with different depths.
The satellite overpass time, measurement time, and weather conditions of each site were shown in Table 1.

Landsat-8 OLI and Sentinel-2 MSI
Landsat-8 was launched in February 2013 and its orbit was 705 km [22]. The imaging swath was 185 km and the revisit period was 16 days. The Landsat-8 OLI had seven multi-spectral reflectance bands. Collection 2 was publicly released in mid-2020. Its biggest features included sensor radiometric calibration and updated geometric accuracy. Landsat-8 Collection 2 Level-1 data was tested in this study. Sentinel-2 carries MSI [23]. Sentinel-2A and Sentinel-2B were launched in June 2015 and March 2017, respectively, into circular sun-synchronous 786 km orbits [24]. The data provide a 290 km imaging swath, and the revisit period was 10 days of each Sentinel-2 sensor and 5 days when combined.
Due to these different orbits and swath widths, OLI and MSI may detect certain targets on the same day. The accuracy of the SR products was validated using synchronous measurements during the Landsat-8 overpass. The overpass time of Sentinel-2 was close to that of Landsat-8, but not be fully synchronous with Landsat-8, and most of the objects observed were land objects with no mobility. We can assume that the surface does not change as the two satellites overpass.
Selecting the common and similar bands for experiments can avoid or weaken the differences between the data of different sensors to a certain extent ( Table 2). For Sentinel-2 MSI, band 8, with a resolution of 10 m, was taken as the NIR band for comparison.

Auxiliary Data
The influence of the atmosphere on the reflectance band of Landsat mainly includes aerosol, water vapor, and ozone [25]. We used three MODIS atmospheric products to optimize atmospheric parameters-MOD04 AOD values at 550 nm [26], MOD05 water vapor [27], and MOD07 total ozone [28].
ASTER GDEM is a Global Digital Elevation data product. The DEM data are completed based on the observations of NASA's new generation Earth observation satellite Terra. The data have a horizontal accuracy of 1 (about 30 m, 95% confidence) and an elevation accuracy of 20 m (95% confidence). The Shuttle Radar Topography Mission (SRTM) and meteorological data were used to reconstruct AOD.

Categorical Boosting(CatBoost)
The contribution of the atmosphere to remote sensor signal can exceed 80% [29], and the atmospheric component that interferes most is aerosol [30]. AOD is an important parameter to describe aerosol quantitatively [31]. However, there are several missing phenomena in MOD04 products. The mechanism for reconstructing the missing MOD04 products to obtain a more accurate AOD parameter is crucial for atmospheric correction results.
CatBoost is a gradient boosting decision tree framework [32] based on tree boosting. CatBoost's combination of iterations based on weak regressors can produce the theoretical expectation of a strong regressor, the same as the eXtreme Gradient Boosting (XGBoost) [33] and Light Gradient Boosting Machine (LightGBM). In this method, the traditional gradient estimation method is replaced by order boosting, which improves the generalization ability of the model and accuracy, and reduces the probability of overfitting [34]. For a variety of tasks, CatBoost is superior to other decision tree methods [35]. It is especially fit for the regression problem with noisy data and a number of input features [36], which is our AOD reconstruction. In this study, the hyperparameters are fine-tuned by the Bayesian Optimization algorithm [37]. After training with AOD data observed by remote sensing, the missing AOD was reconstructed with meteorological variables, day of the year, and elevation as independent variables. We formularized this regression problem as: where DOY is the day of the year (1-365/366), and Elev is the elevation, Mete t is the 17variable meteorological matrix at current day t. These data were provided by Yu Ding [38]. We used the reconstructed MOD04 (AOD) as the AOD parameter to input into the 6S model in the next section (Section 3.2).

Optimized Second Simulation of the Satellite Signal in the Solar Spectrum (O6S)
The aim of atmospheric correction is to remove contributions to the atmosphere from the total signal measured by satellite sensors [39]. It plays a crucial role in converting apparent radiation measured by satellite sensors into surface reflectance [40]. The same method was used to correct the atmospheric top reflectance of OLI and MSI sensors to surface reflectance to reduce the bias that might occur if different methods were used. In this study, three methods were used to compare the accuracy: optimized 6S model with auxiliary data, original 6S model, and FLAASH model.
The 6S is an advanced radiative transfer code, which is not affected by the characteristics of the study area and target type [41]. According to the input parameters, the atmospheric correction coefficients X a , X b and X c are calculated, using the 6S model, through these atmospheric correction coefficients and the TOA radiance (L). The SR (ρ) without atmospheric effects is obtained for each band. The calculation equation is as follows: The input parameters of the 6S model include geometric parameters, climate type, aerosol type, aerosol optical depth, target height, sensor height, spectral response function, and atmospheric correction mode. MODIS atmospheric products were used to optimize the climate type parameter (MOD05 water vapor products and MOD07 total ozone products) and AOD parameter (MOD04 AOD values at 550 nm reconstructed by CatBoost). DEM data were used to optimize the target elevation parameter. The procedures are as follows: (1) Convert the projection of MODIS atmospheric products (MDO04, MOD05, and MOD07) by HEG software. (2) The invalid values of the MODIS atmospheric products obtained in step (1) are reconstructed by the CatBoost (MOD04 AOD values at 550 nm) and Kriging (MOD05 water vapor products and MOD07 total ozone products) method. (3) Use the reconstructed MODIS atmospheric products, DEM, and other input parameters, the 6S model calculates atmospheric correction parameters. (4) Convert projection and resampling to make its projection and resolution consistent with Landsat-8 OLI and Sentinel-2 MSI data. (5) Finally, the surface reflectance image is calculated according to Formula (2).
The flow chart of the optimized 6S model is adapted according to Wang [9] as The flow chart of the optimized 6S model is adapted according to Wang [9] as Figure  4.

Method of Least Squares (MLS)
The degree of correspondence between the satellite surface reflectance products and the in situ measurements was examined by the method of least squares (MLS). MLS regression was performed on the scatter plot data, and the fitting results were quantified as (i) slope and intercept and (ii) the estimated R 2 of the fit.

Root Mean Square Error (RMSE)
RMSE between the ground measurements and the corresponding satellite data products were also estimated using Equation (3).
The RMSE value quantifies the accuracy of the SR product. Similar methods have been used in the past to describe the uncertainty of Landsat products [8], VIIRS [42], and MODIS SR products [43].

Mean Error (ME), Mean Percent Error (MPE), and Mean Percent Absolute Error (MPAE)
The mean surface reflectance difference between the satellite surface reflectance products and in situ measurements was calculated for the sites, which represents the directionality of the error. The MPE and the MPAE between satellite SR and ground truth measurements were calculated. Each is calculated as follows:

Method of Least Squares (MLS)
The degree of correspondence between the satellite surface reflectance products and the in situ measurements was examined by the method of least squares (MLS). MLS regression was performed on the scatter plot data, and the fitting results were quantified as (i) slope and intercept and (ii) the estimated R 2 of the fit.

Root Mean Square Error (RMSE)
RMSE between the ground measurements and the corresponding satellite data products were also estimated using Equation (3).
The RMSE value quantifies the accuracy of the SR product. Similar methods have been used in the past to describe the uncertainty of Landsat products [8], VIIRS [42], and MODIS SR products [43].

Mean Error (ME), Mean Percent Error (MPE), and Mean Percent Absolute Error (MPAE)
The mean surface reflectance difference between the satellite surface reflectance products and in situ measurements was calculated for the sites, which represents the directionality of the error. The MPE and the MPAE between satellite SR and ground truth measurements were calculated. Each is calculated as follows: where y i is the satellite surface reflectance value of each sample point; y i is the surface reflectance derived from in situ measurements of each sample point; and n is the number of sample points to be compared.

Validation and Evaluation
Results of Two Sensors 4.1.1. Landsat-8 OLI SR Validation and Evaluation Results Figure 5 shows scatter plots of the OLI SR product versus the measured SR values at each site. The MLS clearly shows the linear relationship between the OLI SR product and measured values. The difference between the Landsat-8 OLI SR product and the measured values was calculated by linear regression. The values of R 2 , RMSE, MPE, and MPAE of each band were also computed. According to the spectral response function of each sensor, the hyperspectral in situ measurements were weighted before comparing the accuracy. each site. The MLS clearly shows the linear relationship between the OLI SR product and measured values. The difference between the Landsat-8 OLI SR product and the measured values was calculated by linear regression. The values of R 2 , RMSE, MPE, and MPAE of each band were also computed. According to the spectral response function of each sensor, the hyperspectral in situ measurements were weighted before comparing the accuracy.
When considering the synchronous measured data, the MLS slopes ranged from approximately 0.6581 to 0.9286, across all sites and all methods. The measurements showed relatively high agreement with OLI surface reflectance product across all bands (except Coastal Aerosol band), as indicated by R 2 values were higher than 0.85 (the estimated R 2 value of Red band, SWIR-1 band, and SWIR-2 band reached about 0.95). Retrieval at the Coastal Aerosol band was the most difficult as indicated by the slopes of 0.8171, 0.6581and 0.7460 (according to the order of methods). However, the RMSE value of the NIR band was obviously higher than that of other bands by three methods. The minimum RMSE value was 0.0463 even after optimization. In terms of specific methods, compared with the original 6S model and FLAASH model, the optimized 6S with auxiliary data had a higher accuracy according to various indicators, especially in the Coastal Aerosol and Blue bands. For the Coastal Aerosol band, the MLS slope was raised to 0.8171, R 2 value was raised to 0.9237 and RMSE value was reduced to 0.0224, the MPAE value was reduced by 49.21% and 55.43%, respectively. For the Blue band, the MLS slope was raised to 0.8731, the R 2 value was raised to 0.9502, and the RMSE value was reduced to 0.0162, the MPAE value was reduced by 55.43% and 40.77%. The magnitude agreement between surface reflectance and in situ measurements is better than 40 Figure 6 shows the residual plots of the OLI SR product and the corresponding measured surface reflectance values. From the residual plot of each band, the ME values of the Blue, Green, and Red bands were the smallest, the larger ME value was observed in the NIR band. From the residual plot of each target, the ME values of Gobi in Dunhuang and Jiayuguan were relatively small. The ME values of the mature rice in Fuzhou, rice stubble, and grass in Zhuzhou were relatively large, and the reflectance values are obviously larger than the measurements. The OLI surface reflectance of rice and grass in the NIR band was significantly lower than in situ surface reflectance, and that of water was significantly higher than the in situ surface reflectance. According to the errors of different methods, the optimized 6S model had the minimum mean error as a whole. When considering the synchronous measured data, the MLS slopes ranged from approximately 0.6581 to 0.9286, across all sites and all methods. The measurements showed relatively high agreement with OLI surface reflectance product across all bands (except Coastal Aerosol band), as indicated by R 2 values were higher than 0.85 (the estimated R 2 value of Red band, SWIR-1 band, and SWIR-2 band reached about 0.95). Retrieval at the Coastal Aerosol band was the most difficult as indicated by the slopes of 0.8171, 0.6581and 0.7460 (according to the order of methods). However, the RMSE value of the NIR band was obviously higher than that of other bands by three methods. The minimum RMSE value was 0.0463 even after optimization. In terms of specific methods, compared with the original 6S model and FLAASH model, the optimized 6S with auxiliary data had a higher accuracy according to various indicators, especially in the Coastal Aerosol and Blue bands. For the Coastal Aerosol band, the MLS slope was raised to 0.8171, R 2 value was raised to 0.9237 and RMSE value was reduced to 0.0224, the MPAE value was reduced by 49.21% and 55.43%, respectively. For the Blue band, the MLS slope was raised to 0.8731, the R 2 value was raised to 0.9502, and the RMSE value was reduced to 0.0162, the MPAE value was reduced by 55.43% and 40.77%. The magnitude agreement between surface reflectance and in situ measurements is better than 40.76%, 38.63%, 20.85%, 26.53%, and 22.20% of the original 6S model, for Green, Red, NIR, SWIR-1, and SWIR-2 bands, respectively. Figure 6 shows the residual plots of the OLI SR product and the corresponding measured surface reflectance values. From the residual plot of each band, the ME values of the Blue, Green, and Red bands were the smallest, the larger ME value was observed in the NIR band. From the residual plot of each target, the ME values of Gobi in Dunhuang and Jiayuguan were relatively small. The ME values of the mature rice in Fuzhou, rice stubble, and grass in Zhuzhou were relatively large, and the reflectance values are obviously larger than the measurements. The OLI surface reflectance of rice and grass in the NIR band was significantly lower than in situ surface reflectance, and that of water was significantly higher than the in situ surface reflectance. According to the errors of different methods, the optimized 6S model had the minimum mean error as a whole. The validation results of the OLI SR product were different across all sites. Therefore, we conducted further analysis to validate the accuracy of the product. Here we only analyzed optimized 6S results that had been verified with higher accuracy. Table 3 shows the RMSE and MPAE values between obtained OLI SR product and in situ measurements of each site. The validation results of the OLI SR product were different across all sites. Therefore, we conducted further analysis to validate the accuracy of the product. Here we only analyzed optimized 6S results that had been verified with higher accuracy. Table 3 shows the RMSE and MPAE values between obtained OLI SR product and in situ measurements of each site. Combined with Figure 6, analysis was carried out from each band. In the Coastal and Blue bands, the SR product showed a certain difference when compared to in situ measurements. Reflectance at this level is related to rice at the heading stage, mature rice in Fuzhou, and water in Shangrao with RMSE values around 0.04. Sand beach in Pingtan, Gobi in Dunhuang, and Jiayuguan showed good agreement with measurements: the RMSE values were lower than 0.01 and the MPAE values did not exceed 10%. In the Green and Red bands, the accuracy of vegetation has been improved especially for rice at the heading stage, mature rice in Fuzhou. RMSE values reduced by 50% or more and the MPAE values dropped to about 20%. Other sites also had high accuracy in the two bands. In the NIR band, RMSE values were larger than those of the first four bands, reflectance at that level is associated with the vegetation dominant sites. Vegetation in the NIR band has a high reflectance platform, but is usually lower than the measurements, the RMSE values exceeded 0.05. Although vegetation had a large RMSE value in the NIR band, the MPAE value was relatively low, the MPAE value was basically below 20%. On the contrary, the RMSE value of water was relatively low, but the MPAE value was high. OLI SR product was more accurate in bare land, with a maximum MPAE value of 13.31%. In the SWIR-1 and SWIR-2 bands, the reflectance of vegetation decreased, which was consistent with the measurements. The reflectance of rice stubble in Zhuzhou was different from other vegetation in the two bands, and the rising reflectance value was close to that of the bare land. The RMSE values of bare land in these two bands increased with the increase of reflectance values, but the MPAE remained around 10%.
Generally, across all bands, the rice stubble in Zhuzhou and water in Shangrao sites showed the most spread and variation between OLI and in situ measurements. Both errors were concentrated in the long-wavelength bands. RMSE values of rice stubble in Zhuzhou ranged from 0.05 to 0.08. The MPAE of water in Shangrao increased sharply after the NIR band. For bare land, retrievals at the sand beach site in Pingtan and the Gobi site in Dunhuang and Jiayuguan provided a much closer match to the in situ measurements across all bands. Figure 7 shows scatter plots of the Sentinel-2 MSI surface reflectance product and measured surface reflectance values. The slopes of MLS ranged from 0.6393 to 1.0586, across all sites and all methods. Since there was no Sentinel-2 MSI overpass image at Pingtan test site in Fujian, the evaluation data did not contain the Pingtan test site. For all sites, the estimated RMSE results were very similar to the corresponding OLI results. The accuracy of the NIR band reflectance was relatively low, while the RMSE values of the other bands were relatively small. Similarly, the optimized 6S significantly improved the accuracy of each band. The optimization effect of the Coastal Aerosol band was the most obvious. In the Coastal Aerosol band, the RMSE values of the 6S and FLAASH models were 0.0370 and 0.0543, and the estimated R 2 values were 0.8493 and 0.8591, respectively. After optimization, the RMSE value was reduced to 0.0252 and the R 2 value was raised to 0.9501. The RMSE value decreased by 31.89% and 53.59%, and R 2 increased by 11.87% and 10.59%. For MSI sensors, optimized 6S still had the highest accuracy compared to the other two methods. Figure 8 shows the residual plots of the Sentinel-2 MSI surface reflectance data and the corresponding measured surface reflectance values. It can be seen that the mean errors of all spectral bands are positive (except for the SWIR-2 band), especially Coastal Aerosol and Blue bands, that is, the MSI sensor tends to overestimate the true reflectance of the ground. Similar to the OLI sensor results, the mean errors of the Green and Red bands were the smallest; the mean errors of the Coastal and NIR bands were relatively larger. From the residual plot of each site, the accuracy of Gobi in Dunhuang was the highest and the mean error was the minimum. The errors of rice in Fuzhou, rice stubble, and grass in Zhuzhou were relatively large, and most of them were lower than the measurements. From the different methods of each band, the results of the optimized 6S model also had the highest accuracy as a whole. Table 4 shows the RMSE and MPAE values between obtained MSI SR product and in situ measurements of each site. 0.0370 and 0.0543, and the estimated R 2 values were 0.8493 and 0.8591, respectively. After optimization, the RMSE value was reduced to 0.0252 and the R 2 value was raised to 0.9501. The RMSE value decreased by 31.89% and 53.59%, and R 2 increased by 11.87% and 10.59%. For MSI sensors, optimized 6S still had the highest accuracy compared to the other two methods.  Figure 8 shows the residual plots of the Sentinel-2 MSI surface reflectance data and the corresponding measured surface reflectance values. It can be seen that the mean errors of all spectral bands are positive (except for the SWIR-2 band), especially Coastal Aerosol and Blue bands, that is, the MSI sensor tends to overestimate the true reflectance of the ground. Similar to the OLI sensor results, the mean errors of the Green and Red bands were the smallest; the mean errors of the Coastal and NIR bands were relatively larger. From the residual plot of each site, the accuracy of Gobi in Dunhuang was the highest and the mean error was the minimum. The errors of rice in Fuzhou, rice stubble, and grass in Zhuzhou were relatively large, and most of them were lower than the measurements. From the different methods of each band, the results of the optimized 6S model also had the highest accuracy as a whole. Table 4 shows the RMSE and MPAE values between obtained MSI SR product and in situ measurements of each site.

Accuracy Comparison of Two Kinds of Reflectance Products
In order to avoid the possible deviation caused by different methods, the inversion accuracy of Landsat-8 OLI and Sentinel-2 MSI under the same method was compared. The band accuracy of each sensor has been analyzed above ( Figures 5-8), and the comprehensive accuracy of each site was compared here. There was no corresponding result for sites 3-4 and 7 because there was no corresponding MSI surface product for these sites.   The R 2 values showed that the R 2 values of vegetation were higher than of bare land, but the water was the lowest. The results with two data sources showed that the R 2 of water in Nanping was very low. R 2 values of Zhuzhou Xiangjiang river and Meizhou lake water were slightly different. For Zhuzhou Xiangjiang river, the R 2 value of OLI was 0.7665, and MSI was 0.8107. The R 2 value for MSI for this site increased by 5.77% compared to OLI. For Meizhou lake water, the R 2 values of OLI and MSI were 0.7068 and 0.6667, respectively. The R 2 of MSI decreased by 5.67%. The other sites had high consistency, differing by less than 1.55%.
The RMSE values showed that the larger errors were observed from mature rice in Shangrao, and rice at the heading stage in Fuzhou. The overall RMSE values of the water were low, depending on its low surface reflectance value. Zhuzhou grass showed a larger difference between the two data sources. The RMSE values of OLI and MSI were 0.0543 and 0.0473, with the RMSE value decreasing by 14.62%. The largest difference of other sites between them was only 4.76%.
The ME values of water in Nanping, water in Shangrao, and rice stubble in Zhuzhou, were relatively higher, the errors of rice at heading stage in Nanping, sand beach in Pingtan, and Gobi in Gansu were smaller. Meizhou lake water showed the largest difference between the two data sources, the ME value of MSI was twice that of OLI, even though their ME values were all tiny. The rest of the sites had roughly equal ME values.
The MPAE values were large in all sites of water, because the overall reflectance of water was very low, especially after the NIR band. The MPAE values of bare land were the lowest, the accuracy of the two kinds of reflectance products in bare land was relatively consistent and high. The site with the largest difference in MPAE values between the two data sources was Meizhou lake water. Figures 10 and 11 show the accuracy comparison of two kinds of data by the 6S model and FLAASH model, respectively. The overall results of the two methods were the same as those of the optimized 6S model; therefore, only optimized 6S comparison results were analyzed. The comparative information of each method is shown in Table 5.

Accuracy Comparison between Direct and Indirect Validation
We used MOD09GA surface reflectance products to indirectly validate reflectance products obtained from different data sources. The spatial resolution of MOD09GA is 500 m. Since MOD09 has a much lower resolution than OLI, a 3 × 3 MODIS pixel window was used to select pixels from uniform regions. If the difference between the maximum and minimum values of nine pixels is less than the predefined threshold, the central MODIS pixel is considered to be a uniform region [9,21]. Finally, we selected five sites for comparative validation: Nanping rice at heading stage, Dunhuang Gobi, Jiayuguan Gobi, Zhangye wheat, and Shangrao mature rice, in which their area exceeded the MODIS pixel size and was sufficiently homogeneous. Figure 12 shows the difference between direct and indirect comparisons (taking an optimized 6S model based on OLI sensor as an example). Since MOD09GA does not have a corresponding Coastal Aerosol band, we validated the six OLI bands. In general, the accuracy of MOD09GA validation was much lower than the measured data. At the shorter wavelengths, the reflectance values of MOG09GA were much lower than the inversion values, and the slopes of MLS were greater than 1. At the longer wavelengths, the accuracy of MOD09GA validation of Gobi features was high, and vegetation features were not stable.    The MSI sensor did not include the Zhangye wheat site. The validation results were similar to OLI. The complete comparison results are shown in Table 6.

Discussion
The best method for validating and evaluating the accuracy of surface reflectance products is using synchronous measurements [44]. This work mainly used three methods to obtain OLI and MSI surface reflectance products: (i) optimized 6S model with auxiliary data, (ii) original 6S model, and (iii) FLAASH model. Discuss this study from the following four aspects.

Accuracy Evaluation of Each Band
The accuracies of OLI and MSI SR products obtained by three methods were compared with the synchronous in situ measurements ( Figures 5-8). The analysis results show that the optimized 6S model with auxiliary data has the highest accuracy.
For the two kinds of sensors, the accuracy of Coastal Aerosol and Blue bands was the lowest, while the Green and Red bands showed the smallest validation deviations. Aerosols, water vapor, carbon dioxide, and ozone are the main components responsible for most atmospheric perturbations. The degree of atmosphere influence on each band was different: aerosols mainly affect the shorter wavelength bands (Coastal and Blue bands) and water vapor absorption mainly affects the longer wavelength bands (SWIR-1 and SWIR-2 bands) [19].
In the Coastal Aerosol and Blue band, the difference between SR product and measured data was the greatest [18], due to the more difficult inversion of surface reflectance for these bands-the presence of aerosols causes the Rayleigh scattering effect, especially for vegetation targets. This reflects the difficulty of performing atmospheric corrections reliably at shorter wavelengths. The input aerosol parameters in the 6S original model were insufficient (the default AOD is 0.14497), thus the retrieved SR was overestimated, which is also the reason for the large difference at shorter wavelengths. Therefore, the optimized 6S model adjusted the AOD and improved significantly at shorter wavelengths. It should be pointed out that atmospheric correction can never reach the optimal state and can only restore the true surface reflectance value as far as possible, especially at shorter wavelengths where atmospheric effects are usually greatest [13,17,45]. The effects of aerosols in the visible spectrum are difficult to correct for two main reasons. Firstly, due to the properties of aerosols [18,46], aerosol concentrations can change rapidly and significantly over time and space. Secondly, aerosols have complex scattering and absorption characteristics; these properties vary with the size, shape, density, and spectrum of aerosols [8]. Considering the accuracy of the derived surface reflectance, it is particularly important to parameterize the characteristics of aerosol models over space and time domains [47].
In the Green and Red bands, all methods had high accuracy, and the accuracy of the optimized 6S model was still the highest. The maximum value of RMSE was 0.0276. The MPE values of all methods were within 40%. The results showed that the influence of the atmosphere on these two bands was obviously lower than other bands [19].
In the NIR band, the RMSE values increased sharply, and the minimum was 0.0463 even after optimization. This is because the NIR band is a special band for both vegetation and water. There are more vegetation and water sites in this study, thus the error is obvious. Vegetation reaches a high reflectance platform in the NIR band, but the surface reflectance of water is close to 0 in this band. It is difficult for achieving such an ideal result.
In the SWIR-1 and SWIR-2 bands, the R 2 of all methods was almost higher than other bands. However, RMSE and MPAE values were relatively larger, which may be mainly due to the higher reflectance values of bare land and lower reflectance values of water in these two bands. The improvement of accuracy for the two bands was not obvious by the optimized 6S model compared with other bands. At longer wavelengths, however, water vapor is harder to quantify mainly because of its temporal and spatial variability.
As discussed above, the optimized 6S model provided the best performance surpassing all the other models. Thus, the 6S model needs to be corrected with necessary auxiliary data [48][49][50].

Accuracy Evaluation of Different Cover Types
For different land cover types, the accuracy of bare land is the highest, vegetation is the second and water is the lowest. Basically the same results are obtained by the three methods. Here we only discuss the results of the optimized 6S model. However, even the same type can vary slightly from place to place. Combined with Tables 3 and 4 and Figure 9, we had the following discussion in more detail. Figure 13 shows the mean reflectance difference between OLI/MSI SR products and in situ measurements for all vegetation sites. It indicates that the Zhuzhou rice stubble suffers from the largest errors in estimated surface reflectance (ME of up to −0.0377 and −0.0372, respectively) while the Meizhou mature rice site feature the smallest errors with MPAE of 16.54% (OLI) and 14.92% (MSI), respectively. This may be because the rice stubble is less homogeneous than other mature rice sites. Rice stubble also has no typical spectral characteristics of vegetation in these bands. It is obvious that Zhuzhou grass had small errors in all bands except the NIR band. This is because the measured reflectance of grass in this band was very high, exceeding 0.50, while the inversion reflectance did not achieve such an ideal effect, and the reflectance values were basically about 0.43. As a matter of fact, the results of almost all vegetation sites in the NIR band were lower than the measured data. This may be due to the strong reflection and transmission characteristics of vegetation in this band. At the same time, the reflectance of the near-ground observation was easily affected by the repeated reflection of the transmitted light through the lower blade, so the results of the near-ground observation were higher than those of the satellite sensor. The mean errors of vegetation sites in the last three bands were significantly higher than those in the first four bands, which depends on the reflectance values of each band, the MPAE values were just the opposite. tics of vegetation in this band. At the same time, the reflectance of the near-ground observation was easily affected by the repeated reflection of the transmitted light through the lower blade, so the results of the near-ground observation were higher than those of the satellite sensor. The mean errors of vegetation sites in the last three bands were significantly higher than those in the first four bands, which depends on the reflectance values of each band, the MPAE values were just the opposite.  Figure 14 shows the mean reflectance difference between OLI/MSI SR products and in situ measurements for all water sites. In the shorter wavelength bands, the reflectance of deep water is higher than that of shallow water; in the longer wavelength bands, there is little difference between them. SR products consistently overestimate the measured reflectance, as evidenced by the mean errors of most sites above 0 for all bands. There are probably two main reasons. On the one hand, the reflectance of water is very low, especially after the NIR band; on the other hand, water is fluid and not as stable as other land objects. The ME and RMSE value of all water sites were relatively low, but the MPAE values were high. The reason is the water reflectance in the NIR and SWIR bands were essentially 0 [19], even for extremely turbid waters [51], a small absolute change in the number can result in a large percentage change. This makes both measurement and inversion difficult.  Figure 14 shows the mean reflectance difference between OLI/MSI SR products and in situ measurements for all water sites. In the shorter wavelength bands, the reflectance of deep water is higher than that of shallow water; in the longer wavelength bands, there is little difference between them. SR products consistently overestimate the measured reflectance, as evidenced by the mean errors of most sites above 0 for all bands. There are probably two main reasons. On the one hand, the reflectance of water is very low, especially after the NIR band; on the other hand, water is fluid and not as stable as other land objects. The ME and RMSE value of all water sites were relatively low, but the MPAE values were high. The reason is the water reflectance in the NIR and SWIR bands were essentially 0 [19], even for extremely turbid waters [51], a small absolute change in the number can result in a large percentage change. This makes both measurement and inversion difficult.  Figure 15 shows the mean reflectance difference between OLI/MSI SR products and in situ measurements for all bare land sites. It shows that bare land sites provided the best validation accuracy with ME values below 0.08 [52]. This could be explained by the fact that bare land sites mostly experience clear skies. Another rationale could be the spectral reflectance of barren ground, which tends to be flatter than that of vegetation [52]. Beach and desert reflectance curves are different. This may be due to the fact that the concrete composition of different bare-ground materials is not quite the same. At the Pingtan sand beach site, the measured sand was moist because it was constantly buffeted by the sea currents. The Gobi is composed of gravel, fine sand, and a small amount of clay. The MPAE values of the two data were 6.46% and 6.48% at the Dunhuang Gobi site and 8.69% and 8.71% at the Jiayuguan Gobi site, respectively. This also proves that, even the same Gobi, the Dunhuang Correction Field has higher accuracy, which depends on the advantages of its unique geographical location and atmospheric conditions.  Figure 15 shows the mean reflectance difference between OLI/MSI SR products and in situ measurements for all bare land sites. It shows that bare land sites provided the best validation accuracy with ME values below 0.08 [52]. This could be explained by the fact that bare land sites mostly experience clear skies. Another rationale could be the spectral reflectance of barren ground, which tends to be flatter than that of vegetation [52]. Beach and desert reflectance curves are different. This may be due to the fact that the concrete composition of different bare-ground materials is not quite the same. At the Pingtan sand beach site, the measured sand was moist because it was constantly buffeted by the sea currents. The Gobi is composed of gravel, fine sand, and a small amount of clay. The MPAE values of the two data were 6.46% and 6.48% at the Dunhuang Gobi site and 8.69% and 8.71% at the Jiayuguan Gobi site, respectively. This also proves that, even the same Gobi, the Dunhuang Correction Field has higher accuracy, which depends on the advantages of its unique geographical location and atmospheric conditions. composition of different bare-ground materials is not quite the same. At the Pingtan sand beach site, the measured sand was moist because it was constantly buffeted by the sea currents. The Gobi is composed of gravel, fine sand, and a small amount of clay. The MPAE values of the two data were 6.46% and 6.48% at the Dunhuang Gobi site and 8.69% and 8.71% at the Jiayuguan Gobi site, respectively. This also proves that, even the same Gobi, the Dunhuang Correction Field has higher accuracy, which depends on the advantages of its unique geographical location and atmospheric conditions.

Accuracy Comparison of Different Data
The accuracy of the two kinds of reflectance products was compared using the same method (Figures 9-11 and Tables 3 and 4). The accuracy of the OLI SR product and MSI SR product was highly consistent at each site even though they did not cross on the same day. This confirmed the previous hypothesis that the surface did not change much between the two observations. The differences in spatial resolution between the OLI and MSI have a negligible effect on the results showing that the almost sites variability is stable at various scales below the full measurement area [53]. The best results here are found over the Dunhuang Gobi region, despite the mismatch in measurement and MSI overpass time. It should be pointed out that the accuracy comparison results of the OLI image on 22 October and MSI image on 19 October are different for different ground objects. The difference between the two kinds of the data source of Meizhou mature rice was much smaller than that of Meizhou lake water. Although the overpass time of Landsat-8 and Sentinel-2 were three days apart, the bigger reason should be that the water has the characteristics of fluidity and instability.
Comparing different validation methods ( Figure 12 and Table 6) indicated that due to the difference of sensors and the limitation of resolution between MOD09GA and OLI/MSI, the validation accuracy of MOD09GA in shorter wavelengths is lower than the measured data. MOD09GA data were unstable for vegetation. In the SWIR-1 and SWIR-2 bands, the reflectance of MOD09GA of Nanping rice was significantly higher than the inversion values, while the reflectance of MOD09GA of Shangrao rice was significantly lower than the measured values. The inversion values were normally distributed around the measured data. MOD09GA is suitable for validating large and uniform features such as Dunhuang Gobi, but it may be challenging if we want to validate specific features due to its much lower spatial resolution. Again, it's more important to emphasize that MOD09GA data has one of its biggest drawbacks, which is that it cannot be used to validate water. For MODGA09 data, the reflectance of the water does not conform to the normal reflectance characteristics of the water, and its reflectance value is obviously high. This is also the reason why we did not choose water sites in the comparison work. Synchronous measurements ensure the traceability of the data and eliminate these challenges.

Limitation and Future Perspectives
The optimized 6S model had improved accuracy in all bands for OLI and MSI data, the results show that the improvement effect is significant in shorter wavelength bands, while the improvement effect is not obvious in longer bands, especially for water features with their reflectance close to 0. We need to continuously improve the accuracy of the calculation of the surface reflectance in future studies, then apply the method to the different data and objects as much as possible. In the actual measurement, due to the short overpass time of the satellite, we cannot measure many kinds of ground objects at the same time, but can only measure representative cover types. For vegetation, the spectra of rice and wheat were mainly collected in this study, while other crops were not involved. Although we measured the spectral data of Pingtan sweet potato leaves and Zhangye oats, the measured data showed great fluctuation, because of sparse leaves, the influence of soil, and a large area of lodging oats. We had to eliminate these data so that we didn't compare the reflectance of many crops. The ground objects in the measurement field should be single and stable, with flat terrain, uniform surface, and good directional characteristics. Despite adopting the optimal measured data after screening, the accuracy of the Dunhuang correction field was still the highest, vegetation and water did not achieve such high precision.
The comparisons are shown in Figures 9-11 and Table 5 demonstrate that the agreement between the corresponding sites of Landsat-8 OLI and Sentinel-2 MSI sensors is very good, confirming the potential of combined use of Landsat-8 OLI and Sentinel-2 MSI products. However, we did not consider the agreement between the corresponding bands of sensors. We intend to discuss this issue in-depth in our upcoming work. What are the mechanisms for establishing the most direct and accurate relationship? If we want to carry out linear fitting between the measured data (the hyperspectral in situ measured values are weighted according to the spectral response function of each sensor) and the reflectance products of each band, we need to discuss whether the fitting function is related to specific methods. In order to test this hypothesis, it seems likely that a number of careful experimentation would be required. Of course, we also need to consider that the fitting function may have substantial regional differences, and we also can determine whether the discrepancies in reflectance values are relevant or not for each specific object.

Conclusions
In this study, the in situ measurements of surface reflectance synchronized with satellite data were obtained in large areas of China. Compared with indirect verification, it provided effective support for the accurate validation of satellite remote sensing surface reflectance products.
The optimized 6S, original 6S model, and FLAASH model were used to retrieve the surface reflectance of Landsat-8 OLI and Sentinel-2 MSI satellite data, and the accuracy analysis showed that the optimized 6S model had the highest accuracy among the three models. It can improve the accuracy of two kinds of satellite surface reflectance products.
Although the overpass times of the two sensors, OLI and MSI, are not completely synchronized, the accuracy of each site is highly consistent. In other words, the difference between the two sensors is minimal, and the combination of OLI and MSI sensors offers unprecedented possibilities for high-frequency time series analysis, greatly alleviating the problem of data not being available due to cloud coverage.
Author Contributions: Conceptualization X.Z.; methodology and software, X.L. and X.W.; data curation, writing-original draft preparation, and visualization, X.Z., X.L. and G.W.; writing-review and editing, Y.Z. and Z.Z.; supervision, and project administration, G.H. All authors have read and agreed to the published version of the manuscript.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author.