Determination of the Key Comparison Reference Value from Multiple Field Calibration of Sentinel-2B / MSI over the Baotou Site

: Field calibration is a feasible way to evaluate space-borne optical sensor observations via natural or artiﬁcial sites on Earth’s surface with the aid of synchronous surface and atmospheric characteristic data. Since ﬁeld calibration is a ﬀ ected by the coupled e ﬀ ects of surface and atmospheric characteristics, the single calibration results acquired under di ﬀ erent surface and atmospheric conditions have di ﬀ erent biases and di ﬀ erent uncertainties, making it di ﬃ cult to determine the consistency of these multiple calibration results. In view of this, by assuming that the radiometric performance is invariant during ﬁeld calibration and the calibration samples are independent of each other, the surface–atmosphere invariant Key Comparison Reference Value (KCRV) is essentially derived from various calibration results. As the number of calibration samples increases, the uncertainty in the KCRV should decrease, and the KCRV should approach the “true” value. This paper addresses a novel method for estimating a weighted average value from multiple calibration results that can be used to compare each calibration result, and this value is accepted as the KCRV. Furthermore, this method is preliminarily applied to the ﬁeld calibration of the Multispectral Instrument (MSI) onboard the Sentinel-2B satellite via the desert target at the Baotou site, China. After employing a chi-squared test to verify that 12 calibration samples are independent from each other, the KCRV of the 12 calibration samples at the Baotou site is derived, which exhibits much lower uncertainty than a single sample. The results show that the KCRVs of the relative di ﬀ erences between the simulated and observed at-sensor reﬂectance are 3.75%, 5.11%, 6.09%, and 5.03% for the four bands of Sentinel-2B / MSI, respectively, and the corresponding uncertainties are 1.84%, 1.87%, 1.90%, and 1.93%. It is noted that the KCRV uncertainty obtained with only 12 calibration samples is reduced signiﬁcantly, and in the future, more samples in other instrumented sites will be used to validate this method thoroughly. This paper proposes a novel weighted average method for evaluating the sensors’ radiometric performances using multiple samples under di ﬀ erent atmospheric and surface conditions, and a reference value close to the corresponding “true” value, namely, the KCRV, is calculated for comparing each calibration result. In this method, the weight coe ﬃ cient assigned to each sample is calculated according to the calibration uncertainty of each sample, and a single KCRV is calculated as a weighted mean value when the calibration samples are independent from each other. The KCRV would be a state-of-art reference value for performing the consistency comparison among various calibration samples, and the degrees of equivalence for each sample are calculated. The lower degrees of equivalence are, the more accurate the sample is. reﬂectances 6.09%, bands, the uncertainties 1.84%, 1.87%, 1.90%, and 1.93%, respectively. calibration results KCRV, 8, 11, 12 closer the KCRV value samples.


Introduction
The ability to detect and quantify changes in Earth's environment and its global energy balance depends on satellite sensors that can provide calibrated, consistent measurements of Earth's surface the observations fall in and it is more confident than a single-sample result in the metrology field. The derived KCRV is much closer to the "true" value than a single-sample result and, besides, it could be used as a bridge for comparing the consistency of these multiple calibration results. Moreover, this method is tested by applying it to the field calibration of the Multispectral Instrument (MSI) onboard the Sentinel-2B satellite using the desert target at the Baotou site, and a surfaceatmosphere invariant KCRV of the 12 calibration samples is derived as the weighted average value of these samples. This paper is organized as follows. The test site and datasets are described in Section 2. Section 3 provides the method for determining the KCRV, and the results are presented in Section 4. The discussion and the conclusions are drawn in Sections 5 and 6, respectively.

Site Description and Dataset
In this study, the desert target at the Baotou site is used to perform the field calibration of Sentinel-2B/MSI. The characteristics of the Baotou site and the corresponding datasets used are described as follows.

Baotou Site Description
The Baotou site was developed under the support of the Ministry of Science and Technology of China in response to the need to collect data for the ever-increasing number of high-resolution onorbit sensors without the restrictive requirement of ground-based personnel. This site is located in Inner Mongolia, China, with geographical coordinates of 109.6°N, 40.85°E, 50 km away from Baotou city, covering a flat area of approximately 300 km 2 with an average altitude of 1270 m. The Baotou site features a semidry continental climate in the southern region of the Mongolian Plateau. The area is dominated by various land surfaces, including lakes, sand, bare soil, maize, and grass, and is equipped with various artificial targets, including portable and permanent targets. Among them, in consideration of the spatial resolution of Sentinel-2B/MSI, the uniform desert target is used to carry out the field calibration in this study. The desert target has a size of 300 m × 300 m to meet the needs of the radiometric calibration of moderate spatial resolution optical sensors (see Figure 1).
The Baotou site is one of four demonstrated sites that took part in the prototyping phase of RadCalNet along with sites at La Crau (France), Railroad Valley (USA), and Gobabeb (Namibia) [10]. This site currently provides a demonstrated global standard automated radiometric calibration service.

Dataset
We choose 2018 as the study year for all the datasets in this research. To avoid the impact of clouds on satellite observations, a total of 12 clear-sky calibration samples are used in the analysis. For clarity, the satellite image, surface reflectance, bidirectional reflectance distribution function (BRDF) data and the corresponding atmospheric parameters are described as follows.

Dataset
We choose 2018 as the study year for all the datasets in this research. To avoid the impact of clouds on satellite observations, a total of 12 clear-sky calibration samples are used in the analysis. For clarity, the satellite image, surface reflectance, bidirectional reflectance distribution function (BRDF) data and the corresponding atmospheric parameters are described as follows.

Satellite Image Datasets
As part of the Copernicus Program of the European Commission (EC), the European Space Agency (ESA) is currently operating the Sentinel-2 constellation of two polar orbiting satellites Sentinel-2A and Sentinel-2B, and each one is equipped with an optical imaging sensor MSI. Sentinel-2A was launched on 23 June 2015 and Sentinel-2B followed on 7 March 2017. The MSI acquires high spatial resolution (band 2-4, band 8, at 10 m; band 5-7, band 8A, band 11, band 12, at 20 m; band 1, band 9, and band 10, at 60 m) optical imagery from 13 bands in the visible, near infrared, and short wave infrared range of the electromagnetic spectrum [11]. In consideration of the target size, bands 2, 3, 4, and 8 with a resolution of 10m are used in the field calibration. Figure 2 shows the spectral response function of the four bands of Sentinel-2B/MSI. Remote Sens. 2020, 12, x FOR PEER REVIEW 4 of 26 As part of the Copernicus Program of the European Commission (EC), the European Space Agency (ESA) is currently operating the Sentinel-2 constellation of two polar orbiting satellites Sentinel-2A and Sentinel-2B, and each one is equipped with an optical imaging sensor MSI. Sentinel-2A was launched on 23 June 2015 and Sentinel-2B followed on 7 March 2017. The MSI acquires high spatial resolution (band 2-4, band 8, at 10 m; band 5-7, band 8A, band 11, band 12, at 20 m; band 1, band 9, and band 10, at 60 m) optical imagery from 13 bands in the visible, near infrared, and short wave infrared range of the electromagnetic spectrum [11]. In consideration of the target size, bands 2, 3, 4, and 8 with a resolution of 10m are used in the field calibration. Figure 2 shows the spectral response function of the four bands of Sentinel-2B/MSI.  Table 1 (columns 3-8). Under the framework of RadCalNet, two sets of automatic measurement systems are developed and installed in the desert target at the Baotou site (see Figure 3), which is well calibrated and SI Twelve scenes of Sentinel-2B/MSI images under clear-sky conditions over the Baotou site are used in this study for performing field calibration. The image acquisition time (IAT) in Coordinated Universal Time (UTC) and local time and the corresponding sun-observation geometry information (viewing zenith angle (VZA), viewing azimuth angle (VAA), sun zenith angle (SZA), and solar azimuth angle (SAA)) are listed in Table 1 (columns 3-8). Under the framework of RadCalNet, two sets of automatic measurement systems are developed and installed in the desert target at the Baotou site (see Figure 3), which is well calibrated and SI traceable for assuring the accuracy of its measurements. The system could continuously observe the Remote Sens. 2020, 12, 2404 5 of 25 surface-reflected spectral radiance at nadir (L g ), covering a spectral range of VNIR (400-1000 nm) every 2 min [12]. Meanwhile, a Cimel CE318 sun photometer was also installed at the Baotou site to acquire the aerosol optical thickness (AOT) and water vapor content (WVC), which can be used to simulate solar direct irradiance (E direct solar ) and diffuse sky irradiance (E di f f use sky ) for retrieving surface reflectance and then to generate the TOA spectral radiance. Since the Baotou site has joined the Aerosol Robotic Network (AERONET), the AOT values in the 440, 670, 870, and 1020 nm channels are retrieved by the AERONET portal from the measured direct solar irradiance every 15 min. Then, the AOT at 550 nm used in this study is calculated via logarithmic interpolation from the AOT in the 440, 670, 870, and 1020 nm channels. In addition, the measured direct solar irradiance in the 936 nm channel of the solar radiometer is used to retrieve the WVC using a modified Langley algorithm via the AERONET portal [12,13]. The AOT at 550 nm and WVC at the time of Sentinel-2B/MSI image acquisition over the Baotou site are linearly interpolated and shown in columns 9 and 10 of  With the radiance of the desert target collected by the automatic measurement system, the surface reflectance ( ρ ) can then be retrieved using Equation (1) [12].
where direct solar E and diffuse sky E are the solar direct irradiance and sky scattering irradiance reaching the ground, respectively, which can be simulated by the MODTRAN 5 atmospheric radiative transfer model with the aid of the synchronous AOT at 550 nm and WVC. Figure 4 shows the retrieved surface reflectances of the desert target at the acquisition time of the Sentinel-2B/MSI satellite images and the standard deviation of surface reflectance during the study dates. The standard deviation is generally less than 0.02. With the radiance of the desert target collected by the automatic measurement system, the surface reflectance (ρ) can then be retrieved using Equation (1) [12].
where E direct solar and E di f f use sky are the solar direct irradiance and sky scattering irradiance reaching the ground, respectively, which can be simulated by the MODTRAN 5 atmospheric radiative transfer model with the aid of the synchronous AOT at 550 nm and WVC. Figure 4 shows the retrieved surface reflectances of the desert target at the acquisition time of the Sentinel-2B/MSI satellite images and the standard deviation of surface reflectance during the study dates. The standard deviation is generally less than 0.02. In addition, the surface bidirectional reflectance factor (BRF) of the desert target at different viewing angles (VZA: from 0 degrees to 40 degrees with 10-degree intervals; VAA: from 0 degrees to 360 degrees with a 30-degree interval) from 10 a.m. to 3 p.m. are collected using an SVC HR-2014 spectrometer, which is controlled by a multi-angle automatic observation system. Then, the Roujean BRDF model [14], as expressed by Equation (2), is used to model the BRDF characteristics of the desert target.
where i θ is the SZA, v θ is the VZA of the spectrometer, ϕ is the relative azimuth angle, λ is the wavelength, and vol K and geo K are the volumetric and geometric scattering kernels, respectively.
Both the volumetric and geometric scattering kernels can be calculated from the sun and spectrometer viewing angles as reported by Roujean in reference [14]. iso f , vol f , and g eo f are the isotropic, volumetric, and geometric kernel coefficients, respectively, which can be determined with the aid of in situ BRF measurements using a linear least square fit method. Figure 5 shows the simulated BRF in bands 2, 3, 4, and 8 of Sentinel-2B/MSI using our constructed BRDF model when the solar zenith angle is set to 20 degrees.  In addition, the surface bidirectional reflectance factor (BRF) of the desert target at different viewing angles (VZA: from 0 degrees to 40 degrees with 10-degree intervals; VAA: from 0 degrees to 360 degrees with a 30-degree interval) from 10 a.m. to 3 p.m. are collected using an SVC HR-2014 spectrometer, which is controlled by a multi-angle automatic observation system. Then, the Roujean BRDF model [14], as expressed by Equation (2), is used to model the BRDF characteristics of the desert target.
where θ i is the SZA, θ v is the VZA of the spectrometer, ϕ is the relative azimuth angle, λ is the wavelength, and K vol and K geo are the volumetric and geometric scattering kernels, respectively. Both the volumetric and geometric scattering kernels can be calculated from the sun and spectrometer viewing angles as reported by Roujean in reference [14]. f iso , f vol , and f geo are the isotropic, volumetric, and geometric kernel coefficients, respectively, which can be determined with the aid of in situ BRF measurements using a linear least square fit method. Figure 5 shows the simulated BRF in bands 2, 3, 4, and 8 of Sentinel-2B/MSI using our constructed BRDF model when the solar zenith angle is set to 20 degrees.  In addition, the surface bidirectional reflectance factor (BRF) of the desert target at different viewing angles (VZA: from 0 degrees to 40 degrees with 10-degree intervals; VAA: from 0 degrees to 360 degrees with a 30-degree interval) from 10 a.m. to 3 p.m. are collected using an SVC HR-2014 spectrometer, which is controlled by a multi-angle automatic observation system. Then, the Roujean BRDF model [14], as expressed by Equation (2), is used to model the BRDF characteristics of the desert target.
where i θ is the SZA, v θ is the VZA of the spectrometer, ϕ is the relative azimuth angle, λ is the wavelength, and vol K and geo K are the volumetric and geometric scattering kernels, respectively.
Both the volumetric and geometric scattering kernels can be calculated from the sun and spectrometer viewing angles as reported by Roujean in reference [14]. iso f , vol f , and g eo f are the isotropic, volumetric, and geometric kernel coefficients, respectively, which can be determined with the aid of in situ BRF measurements using a linear least square fit method. Figure 5 shows the simulated BRF in bands 2, 3, 4, and 8 of Sentinel-2B/MSI using our constructed BRDF model when the solar zenith angle is set to 20 degrees.

Method for KCRV Determination
The field calibration method makes use of natural or artificial targets on the Earth's surface to evaluate satellite sensor observations. In the spectral range of 400-2500 nm, the surface reflectance is measured by a calibrated spectrophotometer; subsequently, the corresponding simulated TOA radiance or reflectance in band i for sample j (L TOA,sim i,j or ρ TOA,sim i,j ) is generated using MODTRAN 5 model driven by synchronous surface reflectance and atmosphere parameters. It is compared with the radiance or reflectance observed by the sensor (L TOA,obs i,j or ρ TOA,obs i,j ), and the relative difference is calculated. In this process, each simulated TOA radiance (or reflectance) is related to input uncertainties from instrumentation used to make surface reflectance and atmospheric measurements, the exoatmospheric solar irradiance model, spatial sampling of the site, and the MODTRAN 5 radiative transfer code. Therefore, an end-to-end uncertainty budget and the overall uncertainty of the simulated TOA radiance (or reflectance) is determined using a combination of analytical and Monte Carlo approaches. Then, the total uncertainty for each calibration result can be determined. On the basis of this, a KCRV is determined and, meanwhile, a chi-squared test is applied to verify the assumption that the calibration samples are independent from each other, and then the degree of equivalence is calculated to evaluate the consistency of the calibration results in multiple calibrations (see Figure 6).
Remote Sens. 2020, 12, x FOR PEER REVIEW 8 of 26 The field calibration method makes use of natural or artificial targets on the Earth's surface to evaluate satellite sensor observations. In the spectral range of 400-2500 nm, the surface reflectance is measured by a calibrated spectrophotometer; subsequently, the corresponding simulated TOA radiance or reflectance in band i for sample j ( TOA obs i j ρ ), and the relative difference is calculated. In this process, each simulated TOA radiance (or reflectance) is related to input uncertainties from instrumentation used to make surface reflectance and atmospheric measurements, the exoatmospheric solar irradiance model, spatial sampling of the site, and the MODTRAN 5 radiative transfer code. Therefore, an end-to-end uncertainty budget and the overall uncertainty of the simulated TOA radiance (or reflectance) is determined using a combination of analytical and Monte Carlo approaches. Then, the total uncertainty for each calibration result can be determined. On the basis of this, a KCRV is determined and, meanwhile, a chi-squared test is applied to verify the assumption that the calibration samples are independent from each other, and then the degree of equivalence is calculated to evaluate the consistency of the calibration results in multiple calibrations (see Figure 6).

Uncertainty Analysis Method Using the Monte Carlo Approach
The Monte Carlo approach, as presented by the Guide to Uncertainty Measurement (GUM) Supplement 1, involves the propagation of the distributions of the input sources of uncertainty by using the model to provide the distribution of the output [15][16][17]. This process is illustrated in Figure  7. In this process, the most appropriate probability density functions (PDFs) for each of the input quantities should be defined, and then a number of Monte Carlo trials should be selected. Generally, the greater the number of simulation trials, the greater the convergence of the results. In this study, this number is chosen by using an adaptive methodology, and by checking the stabilization of the

Uncertainty Analysis Method Using the Monte Carlo Approach
The Monte Carlo approach, as presented by the Guide to Uncertainty Measurement (GUM) Supplement 1, involves the propagation of the distributions of the input sources of uncertainty by using the model to provide the distribution of the output [15][16][17]. This process is illustrated in Figure 7. In this process, the most appropriate probability density functions (PDFs) for each of the input quantities should be defined, and then a number of Monte Carlo trials should be selected. Generally, the greater the number of simulation trials, the greater the convergence of the results. In this study, this number is chosen by using an adaptive methodology, and by checking the stabilization of the standard deviation of the output quantity, this number is set to 1000. The last stage is to summarize and express the results. According to GUM Supplement 1, the standard uncertainty, taken as the standard deviation of these generated values, is reported as the results [15].
Remote Sens. 2020, 12, x FOR PEER REVIEW 9 of 26 standard deviation of the output quantity, this number is set to 1000. The last stage is to summarize and express the results. According to GUM Supplement 1, the standard uncertainty, taken as the standard deviation of these generated values, is reported as the results [15]. To better understand the application of Monte Carlo simulations to the estimation of measurement uncertainty, a case study of estimating the simulated TOA radiance uncertainty caused by AOT uncertainty is presented. For each calibration sample, first, the simulated TOA radiance (or reflectance) of Sentinel-2/MSI is generated using the MODTRAN 5 model driven by measured surface reflectance, AOT, WVC, and sun-viewing geometries. Then, by assuming the AOT uncertainty is approximately 0.01 and its PDF is assumed to be a Gaussian distribution, a Monte Carlo simulation is set to run M = 1000 trials of the MODTRAN 5 model using the error-added AOT and other input sources, and a new group of TOA radiances (or reflectances) is generated. Furthermore, the relative differences between the simulated TOA radiances (or reflectances) with AOT error and those without AOT error could be calculated, and its standard deviation is taken as the TOA radiance uncertainty caused by AOT uncertainty [18].

Determining the KCRV Using the Weighted Average Approach
First, the differences in band i Its corresponding uncertainty ( ) where ( ) Then, to avoid a higher contribution of samples with relatively low uncertainties, the KCRV is calculated as a weighted mean with a cutoff agreed upon by CCPR. The cutoff values for band i, , i cut off u − are calculated by [19,20]: Subsequently, the uncertainty ( )  To better understand the application of Monte Carlo simulations to the estimation of measurement uncertainty, a case study of estimating the simulated TOA radiance uncertainty caused by AOT uncertainty is presented. For each calibration sample, first, the simulated TOA radiance (or reflectance) of Sentinel-2/MSI is generated using the MODTRAN 5 model driven by measured surface reflectance, AOT, WVC, and sun-viewing geometries. Then, by assuming the AOT uncertainty is approximately 0.01 and its PDF is assumed to be a Gaussian distribution, a Monte Carlo simulation is set to run M = 1000 trials of the MODTRAN 5 model using the error-added AOT and other input sources, and a new group of TOA radiances (or reflectances) is generated. Furthermore, the relative differences between the simulated TOA radiances (or reflectances) with AOT error and those without AOT error could be calculated, and its standard deviation is taken as the TOA radiance uncertainty caused by AOT uncertainty [18].

Determining the KCRV Using the Weighted Average Approach
First, the differences in band i ∆ i,j between the simulated TOA reflectance ρ TOA,sim i,j and the observed reflectance ρ TOA,obs i,j for calibration sample j are determined, and it is represented as Equation (3): Its corresponding uncertainty u ∆ i,j is calculated as: where u ρ TOA,sim Then, to avoid a higher contribution of samples with relatively low uncertainties, the KCRV is calculated as a weighted mean with a cutoff agreed upon by CCPR. The cutoff values for band i, u i,cut−o f f are calculated by [19,20]: Subsequently, the uncertainty u ∆ i,j is adjusted by the cutoff, and the adjusted uncertainty for sample j in band i u adj ∆ i,j can be calculated as follows: Then, by assuming that the calibration samples are independent from each other, the weight coefficient in band i for each sample w i,j is determined by [19,20]: The weighted average value of the relative difference y is determined by [18,19]: The uncertainty associated with y is calculated as: Then, a chi-squared test is applied to check whether the assumption of independence of these calibration samples is justified. If the test does not fail, y is accepted as the KCRV; otherwise, the KCRV is null. Thus, the KCRV values represent the average scale of the comparison. The observed chi-squared value, χ 2 obs , is expressed as Equation (9), and the degrees of freedom are assigned as N − 1. The greater the observed chi-squared value, the stronger the calibration sample correlation.
The chi-squared test is regarded as failing if where χ 2 (γ) is the critical value of the chi-squared distribution, which can be acquired in reference [21], when the degrees of freedom are equal to N − 1. Finally, in order to evaluate the calibration result, the difference between the calibration results of each sample ∆ i,j and the KCRV d i is denoted as degrees of equivalence, and is calculated by Equation (12). The lower degrees of equivalence are, the more accurate the sample is.
The uncertainty for d i , u(d i ) is calculated as:

Results
In this study, the 12 samples in 2018 are used to calibrate Sentinel-2B/MSI, and a KCRV weighted by uncertainty is derived to evaluate the consistency of these samples.

Field Calibration Results for Each Sample
With the aid of the retrieved surface reflectance and synchronous atmospheric parameters (WVC, AOT), the simulated TOA radiances in the Sentinel-2B/MSI bands over the desert target at the Baotou site on the study dates are generated using the MODTRAN 5 model and then converted to TOA reflectances. Subsequently, these TOA reflectances for the 12 samples are compared with the observed values, which are calculated according to the digital number (DN) values extracted from the Sentinel-2B/MSI images over the ground target and the corresponding calibration coefficients published by the ESA. Then, the relative differences between the simulated TOA reflectances and the observed reflectances are also acquired using Equation (3). The compared results for the 12 calibration samples are shown in Figure 8. Each sample has a different relative difference due to the different surface and atmosphere characteristics. In general, the simulated TOA reflectances for samples 1-5, 7, 8, 11, and 12 are much closer to the observed reflectances than those for the other samples, and the corresponding relative differences are within 10% for bands 2, 3, 4, and 8. In addition, large relative differences of approximately 16% are also presented.
Remote Sens. 2020, 12, x FOR PEER REVIEW 11 of 26 In this study, the 12 samples in 2018 are used to calibrate Sentinel-2B/MSI, and a KCRV weighted by uncertainty is derived to evaluate the consistency of these samples.

Field Calibration Results for Each Sample
With the aid of the retrieved surface reflectance and synchronous atmospheric parameters (WVC, AOT), the simulated TOA radiances in the Sentinel-2B/MSI bands over the desert target at the Baotou site on the study dates are generated using the MODTRAN 5 model and then converted to TOA reflectances. Subsequently, these TOA reflectances for the 12 samples are compared with the observed values, which are calculated according to the digital number (DN) values extracted from the Sentinel-2B/MSI images over the ground target and the corresponding calibration coefficients published by the ESA. Then, the relative differences between the simulated TOA reflectances and the observed reflectances are also acquired using Equation (3). The compared results for the 12 calibration samples are shown in Figure 8. Each sample has a different relative difference due to the different surface and atmosphere characteristics. In general, the simulated TOA reflectances for samples 1-5, 7, 8, 11, and 12 are much closer to the observed reflectances than those for the other samples, and the corresponding relative differences are within 10% for bands 2, 3, 4, and 8. In addition, large relative differences of approximately 16% are also presented.

Uncertainty Analysis Results for the Simulated TOA Reflectance
Uncertainty is an estimation of the magnitude of error, typically expressed in terms of a confidence interval within which the error lies. In this field calibration process, the simulated TOA reflectance relies on a set of representative input parameters, namely, the surface reflectance of the calibration target, atmospheric parameters, sun-sensor geometry, etc. The error budget process requires understanding the uncertainty of the TOA reflectance used for the field calibration associated with input parameter uncertainty. In this study, the uncertainty of the simulated TOA reflectance associated with uncertainty sources, including the surface reflectance, the spatial homogeneity, and the BRDF characteristics of the target and atmosphere parameters (WVC, AOT, and aerosol type), are analyzed based on the Monte Carlo simulation method.

Uncertainty Analysis Results for the Simulated TOA Reflectance
Uncertainty is an estimation of the magnitude of error, typically expressed in terms of a confidence interval within which the error lies. In this field calibration process, the simulated TOA reflectance relies on a set of representative input parameters, namely, the surface reflectance of the calibration target, atmospheric parameters, sun-sensor geometry, etc. The error budget process requires understanding the uncertainty of the TOA reflectance used for the field calibration associated with input parameter uncertainty. In this study, the uncertainty of the simulated TOA reflectance associated with uncertainty sources, including the surface reflectance, the spatial homogeneity, and the BRDF characteristics of the target and atmosphere parameters (WVC, AOT, and aerosol type), are analyzed based on the Monte Carlo simulation method. The uncertainty of the retrieved surface reflectance, the spatial homogeneity, and the BRDF characteristics significantly contribute to the uncertainty of the TOA reflectance. The three terms of uncertainties are quantified and clarified as follows.
1. Uncertainty analysis of the retrieved surface reflectance at the bottom of the atmosphere (BOA) (denoted as δ BOA,re f ).
As described in Equation (1), the uncertainty of the retrieved surface reflectance is determined by the uncertainties of the measured surface-reflected radiance and the estimated total irradiance arriving at the surface. The measured surface-reflected radiance is affected by the laboratory calibration accuracy (denoted as δ 1,1 ), the stability (denoted as δ 1,2 ), and the temperature sensitivity (denoted as δ 1,3 ) of the spectrometer during in situ measurement. Both the calibration accuracy and temperature sensitivity of the spectrometer are calibrated by the NMI [22]. The δ 1,1 is calculated by accounting for the uncertainty factors including the lamp-diffuser panel radiance source, source nonuniformity, spectrometer nonlinearity, dark signal, external and internal stray light, and spectrometer noise during calibration when determining the calibration coefficients of the spectrometer in the NMI laboratory. All the uncertainty factors are measured in the NMI laboratory under the ambient temperature of 25 • C and are given by the NMI. Moreover, the calibration coefficients of the spectrometer could change due to temperature sensitivity and the different environmental conditions at the test site compared to those in the laboratory. Therefore, the spectrometer is calibrated with a stable source when the ambient temperature T varies from 10 • C to 40 • C, and derived calibration coefficients at an ambient temperature of 0 • C-40 • C are divided by those at an ambient temperature of 25 • C to calculate a series of temperature correction coefficients f (λ, T). Then, an empirical temperature correction model is fitted with these experimental data, as shown in Equation (14). It is noted that δ 1,3 includes the uncertainty associated with the temperature correction model itself δ m 1,3 and the uncertainty associated with the gain correction due to the uncertainty in instrument temperature δ temp 1,3 . The δ m 1,3 could be calculated from the residual to the model. The δ temp 1,3 , associated with the gain correction due to an estimated 2 • C uncertainty, could be calculated by applying the law of the propagation of uncertainty. In addition, δ 1,2 is calculated from the standard deviation of 15 successive field measurements at intervals of 2 min over 15 min before and after the image acquisition time of Sentinel-2/MSI.
where a i and b i (i = 1, 2, 3) are the coefficients, which could be determined with those experimental data using a linear least square fit method. a 1 , a 2 , and a 3 are equal to 1.3579 × 10 −2 • C −1 , −4.6143 × 10 −5 nm −1 • C −1 , and 4.0289 × 10 −8 nm −2 • C −1 , respectively. b 1 , b 2 , and b 3 are equal to 6.4813 × 10 −1 , 1.1923 × 10 −3 nm −1 , and −1.0507 × 10 −6 nm −2 , respectively. The combined uncertainty result of the measured surface-reflected radiance covering 400 nm to 1000 nm is calculated based on error transfer theory (see Figure 9). It is noted that the values of δ 1,1 and δ 1,3 decrease rapidly with wavelength in the spectral range of 450-500 nm, while δ 1,3 increases smoothly with wavelength in the spectral range of 500-1000 nm. The combined uncertainty results of the measured surface-reflected radiance in bands 2, 3, 4, and 8 of Sentinel-2B/MSI are 2.46%, 1.23%, 1.22%, and 1.45%, respectively. δ (see Figure 10). In addition, 1,7 δ is estimated to be approximately 2% [24]; the Thuillier solar irradiance model used has a standard uncertainty of 1.5% at 450 nm, 0.9% at 650 nm, 1.1% at 850 nm, and 0.8% at 1550 nm [25,26], and the value of 1,8 δ at each wavelength is calculated using an interpolation method. The uncertainty of the MODTRAN 5 model is the dominating factor, approximately 2%, while the uncertainty of the WVC contributes less. Figure 10a shows an example of the uncertainties of total irradiance (400-1000 nm) arriving at the ground caused by WVC, AOT, aerosol types, solar irradiance, and MODTRAN 5 model on 7 April 2018. Figure 10b shows the combined uncertainty results of the estimated total irradiance on the study dates for reading clarity. Table 2 shows the combined uncertainty results of the estimated total irradiance arriving at the ground, corresponding to bands 2, 3, 4, and 8 of Sentinel-2B/MSI, and the corresponding uncertainties are within 3.1%. Figure 9. The combined uncertainty of the measured radiance covering 400 nm to 1000 nm due to the laboratory calibration accuracy, the stability, and temperature sensitivity of the spectrometer during in situ measurement.
The uncertainty of the total irradiance arriving at the ground is mainly related to the uncertainty of the WVC (δ 1,4 ), AOT (δ 1,5 ), aerosol types (δ 1,6 ), the MODTRAN 5 model (δ 1,7 ), and solar irradiance (δ 1,8 ). In this study, the uncertainties caused by the WVC and AOT are analyzed using Monte Carlo approaches by assuming an uncertainty of 12% in WVC and 0.01 in AOT [23]. To analyze the uncertainty caused by the mixed aerosol type of the rural and desert aerosols at the Baotou site, two new groups of simulated total irradiance are generated when the desert aerosol model and rural model are used, and their relative difference is calculated as δ 1,6 (see Figure 10). In addition, δ 1,7 is estimated to be approximately 2% [24]; the Thuillier solar irradiance model used has a standard uncertainty of 1.5% at 450 nm, 0.9% at 650 nm, 1.1% at 850 nm, and 0.8% at 1550 nm [25,26], and the value of δ 1,8 at each wavelength is calculated using an interpolation method. The uncertainty of the MODTRAN 5 model is the dominating factor, approximately 2%, while the uncertainty of the WVC contributes less. Figure 10a shows an example of the uncertainties of total irradiance (400-1000 nm) arriving at the ground caused by WVC, AOT, aerosol types, solar irradiance, and MODTRAN 5 model on 7 April 2018. Figure 10b shows the combined uncertainty results of the estimated total irradiance on the study dates for reading clarity. Table 2 shows the combined uncertainty results of the estimated total irradiance arriving at the ground, corresponding to bands 2, 3, 4, and 8 of Sentinel-2B/MSI, and the corresponding uncertainties are within 3.1%.   Based on the uncertainties of the measured reflected radiance and the total irradiance arriving at the ground, the combined uncertainties of the retrieved surface reflectance, corresponding to bands 2, 3, 4, and 8 of Sentinel-2B/MSI, are calculated and listed in Table 3. The uncertainties of the retrieved surface reflectance tend to decrease in all bands except for band 8, which would be caused by vapor absorption at approximately 820 nm and 940 nm.
The surface spatial heterogeneity of the desert target is estimated from the atmosphere-corrected airborne hyperspectral images over the Baotou site. This value is determined by the ratio of the standard deviation and the mean value of the target region in the image at the spatial resolution of the automated observation system (0.13 m) (see Figure 11). The uncertainty results of the surface spatial heterogeneity in bands 2, 3, 4, and 8 of Sentinel-2B/MSI are 1.32%, 1.22%, 1.12%, and 1.03%, respectively. Uncertainty (%) Wavelength (nm) Figure 11. The uncertainty of the surface spatial heterogeneity covering 400 nm to 1000 nm for the desert target.
3. Uncertainty results of surface BRDF characteristics (denoted as brdf δ ) Because the satellite sensor observes the target at an off-nadir angle and the target is a non-Lambert body, uncertainty would result from the VZA differences between the sensor and automated observation system. In this study, to analyze this term of uncertainty, the Roujean model, as expressed by Equation (2), is constructed with the aid of BRF in situ measurements of the target. On the basis of this Roujean model [14], the desert target BRF in the viewing geometries of the Sentinel-2B/MSI sensor and the corresponding BRF at nadir could be simulated, and their relative difference covering 400 -1000 nm is calculated as the uncertainty of the surface BRDF characteristics (see Figure  12). Consequently, the uncertainty of the surface BRDF characteristics for desert target in bands 2, 3, 4, and 8 of Sentinel-2B/MSI are 2.10%, 1.95%, 1.89%, and 1.92%, respectively. 3. Uncertainty results of surface BRDF characteristics (denoted as δ brd f ) Because the satellite sensor observes the target at an off-nadir angle and the target is a non-Lambert body, uncertainty would result from the VZA differences between the sensor and automated observation system. In this study, to analyze this term of uncertainty, the Roujean model, as expressed by Equation (2), is constructed with the aid of BRF in situ measurements of the target. On the basis of this Roujean model [14], the desert target BRF in the viewing geometries of the Sentinel-2B/MSI sensor and the corresponding BRF at nadir could be simulated, and their relative difference covering 400-1000 nm is calculated as the uncertainty of the surface BRDF characteristics (see Figure 12).
Lambert body, uncertainty would result from the VZA differences between the sensor and automated observation system. In this study, to analyze this term of uncertainty, the Roujean model, as expressed by Equation (2), is constructed with the aid of BRF in situ measurements of the target. On the basis of this Roujean model [14], the desert target BRF in the viewing geometries of the Sentinel-2B/MSI sensor and the corresponding BRF at nadir could be simulated, and their relative difference covering 400 -1000 nm is calculated as the uncertainty of the surface BRDF characteristics (see Figure  12). Consequently, the uncertainty of the surface BRDF characteristics for desert target in bands 2, 3, 4, and 8 of Sentinel-2B/MSI are 2.10%, 1.95%, 1.89%, and 1.92%, respectively.

4.
Uncertainty results of δ TOA,re f According to the error transfer theory, the combined uncertainty of the target surface reflectance (δ combin,re f ) is calculated from the above estimated uncertainties of the retrieved surface reflectance (δ BOA,re f ), heterogeneity (δ heter ), and BRDF (δ brd f ). Then, based on the Monte Carlo simulation approach, the TOA reflectance uncertainty (δ TOA,re f ) caused by δ combin,re f is derived. Figure 13 shows the values of δ TOA,re f for the desert target in 400-1000 nm on the study dates, and the corresponding values in bands 2, 3, 4, and 8 of Sentinel-2B/MSI are also given in Table 4. Note that δ TOA,re f tends to increase with the increase in wavelength and is within 4% for bands 2, 3, 4, and 8 of Sentinel-2B/MSI.  Figure 13 shows the values of   Figure 13. The uncertainty results of the predicted top-of-atmosphere (TOA) reflectance associated with the combined uncertainty of the desert target surface reflectance covering 400 nm to 1000 nm on the study dates. Similar to the uncertainty analysis of the aforementioned simulated total irradiance, the uncertainty of the predicted TOA reflectance caused by atmospheric parameters is mainly related to three factors, namely, WVC, AOT, and aerosol type. By assuming that the uncertainties of the measured WVC and AOT are 12% and 0.01, the Monte Carlo approach is used to analyze their resulting uncertainties in TOA reflectance. In addition, the TOA reflectance uncertainty caused by aerosol type is also calculated when the desert aerosol model and rural model are used. The combined uncertainty of the simulated TOA reflectance caused by the uncertainties of WVC, AOT, and aerosol type (δ atm ) is shown in Figure 14, and the corresponding uncertainty in bands 2, 3, 4, and 8 of Sentinel-2B/MSI are given in Table 5. It is worth noting that δ atm for each sample is within 0.71%. Similar to the uncertainty analysis of the aforementioned simulated total irradiance, the uncertainty of the predicted TOA reflectance caused by atmospheric parameters is mainly related to three factors, namely, WVC, AOT, and aerosol type. By assuming that the uncertainties of the measured WVC and AOT are 12% and 0.01, the Monte Carlo approach is used to analyze their resulting uncertainties in TOA reflectance. In addition, the TOA reflectance uncertainty caused by aerosol type is also calculated when the desert aerosol model and rural model are used. The combined uncertainty of the simulated TOA reflectance caused by the uncertainties of WVC, AOT, and aerosol type ( atm δ ) is shown in Figure 14, and the corresponding uncertainty in bands 2, 3, 4, and 8 of Sentinel-2B/MSI are given in Table 5. It is worth noting that atm δ for each sample is within 0.71%. . Figure 14. Combined uncertainty of the predicted TOA reflectance caused by atmospheric parameters on the study dates.   In addition to δ TOA,re f and δ atm , the uncertainties from the MODTRAN 5 model and solar irradiance should also be considered when calculating the combined overall uncertainty of the predicted TOA reflectance according to the error transfer theory (δ overall ). The uncertainties caused by the MODTRAN 5 model and solar irradiance are the same as δ 1,7 and δ 1,8 , respectively, because of their independent transfer characteristics. Figure 15 shows the δ overall in the spectral range of 400-1000 nm, and the corresponding values in bands 2, 3, 4, and 8 of Sentinel-2B/MSI are shown in Table 6. The δ overall in band 8 is larger than that in bands 2, 3, and 4, and the δ overall for all bands is within 4.6%.  Table   6. The overall δ in band 8 is larger than that in bands 2, 3, and 4, and the overall δ for all bands is within 4.6%.

KCRV and Degrees of Equivalence Based on Multiple Calibration Samples
With the aid of the overall uncertainty results of the TOA reflectance and the calibration accuracy of Sentinel-2B/MSI (approximately 5%), the uncertainty of the relative differences between the simulation and observation for each sample in bands 2, 3, 4, and 8 of Sentinel-2B/MSI can be calculated (see Table 7). Subsequently, according to the calibration results for 12 samples and their corresponding uncertainties, the weight coefficient for each sample, KCRV, and degrees of equivalence are determined using the aforementioned method. In this process, the cutoff uncertainty agreed upon by CCPR is given as 6.10%, 6.32%, 6.52%, and 6.64% for bands 2, 3, 4, and 8 of Sentinel-2B/MSI, respectively. After adjusting by the cutoff uncertainty, the calibration uncertainty for each sample (see Table 8) is used to calculate the weight coefficient according to Equation (7). The weight coefficients for 12 calibration samples in the four bands of Sentinel-2B/MSI are given in Table 9. The comparison of Table 6 with Table 9 reveals that the weight coefficient has a negative correlation with the adjusted validation uncertainties; the greater the adjusted uncertainties are, the smaller the weight coefficients are; the correlation coefficients are approximately -1 in the four bands of Sentinel-2B/MSI.
Subsequently, according to Equations (8) and (9), the weighted average value of the relative differences (y), namely, the KCRV and the corresponding uncertainties (u(y)) in each band, are determined and shown in Table 10. The weighted average values of the relative differences are 3.75%, 5.11%, 6.09%, and 5.03% for the four bands, and the uncertainties are 1.84%, 1.87%, 1.90%, and 1.93%, respectively.  Finally, the degrees of equivalence for the 12 calibration samples and the corresponding uncertainties are given in Tables 11 and 12, respectively. Note that samples 2-4, 7, 8, 11, and 12 are much closer to the KCRV value than the other samples, and the degrees of equivalence of these eight samples are lower than 0.05, especially for sample 3.

Discussion
In this study, the field calibration is used to evaluate the radiometric performance of Sentinel-2B/MSI, and first the difference between the simulated radiance and the observed value for each sample is calculated. However, since the field calibration is affected by the coupled effects of surface and atmospheric characteristics, the relative differences acquired under different surface and atmospheric conditions differ from each other and have different uncertainties, making it difficult to compare the consistency among multiple samples. In view of this, a new method for estimating a weighted average value from multiple calibration results is proposed according to the CCPR Key Comparison K1-a, and this value is accepted as the KCRV since the weighted average depicts where most of the observations fall in and it is more confident than a single-sample result in the metrology field. In this method, the weight coefficient for each sample is determined according to its corresponding uncertainty; the derived KCRV has very low uncertainty and becomes close to the "true" value when the number of calibration samples increases. This value could also be used as a reference value for conducting a consistency comparison of the calibration results from multiple samples. The relative difference for each sample is compared with the KCRV, and the degrees of equivalence are calculated. The lower degrees of equivalence are, the more accurate the sample is. This method is preliminarily applied to Sentinel-2B/MSI using the desert target at the Baotou site, as their calibration results are different and independent from each other since they have obviously different atmosphere characteristics of the study date.
Due to the limitation on the matchup of satellite observations and in situ measures, only 12 calibration samples are used in this study. These samples are concentrated in the warm months of March to October since the Baotou site has very cold and snowy winters, and the automatic measurement system does not work very well at low air temperatures. In the future, many more calibration samples over different surface types and synchronized ground measurements will be required, since the KCRV sensitivity depends on the number of calibration samples, and the uncertainty of the KCRV will decrease. This result indicates that the KCRV will be closer to the "true" value when more samples participate in the field calibration. To analyze the influence of the number of calibration samples on the uncertainty of the KCRV, we randomly select different numbers of calibration samples to determine the uncertainty of the KCRV, and the results are shown in Figure 16. The uncertainty of the derived KCRV decreases dramatically with the increase in sample number, which means that the derived KCRV becomes much closer to the "true" value as the sample number increases.
Remote Sens. 2020, 12, x FOR PEER REVIEW 24 of 26 Due to the limitation on the matchup of satellite observations and in situ measures, only 12 calibration samples are used in this study. These samples are concentrated in the warm months of March to October since the Baotou site has very cold and snowy winters, and the automatic measurement system does not work very well at low air temperatures. In the future, many more calibration samples over different surface types and synchronized ground measurements will be required, since the KCRV sensitivity depends on the number of calibration samples, and the uncertainty of the KCRV will decrease. This result indicates that the KCRV will be closer to the "true" value when more samples participate in the field calibration. To analyze the influence of the number of calibration samples on the uncertainty of the KCRV, we randomly select different numbers of calibration samples to determine the uncertainty of the KCRV, and the results are shown in Figure  16. The uncertainty of the derived KCRV decreases dramatically with the increase in sample number, which means that the derived KCRV becomes much closer to the "true" value as the sample number increases. Moreover, a case study is performed using these datasets at the Baotou site to test the proposed method for determining the KCRV from multiple calibration results, and this method could also be applied to other instrumented calibration sites in the framework of CEOS, especially RadCalNet, since the surface and atmospheric characteristics around the automatic measurement systems are included in RadCalNet sites, such as the La Crau site, the Railroad Valley site, and the Gobabeb site, and the uncertainty associated with the BOA and TOA reflectance is provided on the RadCalNet portal for each site. This situation makes it very feasible and effective to provide the input parameters for determining the KCRV by combining all the calibration results from RadCalNet sites. In the future, we will intend to apply this method to Sentinel-2B/MSI using the four demonstrated sites of RadCalNet so that the proposed method will be tested and can be improved. This test is already in our work schedule and will be carried out very soon.

Conclusions
Evaluating the radiometric performance of in-orbit sensors using field calibration during their long-term operational phase via ground targets is an essential quality assurance step that needs to be performed regularly. Field calibration using multiple samples under different surface and atmospheric conditions can be an effective way to reduce the uncertainty of extrapolating parameter estimates from a single-sample calibration. Due to the different surface and atmospheric characteristics, significant differences in the calibration results and the associated uncertainties for Moreover, a case study is performed using these datasets at the Baotou site to test the proposed method for determining the KCRV from multiple calibration results, and this method could also be applied to other instrumented calibration sites in the framework of CEOS, especially RadCalNet, since the surface and atmospheric characteristics around the automatic measurement systems are included in RadCalNet sites, such as the La Crau site, the Railroad Valley site, and the Gobabeb site, and the uncertainty associated with the BOA and TOA reflectance is provided on the RadCalNet portal for each site. This situation makes it very feasible and effective to provide the input parameters for determining the KCRV by combining all the calibration results from RadCalNet sites. In the future, we will intend to apply this method to Sentinel-2B/MSI using the four demonstrated sites of RadCalNet so that the proposed method will be tested and can be improved. This test is already in our work schedule and will be carried out very soon.

Conclusions
Evaluating the radiometric performance of in-orbit sensors using field calibration during their long-term operational phase via ground targets is an essential quality assurance step that needs to be performed regularly. Field calibration using multiple samples under different surface and atmospheric conditions can be an effective way to reduce the uncertainty of extrapolating parameter estimates from a single-sample calibration. Due to the different surface and atmospheric characteristics, significant differences in the calibration results and the associated uncertainties for those multiple samples are generated. In view of this, a reference value, which approaches the "true" value as much as possible, should be derived and used as a bridge for comparing the consistency of those calibration results. This paper proposes a novel weighted average method for evaluating the sensors' radiometric performances using multiple samples under different atmospheric and surface conditions, and a reference value close to the corresponding "true" value, namely, the KCRV, is calculated for comparing each calibration result. In this method, the weight coefficient assigned to each sample is calculated according to the calibration uncertainty of each sample, and a single KCRV is calculated as a weighted mean value when the calibration samples are independent from each other. The KCRV would be a state-of-art reference value for performing the consistency comparison among various calibration samples, and the degrees of equivalence for each sample are calculated. The lower degrees of equivalence are, the more accurate the sample is.
To test the method, it is also applied to the field calibration of Sentinel-2B/MSI using the desert target at the Baotou site, China. A single KCRV with low uncertainty for the 12 calibration samples is derived for bands 2, 3, 4, and 8 of Sentinel-2B/MSI. The KCRVs of the relative differences between the simulated and observed TOA reflectances are 3.75%, 5.11%, 6.09%, and 5.03% for the four bands, and the uncertainties are 1.84%, 1.87%, 1.90%, and 1.93%, respectively. Subsequently, the calibration results of the 12 samples are compared with derived KCRV, and samples 2-4, 7, 8, 11, and 12 are closer to the KCRV value than are the other samples.
Moreover, to thoroughly validate this method, RadCalNet sites will also be used to further increase samples so that the KCRV that is derived by combining all the calibration results from RadCalNet sites can approach the "true" value as much as possible.