Bias Correction of the Ratio of Total Column CH 4 to CO 2 Retrieved from GOSAT Spectra

: The proxy method, using the ratio of total column CH 4 to CO 2 to reduce the effects of common biases, has been used to retrieve column-averaged dry-air mole fraction of CH 4 from satellite data. The present study characterizes the remaining scattering effects in the CH 4 /CO 2 ratio component of the Greenhouse gases Observing SATellite (GOSAT) retrieval and uses them for bias correction. The variation of bias between the GOSAT and Total Carbon Column Observing Network (TCCON) ratio component with GOSAT data-derived variables was investigated. Then, it was revealed that the variability of the bias could be reduced by using four variables for the bias correction—namely, airmass, 2 μm band radiance normalized with its noise level, the ratio between the partial column-averaged dry-air mole fraction of CH 4 for the lower atmosphere and that for the upper atmosphere, and the difference in surface albedo between the CH 4 and CO 2 bands. The ratio of partial column CH 4 reduced the dependence of bias on the cloud fraction and the difference between hemispheres. In addition to the reduction of bias (from 0.43% to 0%), the precision (standard deviation of the difference between GOSAT and TCCON) was reduced from 0.61% to 0.55% by the correction. The bias and its temporal variation were reduced for each site: the mean and standard deviation of the mean bias for individual seasons were within 0.2% for most of the sites.


Introduction
Atmospheric methane (CH4) is the most significant anthropogenic greenhouse gas after carbon dioxide (CO2) and is emitted from both anthropogenic and natural sources. Satellite observation, which can obtain data over wide areas, is effective to elucidate the CH4 budget over the globe. In the last 15 years, the column-averaged dry-air mole fraction of methane (XCH4) has been retrieved from the spectra of the backscattered Short-Wavelength InfraRed (SWIR) sunlight measured by sensors onboard satellites [1][2][3][4][5][6]. Satellite-derived XCH4 data have been applied to the inverse modeling of CH4 sources and sinks [7][8][9][10][11]. High precision and small bias in spatiotemporal variation are required for the XCH4 data to be used in inverse modeling [12,13]. It is possible that even a small regional bias (0.5%) in the XCH4 data can lead to significant errors in regional source and sink estimation [12]. Optical path length modification due to the light scatterings by aerosols and clouds is a large source of error for the satellite-based SWIR retrievals [14][15][16]. The degree of optical path length modification depends on the abundance, optical properties, and vertical distributions of aerosols and clouds and the reflectance of ground surfaces.
Two retrieval methods have been used to reduce systematic biases due to atmospheric scatterings: the full-physics method and the proxy method. In the full-physics method, the existence of aerosols is described in the forward model, and the aerosol-related parameters are simultaneously retrieved with the gas abundance [17,18]. In the proxy method, information on the optical path length modification for the CH4 absorption band is obtained from that for the adjacent CO2 absorption band [2] (Equation (1) where XCH4,clr and XCO2,clr are XCH4 and XCO2 retrieved under a clear-sky assumption (no aerosols and clouds are assumed) and XCO2,mdl is XCO2 from the numerical model. It is assumed that, in the ratio component (XCH4,clr/XCO2,clr), the impacts of aerosol and clouds cancel each other out between XCH4,clr and XCO2,clr. It is also assumed that the relative variation of XCO2 is much smaller than that of XCH4, and that XCO2 is well represented by the numerical model. The proxy method is expected to offer a larger amount of useful retrieved data than the full-physics method, since highly cloudand aerosol-loaded scenes are difficult to handle in the current full-physics algorithms [19][20][21][22]. Both errors in the ratio component and the model XCO2 lead to errors in the resulting XCH4. Butz et al. [23] showed that the scattering-related errors are not perfectly canceled out in the ratio component depending on atmospheric and ground surface conditions (i.e., cirrus and aerosol load and surface albedo). Schepers et al. [19] discussed the possibility that the temporal variation of bias in proxy XCH4 corresponds to that of bias in the ratio component. In the case of inverse modeling, Parker et al. [24] suggested that it is beneficial to use the ratio component with each own XCO2 model that is consistent with the XCH4 model used in the inversion. The ratio component can also be directly inverted [25][26][27]. The ratio component derived from the Greenhouse gases Observing SATellite (GOSAT) data has been validated by comparing it with that derived from the Total Carbon Column Observing Network (TCCON) data [19,24,26,28,29]. However, it has not been fully investigated what range of cloud and aerosol load permits the canceling out of scattering-related errors in the ratio component. In general, the criteria for the cloud and aerosol screening have been chosen by the algorithm developer. Scattering-related errors are expected to remain and to cause bias in the resulting XCH4, especially when relaxing the data-screening criteria, although this increases the data throughput. Bias corrections of the proxy-based XCH4 have been conducted; however, a simple linear relationship between the bias and the surface albedo [30] and a simple global bias correction [29,31] have been used.
The present study sought to characterize the bias in the GOSAT ratio component and develop a method for correcting the bias while considering the atmospheric scattering effects. GOSAT has been operating for more than 10 years, allowing us to investigate the variation of the bias with time and space and its factors. The ratio component derived from TCCON data was used as the ground truth. In Section 2, the data used and its processing are described. In Section 3, the relationship between the bias and the related variables derived from GOSAT data is investigated. In Section 4, the bias correction is conducted based on the results of Section 3, and the corrected results are evaluated.

GOSAT Data
GOSAT was launched on 23 January 2009 and is on a sun-synchronous orbit at 666 km altitude with 3-day recurrence and a descending node around 13:00 local time. It is equipped with two instruments: the Thermal And Near-infrared Sensor for carbon Observation-Fourier Transform Spectrometer (TANSO-FTS) and the Cloud and Aerosol Imager (TANSO-CAI). The TANSO-FTS has three bands in the Short-Wavelength InfraRed (SWIR) region (an O2 A band, a weak CO2 absorption band, and a strong CO2 absorption band (Bands 1, 2, and 3) centered at 0.76, 1.6, and 2.0 μm, respectively) and records two orthogonal polarization components (hereafter called P/S components). For the signal processing of the TANSO-FTS, the amplifier gain level can be controlled at different levels, high (H) and medium (M), according to the brightness of the target. Gain M is used over bright surfaces such as the Sahara and central Australia. The instantaneous field of view (IFOV) of the TANSO-FTS is 15.8 mrad, which corresponds to a circular surface footprint of about 10.5 km in diameter at nadir. The TANSO-FTS L1B product (radiance spectral data) version V210.210 was used in the present study. We used spectra acquired from April 2009 to December 2018. The sensitivity degradation of the TANSO-FTS was corrected using a radiometric degradation model that is based on the on-orbit solar calibration data [32]. The P and S polarization components of the observed spectra were synthesized to produce a total intensity spectrum [5]. The TANSO-CAI is a push-broom imager and has four narrow bands in the near-ultraviolet to near-infrared regions centered at 0.38, 0.674, 0.87, and 1.6 μm with spatial resolutions of 0.5, 0.5, 0.5, and 1.5 km, respectively, for nadir pixels. The TANSO-CAI L2 cloud flag product (integrated clear confidence level for each TANSO-CAI pixel) version V02.00 was used in the present study. The integrated clear confidence level expresses the cloudy area with 0, the clear area with 1, and the ambiguous area with a numerical value between 0 and 1 [33].

Retrieval
The spectral windows of 1.626-1.695 μm and 1.567-1.618 μm within the TANSO-FTS Band 2 were used to retrieve XCH4,clr and XCO2,clr under the clear-sky assumption, respectively, using the same retrieval scheme as in the NIES TANSO-FTS L2 SWIR full-physics retrieval [5,21,34]. The atmospheric column was divided into 15 layers from the surface to 0.1 hPa with a constant pressure difference, and the average gas concentration for each layer was retrieved. As an indicator of optical path length modification, the surface pressure (Psrf) under the clear-sky assumption was also retrieved from the TANSO-FTS Band 1 spectra. The state vector for the retrieval of each band also included the surface albedo and the wavenumber dispersion. The surface albedo was retrieved at several wavenumber grid points within each band (CH4, CO2, and O2 A) [5]. The mean value was calculated for each band and was used in the following analysis. Data satisfying all the following criteria were used for the subsequent analysis: (1) several data-quality flags stored in the TANSO-FTS L1B product and spectrum quality check utilizing the out-of-band spectra [34] are set as OK; (2) the solar zenith angle is < 70°; (3) the land fraction of the TANSO-FTS footprint is ≥ 60%; (4) the mean squared value of the residual spectrum of the CO2 band is ≤ 4; and (5) the degree of freedom for signals (DFS) is ≥ 1 for both XCH4,clr and XCO2,clr.

Variables for Explaining the Bias in the Ratio Component
Butz et al. [23] evaluated the accuracy of the ratio component and analyzed the error sources (the reasons why the scattering-related errors were not canceled in the ratio component) using simulated satellite measurements. They used cloud-free aerosol-loaded and cirrus-loaded scenes that were assumed to be the targets of the full-physics method. They showed that the primary sources of error were the difference in surface albedo between the CH4 band and the CO2 band and the difference in the retrieval sensitivity to scattering effects at each height level between these bands. The impact of these sources is expected to vary according to the amount and vertical distribution of the scattering materials.
In the studies validating the ratio component derived from the actual GOSAT data [19,24,26,28], the cloud screening was conducted using the cloud fraction within the IFOV of the TANSO-FTS provided by the TANSO-CAI onboard GOSAT. The TANSO-CAI is prone to fail to detect optically thin cirrus clouds [35]. In several studies (e.g., [19]), cirrus-loaded scenes were screened out using the information from the TANSO-FTS 2 μm band. More specifically, if the TANSO-FTS signal level at the strong water vapor absorption channels exceeds the noise level, elevated scattering materials (mainly cirrus cloud) are expected [5,36]. However, it has not been fully addressed how well the cloud fraction and the 2 μm band signal are related to the bias in the ratio component (i.e., the systematic difference between GOSAT and TCCON). Therefore, we investigated the variation of the bias with the cloud fraction and the 2 μm band signal. The cloud fraction (fc) was defined as the ratio of TANSO-CAI pixels with an integrated clear confidence level lower than 0.33 and all TANSO-CAI pixels within the TANSO-FTS IFOV in this study. The TANSO-CAI tends to identify the pixels over snow and ice surfaces as cloudy pixels. Thus, we calculated the Normalized Difference Snow Index (NDSI = (ρO2 − ρCH4)/(ρO2 + ρCH4), where ρO2 and ρCH4 are the retrieved surface albedo for the O2 A band and the CH4 band, respectively), and used data having NDSI ≤ 0.4 for investigating the relationship between the bias and fc. The threshold value of 0.4 was empirically determined ( Figure S1). The radiance at the strong water vapor absorptive channels normalized with its noise level in the TANSO-FTS Band 3 (I2μm) was calculated in a manner similar to [5,34].
In the bias correction of the proxy method, possible error sources (e.g., those indicated by Butz et al. [23]) have hardly been considered. The correction has been conducted based on the relationship between the bias and the retrieved surface albedo [30] and by a simple global bias correction [29,31]. The bias in the ratio component has also been corrected using the surface albedo [26]. Then, we investigated the relationship between the bias in the ratio component and the related variables while considering fc and I2μm. For the related variables, the difference in the surface albedo between the CH4 and CO2 bands, the surface albedo, the vertical profile of CH4 and CO2, the airmass, and the deviation of the clear-sky surface pressure from its prior value were considered. The details of the variables are described below.
The differences in surface albedo and the vertical profile have been revealed as important error sources [23]. For the difference in surface albedo, the retrieved albedo values (ρCH4 and ρCO2) were used (Δalb = ρCH4 − ρCO2). Albedo itself (ρCH4) was also used, since it has been used to explain the bias of the proxy method [26,30] and the full-physics method [37]. For the vertical profile, the partial column-averaged dry-air mole fractions were calculated for Layers 1-7 (upper atmosphere, 0-0.47Psrf hPa (XCH4,upper and XCO2,upper)) and for Layers 12-15 (lower atmosphere, 0.73Psrf-Psrf hPa (XCH4,lower and XCO2,lower)). See Section 2.2 for the definition of layers. The range of the calculation (1-7 layers and 12-15 layers) was decided by considering the averaging kernel ( Figure S2). It is well-known that the SWIR retrieval has a slight sensitivity to the detailed profile but has a known sensitivity to the lower atmosphere (e.g., [38]); therefore, the ratios (RCH4 = XCH4,lower/XCH4,upper and RCO2 = XCO2,lower/XCO2,upper) were used to represent the characteristics of the vertical profile. Airmass is expected to be related to the impact of optical path length modification on the retrieval of XCH4,clr and XCO2,clr. The approximate airmass was calculated as (1/cos(solar zenith angle) + 1/cos(observing zenith angle)), which was similar to the previous studies that conducted bias correction for the fullphysics method by considering airmass [37,39]. For the full-physics method, the deviation of the retrieved surface pressure from its prior value was also used for the bias correction [37,39]. In the case of the clear-sky retrieval, the deviation simply indicates the degree of optical path length modification for the O2 A band and has been used for the cloud screening [40]. The deviation is expected to contain information about the scattering effect by aerosols that can be hardly accounted for by fc and I2μm. Then, the deviation of the clear-sky surface pressure from its prior value was calculated as (ΔPsrf = Psrf,retrieve − Psrf,prior).

TCCON Data and Matching with GOSAT Data
TCCON XCH4 and XCO2 data (GGG2014) from 26 sites  were used as the ground truth to validate the GOSAT ratio component. The map of the sites is shown in Figure A1, and the overview of the sites is shown in Table A1. The TCCON XCH4 and XCO2 data used in the present study (XCH4,TCCON and XCO2,TCCON) represent the mean values measured at each TCCON site within ±30 min of the GOSAT observation time. GOSAT data were selected within a ±2° latitude/longitude box centered at each TCCON site and within the difference in altitude between GOSAT (average within footprint) and TCCON site of 400 m. The ratio component of TCCON was then calculated (XCH4,TCCON/XCO2,TCCON). The relative difference (Δratio) was calculated to evaluate the GOSAT ratio component as where Xratio,G and Xratio,T are the ratio components of GOSAT and TCCON, respectively. Most of the matched GOSAT data were acquired with gain H. Therefore, the data acquired with gain H were used in the following analysis. The data acquired with gain M are briefly addressed in the latter part of the analysis.

Comparison Between GOSAT and TCCON Under Cloud-Free Conditions
First, in order to confirm the baseline of the bias, Xratio,G was compared with Xratio,T under the conditions in which cloud-free scenes were expected (fc = 0 and I2μm ≤ 1). Figure 1 shows the scatterplot of the ratio component and the latitudinal variation of the bias, precision of single scan, and interseasonal bias for each TCCON site. The bias and precision are defined as the mean and standard deviation of Δratio, respectively. This precision value was used with the number of data to calculate the standard error of the mean value. The interseasonal bias is the standard deviation of bias values of the four seasons (DJF, MAM, JJA, SON) regardless of year, which has been used to represent the seasonal variability of bias [68]. The intersite bias is the standard deviation of bias values for individual TCCON sites, which is related to the spatial variability of bias. Only sites having more than nine data points were included in the calculation of intersite bias. The intersite bias of 0.14% might indicate that the spatial variation of bias is sufficiently small; however, the influence of loosening the cloud screening criteria and the temporal variation of bias should be investigated. Previous studies validating the GOSAT ratio component using TCCON data reported that the bias, precision, and intersite bias were about 0.2-0.6%, 0.5-0.7%, and 0.15-0.2%, respectively [19,24,26,28]. Although the retrieval scheme, the version of the TANSO-FTS L1B product, the number of TCCON sites used, the data matching criteria, the cloud screening method, and other details of the data screening differ between the present study and the previous studies, the overall results were comparable. Table A2 shows that there was no clear variation of bias with tightening of the matching criteria of GOSAT and TCCON, but the precision was improved (there was a decrease in the standard deviation). Then, we assessed the influence of the matching criteria on our analysis and confirmed that the results shown below were hardly affected by the matching criteria (Appendix A). We also assessed the relationship between Δratio and Fractional Variation in Solar Intensity (FVSI). FVSI is stored in TCCON data, and low FVSI values (≤ 1%) indicate a reasonably clear sky, where larger FVSI values could indicate some cirrus cloud presence. Only TCCON data having small FVSI values (≤ 5%) were provided to ensure the quality of XCH4 and XCO2 data. Figures S3 and S4 show that similar results were obtained between data with FVSI ≤ 1% and that with FVSI > 1%. This suggests that TCCON data can be used to validate the GOSAT ratio component regardless of FVSI values (0-5%). (b) result for each site. GOSAT data obtained under conditions where cloud-free scenes were expected (cloud fraction = 0 and normalized 2 μm band radiance ≤ 1) were used. In (a), the correlation coefficient (R), the slope and intercept of the linear regression, and the number of data points (N) are also shown. In (b), the latitudinal variations of the bias, precision, and interseasonal bias for TCCON sites having more than nine data points are presented. Figure 2 shows the variation of Δratio with fc and I2μm. The data with NDSI ≤ 0.4 were used. Δratio increases with the increase in I2μm. Δratio decreases with the increase in fc for the data with small I2μm. The large Δratio is observed for data with both fc and I2μm exceeding certain levels, although the amount of data is small for such cases. To interpret the information from the TANSO-CAI and the TANSO-FTS 2 μm band clearly, we mainly used the data with fc = 0 and that with I2μm ≤ 1 in the following analysis. Most of the data fell within these cases (leftmost column and bottom row of Figure 2d). In the case of fc = 0, I2μm is related to the optically thin elevated scattering materials (mainly cirrus cloud) that were not identified by the TANSO-CAI. In the case of I2μm ≤ 1, fc is related to the middle-or lowaltitude clouds with a certain level of optical thickness.  Δratio increases with I2μm. This can be attributed to the light path enhancement, which is greater for the CH4 band than for the CO2 band because the surface albedo of the CH4 band is generally higher than that of the CO2 band. The difference in vertical profile between CH4 and CO2 also seems to contribute to the results. More specifically, the light path enhancement means that the light repeatedly passes the area where the CH4 concentration is higher than the column average, since the CH4 concentration significantly decreases in the upper atmosphere. It is considered that the influence of the difference in the albedo and vertical profile becomes large with the increase in I2μm. Figure 3 (b) shows the variation of Δratio with fc. The data with NDSI ≤ 0.4 were used. Δratio is 0.4% to 0.5% for the data with fc ≤ 0.2 and decreases with increasing fc. This is because the effect of the difference in the retrieved surface albedo between the CH4 and CO2 bands becomes small with the increase in fc, and the influence of the upper atmosphere where the CH4 concentration is low becomes large because of the increase in the amount of light passing through a short path (scattered at the upper part of the cloud and reaching the sensor). Although the influence of clouds depends on their height and optical thickness, it is considered that the influence of the ground surface becomes small with the increase in cloud cover. Δratio is almost stable for the data with fc ≥ 0.4. It seems that the abovementioned effects of the decreasing Δratio and light path enhancement by clouds (multiple scattering within clouds) are balanced. Note that we obtained GOSAT data having large fc value, which fell within the matching criteria of GOSAT and TCCON. This means that GOSAT data was obtained under cloudy conditions even when TCCON data was obtained under clear-sky conditions since cloud conditions varied within a ±2° latitude/longitude box.    Figure 4 shows the relationship between Δratio and the related variables (ρCH4, Δalb, airmass, and ΔPsrf). Section 3.2.1 showed that Δratio was stable for data with fc ≤ 0.2 and increased with I2μm continuously; therefore, the results are separately plotted according to the fc and I2μm values. Case 1: fc ≤ 0.2 and I2μm ≤ 1; Case 2: fc ≤ 0.2 and I2μm > 1; Case 3: fc > 0.2 and I2μm ≤ 1. The results for Case 1 are discussed in this paragraph. Δratio increases with ρCH4, although the variation is gentler than that for the other variables (Figure 4b-d). One possible reason is that the high surface albedo is prone to bring light path enhancement, by which the influence of the difference in the vertical profile between CH4 and CO2 becomes large (even if the surface albedo is similar between the CH4 and CO2 bands). For the difference in albedo, Δratio clearly increases with the increase in Δalb. As Butz et al. [23] indicated in their theoretical study and as discussed in the former section, the difference in albedo causes a difference in optical path length modification between the CH4 and CO2 bands, significantly affecting the ratio component. Δratio decreases with an increase in airmass. The influence of optical path length modification seems to be relatively small for the cases with large airmass (the modified light path is relatively short when the geometric path is long). The characteristics of the retrieval (e.g., errors in spectroscopy) might also affect the airmass dependence. The airmass dependence of retrieved XCH4 and XCO2 has been corrected empirically for satellite data [37,39] and TCCON data [69,70]. Recently, Mendonca et al. [71] found that using speed-dependent Voigt line shapes for retrieval of the O2 total column reduces the airmass dependence of TCCON XCO2. Δratio significantly increases with ΔPsrf, although the number of data with large ΔPsrf is small. Although ΔPsrf is related to the optical path length modification for the O2 A band (TANSO-FTS Band 1), the large ΔPsrf indicates the possibility that the light path enhancement effect, rather than light path shortening, is dominant for Band 2. More specifically, multiple scattering might occur within the area where the concentration of CH4 is relatively high compared to the column average. In contrast to the cases with high fc, the fraction of light passing a short path is expected to be low, and the influence of the ground surface is expected to be large, yielding a positive bias in the ratio component.

Surface Albedo, Difference in Surface Albedo, Airmass, and Deviation of Surface Pressure
For Case 2, Δratio is larger than that for Case 1. Although the dependence of Δratio on Δalb is in the same direction between Cases 1 and 2, the increase in Δratio with Δalb becomes steep, meaning that the influence of the difference in surface albedo becomes large when elevated scattering materials exist. In contrast, for Case 3, the variation trend of Δratio with ρCH4 and Δalb differs significantly from that of Cases 1 and 2. This is because clouds affected the retrieved albedo (ρCH4 and Δalb increased with the increase in fc).

Vertical
Profile of CH4 and CO2 Figure 5 shows the variation of Δratio with RCH4 and RCO2 (see Section 2.3 for the definition). The variation of Δratio with RCH4 is small if the possibility of the existence of elevated scattering materials is low (Case 1). In contrast, for Case 2, Δratio increases with the increase in RCH4. This corresponds to the qualitative discussion of the influence of the CH4 profile on Δratio in Section 3.2.1. Although the retrieval cannot reproduce the real-world profile in detail, it is considered that the retrieved RCH4 represents the real-world RCH4 (R CH4 act ) well and can be used to explain Δratio. In contrast to CH4, Δratio was expected to decrease with an increase in RCO2 because CO2 is the denominator of the ratio component. However, no clear relationship between Δratio and RCO2 is seen in Figure 5. Two possible reasons are considered: (1) the influence of the CO2 profile on the ratio component is small, since R CO2 act is smaller than R CH4 act in general; (2) the sensitivity of the retrieval to the vertical profile for CO2 is lower than that for CH4 (DFS for XCO2,clr was 1.0-1.5 and that for XCH4,clr was 1.7-2.3 in the present study). When only the retrieved data having DFS for XCO2,clr ≥ 1.3 (almost half of all data) were used, the results were not noticeably changed ( Figure S5). For Case 3, the range of RCH4 and RCO2 differs from that for Cases 1 and 2, since RCH4 and RCO2 were affected by clouds. RCH4 was negatively correlated with fc, which is considered to contribute to the increase in Δratio with RCH4. Although a variation of Δratio with RCO2 was expected for Case 3, since RCO2 was also negatively correlated with fc, the variation was small. The correlation between the variables (as mentioned in Sections 3.2.2 and 3.2.3) was accounted for in the variable selection for the bias correction (Section 4.2).

Method
Linear regression was used as in many previous studies [20,37,39,72] as where Δ ratio pred is the predicted Δratio, n is the number of variables, C1-Cn+1 are the regression coefficients, and xi represents the explanatory variable. In the calculation of coefficients (least squares method), each data point (matched GOSAT and TCCON data) was weighted according to the amount of total matched GOSAT and TCCON data of the site to which the data point belonged. More specifically, the weight was given as Rsite/Nj and 1/Nj for the data from sites in the northern hemisphere and that from sites in the southern hemisphere, respectively, where Nj is the total number of matched GOSAT data for each site, and Rsite is the number of sites in the southern hemisphere divided by that in the northern hemisphere. For each GOSAT data point, the correction was calculated as X ratio,G cor = Xratio,G/(1 + Δ ratio pred /100), where X ratio,G cor is the corrected ratio component.

Selecting Explanatory Variables
The correlation between the variables and the correlation between Δratio and the variables were evaluated to select the variables for the bias correction ( Figure 6). In Figure 6, the data with NDSI ≤ 0.4 were used to calculate the correlation coefficient with respect to fc. Figure 6a shows that the variables were correlated with each other. In particular, fc was highly correlated with the other variables, as mentioned in Sections 3.2.2 and 3.2.3. Therefore, the variation of Δratio was expected to be explained by the use of fewer than the total number of variables. We confirmed that the correlation between Δratio and the variables did not become small by the correction using one variable. Therefore, corrections by multiple linear regression were tested. Figure 6b shows the correlation coefficient between the corrected Δratio and the variables for the corrections using different numbers of variables. First, I2μm and airmass were used, since Δratio varied significantly with these variables (Figures 3 and  4), and the correlation between them was low (Figure 6a). When three variables including RCH4 were used, the correlation coefficient became close to zero, except in the case of Δalb. This can be attributed to the fact that RCH4 included information on optical path length modification by scattering by clouds and aerosols. RCH4 is considered to be a useful variable for correcting Δratio, although the relationship between Δratio and RCH4 is somewhat empirical, especially when fc is high.
In addition to the correlation coefficient, the relationship between Δratio and the variables was further evaluated. Figure 7 shows the variation of Δratio with the variables. The results were separately plotted for the different hemispheres and seasons in order to confirm that the spatiotemporal variation of bias was reduced. Data with NDSI ≤ 0.4 were used to obtain the variation of Δratio with fc. Even when the correction was conducted using I2μm and one other variable, the difference in variation of the Δratio with I2μm between the northern and southern hemispheres remained. Although the number of TCCON sites and the number of data points for the southern hemisphere were smaller than those for the northern hemisphere (larger standard error for the southern hemisphere), the difference in variation of the Δratio with I2μm between hemispheres was larger than the standard error. The cause of this difference is that the degree of influence of elevated scattering materials on Δratio varies according to the vertical profile of CH4. When RCH4 was used in the correction, the difference in variation of the Δratio with I2μm between the hemispheres was significantly reduced. For fc, the corrected Δratio varies around zero for both hemispheres and both seasons. For other variables, the difference between the hemispheres becomes small. Then, we decided to use the four variables airmass, I2μm, RCH4, and Δalb for the bias correction. . The meaningless pixels are filled by gray. Eight variables were used: the normalized 2 μm band radiance (I2μm), cloud fraction (fc), retrieved surface albedo of CH4 band (ρCH4), difference in the retrieved surface albedo (CH4 band minus CO2 band, Δalb), airmass (Am), deviation of the retrieved clear-sky surface pressure from its prior (ΔPsrf), and ratio between the partial column-averaged dryair mole fractions for the lower atmosphere and that for the upper atmosphere (RCH4 and RCO2). In (b), results for bias-uncorrected Δratio and bias-corrected Δratio (corrections using two, three, and four variables) are shown. Data with fc = 0 and that with I2μm ≤ 1 were used. Only data with NDSI ≤ 0.4 were used to calculate the correlation coefficient for fc.   Figure 7. Variation of the relative difference in the ratio component between GOSAT and TCCON (Δratio) according to the explanatory variables: normalized 2 μm band radiance (I2μm); cloud fraction (fc); retrieved surface albedo of CH4 band (ρCH4); difference in the retrieved surface albedo (CH4 band minus CO2 band, Δalb); airmass (Am); deviation of the retrieved clear-sky surface pressure from its prior (ΔPsrf); and ratio between the partial column-averaged dry-air mole fractions for the lower atmosphere and that for the upper atmosphere (RCH4 and RCO2). Results for (a) bias-uncorrected Δratio and bias-corrected Δratio (corrections using (b) two, (c) three, and (d) four variables) are presented. Results for the winter (January to April) northern hemisphere, the summer (July to October) northern hemisphere, and the southern hemisphere are plotted with different colors. Data with fc = 0 and that with I2μm ≤ 1 were used. Only data with NDSI ≤ 0.4 were used to obtain the variation of Δratio with fc.

Quality Control
As quality control before evaluating the bias-corrected ratio component, cloud screening was investigated. Figure 8 shows the variation of the corrected Δratio by the four variables according to fc and I2μm. The data with NDSI ≤ 0.4 were used. Although the data with fc = 0 and those with I2μm ≤ 1 were used for the results shown in the previous sections (Figures 3-7), all data were used to generate Figure 8 (similar to Figure 2). Although Δratio is close to zero for many cases where fc is > 0 and I2μm is > 1 (i.e., data not used for the regression), the large mean and standard deviation of Δratio remains. Therefore, the cloud screening was conducted as follows. We considered a function of fc as g(fc) = (afc + b)/(cfc + 1), where a, b, and c are coefficients, so that the tolerance range of I2μm decreases with the increase in fc. The coefficients were determined to make the criteria be I2μm ≤ 15 when fc = 0 and I2μm ≤ 1 when fc = 1, and to screen out the data in areas where the mean and/or the standard deviation of Δratio were large in Figure 8. The criteria of I2μm ≤ 15 when fc = 0 is based on the result that the difference in Δratio between the hemispheres became large with the increase in I2μm (Figure 7). Then, we decided to reject the data with I2μm > g(fc), where coefficients a, b, and c were 46, 15, and 60, respectively. The criteria are depicted by the white line in Figure 8.

Evaluating the Corrected Results
To examine the usefulness and applicability of the bias correction, the matched GOSAT and TCCON data acquired in the even-numbered years were used to obtain the regression coefficients, and then the correction was applied to the GOSAT data acquired in the odd-numbered years. The obtained coefficients for airmass, I2μm, RCH4, and Δalb and the intercept were −0.28, 0.019, 1.06, 17.69, and −0.31, respectively (note that the four variables are dimensionless; see Section 2.3 for the definition of the variables). The cloud screening described in Section 4.3 was conducted. Figure 9 shows the comparison of the bias, precision, and interseasonal bias for each TCCON site between the bias-uncorrected and the bias-corrected ratio component. After the bias correction, the interseasonal bias was reduced, and the variation between sites in the northern hemisphere and the difference between hemispheres became small. The precision and intersite bias were also improved. The reduction of precision value was statistically significant (p < 0.001 from F test). The reduction of intersite bias value was less significant, but the 75% confidence intervals (CI) showed almost no overlap (0.177-0.234% and 0.143-0.179% for before and after the bias correction, respectively). The CI of intersite bias was estimated by the nonparametric bootstrap method. For each site, a bootstrap sample was taken, and a bootstrap estimate of bias (mean Δratio of the sample) was obtained. Then, the intersite bias (standard deviation of the estimated bias values for individual sites) was calculated. This calculation was repeated 2000 times, and the CI was obtained from the 2000 intersite bias values.  Figure 10 shows the temporal variation of the uncorrected and corrected Δratio for each TCCON site. The results for TCCON sites with long-term observation and a large number of data points are shown. For other sites, it was difficult to discuss the temporal variation of Δratio, but no results contradicting the following discussion were obtained. On the whole, seasonality was seen for the uncorrected Δratio, but it was significantly reduced by the correction. Δratio was small even before the correction for the high-latitude site (Sodankylä [73]). Such a general seasonal and latitudinal pattern of Δratio seems to be caused by the airmass dependence of Δratio (Figure 4). In addition to this general pattern, the characteristics of each site were observed as described below. For each site, the temporal variation of Δratio was determined as follows: the mean Δratio was calculated for each season (DJF, MAM, JJA, SON) in each year (four seasons × five years), and then the mean and the standard deviation of the mean Δratio values were calculated ( Figure 11). For comparison, results for the corrected Δratio with two and three variables are also shown.  Results for the data acquired in the odd-numbered years are shown to assess the applicability of the bias correction using the independent data (coefficients for the correction were obtained using the data acquired in the even-numbered years).

(a) Mean Δ ratio [%] (b) SD of Δ ratio [%]
For Sodankylä, the uncorrected Δratio was smaller than that for the other sites owing to the large airmass. The low possibility of elevated scattering materials (I2μm was small for most of the data) also seems to have contributed to the small Δratio. Δratio was overcorrected by the correction using two variables. fc tended to be high for the data over this site, and R CH4 act seemed to be large according to the model (prior value). Then, the overcorrection was reduced by using four variables. For Orléans, occasional large I2μm, in addition to the airmass effect, seemed to cause a large standard deviation; however, the deviation decreased after the correction. For Park Falls, the uncorrected Δratio was considered to be affected by many factors. The temporal variation of Δalb was large, since this site was covered by snow during winter and by vegetation during summer. Large I2μm was observed during spring. R CO2 act is expected to be small during summer due to photosynthesis by vegetation. R CH4 act seemed to be large during summer according to the model (prior value). Correction using RCH4 reduced Δratio, although a relatively large Δratio remained, since the complicated conditions were not perfectly accounted for by the correction. For Lamont, large Δalb generally brought large Δratio. The correction using Δalb reduced Δratio. A large uncorrected Δratio was occasionally seen in the early months of the year (Figure 10). We found that the large Δratio corresponded to the large I2μm in February 2013 and 2017. According to Figure 10, these large errors were properly corrected. The use of I2μm in the correction had only a small effect on the averaged results ( Figure 11), since I2μm was not always large; however, the effect could be confirmed for individual cases. For Tsukuba, a large I2μm was observed during spring to summer, which enhanced the seasonality of the uncorrected Δratio. Δalb of this site was small, and thus the overcorrection was reduced by using Δalb in the correction. For Caltech, Δalb was small, and the temporal variations of R CH4 act and R CO2 act were small according to the model. The difference between R CH4 act and R CO2 act was also small. Then, the uncorrected Δratio showed small temporal variation, and the corrected Δratio varied around zero with the small temporal variation retained. For Saga, the airmass effect and the large I2μm during spring to summer enhanced the seasonality, and Δratio was significantly reduced by the correction using two variables. The temporal variation of R CH4 act was expected to be large according to the model. Thus, using RCH4 in the correction improved both the mean and standard deviation ( Figure 11). For the sites in the southern hemisphere (Darwin and Wollongong), although R CH4 act and R CO2 act were expected to be small, the amplitude of temporal variation of uncorrected Δratio was comparable to that for the sites in the northern hemisphere ( Figure 10). For Darwin, the uncorrected Δratio showed large temporal variation, although the variation of airmass was small. For Wollongong, occasional large uncorrected Δratio did not correspond to the large I2μm. The standard deviation was reduced only slightly by the correction for these sites (Figure 11). Although using RCH4 in the correction reduced the difference between the northern and southern hemispheres (Figure 7), unaccounted factors might remain. On the whole, the optimal variables differ between sites; but the correction using four variables brought the best result in terms of the balance between sites (the mean of absolute Δratio values for individual sites was close to zero and the variation between sites was small).
We also conducted a bias correction for the data acquired with gain M using the four variables and the above-mentioned coefficients (i.e., the coefficients were obtained using the gain H data). The bias correction functioned properly for the gain M data, although the bias was small even for the uncorrected data (Appendix B). This result supports the applicability of the bias correction.

Conclusions
The relationship between the bias in the GOSAT ratio component and the variables derived from GOSAT data was investigated, and the bias correction and its evaluation were performed. The bias between the uncorrected GOSAT ratio component and the TCCON ratio component was 0.43% when the cloud-free condition was expected (normalized 2 μm band radiance (I2μm) ≤ 1 and cloud fraction (fc) = 0). The bias increased with the increase in I2μm and reached 1.5% when I2μm was 30. The variation of bias with fc was small when fc was small or large (the bias was 0.4% to 0.5% for the data with fc ≤ 0.2 and −0.1% to 0% for the data with fc ≥ 0.4). The variation of bias according to I2μm and fc could be interpreted based on the difference in the detection target between I2μm and fc, the difference in surface albedo between the CH4 and CO2 bands (Δalb), and the vertical profile of CH4 and CO2. The relationship between the bias and other related variables was also investigated. We used the retrieved profile and the ratios between the upper and lower atmosphere CH4 and CO2 (RCH4 and RCO2), in addition to the variables that have been used in the bias correction for the full-physics method (airmass) and in the cloud screening (deviation of the retrieved clear-sky surface pressure from its prior (ΔPsrf)) and that have been revealed to be a large error source for the proxy method (Δalb). The bias showed clear variations with the variables except for RCO2.
Then, airmass, I2μm, RCH4, and Δalb were selected as explanatory variables for a linear regression of the bias by considering the correlation between the variables and the correlation between the variables and the bias. Using RCH4 in the correction reduced the dependence of the bias on fc and ΔPsrf. The difference in bias between the northern and the southern hemispheres was also reduced. These results are attributed to the information on the CH4 vertical profile and the effect of atmospheric scattering included in RCH4. Although the relationship between bias and RCH4 is somewhat empirical, RCH4 is an important variable for correcting the bias in the ratio component. Before evaluating the corrected results, cloud screening was applied. The criteria were determined as the threshold value of I2μm decreases with the increase in fc (I2μm ≤ 15 when fc = 0 and I2μm ≤ 1 when fc = 1), by investigating the variation of bias in the corrected ratio component with fc and I2μm. Although fc and I2μm have been used for the cloud screening of the GOSAT proxy retrievals, our results give a quantitative basis for the screening.
Comparison between the corrected ratio component and TCCON showed that the precision (standard deviation of the difference between GOSAT and TCCON) was reduced from 0.61% to 0.55%, and the intersite bias was reduced from 0.20% to 0.15%. The temporal variation of bias was further investigated for the sites having a long-term record and a large amount of data. The uncorrected bias showed a seasonality with a large bias in summer. The difference in monthly mean bias between summer and winter exceeded 1% for several sites. In addition to such seasonality, the months with a large mean bias corresponded to the months with a large mean I2μm. The temporal variation of bias was significantly reduced by the correction. Although the optimal variables differed between sites, the mean and standard deviation of the mean bias values for individual seasons (each season in each year) were within 0.2% for most of the sites, when the four variables were used for the correction.
The bias-corrected and cloud-screened ratio component data in the present study reduce the concern that the residual errors related to the atmospheric scattering and the property of ground surfaces affect the inverse modeling of CH4 sources and sinks. In future work, the impact of utilizing the bias-corrected data in the inverse modeling will be investigated. GOSAT has been operating for more than 10 years. GOSAT-2, a successor mission to the GOSAT, was launched on 29 October 2018. Providing long-term, consistent, and high-quality XCH4 data set by the GOSAT series is expected to contribute to the studies on CH4 budgets over the globe. To construct such a data set, GOSAT and GOSAT-2 data will be continuously compared with each other and with the data from other satellites and TCCON.
Supplementary Materials: The following are available online at www.mdpi.com/xxx/s1, Figure S1: Relationship among the Normalized Difference Snow Index (NDSI), relative difference in the ratio component between GOSAT and TCCON (Δratio), and month of observation. Figure S2: Averaging kernel of the GOSAT clear-sky retrieval for (a) XCH4 and (b) XCO2 (monthly mean values of GOSAT data matched the TCCON data). Figure  S3: Similar to Figure 1, but for the data with TCCON Fractional Variation in Solar Intensity (FVSI) ≤ 1% and the data with FVSI > 1%. Figure S4: Relationship between the relative difference in the ratio component between GOSAT and TCCON (Δratio) and the TCCON Fractional Variation in Solar Intensity (FVSI). Figure S5: Similar to Figure 5, but using GOSAT data with DFS for XCO2 ≥ 1.3. Acknowledgments: GOSAT clear-sky retrieval in the present study was conducted at the GOSAT-2 Research Computation Facility.

Conflicts of Interest:
The authors declare no conflicts of interest.

Appendix A. TCCON sites and the influence of the matching criteria of GOSAT and TCCON
We used TCCON data from 26 sites . Figure A1 shows the map of TCCON sites, and Table A1 shows an overview of the sites. Basically, the sites are located in areas with relatively uniform surface properties and are reasonably far from anthropogenic sources. However, several sites are in areas with nonflat topography or are located near or in urban areas. Such characteristics are summarized in the rightmost column of Table A1.
In the main text, GOSAT data within a ±2° latitude/longitude box centered at each TCCON site were used for the comparison. The influence of the matching criteria on the analysis was assessed. Table A2 shows the mean and standard deviation of Δratio for each TCCON site. Results for the different matching criteria are tabulated. The mean value showed almost no clear trend, but the deviation value decreased as the matching criteria were tightened. Figure A2 shows the relationship between Δratio and the related variables for the matching criteria of ±2° and that of ±0.5°. Although more deviated values are seen for the matching criteria of ±2°, the regression lines were similar between the criteria. The mean Δratio values for the bins also show similar trends between the criteria ( Figure A3). The skewness around 0 indicates the small asymmetry for the distribution of Δratio values for each bin. Large negative kurtosis values were hardly seen, meaning there was no high frequency for either side (large and small Δratio).  The site is far from the city area but observes plant plumes and methane from mine shafts and fugitive leaks [74,75] The site is between the ocean and a sharp escarpment. Table A2. Mean (μ) and standard deviation (SD) of the relative difference in the ratio component between GOSAT and TCCON and the number of matched GOSAT and TCCON data points (N) for each TCCON site. Results for four different matching criteria are shown: GOSAT data within ±2°, ±1°, ±0.5°, and ±0.1° latitude/longitude boxes centered at each TCCON site. Only sites having more than 29 data points are shown. The GOSAT data obtained under conditions where cloud-free scenes were expected (cloud fraction = 0 and normalized 2 μm band radiance ≤ 1) were used. Intersite bias is calculated using the same sites as in the case of matching criteria of ±0.1°.

±2°
±1° ±0.5° ±0.1°  Figure A3. Relationship between the relative difference in the ratio component between GOSAT and TCCON (Δratio) and the related variables. The variables ((a)-(d)) are similar to those in Figure A2, but in each panel, the mean and standard deviation (SD) of Δratio, number of data points (N), skewness, and kurtosis for each bin are shown. Results for the different matching criteria are plotted (GOSAT data within ±2° and ±0.5° latitude/longitude boxes centered at each TCCON site). For I2μm and RCH4, the data with cloud fraction = 0 were used. For Δalb and airmass, the data with cloud fraction = 0 and I2μm ≤ 1 were used.

Appendix B. Bias Correction for the Data Acquired by Gain M
The different gain settings of the TANSO-FTS (H and M) only affect the spectral radiance of the Band 1 [76]. Therefore, the ratio component should show similar characteristics between the data acquired with gain H and that acquired with gain M. Then, the bias correction equation established using the gain H data (four variables with the coefficients shown in Section 4.4) was applied to the gain M data. Figure A4 shows the scatterplot of the ratio component derived from GOSAT gain M data and that derived from TCCON. The results for each site are shown in Table A3. The biasuncorrected and bias-corrected results are presented in the left and right sides in Figure A4 and Table  A3, respectively. Figure A5 shows the dependence of Δratio on the related variables for the uncorrected and the corrected data. The matching criteria were similar to those in the main text. Only GOSAT data over the areas east of 118°W were used to reject the data over Bakersfield where there are oil fields. According to Figure A4, the bias was smaller than that for the gain H data for the uncorrected data, and the bias was slightly overcorrected. Table A3 shows that the overcorrection was seen for the data of JPL and Caltech. This is reasonable for the following reasons. Almost all the GOSAT data acquired with gain M in this region were located around the Edwards site. For the time period during which TCCON observation was not being performed at the Edwards site, the GOSAT data were matched with the TCCON data at JPL or Caltech. The Edwards site is located in the north of San Gabriel Mountains, while the other two sites are located in the south of the mountains. Therefore, it was possible that the air at the other two sites was occasionally different from the air at the Edwards site. Then, the results for the Edwards site (bias of -0.07% after the correction) indicate that the GOSAT data can be used for the proxy retrieval without exercising caution with respect to the gain settings and support the applicability of the bias correction. ; (e) airmass (Am); (f) deviation of the retrieved clear-sky surface pressure from its prior (ΔPsrf); (g) ratio between the partial column-averaged dry-air mole fractions for the lower atmosphere and that for the upper atmosphere (RCH4 and RCO2). Mean Δratio for each bin is plotted for the bias-uncorrected and bias-corrected data.