A Fast Atmospheric Trace Gas Retrieval for Hyperspectral Instruments Approximating Multiple Scattering — Part 2 : Application to XCO 2 Retrievals from OCO-2

Satellite retrievals of the atmospheric dry-air column-average mole fraction of CO2 (XCO2) based on hyperspectral measurements in appropriate near (NIR) and short wave infrared (SWIR) O2 and CO2 absorption bands can help to answer important questions about the carbon cycle but the precision and accuracy requirements for XCO2 data products are demanding. Multiple scattering of light at aerosols and clouds can be a significant error source for XCO2 retrievals. Therefore, so called full physics retrieval algorithms were developed aiming to minimize scattering related errors by explicitly fitting scattering related properties such as cloud water/ice content, aerosol optical thickness, cloud height, etc. However, the computational costs for multiple scattering radiative transfer (RT) calculations can be immense. Processing all data of the Orbiting Carbon Observatory-2 (OCO-2) can require up to thousands of CPU cores and the next generation of CO2 monitoring satellites will produce at least an order of magnitude more data. For this reason, the Fast atmOspheric traCe gAs retrievaL FOCAL has been developed reducing the computational costs by orders of magnitude by approximating multiple scattering effects with an analytic solution of the RT problem of an isotropic scattering layer. Here we confront FOCAL for the first time with measured OCO-2 data and protocol the steps undertaken to transform the input data (most importantly, the OCO-2 radiances) into a validated XCO2 data product. This includes preprocessing, adaptation of the noise model, zero level offset correction, post-filtering, bias correction, comparison with the CAMS (Copernicus Atmosphere Monitoring Service) greenhouse gas flux inversion model, comparison with NASA’s operational OCO-2 XCO2 product, and validation with ground based Total Carbon Column Observing Network (TCCON) data. The systematic temporal and regional differences between FOCAL and the CAMS model have a standard deviation of 1.0 ppm. The standard deviation of the single sounding mismatches amounts to 1.1 ppm which agrees reasonably well with FOCAL’s average reported uncertainty of 1.2 ppm. The large scale XCO2 patterns of FOCAL and NASA’s operational OCO-2 product are similar and the most prominent difference is that FOCAL has about three times less soundings due to the inherently poor throughput (11%) of the MODIS (moderate-resolution imaging spectroradiometer) based cloud screening used by FOCAL’s preprocessor. The standard deviation of the difference between both products is 1.1 ppm. The validation of one year (2015) of FOCAL XCO2 data with co-located ground based TCCON observations results in a standard deviations of the site biases of 0.67 ppm (0.78 ppm without bias correction) and an average scatter relative to TCCON of 1.34 ppm (1.60 ppm without bias correction). Remote Sens. 2017, 9, 1102; doi:10.3390/rs9111102 www.mdpi.com/journal/remotesensing Remote Sens. 2017, 9, 1102 2 of 23


Introduction
Satellite retrievals of the atmospheric dry-air column-average mole fraction of CO 2 (XCO 2 ) based on hyperspectral measurements in appropriate near (NIR) and short wave infrared (SWIR) O 2 and CO 2 absorption bands can help to answer pressing questions about the carbon cycle (e.g., [1]).However, the precision and even more the accuracy requirements for applications like surface flux inversion or emission monitoring are demanding (e.g., [2][3][4]).As an example, large scale biases of a few tenths of a ppm can already hamper an inversion with mass-conserving global inversion models [2,3].
The Scanning Imaging Absorption Spectrometer for Atmospheric Chartography (SCIAMACHY, [5,6]) became operational in 2002 and its radiance measurements allowed to start the time series of NIR/SWIR XCO 2 retrievals.With an overlap of about three years, the Greenhouse Gases Observing Satellite (GOSAT, [7]) allowed complementation and continuation of this time series in 2009.
The Orbiting Carbon Observatory-2 (OCO-2) was launched in 2014 also aiming at continuing and improving XCO 2 observations from space.As part of the A-train satellite constellation, OCO-2 flies in a sun-synchronous orbit crossing the equator at 13:36 local time.It measures one polarization direction of the solar backscattered radiance in three independent wavelength bands: the O 2 -A band at around 760 nm (band1) with a spectral resolution of about 0.042 nm and a spectral sampling of about 0.015 nm, the weak CO 2 band at around 1610 nm (band2) with a spectral resolution of about 0.080 nm and a spectral sampling of about 0.031 nm, and the strong CO 2 band at around 2060 nm (band3) with a spectral resolution of about 0.103 nm and a spectral sampling of about 0.040 nm.OCO-2 is operated in a near-push-broom fashion and has eight footprints across track and an integration time of 0.333 s.The instrument's spatial resolution at ground is 1.29 km across track and 2.25 km along track.More information on the OCO-2 instrument can be obtained from the publications of Crisp et al. [8,9].
Several XCO 2 retrieval algorithms exist for the SCIAMACHY and GOSAT observations covering the whole mission periods (e.g., [10][11][12][13][14][15][16][17][18]).Because of the sparse ground based validation sites, the analyses of an ensemble of independently developed algorithms can give important insights in the quality of the retrievals, especially, remote from the validation sites [19] and strengthen the geophysical interpretation of the data [20].Nevertheless, so far only the operational NASA algorithm exists for OCO-2 [21,22] covering the whole mission period.
One reason for this is that many of the existing algorithms require computationally expensive multiple scattering radiate transfer (RT) calculations and the data rate of OCO-2 is about two orders of magnitude larger than for GOSAT.In part 1 of this publication [23], the Fast atmOspheric traCe gAs retrievaL (FOCAL) has been introduced.It approximates multiple scattering effects with an analytic solution of the scalar RT problem of an isotropic scattering layer and a Lambertian surface.This enhances the computational efficiency by orders of magnitude making it possible to process the whole OCO-2 data stream on few CPU cores in real time and will allow also the analyses of future satellite missions which will provide at least an order of magnitude more data.
In part 1, Reuter et al. [23] tested various retrieval setups with simulated OCO-2 measurements and concluded that their 3-Scat setup is a promising candidate for further studies with measured OCO-2 data.The 3-Scat retrieval setup fits the OCO-2 measured radiance simultaneously in four fit windows: SIF (∼758.26-759.24nm), O 2 (∼757.65-772.56nm), wCO 2 (∼1595.0-1620.6 nm), and sCO 2 (∼2047.3-2080.9nm).This is achieved by iteratively optimizing the state vector including the following geophysical parameters: five layered CO 2 and H 2 O concentration profiles, the pressure (i.e., height), scattering optical thickness at 760 nm, and the Ångström exponent of a scattering layer, solar induced chlorophyll fluorescence (SIF), and polynomial coefficients describing the spectral albedo in each fit window.
In this publication, we confront the 3-Scat retrieval setup with actually measured OCO-2 data and protocol the steps undertaken to transform the input data (most importantly, the OCO-2 radiances) into a validated XCO 2 data product.This includes preprocessing (Section 2), adaptation of the noise model (Section 3.1), zero level offset correction (Section 3.2), post-filtering (Section 4.1), bias correction (Section 4.2), comparison with CAMS (Copernicus Atmosphere Monitoring Service) greenhouse gas flux inversion model (Section 5.1), comparison with NASA's operational OCO-2 XCO 2 product (Section 5.2), and validation with ground based Total Carbon Column Observing Network (TCCON) data (Section 5.3).The complete retrieval chain including adaptations, pre-, and postprocessing will be referred to as FOCAL v06 in the following.

Preprocessing
During preprocessing, we collect all datasets that are needed to run the retrievals and pre-filter soundings with potentially degraded quality or potential cloud or aerosol contamination.Due to the demanding precision and accuracy requirements for XCO 2 retrievals (e.g., [2][3][4]) and the large amount of OCO-2 data, we prioritize quality over quantity in the course of pre-filtering.
The primary input data used for this publication are global OCO-2 L1b calibrated radiances (i.e., a single linear component of the polarization of the incoming light) version 7r [9,24] of the year 2015 in glint (GL), nadir (ND), target (TG), and transition (XS) mode which have been obtained from https://daac.gsfc.nasa.gov.Each of OCO-2's three bands consists of 1016 spectral pixels which we group into the four fit windows illustrated in Figure 1 showing a typical OCO-2 measurement fitted with FOCAL.
Each L1b orbit file includes information on spectral pixels with potentially reduced quality, e.g., due to radiometric problems.Based on the last nadir orbit in 2015 (oco2_L1bScND_07974a_151231 _B7200r_160121043229.h5), we generated a dead or bad pixel mask which we use for the O 2 and both CO 2 fit windows.For the SIF fit window, we ignore the dead or bad pixel mask because it is located in a spectral region generally flagged as potentially bad. ) and fit residual (fit minus measurement) in gray and red, respectively; χ j is an estimate of the goodness of fit (relative to the noise) in fit window j and is computed as defined in part 1 [23].
We reject all soundings flagged to have potentially reduced quality (quality flag = 0) or failing a data integrity test (e.g., unreasonable sounding ID or time).We filter out potentially "tricky" scenes with solar or satellite zenith angles greater than 70°, latitudes beyond ±70°, or extreme surface roughnesses (standard deviation of the surface elevation) greater than 1000 m.In Figure 2, this filter is referred to as LAT/SUZ/SAZ/σALT.
We use the spike EOF analysis provided with the OCO-2 L1b data [24] and accept only soundings with less than 60 spectral pixels with potentially poor quality (referred to as bad colors) in the O 2 and no bad colors in the CO 2 bands.This primarily filters out soundings above South America and the South Atlantic because of contamination by cosmic rays within the South Atlantic Anomaly (SAA) caused by the shape of the inner Van Allen radiation belt (Figure 2).
Potentially aerosol contaminated scenes are filtered using OMI (Ozone Monitoring Instrument aboard Aura lagging OCO-2 by 30 min) L3 global daily gridded 1°× 1°UV aerosol index data (OMAERUVd v003 obtained from https://daac.gsfc.nasa.gov)with a filtering threshold of one.As described by Stammes [25], the UV aerosol index is derivedby comparing the measured reflectance ratio at two wavelengths (342.5 nm and 388.0 nm) to the calculated reflectance ratio using a Rayleigh atmosphere with an assumed surface albedo.The UV index is relatively insensitive to scattering aerosol layers or clouds, because it is mainly determined by the reduction of Rayleigh multiple scattering due to aerosol absorption.As illustrated in Figure 2, this filter most prominently impacts regions contaminated with desert dust aerosols.Potentially cloud contaminated scenes are filtered using MODIS Aqua (moderate-resolution imaging spectroradiometer aboard Aqua) L2 cloud mask data [26] with about 1 km × 1 km resolution (collection 6, MYD35, obtained from https://ladsweb.modaps.eosdis.nasa.gov).All MODIS ground pixels which are not flagged as clear or probably clear are considered as potentially cloudy.Aqua is lagging OCO-2 by 15 min and in order to account for the parallax effect and potential cloud movements, we use only OCO-2 data with at least 10 km distance to the nearest MODIS cloud.Even though 10 km is not overly conservative, this filter has a throughput of only about 11% and dominates the total pre-filtering throughput of about 4% (Figure 2).
Additionally, we filter out very dark or bright scenes, i.e., extreme detector fillings.Specifically, we ensure that the continuum radiance in each band is between 5% and 95% of the maximum band radiance as specified in the OCO-2 L1b ATBD (Algorithm Theoretical Basis Document, [24]).
Meteorological profiles come from ECMWF operational analysis data (http://apps.ecmwf.int)and have a resolution of six hours, 0.75°× 0.75°, and 137 height layers.As part of the preprocessor, these profiles are corrected for the actual surface height of the OCO-2 soundings and split into 20 layers containing the same number of dry-air particles.

Retrieval Adaptations
The theoretical bases of FOCAL's retrieval method is described in detail in part 1 of this publication [23].However, in order to analyze actually measured data instead of simulations, we made the following adaptations.

Noise Model
The measurement error covariance matrix [23] has to account not only for the measurement noise but for the total error including also the forward model error.The measurement noise of the instrument is well known from laboratory measurements and in-flight estimates.In theoretical studies (as in part 1, [23]), it is often assumed for convenience, that the measurement noise dominates and that other error components can be neglected, i.e., the noise model is approximated by the measurement noise.
Especially when analyzing measured data, unknown inaccuracies of the forward model can violate this assumption and lead to larger fit residuals and unrealistic results (and error estimates) because the optimal estimation retrieval puts too much trust in the measurement.This may happen, e.g., due to imperfect knowledge of the instrumental line shape function (ILS), unconsidered spectroscopic effects such as Raman scattering, inaccuracies of the spectroscopic data bases, approximations of the radiative transfer model, or imperfect meteorology.
Ideally, one would reduce the fit residuals to the instrument's noise level by improving the forward model, but this is often not possible.A potential solution is to fit parts of the residuum by empirical orthogonal functions (EOF) computed from a representative set of measurements [21].Another approach is to adjust the noise model so that it accounts for measurement noise plus forward model error (e.g., [16,17,27]) and a variant of this approach is also used by us.
Most forward model errors can be interpreted to result from inaccuracies of the computed (effective) atmospheric transmittance.However, the largest scene-to-scene variability of the simulated radiance is due to changes of, e.g., albedo and solar zenith angle.Therefore, it is reasonable to assume forward model errors to be approximately proportional to the continuum signal I cont which we obtain from up to nine spectral pixels at the fit windows' lower wavelength length ends.
We model the root mean square residual to continuum signal ratio RSR by where NSR represents the root mean square of the spectral 1σ radiance noise (as reported in the OCO-2 L1b data) to continuum signal ratio and δF the relative forward model error.
In order to estimate the free parameter δF, we analyzed a representative set of pre-filtered soundings (Figure 3) with a modified FOCAL setup for which we (quadratically) added 2% of the continuum radiance to the measurement noise.This overestimation of the expected total error effects that the retrieval usually converges towards values being not very far away from the a priori, i.e., values being more or less realistic.Additionally, we switched off the SIF retrieval (which is basically identical to a zero level offset retrieval in the SIF fit window) and switched on the retrieval of zero level offsets in all four fit windows.
If the instrument noise would dominate the total error, RSR and NSR would (statistically) lie on a 1:1 line.After the removal of outliers (Figure 4, gray dots), this is basically the case for the SIF fit window with forward model errors estimated to be about 0.5‰ of the (continuum) signal (Figure 4).The forward model error within the other fit windows is estimated to be between 2.5‰ and 3.2‰ (Figure 4).This means, the total error in dark scenes (large NSR) is still dominated by the instrumental noise but in bright scenes (small NSR), the forward model error dominates.15.10., 16.10., 15.11., 17.11., 12.12., 18.12.).This results in a manageable but still representative data set with respect to nadir/glint observation geometry, season, and spatial distribution.
Outliers are removed as follows: The data set is grouped in 35 NSR bins.Only bins with more than 500 samples are further considered.Within each bin, RSR should follow a χ 2 -distribution with as many degrees of freedom as spectral pixels of the fit window.The number of spectral pixels is always large enough to approximate the χ 2 -distribution with a Gaussian distribution.Outliers represent poor fits, e.g., due to complicated atmospheric conditions which cannot be well described by the forward model.As they usually enhance the RSR, we have to approach the expectation value of RSR from the lowermost values.The 2.28th and 15.9th percentile (Figure 4, red and orange points) of the Gaussian distribution are two and one standard deviations smaller than the expectation value.We used this to estimate the expectation value (Figure 4, green points) from which we determined the free fit parameter δF of Equation (1) (numerical values are shown in Figure 4).Note that adding 4% instead of 2% of the continuum radiance to the measurement noise gave similar results (not shown here).
Soundings with a RSR being more than two standard deviations larger than expected from Equation (1) are considered outliers.For this purpose, we fitted the second order polynomial and use it as threshold for the maximal allowed deviation from the RSR model (Figure 4, gray lines).
We define the noise model which modifies the reported OCO-2 L1b radiance noise N analog to Equation (1): (3)

Zero Level Offset Correction
We define as zero level offset (ZLO) an additive fit window-wide radiance offset.An apparent or effective ZLO can have various reasons such as residual calibration errors or unconsidered spectroscopic effects.Many of these effects can be expected to result in ZLOs being approximately proportional to the fit window's continuum radiance.In order to study potential ZLOs, we used the same modified FOCAL setup as in the last section but with the just defined noise model.The simultaneous retrieval of ZLOs reduce the uncertainty reduction for XCO 2 and renders the SIF retrieval impossible.Therefore, we aimed at a ZLO correction rather than a ZLO retrieval per sounding.We analyzed the same 24 days of OCO-2 data as in the last section but filtered for potential contamination with chlorophyll fluorescence because in the SIF fit window it is not possible to disentangle ZLO and SIF (Figure 5).For this purpose, we used monthly L3 MODIS Aqua chlorophyll-a data (obtained from https://modis.gsfc.nasa.gov/data/dataprod/chlor_a.php,[28]) over ocean and normalized difference vegetation index (NDVI) data over land (obtained from https://modis.gsfc.nasa.gov/data/dataprod/mod13.php).Figure 6 shows that we find a reasonably linear relationship (with correlations around 0.9) between the retrieved ZLO and the continuum radiance within the SIF and both CO 2 fit windows hinting at ZLOs in the range of 0.8-1.8% of the continuum radiance.In the following, we use the fitted linear relationship as ZLO correction for these three fit windows.In the O 2 fit window, the correlation between ZLO and continuum radiance is poor and the linear fit suggests a small negative slope.Therefore, we decided to not apply a ZLO correction for this fit window.gray dots: potential outliers, (i.e., no convergence, χ 2 > 2, or RSR exceeding threshold (see Figure 4)).green line: linear fit.

Filtering
First of all, we check for convergence, i.e., the state vector increment has to be small compared to the a posteriori uncertainty, the maximum number of iterations must not exceed 15, and χ 2 must not exceed 2 (for more details, see part 1 [23]).Convergence is achieved in about 74% of all pre-filtered OCO-2 soundings.Many non-converging soundings can be found near the SAA, the Saharan desert, and the Arabian peninsula (Figure 7).
In the next step, we check for each fit window if the RSR is smaller than the threshold for potential outliers defined in Section 3.1.The throughput of this filter, which is most active above the tropical oceans (Figure 7), is about 68%.
Additionally, we filter for potential outliers by parameters that have a unexpectedly large influence on the retrieved local XCO 2 variability.For the example data shown in Figure 7, this filter is most active in high latitudes and has a throughput of about 84%.
This filter bases on the idea that XCO 2 outliers increase the local retrieved XCO 2 variability and are likely correlated with extreme values of some of the candidate parameters: XCO 2 uncertainty σXCO 2 , lowermost layer of the CO 2 column averaging kernel, XH 2 O, XH 2 O uncertainty σXH 2 O, XH 2 O difference to the a priori, continuum radiance in the O 2 (I O 2 cont ), wCO 2 (I wCO 2 cont ), and sCO 2 (I sCO 2 cont ) fit window, gradient between first and second CO 2 layer ∇CO 2 , albedo difference between the O 2 and sCO 2 fit window, and all non CO 2 and H 2 O state vector elements [23].
For a representative two months data set (April and August 2015), we estimated the local retrieved XCO 2 variability VAR(∆XCO 2 ) as follows: For each sounding, we computed the difference ∆XCO 2 between XCO 2 and its 5°× 5°daily median and subsequently, we computed the variance of all ∆XCO 2 values falling in grid boxes with more than 100 samples.Now we searched for an upper or lower threshold for that candidate parameter which reduces VAR(∆XCO 2 ) most when removing 1‰ of all data points.We repeated this until 15% of all data points were removed.In order to reduce the complexity of the postprocessing filter procedure, we now identified the 10 most promising parameters separately for land and ocean and repeated the whole exercise to find filter thresholds for these 10 parameters.Figure 8 shows (especially for land) that the decrease in variability somewhat reduces after the removal of the first 5-10%.A potential interpretation is that in this range indeed primarily outliers are removed.After the removal of approximately 15% the decrease in variability is relatively constant over a larger range before it drops to zero when the last data points are removed.As the curves do not show a distinct kink, the choice to remove 15% of all data points is a bit arbitrary but seems to be a good compromise.Above land (Figure 8, left), the potential outliers filter reduces the variance of ∆XCO 2 from 2.05 ppm 2 to 1.68 ppm 2 .The Ångström exponent Å is the dominant parameter, contributing 38% to the variance reduction.All parameter thresholds found for the potential outliers filter above land are listed in Table 1 (left).
Above sea (Figure 8, right), this filter reduces the variance of ∆XCO 2 from 1.22 ppm 2 to 1.09 ppm 2 .In glint geometry, scattering is less important and the dominant parameter is the wavelength squeeze in the wCO 2 fit window λ wCO 2 sq , contributing 32% to the variance reduction.All parameter thresholds found for the potential outliers filter above land are listed in Table 1 (right).
The combined throughput of all three post-filters (convergence, residual, and potential outliers) is about 42%.
Table 1.Thresholds and prorated variance reduction of the 10 parameters of the potential outliers filter for soundings above land (top) and sea (bottom).In total, the variance of ∆XCO 2 is reduced from 2.05 ppm 2 to 1.68 ppm 2 above land and reduced from 1.22 ppm 2 to 1.09 ppm 2 above sea.See part 1 [23] and the main text for a description of the individual parameters.

Bias Correction
The basic assumption of the bias correction scheme is that on average XCO 2 has little variations on small scales so that correlations to more variable parameters can be used to quantify biases.As a consequence, the bias correction does not require any ground truth data except for the quantification of a globally constant offset.
The final bias model consists of four components: a footprint bias B f , a land/sea bias B ls , a linear bias model B lin , and a globally constant bias B g : These four components are successively derived from analyses of the same two months data set (April and August 2015) used to determine the thresholds of the potential outliers filter (Section 4.1).
The swath of OCO-2 consists of eight neighboring footprints across track.In order to determine the mean footprint anomaly, we used only soundings belonging to complete sets of eight neighboring soundings which all passed the post-filtering and which were entirely over land or sea. Figure 9 (right) shows the sampling of the roughly 180,000 soundings where this is the case.For each of these sets of eight soundings, we compute the footprint anomaly and subsequently the average footprint anomaly of all sets which we then use as bias function.Figure 9 (left) shows the footprint bias pattern at the example of all soundings passing the post-filtering in August 2015 and the corresponding bias function depending on the footprint f is: (5) In order to determine the land/sea bias, we corrected all post-filtered results for the footprint bias and analyzed all coastline overpasses with a maximum duration of 120 s (≈800 km along track), at least 100 soundings, and a land fraction between 30% and 70%.For each of these coastline overpass (162 with about 102,000 soundings, Figure 10, right) we computed the land/sea anomaly and hence the average land/sea anomaly (±0.8986 ppm).Figure 10 (left) shows the land/sea bias pattern corresponding to the bias function with l being the land/sea fraction.In addition to the footprint and land/sea biases, we found a small correlation (ρ = 0.16) between small scale anomalies of XCO 2 and the retrieved ILS squeeze in the wCO 2 fit window ILS wCO 2 sq .The small scale anomalies have been computed from 60 s chunks (≈400 km) of post-filtered and footprint and land/sea bias corrected OCO-2 orbit data with more than 10 soundings.In total about 5000 chunks with one million soundings (Figure 11 Finally, we correct for a global offset in respect to the optimized CO 2 concentration fields of the CAMS (Copernicus Atmosphere Monitoring Service) greenhouse gas flux inversion model [29] obtained from http://apps.ecmwf.int.CAMS is the CO 2 atmospheric inversion product of the European Union programme Copernicus that develops information services based on satellite Earth observation and other data (http://www.copernicus.eu/).The product is released twice per year and each time covers the full period from 1979 until the year before the release.It results from an analysis of CO 2 surface air sample measurements over the corresponding period and consists of optimized CO 2 surface fluxes over the globe and of associated 3D CO 2 concentrations.Version 15r4 used here analyzed 37 years of surface measurements .Its spatial resolution is of 3.75°in longitude and 1.875°in latitude, with 39 hybrid layers in the vertical.A full description of v15r4 is given by Chevallier [30], showing also, among other validation results, that its root-mean-square fit to TCCON measurements is usually close to 1 ppm.The global offset of FOCAL v06 relative to the CAMS model amounts: Figure 12 (left) shows an example of the total bias pattern consisting of all four components (footprint bias, land/sea bias, linear bias model, global bias).As can be seen, the large scale pattern of the total bias is dominated by the land/sea bias followed by the footprint bias and the linear bias model plays only a minor role.For comparison, Figure 12 (right) shows the total bias pattern of NASA's operational OCO-2 L2 product v7.3.05b[21,22] obtained from https://daac.gsfc.nasa.govand in the following referred to as NASA v7.3.05b.The overall variability is similar (0.82 ppm and 0.71 ppm for FOCAL v06 and NASA v7.3.05b,respectively) and the NASA product also has a distinct land/sea bias but with opposite sign, i.e., with largest values over sea (note the reversed color bar in Figure 12, right).

Model Comparison
In this section, we compare two months (April and August 2015) of post-filtered and bias corrected FOCAL v06 XCO 2 results with corresponding values of the CAMS v15r4 model accounting for FOCAL's column averaging kernels (e.g., [31]).Figure 13 shows 5°× 5°monthly gridded values for both months, FOCAL, and CAMS.The main spatial and temporal patterns are similar for FOCAL and CAMS with largest and smallest values in the northern hemisphere in April and August, respectively.Differences become larger at smaller scales, e.g., FOCAL sees larger values in natural and anthropogenic source regions of Sub-Saharan Africa and East Asia in April but also above the Sahara in August.However, it shall be noted that often only few data points are in the corresponding grid boxes.In grid boxes with more than 100 soundings, the standard error of the mean becomes negligible (≈0.1 ppm).Therefore, the difference between FOCAL and CAMS in such grid boxes can be interpreted as systematic temporal and regional mismatch or bias.The standard deviation of this systematic mismatch (including also representation errors) amounts to 1.0 ppm.The standard deviation of the single sounding mismatch after subtracting the systematic mismatch amounts to 1.1 ppm which agrees reasonably well with the average reported uncertainty of 1.2 ppm.

Comparison with NASA's Operational OCO-2 L2 Product
In this section, we compare the same two months (April and August 2015) of post-filtered and bias corrected FOCAL v06 XCO 2 results with NASA's operational OCO-2 L2 product.Comparing Figure 14 with Figure 13 (top) shows similar large scale temporal and spatial patterns and also the relative enhancement in the anthropogenic source regions of East Asia in April are similar.The most obvious difference is that the NASA product has about three times more soundings.The primary reason for this is the inherently poor throughput (11%) of the MODIS based cloud screening of the preprocessor (see discussion in Section 2).Analyzing only the same soundings in both data sets and considering the column averaging kernels, the NASA product has on average 0.7 ppm larger values than FOCAL which is (due to the used color table) most noticeable in the northern hemisphere of Figure 14 (right).The standard deviation of the difference is 1.1 ppm.As done in the last section, we separate the systematic mismatch from the stochastic mismatch by analyzing grid boxes with more than 100 co-locations.The standard deviation of the stochastic and the systematic mismatch amounts 0.91 ppm and 0.83 ppm, respectively.It is no surprise, that the stochastic mismatch is smaller than expected from the combined reported uncertainties because both data products base on the same L1b input data including the same noise spectra.

Validation with TCCON
In this section, we show validation results for one year (2015) of FOCAL v06 and NASA's operational OCO-2 L2 data and analyze the influence of the bias corrections.We used ground based TCCON [32] GGG2014 data obtained from http://tccon.ornl.govas reference data set and a similar validation protocol as Reuter et al. [19,33].We considered the column averaging kernels of all data products and co-located OCO-2 and TCCON measurements with a maximum time difference of 2 h, a maximum distance of 500 km, and a maximum surface elevation difference of 250 m.In cases with multiple TCCON measurements of the same site co-locating with an OCO-2 sounding, we averaged the TCCON measurements.In total we found about 179,000 and 378,000 co-locations for FOCAL and the NASA product, respectively.
Figure 15 shows the co-locations of all 19 sites with more than 250 co-locations.Per site statistics (bias and scatter, i.e., single sounding precision measured by the standard deviation of the difference to TCCON) are shown from north to south in Figure 16.Note that global offsets have been removed for both figures (−0.29 ppm, 0.35 ppm, −0.62 ppm, and −0.94 ppm for NASA, NASA not bias corrected, FOCAL, and FOCAL not bias corrected).
Both algorithms show a somewhat similar bias site-to-site pattern regardless whether the bias correction is applied or not.The largest differences of the bias corrected satellite products can be found in Sodankylä and Tsukuba with larger than 1 ppm biases of the NASA product and FOCAL.The standard deviations of the site biases are 0.82 ppm and 0.67 ppm for the NASA product and FOCAL (0.69 ppm and 0.78 ppm if no bias correction is applied).These algorithm-to-algorithm differences are barely significant because TCCON's per site accuracy is about 0.4 ppm (1σ) [32].
The analyzed algorithms also show a similar site-to-site pattern for the scatter with lowest values for the southern hemispheric sites probably due to smaller natural variability and, consequently, smaller representation errors.Both algorithms have a similar average scatter relative to TCCON before bias correction (1.62 ppm and 1.60 ppm for NASA and FOCAL) and after bias correction (1.31 ppm and 1.34 ppm for NASA and FOCAL).This means, both bias corrections primarily reduce the scatter rather than the site biases.However, according to Figure 12, the influence of the bias correction on the spatial bias pattern may be larger elsewhere.
FOCAL's retrieved XH 2 O has also been initially compared with TCCON in the same manner.However, due to the much larger natural variability of water vapor (typically spanning a range from 500 ppm to 7000 ppm), we used stricter co-location criteria (1h maximum time difference and 150 km maximum distance) reducing the number of co-locations roughly by a factor of five.The global offset amounts to −150 ppm, the standard deviation of the site biases is 206 ppm, and the average single sounding precision is 293 ppm.It should be mentioned that, in contrast to XCO 2 , the agreement significantly reduces when relaxing the co-location criteria.Conversely, a significant part of the observed deviations could still be due to representation errors which are expected to reduce for even stricter co-location criteria.This, however, would also further reduce the number of co-locations.[34], Białystok [35], Bremen [36], Karlsruhe [37], Paris [38], Orleans [39], Garmisch-Partenkirchen [40], Park Falls [41], Lamont [42], Anmeyondo [43], Tsukuba [44], Dryden [45], Pasadena [46], Saga [47], Ascension Island [48], Darwin [49], Reunion Island [50], Wollongong [51], and Lauder [52].The summarizing values ("overall") represent the standard deviation of the site biases and the average scatter relative to TCCON, respectively.The sites are ordered from north (top) to south (bottom).

Conclusions
We used the fast atmospheric trace gas retrieval FOCAL v06 to retrieve XCO 2 from one year (2015) of OCO-2 measurements and presented the applied pre-and post-processing methods including pre-filtering, noise model, zero level offset correction, post-filtering, and bias correction.
The strict pre-filtering bases on sounding quality, NASA's spike EOF analyses, OMI UV aerosol index, and MODIS Aqua cloud coverage.Due to the wider swath, MODIS cloud masking has the potential advantage to better account for 3D-effects caused by neighboring cloud contamination.However, as Aqua is lagging OCO-2 by 15 min, we chose a cloud filtering radius of 10 km, to prevent potential cloud movement from introducing cloud contamination.As a result this filter has a throughput of only about 11% and dominates the total pre-filtering throughput of about 4%.This makes this filter the main reason for FOCAL v06 having about three times less data points than NASA's operational product and an OCO-2 based cloud filtering as also done by NASA [53] is a potential solution for future FOCAL versions.
In order to consider not only instrumental noise but also (pseudo) noise of the forward model, we set up a noise model that depends on the instrument noise and one free fit parameter which we determined from the residuals of a set of relatively unconstrained retrievals.The noise model suggests that forward model errors (plus potential pseudo noise of the instrument) have a magnitude of 0.5-3.2‰ of the continuum radiance.This means that in dark scenes the mismatch of simulated and measured radiance is still dominated by the noise of the instrument but in bright scenes (e.g., above deserts) the forward model error dominates.
Apparent or effective zero level offsets can have various reasons such as residual calibration errors or unconsidered spectroscopic effects.For the SIF, and both CO 2 fit windows, we found linear relationships between the retrieved zero level offsets and the continuum radiances with slopes between 0.8% and 1.8%.As FOCAL v06 usually does not retrieve the ZLO per sounding, we correct the measured radiance with the derived linear relationships before the retrieval.
Post-filtering checks for convergence, for fit window residuals being smaller than the thresholds derived from the noise model analyses, and for potential outliers.Non converging soundings are often found near the SAA, the Saharan desert, and the Arabian peninsula.Soundings with too large residuals are often found above the tropical oceans.The filter for potential outliers is most active in high latitudes and its dominant input parameter (above land) is the retrieved Ångström exponent.The total post-filtering throughput is about 42%.The average RSR is 2.2‰, 3.0‰, 2.8‰, and 3.4‰ for the SIF, the O 2 , the wCO 2 and the sCO 2 fit window, respectively.
A bias correction has been applied to the post-filtered results which primarily bases on the assumption that XCO 2 has (on average) little variations on small scales so that correlations to more variable parameters can be used to quantify biases.As a consequence, the bias correction does not require any ground truth data except for a globally constant offset of −1.67 ppm.We found a distinct OCO-2 footprint dependent bias in the range between −0.97 ppm and 1.22 ppm but the most prominent global bias pattern results from the land/sea bias of 1.80 ppm.One could speculate that the land/sea bias has its origin in FOCAL's assumption of Lambertian surfaces.However, retrievals of simulated ocean observations done in part 1 [23] do not support this hypothesis.
We compared FOCAL v06 XCO 2 results with co-located values of the CAMS v15r4 model.Both data sets show similar large scale spatial patterns and the systematic temporal and regional differences have a standard deviation of 1.0 ppm.The standard deviation of the single sounding mismatches amounts to 1.1 ppm which agrees reasonably well with the average reported uncertainty of 1.2 ppm.
We also compared FOCAL's v06 XCO 2 with the operational NASA OCO-2 product.Large scale patterns of both data sets are similar and the most prominent difference is that the NASA product has about three times more soundings.The primary reason for this is the inherently poor throughput (11%) of the MODIS based cloud screening of FOCAL's preprocessor.The NASA product has on average 0.7 ppm larger values than FOCAL v06.The standard deviation of the difference between both products is 1.1 ppm.
Finally, we validated one year (2015) of FOCAL v06 XCO 2 data with and without bias correction as well as NASA's operational OCO-2 XCO 2 product with and without bias correction with co-located ground based TCCON observations.The algorithms show similarities in the site-to-site patterns of bias and scatter.The standard deviations of the site biases are 0.82 ppm and 0.67 ppm for the NASA product and FOCAL, respectively (0.69 ppm and 0.78 ppm without bias correction).These algorithm-to-algorithm differences are barely significant because TCCON's per site accuracy is about 0.4 ppm (1σ) [32].The average scatter relative to TCCON is 1.31 ppm and 1.34 ppm for NASA and FOCAL, respectively (1.62 ppm and 1.60 ppm without bias correction).
Additionally, we performed an initial validation of one year (2015) of FOCAL v06 XH 2 O data with co-located ground based TCCON observations and found site-to-site biases with a standard deviation of 206 ppm and an average single sounding precision of 293 ppm.However, due to the much larger natural variability of XH 2 O compared to XCO 2 , future studies are needed to quantify or minimize the influence of representation errors.
Processing an entire year of OCO-2 data with FOCAL v06 took about two weeks on a small cluster with 8 Intel Xeon E5-2687W CPUs with eight cores running at 3.1 GHz (released in 2012).This means that FOCAL is fast enough to process data from current and future satellites similar to CarbonSat [4,54] providing at least an order of magnitude more data with a reasonable amount of CPU cores -especially, when taking into account the to be expected CPU developments until launch date.Additionally, FOCAL's computations are simple enough for an adaptation to GPU architecture with reasonable effort which has the potential for a further substantial acceleration.

Figure 1 .
Figure 1.OCO-2 measurement of June 6, 2015, 12:01 UTC near Hamburg, Germany (sounding ID: 2015060512011938) fitted with FOCAL.(Top) Simulated and fitted radiance measurement in gray and red, respectively.(Bottom) Adapted measurement noise (see Section 3.1) and fit residual (fit minus measurement) in gray and red, respectively; χ j is an estimate of the goodness of fit (relative to the noise) in fit window j and is computed as defined in part 1[23].

Figure 2 .
Figure 2. Pre-filtering statistics of the 24 days data subset used for the noise model analysis (Section 3.1).The filters are applied in the order: Sounding quality, LAT/SUZ/SAZ/σALT, Spike EOF, OMI UV aerosol idx, MODIS clouds, and Radiance level (see main text for a description).The colors represent filter activity and soundings passing all filters are shown in white.Numbers in brackets represent filter throughputs.

Figure 4 .
Figure 4. Root mean square noise to signal ratio NSR versus root mean square residual to signal ratio RSR for all four fit windows.red points: 2.28th percentile within bins with more than 500 samples (35 bins in total).orange points: 15.9th percentile.green points: expectation value estimated from the 2.28th and 15.9th percentile.solid green line: RSR as computed from the RSR model (Equation (1)).gray points: RSR model plus 2σ estimated from the 2.28th and 15.9th percentile.gray line: outlier threshold.gray dots: potential outliers.dashed green line: one-to-one line.

Figure 7 .
Figure 7. Post-filtering statistics for April and August 2015.The filters are applied in the order: convergence, residual, and potential outliers (see main text for a description).The colors represent filter activity and soundings passing all filters are shown in white.Numbers in brackets represent filter throughputs.

Figure 8 .
Figure 8. Variance versus filter throughput for the 10 most promising parameters identified for the potential outliers filter.The colors represent the prorated variance reduction of the individual parameters.See part 1 [23] and the main text for a description of the individual parameters.(left) Land; (right) Sea.

Figure 9 .
Figure 9. FOCAL v06 OCO-2 footprint bias pattern (Equation (5)) at the example of August 2015 (left) and sampling of soundings used to determine the footprint bias (right).

Figure 10 .
Figure 10.FOCAL v06 land/sea bias pattern (Equation (6)) at the example of August 2015 (left) and sampling of soundings used to determine the land/sea bias (right).

7 )Figure 11 .
Figure 11.FOCAL v06 bias pattern of the linear bias model (Equation (7)) at the example of August 2015 (left) and sampling of soundings used to determine the linear bias model (right).

Figure 16 .
Figure 16.Validation statistics bias and scatter per TCCON site with more than 250 co-locations for FOCAL v06 and NASA's operational OCO-2 L2 product (both with and without bias correction).The summarizing values ("overall") represent the standard deviation of the site biases and the average scatter relative to TCCON, respectively.The sites are ordered from north (top) to south (bottom).