Validation of Carbon Monoxide Total Column Retrievals from SCIAMACHY Observations with NDACC / TCCON Ground-Based Measurements

Philipp Hochstaffl 1,∗ ID , Franz Schreier 1 ID , Günter Lichtenberg 1 ID and Sebastian Gimeno García 1,2 1 DLR—German Aerospace Center, Remote Sensing Technology Institute, 82234 Oberpfaffenhofen, Germany; franz.schreier@dlr.de (F.S.); guenter.lichtenberg@dlr.de (G.L.); sebastian.gimenogarcia@eumetsat.int (S.G.G.) 2 EUMETSAT—European Organisation for the Exploitation of Meteorological Satellites, 64283 Darmstadt, Germany * Correspondence: philipp.hochstaffl@dlr.de; Tel.: +49-8153-28-3056


Introduction
Obtaining space-based (s-b) measurements of the state of Earth's atmosphere is costly compared to many other atmospheric research activities.Once the instruments needed to take these measurements are in orbit, thorough verification and validation of the delivered data is required.
Therefore, considerable efforts have been invested in establishing a ground-based (g-b) validation infrastructure for atmospheric composition, temperature, cloud, and aerosol data acquired by satellite-based remote sensing instruments.Additionally, the stringent requirements of upcoming missions such as Sentinel-5P/4/5 require comprehensive validation campaigns and solid strategies.

The Environmental Satellite (ENVISAT) and Scanning Imaging Absorption Spectrometer for Atmospheric Chartography (SCIAMACHY) Instrument
In March 2002 the ENVISAT was launched into a low-Earth orbit.The routine operation comprising the nominal measurement program began in August 2002 and lasted until April 2012 when contact was lost [1].The satellite comprised 10 instruments with the SCIAMACHY, the Michelson Interferometer for Passive Atmospheric Sounding (MIPAS), and Global Ozone Monitoring by Occultation of Stars (GOMOS) dedicated to studying the Earth's atmosphere.The SCIAMACHY was an ultra-violet (UV) to short-wave infrared (SWIR, 1.4-3 µm) absorption spectrometer that observed the scattered and reflected solar spectral radiance transmitted through the atmosphere [2].
The SCIAMACHY's major research objective was the acquisition of information from Earth radiance spectra of various trace gases in the troposphere and the stratosphere.Among these gases were O 3 , CO, CH 4 , H 2 O, SO 2 , CO 2 , and NO 2 .Thus, the SCIAMACHY comprised eight detectors in the wavelength range between 214 and 2386 nm.Channel 8 observed radiance in the SWIR spectral range from 2259.38 nm to 2386.07 nm (4426-4191 cm −1 ) at a resolution of 0.26 nm, which is equivalent to a resolving power of 8689 to 9177.

Carbon Monoxide (CO) from Channel 8 of SCIAMACHY
In atmospheric remote sensing, trace gas concentrations are retrieved from measured radiance or transmission spectra.The accurate retrieval of CO from Channel 8 observations of the SCIAMACHY is demanding due to the low optical depth of CO compared to the total depth in this spectral region.To be more specific, for a vertical path through the atmosphere, only about one percent of absorption is due to CO molecules.In addition, the precision of the cloud-free SCIAMACHY measurements is strongly influenced by the reflectance (albedo) of the observed ground-pixel, because it determines to a large extent the signal-to-noise ratio (SNR) of the corresponding observed spectra.The retrieval of CO can therefore only deliver total column amounts without any information on the vertical distribution.In addition, the Channel 8 detector shows temporal degradation due to an accumulation of ice, which significantly affects the throughput.This makes gas retrieval an even more challenging task [3].Moreover, pixel degradation due to, e.g., solar radiation, is an ongoing process.According to Lichtenberg et al. [4] Channel 8 contains a substantial number of dead and bad pixels, with 40% deemed unusable by June 2009.For the CO fitting window from 4280 to 4305 cm −1 the number of usable pixels was reduced to only about 50 in the period between 2003 and 2005, and to even fewer afterwards.

Retrieval Codes
The retrieval of trace gas concentrations from radiance or transmission spectra poses an inverse problem that is typically solved by least-squares algorithms.Several codes have been developed for SCIAMACHY nadir SWIR spectra at different European institutes, e.g., the Weighted Function Modified Differential Optical Absorption Spectroscopy (WFM-DOAS) algorithm [5,6], the Iterative Maximum A Posteriori (IMAP)-DOAS [7] method, the Iterative Maximum Likelihood Method (IMLM) [3], and the Beer InfraRed Retrieval Algorithm (BIRRA) [8].Recently, SCIAMACHY spectra have also been processed with the Shortwave Infrared CO Retrieval (SICOR) algorithm developed for the operational data processing of the Tropospheric Monitoring Instrument (TROPOMI) that was launched on the European Space Agency's (ESA) Sentinel-5 Precursor (S5P) mission [9,10].Concurrent validation is a crucial task in processor/algorithm development in order to confirm applicability for scientific tasks and ultimately qualify operational algorithms or scientific tools.Based on such studies, processors can be applied on new data sets from other platforms with reasonable confidence.However, new fields of application require further validation or verification efforts.

The Network for the Detection of Atmospheric Composition Change (NDACC) and Total Carbon Column Observing Network (TCCON) Ground Truthing Networks
Fourier transform spectroscopy (FTS) is a well-established technique in passive remote sensing and observes thermal emissions (e.g., using the Infrared Atmospheric Sounding Interferometer (IASI) [11] or MIPAS [12]) or absorption (e.g., using the Atmospheric Chemistry Experiment-Fourier Transform Spectrometer (ACE-FTS) [13] or the Greenhouse Gases Observing Satellite (GOSAT) [14], which also observes in the thermal infrared range (TIR range, 8-15 µm)).High-resolution FTS is also used by numerous g-b observatories of the NDACC and TCCON [15].The instruments associated with these networks routinely record thermal emission spectra in the mid-infrared range (MIR range, 3-8 µm) (NDACC) and solar absorption spectra in the SWIR range (TCCON) at a number of stations worldwide (see Figure 1, Table 1) and utilize these to infer information about atmospheric constituent columns and concentration profiles (NDACC only), including CO.  [16] showing stations affiliated to ground-based (g-b) observing networks routinely measuring trace gases such as CO in the mid-infrared (MIR) and short-wave infrared (SWIR) ranges.The background color scheme also provides some information on the ground reflectance in the SWIR range (e.g., the variations of the reflectance on the continents and also the difference between land and the oceans).

Stations
Lat N [ The TCCON is the reference network for the validation of greenhouse gas satellite retrievals and enables the linking of satellite retrievals to the World Meteorological Organization (WMO) reference scale [17].In this study, the GGG2014 release of TCCON data is used.GGG stands for the whole software package.According to the TCCON website the error budget for CO is around 8.7% in total (see TCCON [18]).NDACC Fourier transform infrared (FTIR) data were obtained via the publicly available website (see http://www.ndacc.org).

Validation
In general terms, validation is defined as the process of evaluating the performance of a system against some equivalent information that is regarded as a 'true' reference [19].In the field of remote sensing from space, compliance of observations from satellite platforms with the actual state of the atmosphere is usually assessed through validation studies.In that sense, validation means comparing measurements acquired by s-b instruments to other measurements utilizing different measuring methods, e.g., in-situ measurements or g-b soundings.

General Aspects
With respect to a direct comparison it stands to reason that the closer in time and space the reference measurements are acquired, the better they quantify differences due to errors in the acquisition or retrieval process, i.e., the instrument or algorithm performance.However, space-borne and reference measurements do not exactly match in time and space, nor do they address the same volumes of air.Hence, direct comparison of observations from different observing systems is affected by representation errors.
In fact, the column measured by the SCIAMACHY is an average column above the area covered by a SCIAMACHY pixel (32 ×120 km 2 for observations in Channel 8) which greatly extends beyond the location of the point-like g-b station.According to Verhoelst et al. [20], non-perfect co-location in space is therefore a consequence of both a difference in sampling (i.e., a satellite pixel center generally does not coincide exactly with a ground station), and a difference in the way each instrument has a smoothed perception of the real, non-homogeneous atmosphere.The air mass of sensitivity for the s-b nadir measurements contains the ground pixel footprint, an extension towards the satellite, and, especially in the SWIR range, an extension in the direction of the Sun.In the SWIR range, these extensions correspond to a good approximation to the optical light path between the Sun, the surface reflection point, and the sensor.
Moreover, with respect to the SCIAMACHY, Borsdorff et al. [9] mention another issue, whereby for low radiance scenes, the retrieval noise error for SCIAMACHY measurements can exceed 100% of the retrieved column.That is, the comparison of individual CO columns using the SCIAMACHY to a reference is not sufficient at all.According to Gimeno García et al. [8], De Laat et al. [21], and de Laat et al. [22] averaging over an ensemble of pixels is necessary to reduce the instrument-noise error.
Finally, all of these issues need to be considered in an appropriate validation strategy of the CO product inferred from a retrieval algorithm.For mutually exclusive aspects a compromise must be made between, on the one hand, the abundance of comparison pairs, and on the other hand, non-instrumental comparison errors due to non-perfect co-location in space and time of satellite and ground-based measurements.

The SCIAMACHY CO Product
Early validation studies of the CO product retrieved from SCIAMACHY SWIR measurements by g-b data have been conducted by Sussmann and Buchwitz [23], Sussmann et al. [24], and Dils et al. [25].In recent years, additional validation efforts have been carried out by Borsdorff et al. [9,10], De Laat et al. [21], and Schneising et al. [26].Borsdorff et al. [9] additionally included Tracer Model version 5 (TM5) data for large-scale (northern and southern hemispheric Africa) intercomparison and Measurement of Ozone on Airbus In-service Aircraft (MOZAIC)/In-service Aircraft for Global Observing System (IAGOS) measurements.
In all of these studies, important aspects of the comparison of g-b and s-b data were presented.With respect to methodology, many considerations described in those articles are not only true for SCIAMACHY but are valid for all s-b to g-b comparisons.The former studies particularly include validation of CO data against measurements from g-b FTIR spectrometers from either the NDACC, the TCCON, or both.As SCIAMACHY measurements do not provide any information about the vertical distribution of CO, it is reasonable to use TCCON columns for SCIAMACHY validation.In addition, integrals of vertical CO profiles provided by the NDACC are utilized for validation of the s-b total column retrievals.Usually, if total columns calculated from vertical profiles of different remote sounding instruments are to be compared, averaging kernels have to be taken into account.Nonetheless, due to the small differences in vertical sensitivity between g-b FTIR and SCIAMACHY, and because the representation error dominates, a direct comparison (i.e., a comparison without applying total column averaging kernels) of the columns was considered possible without introducing significant intercomparison errors [23].
So far, only Borsdorff et al. [9,10] have presented full-mission data set validation of SCIAMACHY CO vertical column densities employing the SICOR algorithm.Dils et al. [25] considered SCIAMACHY data that covered the January to December 2003 time period that were produced using three different retrieval algorithms, namely the WFM-DOAS algorithm (version 0.5 for CO), the IMAP-DOAS method (version 0.9 for CO), and the IMLM (version 6.3).Sussmann and Buchwitz [23] used CO vertical profile retrievals from g-b solar FTIR measurements at Zugspitze, Germany (January-October 2003) to validate columnar CO measurements retrieved from SCIAMACHY spectra with the WFM-DOAS algorithm version 0.4.

Goals and Structure of this Study
The objective of this study is to validate the BIRRA level 2 prototype processor using the full-mission level 1b SCIAMACHY data set.Both land and ocean pixels are used in this intercomparison.In addition, the study places emphasisis on mitigating errors induced by representation deficiencies due to averaging and analyses the effect of different comparison methodologies-a crucial step in thoroughly quantifying the performance (accuracy and precision) of a retrieval algorithm or instrument.
The paper is organized in five sections.Our methodology is described in Section 2 and the results in Section 3. Finally, the discussion and conclusion are provided in Sections 4 and 5.

Methodology
The analysis of a solar absorption spectrum essentially reveals the slant columns of the various absorbing gases.However, the dry air mole fractions are actually advantageous for comparison.These quantities are independent of surface pressure and humidity and are therefore much more useful for satellite validation.Therefore, before comparing the observations for validation, CO needs to be harmonized across the observing systems.

Ground-Based Product Definition
The vertical column density (VCD) in [mol cm −2 ] is the number of molecules above the surface defined as with n µ the number density [mol cm −3 ] of molecule µ.The surface pressure p is given by where m is the molar mass and g the gravitational acceleration of the Earth.In Equation ( 2), m g is the column-averaged value.Within the homosphere, roughly including the troposphere, stratosphere, and mesosphere, the molar mass of air and gravity is a good approximation constant.
The column mixing ratio (CMR) f µ of molecule µ is related to the total column according to with As indicated above, it is useful to discuss dry CMR d µ , i.e., µ = H 2 O rather than the true (wet) mixing ratios because dry CMR is a better tracer, not being subject to strongly varying H 2 O variations.As the mole fraction and dry air mole fraction are connected according to Note that the molar masses of m H 2 O = 18.02 g mol −1 and m air = 28.96g mol −1 are different.
accounts for this by correcting the mass for the dry air mixing ratio with the fraction of the H 2 O mixing ratio, as H 2 O contributes less to the surface pressure p on a per molecule basis than air.All this together leads to Equation (3), reading as and therefore Given that f H 2 O 1, Equation ( 7) can be approximated according to which constitutes the dry air volume mixing ratio for gas µ.The dry air CMR d µ is frequently designated as xCO and given in parts per billion in volume (ppbv).

NDACC
NDACC sites retrieve vertical profiles for a variety of molecules (see Table 2).Since the actual measurement obtains the slant column and not the vertical column, ζ = cos(θ) accounts for the local zenith angle 0 ≤ θ < π 2 . 1 ζ is commonly known as the geometrical airmass factor.Therefore, when calculating CO dry air column mixing ratio (xCO) for the respective site, Equation (9) has to be taken into account according to Table 2. Center wavelengths (for the NDACC) and spectral windows (for the TCCON and Beer InfraRed Retrieval Algorithm, BIRRA) used for the retrieval of CO.The NDACC relies on narrow spectral fitting regions [27], while the retrieval strategy of the TCCON prefers much wider spectral regions [28].[15], the TCCON scales a priori profiles of retrieved molecules similarly to the BIRRA.Hence, the shape of the vertical profile is invariant with respect to variations in molecular atmospheric densities.For TCCON sites, Equation ( 9) is used in addition to CO with respect to the co-retrieved oxygen O 2 .Assuming a dry air mole fraction of d O 2 = 0.2095, this is equivalent to the dry air column mixing ratio (details see Wunch et al. [29]).Rationing the CO and O 2 equations eliminates the denominator in Equation ( 9) to make d CO independent of surface pressure and gravity, leading to the equation xCO = 0.2095

The BIRRA
The BIRRA was developed at the Deutsches Zentrum für Luft-und Raumfahrt (DLR) and serves the operational SCIAMACHY processor for CO and CH 4 retrievals using Channel 8 and Channel 6 spectra, respectively.The CO total column amounts are inferred simultaneously with methane (CH 4 ) and water vapour columns and a Lambertian surface albedo from individual SCIAMACHY measurements assuming a non-scattering atmosphere.The validation study reported here is based on the BIRRA prototype version.

Algorithm
The BIRRA comprises the line-by-line forward model Generic Atmospheric Radiation Line-by-line InfraRed Code (GARLIC) [30,31] coupled to a least-squares [32] inversion algorithm for trace gas retrieval in the SWIR spectral region(see Gimeno García et al. [8]).In case of SWIR nadir observations, the radiative transfer equation through Earth's atmosphere with molecules µ, including H 2 O, reduces to Beer's law, describing the radiance µ = cos θ and I sun describe the solar zenith angle and incoming radiation from the sun, respectively.The 'true' (= to be estimated by the fit) optical depth τ µ (ν) is hence given by For the BIRRA, the state vector x = (η, β) includes nonlinear (η) and linear (β) parameters to be estimated.This separation of parameters enables the algorithm to utilize a separable least-squares fit-also known as variable projection [33]-in order to estimate the unknown quantities.The molecular scaling factor(s) α µ , the half width γ and the wavelength shift δ of the instrumental slit function S are estimated using a nonlinear least-squares fit; furthermore, the coefficients of the surface albedo r, modeled by a second-order polynomial depending on wavenumber, and the optional baseline correction b (again a polynomial but not used in this study) show up linearly within the model and are estimated using a linear least-squares fit.The model for the least-squares problem is therefore with n representing the numbers of linear parameters.m is the number of observations (spectral pixels) in the microwindow chosen for the retrieval.Notice that in the separable least-squares approach it is assumed that the model functions φ i,j (η) depend on the nonlinear parameters η, but not on the linear parameters β.Thus, for any given η a matrix Φ(η) comprising the model functions, φ i,j (η) is defined according to Now the objective function to be minimized with respect to β and η is given by where the inner minimization problem forms a linear least-squares problem.The overall minimization problem can therefore be represented only in terms of the nonlinear parameters η according to with the generalized inverse The advantages of the separable least-squares approach are that for the linear parameters β no initial estimate is necessary and the size of the Jacobian matrix is reduced which improves the condition and minimizes computing times.
Only for nonlinear parameters such as the scale factors of the optical depth of molecule µ is a priori information on the number density (see Equation ( 1)) required.For CO and CH 4 this information is taken from the Air Force Geophysical Laboratory (AFGL) atmospheric constituent profiles [34], while H 2 O and the auxiliary parameter temperature and pressure are taken from reanalysis data provided by the National Center for Environmental Prediction (NCEP).

Product Definition
Recall that in order to obtain atmospheric CO abundances, the BIRRA scales a representative CO reference profile.In general, to calculate the actual wet CMR of CO the inferred scaling factors are used according to However, there is actually no parameter α air showing up in the retrieval to account for the dry air CMR.Instead, differences between the retrieved and a priori CH 4 columns are used as a proxy to account for variations in the dry air column density according to This approach assumes that CH 4 is a well-mixed gas and that natural CH 4 variations in the atmosphere are small compared to variations of CO, with CH 4 changes produced solely by light path modifications (mainly by cloud shielding, for details see Gloudemans et al. [35]).Since CH 4 has strong absorption lines across the CO spectral fitting window in Channel 8, which allow is to determine the amount of CH 4 with good accuracy, this method is also suitable to detect optically thick clouds in the SCIAMACHY observations.In general, the error of the CH 4 scaling factors is 1-2 orders of magnitude smaller than the error of the CO scaling factors, depending on the scene.Intending to eliminate most of the uncertainties arising in the level-2 processing, postprocessing included the composition of quality criteria based on multiple parameters from the BIRRA output.For example, non-converging fits were filtered out, and errors of CO, H 2 O and CH 4 were used to eliminate data with extremely low signal-to-noise ratios.This was done particularly for measurements on the ocean benefit from the presence of clouds due to the low reflectance of water in the SWIR range (see Figure 1, Gloudemans et al. [35]).This combination of filter criteria allows for the selection of optically thick clouds over the oceans and both cloudless scenes with acceptable small errors and cloudy scenes over land.If the light path was enhanced by,for example, aerosols, and exceeded a certain threshold (a 10% the enhancement of the CH 4 scaling factor), observations were rejected.Further details on the quality criteria chosen for the CO retrievals in this study are described in Gimeno García et al. [8] Section 2.3.The CO dry air CMR defined is therefore actually given by

Weighted Averages
It stands to reason that averaging over lots of measurements requires accounting for incorporating representation deficiencies.For example, if an average of 10 measurements is already deemed trustworthy, representation errors might not be an issue.However, in case of the SCIAMACHY's CO product, the number of measurements to be averaged for a representative mixing ratio (with respect to error of the mean) is at least in the order of 10 2 .
A representative average value xCO for both s-b and g-b data with respect to space ρ and time τ is the weighted mean according to with M representing the number of observations, and ω = ρ κ τ the respective weight.The weights include a spatial ρ = r gb − r sb and temporal τ = t gb − t sb component, with r and t designating location and time of the observation, respectively.The CO spatial variation of column density is much larger than that of CO 2 and CH 4 [36,37].Many NDACC and TCCON sites are located in remote regions while some sites are located in polluted areas (Table 1).Since the validation aims to compare s-b observations for both background and enhanced CO levels, the exponent κ is introduced to account for the representation deficiencies introduced by the great variability of CO in space.Typically xCO is the monthly, seasonal, or annual mean.The corresponding standard deviation xCO e is defined according to In general, if weighting is applied it needs to be applied on both datasets.Note that Equation ( 22) is valid with respect to both the spatial and the temporal domain.

Space
In the spatial domain the calculation of the averages for the SCIAMACHY data accounts for the position of acquisition relative to the location of the g-b data set.Therefore, τ = 1 and ω = ρ κ .Note that since the location of the g-b reference site r gb is fixed in time, r gb = const.

Time
In order to calculate a representative average with respect to the temporal domain, the context was changed with ω = τ.In addition, a running average of t gb needs to be introduced since t gb is not constant.The interval selected for the running average is centered at the respective t sb and chosen to extend the same time span as the corresponding weighted average.For example, a 30-day weighted average of the s-b CMR corresponds a running average of t gb that incorporates g-b observations within t sb ± 15 days.

Bias
In order to quantify the retrieval accuracy per reference site, the bias was calculated as an error weighted offset according to with xCO sb e representing the standard deviation of the SCIAMACHY xCO in a certain time interval and M being the number of s-b and g-b averages (see Section 2.3), respectively.
In addition, the average of the s-b standard deviations at a specific site was used to characterize the accuracy of the bias b according to the standard error of the mean s e = xCO sb e / √ M, with M the number of measurements incorporated in xCO sb e .To estimate the overall performance, the global mean bias b was determined as the average of all station biases weighted by the standard deviation σ b of the respective biases.s e is the global (for all sites) standard error of the means.A similar approach was applied by Borsdorff et al. [9].

Averaging Multiple Years of CO
In order to exploit more observations within a given sampling area, multiannual averages for CO mixing ratios were calculated and refered to a common reference.Within the time interval 2003-2011, parameters for intercept a n and slope b n of xCO were estimated for each year n for both measurement systems and referred to a common reference.For this purpose, linear least-squares was applied.
Subsequently, the annual estimates for intercept and slope were averaged yielding the parameters a and b.The detrended data points xCO were then calculated according to The detrended observations xCO are largely independent with respect to the year of acquisition.Therefore more measurements belong to the common interval (e.g., a month, a year) compared to the single year analysis, and thus stricter thresholds can be imposed on filter criteria during post processing.

Results
The reference sites were selected with respect to the temporal data coverage and continuity of measurements as well as the reliability and completeness of auxiliary parameters such as surface pressure.The former two criteria were met if there were at least three years of observations covering the time period 2003-2011 with at least 10 months of observations.In case of NDACC stations all sites fulfilled this criteria, however, some TCCON sites such as Jet Propulsion Laboratory (JPL), Reunion Island, etc. had to be omitted.The latter criteria regarding auxiliary parameters posed an issue for some NDACC sites such as Arrival Heights, Lauder, or Wollongong.However, no TCCON sites were skipped due to this filter criterion.

Averaging of Measurements
The single most limiting factor in the global SCIAMACHY CO column product is its significant variability due to noise in the recorded spectrum.This clearly shows up in the CO values of individual observations in Figure 2. Furthermore, because Channel 8's pixel has a footprint on the Earth's surface of around 32 × 120 km 2 for the 180 • nadir, a direct comparison of BIRRA-retrieved CO columns was not reasonable.It was found that even comprehensive filtering of the CO product based on the norm of the residuum ([8] Section 4) was not able to deliver the sufficient quality required for the comparison of a single or few (i.e., tens) of CO observations.Figure 3 reveals that errors of individual SCIAMACHY retrievals have increased substantially since 2006.This seems to be primarily caused by omitting regular decontamination procedures from 2006 onward to dispose of the ice accumulated on the detector in Channel 8 (see [9] Table 1 and [2]).Furthermore, measurements conducted during this decontamination phase were commonly not suited for the adequate retrieval of parameters and most of them were filtered out during postprocessing.This effect is clearly visible in Figure 2 displaying the lack of proper CO retrievals at the turn of the years from 2003 through 2005.Also, the decreasing number of good spectral points (pixels) on the sensor plays a role (see [8] Figure 3).
Monthly-mean averages of CO mixing ratios constitute an adequate trade-off since the basic features of the CO annual cycle are preserved and at the same time they include enough measurements to get reasonable statistics for both SCIAMACHY and NDACC/TCCON observations (see Figure 4).Note that the validation studies cited in Section 1.5.2 also use a temporal interval of one month.Figure 4 also demonstrates that the adequate size of the temporal interval varies with the size of the sampling area.Moreover, the results indicate that the adequate selection depends on the quality of the s-b data and the problem investigated, hence also seasonal and annual averages were examined accordingly.Since local enhancements shall be preserved, a related issue that also demands an adequate trade-off is spatial averaging of the s-b observations within a given radius from the g-b reference site.In order to include equidistant s-b measurements with respect to a reference site, latitude and longitude coordinates of the measurements were converted to the great circle distance on the Earth's surface.In a first step, the chosen distances for spatial averaging were 500 km, 1000 km, and 2000 km, since previous validation studies of SCIAMACHY CO used collocation criteria within this range (e.g., Borsdorff et al. [9], De Laat et al. [21], and Dils et al. [25]).The results in Figure 4 suggest that the smaller the sampling area, the larger the interval for temporal averaging that should be chosen.So far, only Sussmann et al. [24] sampled SCIAMACHY observations within 2000 km of Zugspitze.Other preceding validation studies validated s-b data within a square-or rectangle-shaped area around the g-b site.The effect of the different methodologies are displayed in Figure 5. Spatial filtering with respect to distance delivers a smoother course of s-b CO averages throughout the year.The SCIAMACHY observations within various-sized areas around two the NDACC and TCCON g-b sites, namely Bremen and Izana (see Figure 1), are compared in Figure 6 and the results demonstrate that the adequate spatial and temporal range of averaging depends on the precision of the s-b observations at a specific site.The former station is located in the plains of northern Germany, while the latter is situated on an island in the sub-tropical region in the Atlantic ocean at about 2370 m above mean sea level.With respect to Izana, a station largely surrounded by water, observations of optically thick clouds were preferred in order to get an acceptable SNR for the retrieval.Note that the bias b designated in the legend is the deviation of the s-b to the g-b CO averages weighted by the respective s-b standard deviation (details on the bias analysis in the subsequent section).Another important aspect that needs to be considered is that for spatially weighted averages (used throughout the subsequent sections), the effective radius becomes seemingly smaller since close measurements account most while observations far away from the g-b measurement site have much weaker impact on the representative average.This basically means that in the case of weighted averages, the area significantly contributing to the average of s-b measurements is reduced.Therefore, we consider averaged CO values of 1 month and within 500 km of the g-b site as an appropriate trade-off for SCIAMACHY CO validation.

Bias and Weighting
The bias of SCIAMCHY observations to the g-b reference was analyzed using two different approaches.Initially, in Section 3.2.1, the classical unweighted averages for SCIAMACHY and g-b observations are used, while distance weighted averaging is treated in Section 3.2.2.

Unweighted Bias
Overall, Figure 7 shows that the bias of CO mixing ratios is ranging from almost 0 ppbv to −27 ppbv.In over 80% the biases range from around −5 to −15%.The global bias b = −12.1 ppbv represents the mean bias of all stations weighted by the respective standard deviation.The global standard error of the mean s e = 6.77ppbv is the average of all the stations standard errors of the mean.
In general, the results in Figure 7 agree with the findings by Borsdorff et al. [9,10] within around −5 to −10% for most sites.However, the CO mixing ratios calculated from the BIRRA retrieval have consistently low bias with respect to all sites in both networks, while Borsdorff et al. [9,10] found slightly positive biases at some sites.The data show different magnitudes of biases for different sites.Notice that some negative bias in the northern mid-latitudes in Europe and North America are significant with respect to standard deviation.These findings suggest that the comparison crucially depends on the CO a priori information which is not well characterized in polluted areas.Sites located in remote areas such as Ny Alesund, Izana, Wollongong, and Lauder do therefore show much better agreement (Figure 7).

Distance Weighted Bias
The impact on xCO averages was studied using linear (r −1 ) and quadratic (r −2 ) inverse distance weighting of spatially close measurements (not to be confused with the error weighting of the bias also used in Section 3.2.1).In general, at most sites, the biases are decreased in the case of inverse distance weighting.This effect can be observed for both the linear inverse and quadratic inverse weighting (Figures 8 and 9).In the case of linear weighting only a few sites exhibit slightly larger biases, such as Kiruna (NDACC), Bialystok, Wollongong, and Lauder (TCCON).However, the overall bias was reduced to −11.2 ppbv while the average standard deviation was increased to 16.6 ppbv.Furthermore, most sites show that quadratic weighting (with respect to inverse distance of SCIAMACHY measurements) increases the agreement of the compared averaged values.The majority of biases observed in Figure 9 are reduced even more compared to the linear approach and the distribution throughout the sites from north to south is significantly smoothed.This could indicate that the results in the linear and quadratic weighted cases are better suited for the estimation of the instrument and retrieval deficiencies compared to the unweighted results which might include larger fractions of representation errors.In other words, it is likely that a fraction of the revealed offsets are attributed to mismatch artifacts and not only incorporate instrument or retrieval issues.
Favorable examples to study this effect more closely are g-b sites located in polluted areas, especially large cities embedded in rural areas such as Toronto.While the SCIAMACHY seems to significantly underestimate CO at Toronto in the unweighted example (Figure 7), the quadratic weighting reveals that the offset is not significant but rather suggests to be introduced by less representative observations.However, discrepancies at Kiruna, Bialystok, and Bremen in the northern mid-latitudes still seem to be significant.In the case of Bremen, the negative offset is significant with respect to both NDACC and TCCON observations, which underlines the finding of a negative bias of the SCIAMACHY SWIR observations.The global bias was reduced to 11.9 ppbv compared to the unweighted approach, but σ increased to 20.8 ppbv.
Table 3 reveals a discrepancy of the global bias between the NDACC and TCCON of 6.2 ppbv for the unweighted case and 5.3 ppbv in case of distance weighting (r −2 ).In general the findings with respect to both networks are in accordance with the conclusions by Borsdorff et al. [9] in many aspects.The global b bias is larger for sites affiliated to the NDACC than those affilliated with the TCCON.With respect to the NDACC and TCCON, we found b ≈ −14 ppbv and b = −7.4 up to −9.1, respectively, depending on the weighting while Borsdorff et al. [9] revealed values of −9.2 ppb and −1.2 ppb for the clear-sky retrievals for the NDACC and TCCON, respectively.In case of cloudy-sky retrievals, Borsdorff et al. [10] found a negative bias b = −6.0ppb which is similar to our results for the TCCON (see Table 3).
In order to underline the previous findings, it was important to confirm that the bias of the SCIAMACHY data with respect to the g-b reference did not exhibit a linear trend throughout the mission.Therefore, a bias trend t in ppbv/year was calculated for the annual biases b at each site using linear least-squares regression.The results are depicted in Figure 10.The global trend t is the average of the bias trends weighted by the standard error of the fit.In order to get a complete picture, initially all sites for all available years were investigated.It was found that the number of sites that reveal a negative bias trend exceeds the number of sites with a positive bias trend.Note that the trend is within ±5 ppbv around most reference sites.In addition, for most sites no significant trend with respect to the accompanying standard error was identified.Most sites show a consistent standard error of about 5 ppbv/year in both the weighted and unweighted cases.Moreover, the magnitude of the standard error is in the order of (or even larger than) the calculated average trend at most sites (see Figure 10).The numbers in the figure indicate that the global trend of the bias is not significant for at least two cases.Hence, we decided that a linear trend correction of the results presented above was not required.Table 3.The respective biases b (bold) and standard errors s e separated for NDACC and TCCON.
The NDACC reveals a significant bias and standard error of the means.This is also true if linear (r −1 ) and quadratic (r −2 ) inverse distance weighting is applied.With respect to the TCCON, the bias is less significant and closer to the magnitude of the standard error.ppbv: parts per billion in volume.

Spatio-Temporal Weighting
In general, the findings suggest that accounting for the spatial mismatch by inverse distance weighting does not necessarily require the introduction of a temporal penalty term (accounting for temporal mismatch) in order to obtain representative results for comparison.Figure 11 shows the effects of the spatio-temporal weighting on the 90-day averages for Toronto in 2003.Overall, the outcome demonstrates that temporal weighting has only minor effects on the accuracy and should in general only be regarded as an option in areas where data quality is degraded, and therefore averaging over a larger number of observations in the space and/or time domain becomes necessary.

Multiannual Averages
Measurements in the SWIR channel suffered from decontamination procedures (see Section 3.1) often executed at the turn of the year.Fitting the intercept and slope based on least-squares is a process which is sensitive to outliers, and therefore, the months December and January were excluded in the calculation.In about 40% of the cases the intercept of the s-b retrievals matched within the standard deviation of the respective g-b intercept and in around 60% a positive slope of the g-b measurements matched with a positive slope of the s-b observations.Figure 12 shows two sites in the southern hemisphere where both intercept and slope coincide within the standard deviation of the g-b reference.
In Figure 13, s-b CO columns are compared to the Zugspitze (NDACC) and Garmisch (TCCON) reference sites.These g-b sites are distinct due to their close horizontal but great vertical distance (see Table 1).The permanently higher CO values at Garmisch might be an effect of the planetary boundary layer, since observations at Zugspitze are not influenced by this throughout most of the year.Notice that the 500-km circle (from where the SCIAMACHY measurements were taken) was centered at Zugspitze for both cases due to the proximity of both sites (≈8 km) and the large sampling area.
Figure 13a shows the comparison for the unweighted and weighted xCO averages.The results demonstrate that some fraction of the SCIAMACHY xCO bias seen at Garmisch and Zugspitze is likely to be an effect of spatial mismatch, i.e., representation error, and should therefore not solely be considered as a retrieval or instrument flaw.Figure 13b examines how the squared inverse distance weighting is affected by variations in the size of the sampling area.It clearly reveals that the offset becomes smaller the closer the measurements are taken from the g-b site.In this case, the weighted average of SCIAMACHY observations from within 500 km shows the smallest offset with respect to the g-b sites.However, the CO annual cycle unveils in the 1000-km and 2000-km cases but is not visible in the 500-km circle.The effect of weighting partly reveals the annual cycle at the reference sites.However, the amplitude is low and a slightly negative offset remains.Note that the standard deviation of the s-b measurements is included in the legend.

Time Series
For the NDACC and TCCON stations listed in Table 1, Figures 14 and 15 show the time series of dry air quadratic inverse distance-weighted monthly-mean CO CMRs.Both figures demonstrate that the large scatter of the BIRRA retrievals is mainly caused by the noise in the SCIAMACHY observations.The standard deviation of individual months sometimes even exceeds a typical mean CO concentration.The results underline that solar radiation reflected at the Earth surface or clouds reduces scattering due to a higher SNR.

Mission Averaged Global CO
The 2003-2011 averaged CO product is shown in Figure 16 with the corresponding errors in Figure 17.In order to acquire representative CO mixing ratios on a regular grid globally, averaging over the complete validation period was required.Beside increased CO concentrations over China, parts of India, and Indonesia due to pollution and wildfires, Figure 16 also reveals valuable information of the global transport of pollution in the Earth's atmosphere.For example, the outflow of CO over the Atlantic Ocean due to biomass burning in central Africa becomes clearly visible.However, the gradients in the CO columns appear to be less pronounced due to several reasons.One is that the averaging smoothes peaks of CO that occur during the wildfire seasons over central Africa.This shows that this effect is less pronounced in regions where high concentrations of CO prevail throughout the year (see eastern parts of China).Another aspect is that the filter criteria applied also accept scenes where optically thick clouds extend into the mid-troposphere, therefore providing good SNRs but obscuring the high CO concentrations in the lower troposphere and boundary layer.This reveals that the effect is most pronounced in the region of the Inter Tropic Convergence (ITC) zone.

Discussion
The BIRRA CO total column product of the SCIAMACHY Channel 8 spectra was validated for the entire mission.In general, the results suggest that weighting is capable of mitigating some incorporated representation errors that come into play when comparing satellite-averaged CO data to averages of g-b reference sites.
A global negative bias of about −10 ppbv and a standard deviation of 15-20 ppbv (depending on the approach used for averaging and the g-b reference network used, see Table 3) was found for the SCIAMACHY xCO.Since both networks perform direct sunlight observations using FTS, the fact that SCIAMACHY measurements acquired in remote areas within the subtropical bands tend to exhibit smaller negative biases compared to observations in the mid-latitudes suggests that part of the s-b bias is caused by cloud shielding of the lower troposphere and an insufficient representation of the CO a priori profile in those regions (see Figures 7-9).In the case of Izana, a site located at about 2.4 km above mean sea level, the effect of cloud shielding is small, hence the cloud induced bias is reduced.
The full-mission analysis from Borsdorff et al. [10] also included cloudy-sky retrievals and found similar results for the bias which was estimated to be b = −6 ppb.Another common feature our results is the larger negative bias for the NDACC as compared to TCCON (see Table 3).The differences might arise from the fact that both networks observe in different spectral regions and are using different retrieval approaches.

Weighted Averages
In the validation of the s-b to g-b reference, it is necessary to find representative averages of both data sets with respect to one another.Early validation studies (e.g., Sussmann and Buchwitz [23]) pointed out that even without correction, the statistical effect of reducing scatter by increasing the ensemble of averaged pixels surpasses the effect of including less representative pixels.However, a thorough validation effort should also take into account the incorporating representation issue.In particular, the variability of CO induces gradients on a regional basis which the validation methodology should account for when comparing two independent systems.
The results demonstrate that assuming xCO to be less representative with increasing distance from the reference is reasonable and that distance-weighted averages reduce the negative bias for the 2003-2011 period at most reference sites (see Figures 7 and 9).The different averaging techniques agree upon the fact that the global bias is not significant with respect to the standard deviation at most g-b locations for the 2003-2011 period (see Figures 7-9).Moreover, the methods agree upon the larger standard error of the mean bias in the Southern Hemisphere.It also became evident that the selection of larger sampling areas requires smaller temporal averages to be chosen for a given noise level (see Figure 4).The findings give confidence that the BIRRA retrievals are consistent on the global scale and for the time period of the SCIAMACHY mission.

Temporal Averages
Firstly, it is important to note that with respect to time, close temporal measurements of two observation sites do not necessarily mean similar concentrations since spatial gradients are much more pronounced compared to temporal ones (i.e., show smoother gradients).The concentration of CO in rural as well as urban areas will exhibit only weak variations throughout a monthly or seasonal period.Due to this weak fluctuation, the temporal domain, including temporal weighting in the analysis of the averages, requires utilization of the spatial distance of the measurements from the independent observing systems as a penalty term.This penalty term accounts for the decreasing representativeness of the measurements due to the spatial mismatch-although they may have been observed at the same time.Therefore, including a stronger penalty term for a spatial mismatch as compared to a temporal mismatches is regarded as a reasonable approach.It is also important to note that since spatial weighting only considers measurements with respect to their distance from the reference without taking into account directional gradients, spatial weighting is expected to deliver the best results when the CO concentration is isotropically distributed around a g-b site.However, even in cases of non-perfect isotropic distribution from the reference, the method of weighted averages seems to deliver more representative results than simple averaging without weighting (see Figure 13).

Multiannual Averages
Figures 2 and 3 clearly demonstrates that later years (2006 onward) in the SCIAMACHY dataset suffered from low SNRs, and may therefore require a modified approach for comparison.The multiannual averages, presented in Section 3.2.4,are demonstated as useful and capable in further reducing random errors.Although the multiannual averages cannot eliminate the negative bias completely (see Figure 13) and reveal disagreements in slope at most sites, the results improve for the majority of sites compared to the single-year analysis.

Spatio-Temporal Averages:
Section 3.2.3shows that incorporating temporal weighting, in addition to distance weighting, is not crucial since CO columns are quite constant within a month.Hence, closer, mutually temporal observations do not necessarily mean improved representativeness (in contrast to the spatially persistent CO gradients).However, if stronger weighting on spatial mismatch is deemed useful (e.g., in areas of strong spatial gradients) one might consider including a (less pronounced) penalty term that accounts for the temporal mismatch.Quadratic spatial weighting, applied in conjunction with temporal averages beyond 30 days, should perhaps be considered (see Figure 11).

CO from TIR vs SWIR Observations
As an alternative to the SWIR range, atmospheric remote sensing of CO can exploit the rotational band in the microwave [38][39][40] or the fundamental band in the TIR range.Several s-b nadir viewing instruments observe CO in the TIR: the Atmospheric Infrared Sounder (AIRS) [41], the Cross-track Infrared Sounder (CrIS) [42,43], the Infrared Atmospheric Sounding Interferometer (IASI) [44][45][46], the Measurement of Pollution in the Troposphere (MOPITT) instrument [47], and the Tropospheric Emission Spectrometer (TES) [48].Initially MOPITT retrievals utilized the TIR range only, but Worden et al. [49] demonstrated that multispectral retrievals exploiting both the near infrared (NIR, 0.8-3 µm) and TIR channels increase the sensitivity to CO.
A distinct advantage of the SWIR band over the TIR band is its almost uniform sensitivity down to Earth's surface.On the other hand, the CO absorption in the TIR is significantly stronger, i.e., the vertical optical depth τ CO of the CO fundamental band around 4.6 µm is about a factor of 100 larger than the optical depth of the first overtone band around 2.3 µm (see, e.g., the Atmospheric Infrared Spectrum Atlas, http://eodg.atm.ox.ac.uk/ATLAS/).Moreover, the methane optical depth is larger than the carbon monoxide optical depth in the SWIR range, whereas τ CH 4 < τ CO in the TIR range, at around 2100 cm −1 .
The accuracy of CO columns estimated from TIR observations is typically 10% (see e.g, [41,50]), i.e., much better than for SCIAMACHY: de Laat et al. [51], and Kopacz et al. [52] found that the random instrument noise related error of a single CO total column measurement is large, typically 10-100% or larger [9], and Tangborn et al. [53] stated that total error values generally range from 20 to 100% for the observations tagged as 'good'.The smaller noise related error of TIR retrievals has been confirmed by the analysis of AIRS observations using Column EstimatoR Vertical Infrared Sounding of the Atmosphere (CERVISA), a variant of the BIRRA based on Schwarzschild's equation appropriate for the thermal infrared range [54].

Conclusions
Dry air CO total column estimates from 2.3 µm reflectance measurements using SCIAMACHY from 2003-2011 have been validated against 18 stations of the g-b NDACC and TCCON.While observations from the NDACC covered the entire SCIAMACHY mission period, the TCCON observations could only be used to validate the CO product in the later phase of the mission.It was found that after postprocessing of the BIRRA retrieval output, the actual xCO noise still varied significantly between sites, ranging from around 100 ppbv up to 200 ppbv.This was similar to findings from other authors cited in Section 1.5.2 and therefore, averaging individual s-b CO observations was an essential task in order to validate the product.Here, a distance-weighted approach was chosen, to our knowledge for the first time, to compare the CO mixing ratios from spatially distributed satellite observations to point-like g-b measurements.The approach was demonstrated reasonable in validating CO from s-b sensors.The results also suggest that part of the discrepancy between the BIRRA-retrieved columns and g-b observations arises from representation errors, inadequate description of the a priori CO profile, and cloud shielding.In an upcoming study we aim to use model data as a reference in

Figure 1 .
Figure 1.World map created by Feist[16] showing stations affiliated to ground-based (g-b) observing networks routinely measuring trace gases such as CO in the mid-infrared (MIR) and short-wave infrared (SWIR) ranges.The background color scheme also provides some information on the ground reflectance in the SWIR range (e.g., the variations of the reflectance on the continents and also the difference between land and the oceans).

Table 1 .
The Network for the Detection of Atmospheric Composition Change (NDACC) and Total Carbon Column Observing Network (TCCON) g-b Fourier transform infrared (FTIR) stations used in this validation.The last two columns indicate the time span of the g-b data used for the comparison to the Scanning Imaging Absorption Spectrometer for Atmospheric Chartography (SCIAMACHY).

Figure 2 .
Figure 2. The SCIAMACHY and g-b reference site dry air column mixing ratios of carbon monoxide (xCO).The plot includes the column mixing ratios (CMRs) after postprocessing within 500 km of the reference sites (a) Izana, (b) Zugspitze, (c) Toronto and (d) Lauder from 2003 to 2011.The large scatter of the individual SCIAMACHY CO columns is mainly caused by measurement noise, while averaging of observations within 500 km of the reference site has only a minor contribution.

Figure 3 .Figure 4 .
Figure 3. Histogram of the errors of the CO columns (defined in Equation (20)), represented as the unnormalized error probability function (also see Section 4.2.1 in Gimeno García et al. [8]).The data content is the global fraction of CO columns that survived after postprocessing of the BIRRA retrieval output.The dashed lines indicate the respective median.Note the different range in the x-axis.(a) The first and last year of the nominal SCIAMACHY operation covering the complete time span; (b) The occurrence frequency of total column errors for the years 2004 through 2007 indicate a substantial degradation of the quality of the CO product from 2006 onward (blue and magenta dashed lines).The years 2003 through 2005, however, show similar and significantly smaller errors.

Figure 5 .
Figure 5. xCO averages for circle-and square-shaped sampling areas.(a) Averages of SCIAMACHY xCO values calculated for various distances from the reference site Jungfraujoch in 2003; (b) xCO averages within given latitudes and longitudes of the g-b site.Here, [4 × 4] • designates a square-shaped object with an extent of ±2 • in latitude and longitude from the reference site.SCIAMACHY observations within the circle-shaped sampling areas show better agreement for most cases in 2003.Note the different range of the y-axis.

Figure 6 .
Figure 6.xCO monthly averages around the collocated NDACC and TCCON sites in 2007.(a) SCIAMACHY xCO values calculated for various distances from the reference site in Bremen.Measurements of the SCIAMACHY within 1000 km show the smallest bias b; (b) xCO averages with respect to Izana in 2003.On this site the SCIAMACHY observations within 500 km of the reference site show the smallest bias in total for the TCCON.

Figure 7 .
Figure 7. Mean bias of NDACC (blue) and TCCON (purple) stations with co-located SCIAMACHY CO retrievals from 2003 to 2011.Bias is the average of the monthly-mean differences weighted by the standard deviations.The total (2003-2011) bias per site was subsequently calculated using the annual weighted biases.(a) Global bias b and respective global standard error σ.The dashed gray line indicates b; (b) The standard error of the means s e .

Figure 8 .Figure 9 .
Figure 8. Mean bias, standard deviation, and standard error as in Figure 7 but with monthly-mean averages of SCIAMACHY CO weighted according to the inverse distance from the reference site.(a) Global monthly-mean bias b and respective standard error σ; (b) The global standard error of the mean biases s e is the average of all standard errors of the means.

5 t 4 Figure 10 .
Figure10.Linear bias trend t of the mean biases b for the classical (unweighted) and weighted approaches.No significant bias trend is identified on the global scale, however, a few sites do show a significant negative bias trend.

Figure 11 .
Figure 11.The 90-day averages within 2000 km from the Toronto truthing site in 2003.In (a) only spatial weighting was applied in the calculation of averages while in (b) both spatial and temporal weighting was used.δ designates the offset of the averages between xCO from SCIAMACHY and Toronto.

Figure 12 .
Figure 12.Comparison of intercept and slope of s-b and g-b xCO.The years 2004 through 2011 were included in the analysis for Lauder depicted in (a).Results for Wollongong that include data from 2008-2011 are shown in (b).

Figure 13 .
Figure 13.The effect of distance weighting on SCIAMACHY xCO with respect to Garmisch (2007-2011) and Zugspitze (2003-2011) reference sites.The solid and dashed lines show the three-degree polynomial fit on the individual observations and σ ≡ xCO sb e .(a) Weighting of measurements within a given distance leads to better agreement with g-b references; (b)The effect of weighting partly reveals the annual cycle at the reference sites.However, the amplitude is low and a slightly negative offset remains.Note that the standard deviation of the s-b measurements is included in the legend.

Figure 16 .Figure 17 .
Figure 16.Dry air CO column-averaged mixing ratios in ppbv over land and the oceans.The values are averaged from January 2003 to December 2011 on a [1×1] • latitude/longitude grid.Measurements of clear-sky scenes and optically thick cloud conditions are included.