A Data-Driven Assessment of Biosphere-Atmosphere Interaction Impact on Seasonal Cycle Patterns of XCO 2 Using GOSAT and MODIS Observations

Using measurements of the column-averaged CO2 dry air mole fraction (XCO2) from GOSAT and biosphere parameters, including normalized difference vegetation index (NDVI), enhanced vegetation index (EVI), leaf area index (LAI), gross primary production (GPP), and land surface temperature (LST) from MODIS, this study proposes a data-driven approach to assess the impacts of terrestrial biosphere activities on the seasonal cycle pattern of XCO2. A unique global land mapping dataset of XCO2 with a resolution of 1◦ by 1◦ in space, and three days in time, from June 2009 to May 2014, which facilitates the assessment at a fine scale, is first produced from GOSAT XCO2 retrievals. We then conduct a statistical fitting method to obtain the global map of seasonal cycle amplitudes (SCA) of XCO2 and NDVI, and implement correlation analyses of seasonal variation between XCO2 and the vegetation parameters. As a result, the spatial distribution of XCO2 SCA decreases globally with latitude from north to south, which is in good agreement with that of simulated XCO2 from CarbonTracker. The spatial pattern of XCO2 SCA corresponds well to the vegetation seasonal activity revealed by NDVI, with a strong correlation coefficient of 0.74 in the northern hemisphere (NH). Some hotspots in the subtropical areas, including Northern India (with SCA of 8.68 ± 0.49 ppm on average) and Central Africa (with SCA of 8.33 ± 0.25 ppm on average), shown by satellite measurements, but missed by model simulations, demonstrate the advantage of satellites in observing the biosphere–atmosphere interactions at local scales. Results from correlation analyses between XCO2 and NDVI, EVI, LAI, or GPP show a consistent spatial distribution, and NDVI and EVI have stronger negative correlations over all latitudes. This may suggest that NDVI and EVI can be better vegetation parameters in characterizing the seasonal variations of XCO2 and its driving terrestrial biosphere activities. We, furthermore, present the global distribution of phase lags of XCO2 compared to NDVI in seasonal variation, which, to our knowledge, is the first such map derived from a completely data-driven approach using satellite observations. The impact of retrieval error of GOSAT data on the mapping data, especially over high-latitude areas, is further discussed. Results from this study provide reference for better understanding the distribution of the strength of carbon sink by terrestrial ecosystems and utilizing remote sensing data in assessing the impact of biosphere–atmosphere interactions on the seasonal cycle pattern of atmospheric CO2 columns.


Introduction
Understanding the interactions between terrestrial ecosystem and atmospheric CO 2 column is significant to understand the variations of greenhouse gases in our atmosphere.The terrestrial biosphere is a large and persistent carbon sink [1].Every year about 1/3 of the anthropogenic CO 2 emission from fossil fuels combustion, cement production, and land-use change is absorbed by the terrestrial ecosystems [2].The CO 2 uptake and release induced by terrestrial biosphere activities essentially contribute to the seasonal cycle of atmospheric CO 2 column [3], which is, however, weakly related to fossil emission [2] and ocean fluxes [4].The CO 2 seasonal cycle amplitude (SCA) is mainly determined by the difference between the dominant CO 2 uptake during the growing seasons and CO 2 release during the autumn and winter seasons [5,6].To understand the underlying mechanisms, surface measurements and satellite observations have been widely used in analyzing the seasonality of atmospheric CO 2 [7][8][9][10][11][12][13].
CO 2 data from current surface monitoring networks have been crucial in gaining important insights of the mechanisms of CO 2 seasonal variations [7][8][9][10].Seasonal cycle of CO 2 at high northern hemisphere (NH) latitudes has been found to be dominantly induced by terrestrial activities, especially within tundra and boreal forest biomes using data from long-term surface measurements and atmosphere transport models [10].Variations in the strength of carbon sinks over North America can be well monitored by measurements from ground sites [7].An enhanced strength of the seasonal exchange of CO 2 , which is substantially larger than current model simulations in northern extratropical land ecosystems, especially in boreal forest, can be found from ground-based and aircraft-based observations [9].Climate-vegetation-carbon cycle feedback at high latitudes and the latitudinal gradient of the increasing CO 2 SCA can be precisely observed by a total carbon column-observing network (TCCON) [8].However, in global scale, it is still difficult to capture the full picture with detailed variations of biosphere-atmospheric CO 2 interaction by current ground measurement networks.With the advantages of global coverage and high-density measurements, satellite observations have the potential to improve our understandings in both global and local scales.
Results from validation with TCCON measurements show that XCO 2 retrievals from the Greenhouse gases Observing SATellite (GOSAT) well capture the XCO 2 seasonal cycle [14].Spatial distributions of SCA, derived from gridded data over 10 • by 10 • in space, are consistent among satellite observations from GOSAT and the SCanning Imaging Absorption spectroMeter for Atmospheric CHartographY (SCIAMACHY) and model simulations from Monitoring Atmospheric Composition and Climate (MACC 13.1) and CarbonTracker (CT2013b) at the global scale [15].In addition, the latitudinal gradient of biosphere-atmosphere CO 2 interactions has been revealed with satellite observations [11,13,16].However, geographic distribution of the magnitude of biosphere-atmospheric CO 2 exchange, which drives the atmospheric CO 2 seasonal cycle, and its underlying mechanisms are still highly uncertain [17,18].Analysis at finer scale of the interaction using biophysical parameters and XCO 2 retrievals from satellite observations is still lacking.One of the challenges is the irregular distribution and gaps of the XCO 2 retrievals in space and time, which result from certain constraints, such as cloud coverage and satellite observation modes [7].As a solution, a global mapping XCO 2 dataset derived from GOSAT XCO 2 retrievals, which has been demonstrated to be accurate in characterizing the spatio-temporal variations of XCO 2 at the global scale [19,20], can be made to investigate the detailed seasonal cycle pattern of XCO 2 .
On the other hand, satellite observations of terrestrial biosphere have been widely used in studying the terrestrial vegetation activities resulting in terrestrial carbon uptake and release.Vegetation indices and parameters, such as normalized difference vegetation index (NDVI), enhanced vegetation index (EVI), leaf area index (LAI), and gross primary production (GPP), indicate the strength of photosynthesis activity related to terrestrial CO 2 uptake [21][22][23], while land surface temperature (LST) has an impact on the respiration rate and evaporation of terrestrial biosphere [24].For a better understanding of where and how strong the biosphere's seasonal activities influence the seasonal cycle pattern of XCO 2 , this study proposes a data-driven approach to assess the impacts from terrestrial biosphere-atmosphere interactions on the seasonal cycle pattern of XCO 2 using satellite retrievals of XCO 2 observed by GOSAT and parameters related to terrestrial biosphere activities observed by the Moderate Resolution Imaging Spectroradiometer (MODIS) from June 2009 to May 2014.To use the satellite XCO 2 retrievals, which are irregularly distributed in space and time and have many gaps, a data-driven mapping approach based on spatio-temporal geostatistics is applied to fill the gaps and generate a gridded, mapping XCO 2 dataset [19,20] with a resolution of 1 • by 1 • in space, and three days in time (hereafter referred to as GM-XCO 2 ).The satellite observations of terrestrial biosphere, including NDVI, EVI, LAI, GPP, and LST from MODIS, are collected and re-gridded into the same spatial resolution with the mapping XCO 2 dataset.
In Section 2, we describe all the used data and methods.Results of XCO 2 SCA from GOSAT, correlation analysis of seasonal cycle between XCO 2 and terrestrial biosphere parameters, and seasonal cycle phase lags between XCO 2 and biosphere activities indicated by NDVI are presented in Section 3. Discussions on the uncertainty of GM-XCO 2 , and possible impact of LST on seasonal variation of XCO 2 on tropical land are presented in Section 4, followed by conclusions in Section 5.

Datasets Used in the Study
Satellite observations from GOSAT and MODIS, model simulations from CarbonTracker, and ground-based measurements for validation from TCCON for 5 years from 1 June 2009 to 31 May 2014 were collected.The specifications of the used datasets are shown in Table 1.To highlight the differences of seasonal cycle over various land cover types, we selected nine different regions of interest where there are distinctive correlations of seasonal cycle between XCO 2 and terrestrial biosphere parameters.Figure 1 shows the locations of these regions overlapped with land cover map.Table 2 gives detailed specifications of these regions.In addition, XCO 2 data from 12 TCCON sites (shown in Figure 1) and model simulations are obtained to investigate the uncertainty of GM-XCO 2 datasets used in this study.[19,20] based on spatio-temporal geostatistics.A detail description and accuracy assessment of the global mapping method can be found in Zeng et al. [20].Unlike Zeng et al. [20], in this study, when modeling the spatio-temporal trend during the mapping process, we use the original ACOS-GOSAT XCO 2 data instead of model simulations.In this way, the method makes full use of information inside the data and is therefore a completely data-driven mapping approach.The resulted mapping XCO 2 data, GM-XCO 2 , are gridded dataset and regular in space and time with spatial resolution of 1 • by 1 • and temporal resolution of 3 days.To screen out data with high uncertainty, mapping data with prediction standard deviation [19] larger than 2.0 ppm are not used in this study.Accuracy assessment of this mapping data is presented in Section 4.1.With high resolution in space and time, this unique global land mapping dataset of XCO 2 make it possible for both spatial and temporal comparison analysis at a fine scale with other satellite-observed parameters.As described in Zeng et al. [19], the mapping result and its data uncertainty from this data-driven gap-filling method rely heavily on the original XCO 2 retrievals.Therefore, any possible retrieval artifacts due to retrieval errors will be propagated into the mapping data.More description on the mapping data uncertainty will be discussed in Section 4.

Terrestrial Biosphere Parameters from MODIS
The satellite observations related to terrestrial biosphere activities and land surface temperature used are obtained from MODIS onboard the Terra or Aqua.These data include MYD13C2-NDVI, MYD13C2-EVI, MCD15A2-LAI, MOD17A2-GPP and MOD11C3-LST.NDVI and EVI are the vegetation indices positively related to the amount of the biomass and strength of vegetation photosynthesis [28], LAI indicates the state of plant growth [30], GPP represents the ability of plants to capture carbon [29], and LST affects the soil and plant respiration rates and ecosystem photosynthesis [33].It has been shown that, in the tropical areas, the net effect of increasing LST is increasing atmospheric CO 2 [34].A detailed description of all the used datasets is shown in Table 1.

XCO 2 Simulations from Carbon-Tracker
Model simulations of CO 2 used for comparison are from CarbonTracker CT2015 [27], which has been adopted for comparison in several previous studies [11,13,20].In this study, the XCO 2 value is calculated from the CO 2 profile data within the local time around 13:00, using a pressure-averaged method as described in Conner et al. [52].The XCO 2 data from CT2015 is hereafter referred to as CT-XCO 2 .When comparing GM-XCO 2 and CT-XCO 2 , the difference due to averaging the kernel effect [49] is not considered here, since the difference of XCO 2 between applying and not applying the averaging kernel smoothing on model simulations is less than 0.1% [53].

Calculation of Seasonal Cycle Amplitude of XCO 2
The SCA of XCO 2 is determined by CO 2 uptake and release by terrestrial biosphere.The temporal variation of XCO 2 can be divided into two components, long-term yearly increases mainly caused by anthropogenic emissions [3] and inherent seasonal cycle mainly resulted from photosynthesis and respiration by the terrestrial biosphere [23].Fitting formulas for the XCO 2 seasonal cycle have been proposed in many different forms [13,14,20,54,55].In this study, we use the typical one, adopted by Keeling et al. [54] and Thoning et al. [55], to statistically fit the seasonal cycle and yearly increases of XCO 2 , which is given by the Equation (1) below, where k 0 is the XCO 2 at the start time, k 1 is the temporal growth rate of XCO 2 , and the annual harmonic functions, ∑ 2 i=1 (a i cos(2πit) + b i sin(2πit)), describe the seasonal variation of XCO 2 .The harmonic functions include an annual seasonal cycle and semi-annual seasonal cycle [56].We fit Equation ( 1) to the five-year time series of GM-XCO 2 .However, the inter-annual variations in the seasonal cycle shape is not included in the fitted curves.In this study, the yearly increase of XCO 2 in each grid is removed to obtain the seasonal GM-XCO 2 data (hereafter referred to as dXCO 2 ), which are used in analyzing the seasonality in the following sections.The SCA is calculated as the difference between maximum and minimum value of dXCO 2 from five years GM-XCO 2 data from Jun 2009 to May 2014.An example of fitting GM-XCO 2 using Equation (1) in a selected grid (46.5 • N, 111.5 • E) is shown in Figure 2.

311).
In the bottom panel, the seasonal GM-XCO 2 (dXCO 2 ) is calculated by removing the yearly increases from the temporal variation of GM-XCO 2 .The fitting residual is the difference between GM-XCO 2 and the fitting.Monthly-averaged seasonal GM-XCO 2 is the monthly mean of dXCO 2 , which is used in Section 3.2.

Correlating Analysis of Seasonal Variation between XCO 2 and Biosphere Parameters
The seasonal variation of XCO 2 is driven by the seasonal imbalance between CO 2 uptake and release related to ecosystem photosynthesis and respiration [13], which are the outcomes of the underlying biosphere-atmospheric interactions.However, it is still not clear how much of the seasonal variation of XCO 2 can be explained by surface biosphere parameters at global scale with fine resolutions, especially from a data-driven perspective.We use the GM-XCO 2 dataset and terrestrial biosphere parameters, including NDVI, EVI, LAI, GPP, and LST data, and implement correlation analyses on their seasonal cycles.Pearson correlation coefficients (PCCs) in seasonal variation between XCO 2 and each of the parameters (NDVI, EVI, LAI, GPP, and LST) for the terrestrial biosphere are calculated for each grid using monthly mean values, respectively.XCO 2 data less than 350 ppm are excluded; NDVI and EVI larger than 0 and valid LST, GPP, and LAI were used.The spatial distribution of PCC between the monthly mean of dXCO 2 and the biosphere parameters was used to quantify the impact of biosphere activities on the seasonal cycle pattern of XCO 2 .

Spatial Pattern of Seasonal Cycle Amplitude of XCO 2 and NDVI
XCO 2 SCA is an effective indicator of the strength of seasonal variation of XCO 2 , which is related to the increase and drawdown of CO 2 in the atmosphere [13].The spatial variation of SCA has been shown by Lindqvist et al. [14] using ground measurements from sparse TCCON stations.Further work by Kulawik et al. [15] using satellite retrievals has shown the global variation of SCA in 10 • by 10 • grids based on a traditional binning method, which, however, does not consider the detailed variation information within each grid.Based on our global land mapping data, Figure 3 shows the global land distribution of XCO 2 SCA derived from seasonal cycle of GM-XCO 2 from June 2009 to May 2014 using Equation (1).This is, to our knowledge, the first reported global map of CO 2 SCA at fine scale derived from real measurements.The goodness-of-fit evaluated by coefficient of determination (R 2 ) between mapping and fitting XCO 2 , as described in Section 2.2, is shown in Figure S1.We can see that R 2 for all fittings is larger than 0.9, indicating that the fittings well capture the seasonal cycle of GM-XCO 2 .This global map of XCO 2 SCA clearly shows the latitude gradient decrease from north to south that is consistent with previous studies using ground measurements and binned satellite observations [15,35,57].In the southern hemisphere (SH), the XCO 2 SCA are mostly lower than 4.0 ppm except some areas in Southern Africa, where the amplitude is above 4.0 ppm.It can be seen from this SCA map, moreover, that the range border of greater than 8 ppm extended over the high latitudes mostly with boreal forest in the NH.Moreover, the tropical and subtropical areas in India and Central Africa show abnormal SCA values larger than 8 ppm, which have not been seen previously and can be detected from our mapping dataset with fine spatial details.  1) to the 5-year time series of GM-XCO 2 and obtaining the annual mean of differences between peaks and troughs after subtracting the linear trend of the yearly increase.Two contour lines for amplitude of 8.0 ppm and 4.0 ppm are outlined with solid and dotted black lines, respectively.
To investigate how biosphere activities drive the spatial pattern of XCO 2 , we correlated the XCO 2 SCA, as shown in Figure 3, with NDVI SCA, an indicator of the strength of vegetation photosynthesis and its resulting carbon uptake.Corresponding to Figure 3, Figure 4 presents the NDVI SCA overlapped with amplitude contour lines of GM-XCO 2 shown in Figure 3, which is derived in the same way as XCO 2 SCA for the same period.Comparing Figure 3 with Figure 4, it can be seen that the overall spatial distribution of XCO 2 SCA shows a very similar pattern as that of NDVI, with a high correlation coefficient of 0.69 for global land and 0.74 for the NH land.Both the SCAs of XCO 2 and NDVI generally decrease with latitude from north to south.Especially, the amplitude of XCO 2 has the largest value ranging from 8.0 ppm up to 12 ppm over high latitude in the NH, corresponding well to the highest amplitude of NDVI in these areas, which is likely caused by the strong CO 2 uptake during the growing season by boreal forest (Figure 1) [13,58].The SCA of XCO 2 changes mostly from 4.0 ppm to 8.0 ppm over the mid-latitude area in the NH.It is remarkable to find that large amplitudes up to 8 ppm are presented over the Northern Indian area (8.68 ± 0.49 ppm) and some hotspots in Central Africa (8.33 ± 0.25 ppm), whereas the NDVI SCAs in these areas are relatively low.These areas, partly shown as CnI and GcA in Figure 1, are probably due to the CO 2 release from soil respiration of terrestrial biosphere in Northern India region or the large wild fires in the central Africa area.In addition, the temporal variation of anthropogenic emissions on the XCO 2 SCA in Northern India may partly contribute to these SCA anomalies.Further discussion is presented in Section 4.3.

Correlation between XCO 2 and Biosphere Parameters
In this section, correlation analyses are conducted, respectively, between seasonal variations of GM-XCO 2 data and several biophysical parameters associated with biosphere activities, including NDVI, EVI, LAI and GPP, to explore the underlying biosphere-atmospheric coupling process that drives the XCO 2 seasonality.Figure 5 shows the global distribution of the Pearson Correlation Coefficients (PCCs) with a significant level (p-value < 0.01) between monthly-averaged dXCO 2 and NDVI, EVI, LAI, and GPP, respectively, from June 2009 to May 2014 in each grid.Due to the lack of available data, the tropical forests in Amazon and Southeast Asia and the Tibetan Plateau are not considered here.
All of the PCCs between seasonal cycles of XCO 2 and seasonal variations of NDVI, EVI, LAI, and GPP, shown in Figure 5, present similar spatial patterns and are mostly negatively correlated, except for the bare land areas (shown in Figure 1).Especially, the seasonal cycle of XCO 2 show stronger and significant negative correlations with the biosphere parameters over Northeast Asia around 45 • N than those over the other areas.Among the indices of NDVI, EVI, LAI, and GPP, XCO 2 shows stronger negative correlation with NDVI and EVI than those with LAI and GPP globally.In the NH high-latitude region, the correlation coefficients are −0.69,−0.68, −0.66, and −0.59 for NDVI, EVI, LAI, and GPP, respectively.They are −0.74,−0.69, −0.64, and −0.67 in the NH low-latitude region.It is known that LAI and GPP are the vegetation parameters derived from semi-empirical biophysical models, which likely induce more uncertainty than NDVI and EVI that are directly retrieved from MODIS-observed radiance.Moreover, it can be seen that the PCCs between XCO 2 and LAI are slightly weaker than those from other indices in the low latitudes, which is likely because LAI is an indicator of vegetation biophysical structure rather than biochemical activities [59,60].The PCCs from EVI, moreover, are slightly higher than that from NDVI over Northeast China.This difference may be related to the saturation effect of NDVI in high biomass regions, while EVI data does not become saturated as easily as NDVI by using the non-saturated NIR band for high biomass regions [61].
We selected six regions of interest with a PCC higher than 0.5 distributed across multiple latitudes as shown in Figure 6 to investigate the correlation of seasonal variations of XCO 2 and NDVI from June 2009 to May 2014.These six regions of interest are shown in Figure 1, including FnC: forest in Northern Canada covered by 63% of forest, FeR: forest in Eastern Russia covered by 86% of forest, CwR: crop in Western Russia covered by 70% of cropland, CnC: crop in Northern China covered by 58% of cropland with mixed 18% grassland and shrub-land, GcA: grass in Central Africa covered by 36% of grassland mixed with 30% of forest, and CnI: crop in Northern India covered by 89% of cropland.It can be seen from Figure 6 that XCO 2 decreases sharply when NDVI increases quickly during the growing season in summer.A stronger negative correlation in the forest region (−0.68 in FeR) compared to the cropland region (−0.50 in CwR) at the same latitude zone can be found.PCCs between seasonal variations of XCO 2 and NDVI is up to 0.8 over GcA, CnI, and CnC.The rainy season over GcA boosts the grass growth and promote the CO 2 uptake.Strong correlation presented in CnI and CnC, imply that the seasonal crop farming from planting, growing to harvesting may have a large impact on the seasonal variation of CO 2 uptake and release.Moreover, from Figure 6, we can see that there is an offset between the time of XCO 2 troughs and NDVI peaks.The time delay, as lags of month, over the six regions of interest are also indicated in Figure 6.Comparing with phase of NDVI, the phase of XCO 2 is generally delayed by a lag of about 0-2 months, which generally agrees with results from Table 3 in [62] by Olsen et al., which shows the delayed time between seasonal amplitudes of XCO 2 and boundary layer CO 2 to be within 2 months [62], with generally a greater difference in high latitude regions compared to low ones in NH.This offset is probably due to the time delay in XCO 2 response to terrestrial biosphere activities associated with the mass of CO 2 in the column and vertical mixing and convection of CO 2 from the surface to the column profile [58,62,63].In this study, we define the time delay to be the lags of months between XCO 2 and NDVI that make their negative correlation the strongest.We can see the delayed time over the three regions (FnC, FeR, and CwR) in high latitude regions are over one month, and larger than the other three regions at low latitudes (CnC, GcA, and CnI), probably indicating a relatively slow vertical mixing due to very large changes of the CO 2 mass at high latitudes with strong biosphere absorption.The global pattern of this phase delay effect is elaborated in Section 3.3.7a, which is the averaged phase lag for five years of GM-XCO 2 and NDVI data.This map of phase delay effect, to our knowledge, is the first such map derived from a completely data-driven approach using satellite observations.Generally, the phase delay in southern hemisphere is either zero or one, probably due to the much smaller change of CO 2 mass in the column that needs time to be well mixed in column.In the NH mid-and high-latitude regions, interestingly, the pattern of the phase delay is similar to the seasonal cycle pattern derived from GM-XCO 2 shown in Figure 3 and from NDVI shown in Figure 4.The larger amplitudes, such as North America and North Eurasia boreal forest regions, correspond to a smaller time delay, suggesting that the strength of vertical mixing is much stronger in these areas, even though the biosphere-atmosphere exchange in these areas is also large.As shown in [64] by Huang et al., the strength of vertical mixing in boreal and arctic regions is not as strong as that in the tropics, which may lead to stronger phase delay effect in high latitude regions.Moreover, XCO 2 has a large footprint, which is supposed to have contributions from different oceanic and land regions.However, satellite observations of XCO 2 , such as from ACOS-GOSAT, are more sensitive to CO 2 variations within the boundary layer [65], which allows us to explore the biosphere-atmosphere interactions by correlating seasonal cycles of satellite-observed XCO 2 and surface biosphere parameters.
Figure 7b shows the PCC between seasonal variation of XCO 2 and NDVI if the phase delay effect is excluded.We can see that, over most of the regions, the PCC is stronger than −0.8, particularly for most areas in North America where PCCs are around −0.6 before excluding the effect.In addition, the negative correlation over some area of boreal forest, most areas of grassland and forest in Central Africa have been increased to stronger than −0.9.These results show that the correlation between seasonal cycles of XCO 2 and NDVI may be underestimated if this phase lag effect is not removed.Furthermore, it suggests that this phase lag effect associated with atmospheric transport should be considered when assessing the impact of biosphere-atmosphere interactions on seasonal cycle patterns of XCO 2 .

Evaluation Using Cross-Validation
Cross-validation is widely used in accuracy assessments for statistical models [20,66].In this study, we used cross-validation based on the Monte Carlo sampling technique adopted by Zeng et al. [20].The Monte Carlo method is carried out by randomly sampling 5% of the data that will be left out in cross-validation, and repeating this process 100 times.Please refer to Zeng et al. [20] for technical details.From the outputs of cross-validation, two datasets, the predicted dataset and the original observation dataset, are obtained.The distribution of observed values and predicted values of XCO 2 is shown in Figure 8.We can see the predicted XCO 2 well agrees with observed XCO 2 with high R 2 of 0.91, indicating a very strong correlation between the original ACOS-GOSAT data and the predictions.The mean absolute prediction error (MAPE), which is the averaged prediction error, is 0.85 ppm, indicating that the averaged absolute error for each prediction is less than 1 ppm.Specifically, 70% of the prediction errors are less than 1 ppm.These statistics from cross-validation study suggest that the used mapping method based on spatio-temporal geostatistics is effective and precise in generating global land mapping of XCO 2 from original retrievals.The uncertainty in the mapping XCO 2 data comes from a combination of the uncertainty from the mapping method and the uncertainty propagated from satellite retrieval error.This latter source of data uncertainty is discussed in the following section by comparing with TCCON.

Verification with TCCON
TCCON has been widely applied in verification of XCO 2 from satellite observations [14,20,36].In this study, we perform a comparison between TCCON XCO 2 data for 12 sites and the XCO 2 time series reconstructed from the global land mapping XCO 2 data.For each TCCON site, we use all available data within the period from June 2009 to May 2014.The averaging kernel, as described in [67], for comparing ACOS-GOSAT and TCCON data is used for this comparison.Time series of TCCON XCO 2 , GM-XCO 2 , and ACOS-GOSAT XCO 2 within 500 km of every TCCON site are shown in Figure S2.Table 3 shows the statistical results in terms of the number of coincident data pairs, bias, and SCA difference between the mapping dataset and TCCON.The SCA of TCCON XCO 2 shown in Table 3 is derived in the same way as SCA of GM-XCO 2 , as shown in Figure S2 which depicts the seasonal variations of XCO 2 from ACOS-GOSAT retrievals, mapping data, and TCCON measurements.
Figure S2 compares the seasonal variation of observed and mapping XCO 2 and TCCON measurements.We can see that the mapping XCO 2 data go through the original satellite observations and the mapping data well capture the seasonal variations of the observations.From Table 3, we can see that the overall absolute difference of SCA between GM-XCO 2 and TCCON is 0.4 ppm.The SCAs from GM-XCO 2 are systematically shallower than that from TCCON for all sites in continental Europe, which is consistent with conclusions from previous studies [14,15] of XCO 2 SCA using original GOSAT-ACOS retrievals, due to retrieval biases that were evaluated in previous studies [14,15].The SCAs of GM-XCO 2 are well consistent to that of TCCON (SCA difference < 0.7 ppm) at Park Falls, Lamont and Wollongong with long-term measurement that covers all the five years as shown in Figure S1.The largest absolute different SCA are 1.15 ppm in Darwin, 1.75 ppm at Orleans, and 1.38 ppm at Karlsruhe.It should be noted that Darwin is a coastal site of SH, whose SCA is small and has a combined contribution from local, oceanic, and adjacent land regions.From the comparison results, we can see that the mapping dataset can generally capture the seasonal characteristics of XCO 2 .However, the mapping result and its data uncertainty from this data-driven gap-filling method rely heavily on the original XCO 2 retrievals.As a result, the retrieval artifacts on original XCO 2 retrievals are propagated into the mapping dataset [19], primarily leading to the differences between the seasonal cycles of GM-XCO 2 and TCCON.As shown in Chevallier et al. [68], evidence of systematic retrieval errors can be found over dark surfaces of the high-latitude lands, particularly in the boreal region.These retrieval errors, which propagate into the mapping data, may play a role in the difference when compared to TCCON as shown in Table 3 and may lead to underestimation of XCO 2 SCA in the boreal region as shown in Figure 3.However, these possible errors (about 1 ppm for Continental Europe) are not likely to have a substantial impact on the SCA patterns, as we can see in Figure 3.The retrieval errors on XCO 2 retrievals may have a smaller impact on the correlation analysis between seasonal cycles between GM-XCO 2 and biosphere parameters, as shown throughout this paper, since it is mostly the phase of seasonal cycles that determine the PCC value instead of absolute values of XCO 2 .

Comparison of Seasonal Cycle between GM-XCO 2 and Model Simulations
As suggested by previous studies [11,27,53,58], model simulations can well reproduce the large-scale features of atmospheric CO 2 .On the other hand, satellite observations make direct measurements of XCO 2 and, therefore, be able to capture the local variability and a more detailed characteristic of XCO 2 variation compared to model simulations.Therefore, we compare the SCA of GM-XCO 2 to that of simulated XCO 2 from CarbonTracker in order to evaluate the satellite measurements and the mapping dataset.Figure 9 shows the global distribution of XCO 2 SCA derived from the CarbonTracker (CT2015) data, overlapped with the contour lines of GM-XCO 2 amplitudes shown in Figure 3. Comparing Figure 3 with Figure 9, it can be clearly seen that the spatial pattern and latitudinal gradient of GM-XCO 2 SCA is in good agreement with CT-XCO 2 , especially in the high-latitude NH areas where the contour lines from GM-XCO 2 amplitude match well with the changes of SCA from CT-XCO 2 .Moreover, the phenomenon of higher SCA over India and Central Africa than surrounding areas can be seen in SCA maps from both GM-XCO 2 and CT-XCO 2 .The SCA difference map of GM-XCO 2 and CT-XCO 2 is shown in Figure S3.Over the NH high-latitude area, the GM-XCO 2 SCA is generally lower than that of CT-XCO 2 , which agrees with the results, as shown in Table 3, that GM-XCO 2 SCA is generally lower than TCCON sites at the same area (Bremen, Karlsruhe, Orleans, Garmisch, and Bialystok).Over the low latitude area, GM-XCO 2 SCA is generally higher than that of CT-XCO 2 , especially over the areas within the overlapped GM-XCO 2 contour lines of 8 ppm and 4 ppm (Such as India and Southern Africa).There are two possibilities for these local inconsistencies: (1) the retrieve error of satellite observations may be large over these areas because of the impacts of thin clouds and disturbance of wildfire [57,69]; (2) the uncertainty of simulations by current models [8,70] may be large over these areas because of the sparseness of the ground-based measures and airborne observations used for assimilations to constrain the local CO 2 fluxes [20,66].This comparison shows that satellite observations of XCO 2 with appropriate analysis can be used to assess the earth system modeling coupled with carbon flux estimates, as also suggested by Hammerling et al. [71].We can conclude that the GM-XCO 2 SCA is generally consistent with that of CT-XCO 2 on a large scale, even though the discrepancies over some local areas need further investigation.4, the SCA for each grid is calculated by fitting its XCO2 time series using Equation ( 1) and obtaining the annual mean of differences between peaks and troughs after subtracting the linear trend.The two overlapped contour lines are from the XCO2 SCA (8.0 ppm in the solid black line and 4.0 ppm in the dotted black line) as shown in Figure 3.

The Possible Impact of Retrieval Density in High Latitude Regions
As shown in Figure 10, there are large gaps of satellite XCO 2 retrievals in high latitude regions, especially during the winter season due to snow cover in these regions and very small signal noise ratio in observation.For analyzing the XCO 2 SCA, the availability of retrievals around the peak and trough of the cycle is important.Fortunately, the ACOS-GOSAT XCO 2 retrievals cover the peak and trough of the XCO 2 cycle, as indicated by CT2015 simulations, suggesting that our data-driven mapping method can generally reproduce the SCAs in these regions from the satellite retrievals.However, at high latitude boreal regions, the ACOS-GOSAT retrievals may also be affected by systematic retrieval errors [14,68], so the SCA values at these high latitude regions, as shown in Figure 3, should be interpreted with caution.Further validations should be carried out with more TCCON data available in the future.

Possible Impact of LST on Seasonal Variation of XCO 2 on Tropical Land
Abnormal results are shown over some regions (e.g., CnI, GcA, FsA, and CeB) in the tropical area (30 • N-30 • S), where large XCO 2 SCA can be seen from both GM-XCO 2 and CT-XCO 2 , even the former is slightly larger than the latter, and high PCCs exist for NDVI and EVI, as shown in Figure 5.These regions are covered by vegetation, such as grass, forest, and crop mixed with bare land, where the interaction between CO 2 uptake and release from vegetation and soil respiration likely has impact on the seasonal cycle of CO 2 .As indicated from previous studies on CO 2 emission from soils, the soil respiration in the tropic and subtropical areas is much larger than high latitudes [72,73] and, therefore, likely has non-negligible impact on the XCO 2 seasonal pattern.LST, a remotely sensed terrestrial parameter, is effective in depicting the land surface temperature distribution at the global scale with long time-series.LST has been used as one of the critical parameters in estimating seasonal soil respiration over forests in the Midwest USA [74].Higher surface temperature is associated with stronger soil and plant respiration rates, but also possibly enhanced photosynthesis [33].However, the net effect of increasing LST in the tropical areas has been shown to result in increasing atmospheric CO 2 [34].In tropical areas, soil and plant respiration contribute a very important part to the seasonal variation of atmospheric CO 2 .The tropical carbon storage is very vulnerable to temperature change [34].Therefore, an investigation of the correlation between XCO 2 from satellite observation and LST can indicate the possible impact of LST on the seasonal variation of XCO 2 in the tropical areas, where the carbon storage has been shown to be sensitive to future temperature variabilities [34].
Figure 11 shows the correlation between seasonal XCO 2 and LST variation over the tropical areas.The results interestingly show the seasonal cycle of XCO 2 is highly positive correlated with seasonal variation of LST, which is opposite to the negative correlation with NDVI, as shown in Figure 5a.Especially, over Central Africa and Southern Asia, the PCC is generally larger than 0.70, indicating the possible impact of LST on the seasonal cycle of XCO 2 .Additionally, Figure 12 shows the detailed time series of the selected five regions of interest shown in Figure 1.The seasonal variation of XCO 2 is generally consistent with the variation of LST; especially, the variation around peak XCO 2 matches well with the variation around peak LST.Coincident with the maximum LST, terrestrial respiration and weakened photosynthesis could contribute to the maximum XCO 2 .As is shown in Sreenivas et al. [75], XCO 2 showed a positive correlation with temperature, and soil respiration and biomass burning may be an important terrestrial carbon source contributing to the CO 2 concentration in India.To further investigate the possible influence from anthropogenic emissions on the SCA anomaly, we found that the fossil fuel emission from bottom-up estimation in CnI [76,77] (shown in Figure S4) does show a seasonal cycle, which, however, has much less consistency than LST with XCO 2 around the peaks of the XCO 2 cycle.This may suggest that the SCA anomaly observed in CnI is partly contributed from fossil fuel emissions but dominantly contributed from LST variations around the spring period.The effect of terrestrial respiration in these regions need to be further investigated and directly observed LST could be an alternative parameter to its further investigation for lack of surface measurements.

Conclusions
The seasonal cycle of atmospheric CO 2 is mostly caused by CO 2 uptake and release from biosphere activity, which is profoundly connected to biosphere fluxes that determine the global net CO 2 sink.Satellite observations of XCO 2 by GOSAT, correlated with biosphere parameters from MODIS, not only expand the current in situ measurements tremendously, but also provide us with multi-parameter data of biosphere-atmosphere interaction to reveal the impacts of biosphere activity on the seasonal cycle pattern of XCO 2 .The data-driven analysis of atmospheric CO 2 coupled with biosphere parameters has the significant potential to improve our understanding of the global net CO 2 sink with massive increases of satellite data.In this study, we introduced a data-driven approach to assess biosphere-atmosphere interaction on the seasonal change of XCO 2 through collecting the global data, including XCO 2 from GOSAT, vegetation parameters of NDVI, EVI, LAI, and GPP, and land surface temperature from MODIS from the years 2009-2014.
The global land mapping dataset of XCO 2 (GM-XCO 2 ), derived from GOSAT XCO 2 retrievals, can reveal the spatial distribution of seasonal cycle characteristics of XCO 2 directly and at a global scale that were not involved before.Furthermore, GM-XCO 2 makes it possible to implement the correlation analyses, at fine scale, of seasonal variation between satellite observed XCO 2 and terrestrial biosphere parameters.We found that the seasonal cycle pattern of GM-XCO 2 is well corresponding to that from NDVI with high correlation coefficient of 0.69 for global land and 0.74 for the NH land.That indicate the significant influence from vegetation activities on the seasonal change of XCO 2 .Through the correlation analyses between GM-XCO 2 and NDVI, EVI, LAI, and GPP, it is found that NDVI and EVI have better correlations and spatial pattern with the seasonal variation of XCO 2 than LAI and GPP for stronger coefficients over all latitude zones.That is consistent with previous studies [78,79] that NDVI and EVI, directly retrieved from MODIS observed radiance, may be better parameters representing biosphere activities for constraining of the terrestrial CO 2 flux than LAI and GPP, which are obtained from semi-empirical biophysical models.Using XCO 2 and NDVI, moreover, we show the global distribution of time delay effect in terms of lags of months, which, to our knowledge, is the first such map derived from the completely data-driven approach using satellite observations.After removing the phase lags, PCC enhanced nearly −0.2 for most areas, especially for North America, where those were improved from weaker than −0.6 to stronger than −0.8.As a result, the phase lags between seasonal change of XCO 2 and the vegetation indices should be considered when assessing the impact of biosphere-atmosphere interaction on the seasonal cycle pattern of XCO 2 .
Considering the uncertainty of the GM-XCO 2 , evaluation with cross-validation, comparisons using TCCON measurements and CT-XCO 2 was conducted.The method presents a significant high R 2 of 0.91 between the predictions and observations in cross-validation.Results of evaluation with TCCON data demonstrate the sufficient accuracy of mapping XCO 2 with averaged absolute bias within 2 ppm over all selected 12 TCCON sites, and the similar SCA distribution between GM-XCO 2 and CT-XCO 2 indicating the availability of GM-XCO 2 in capturing the seasonal change of atmospheric CO 2 concentration, even though some discrepancy exists in the SCA estimation between GM-XCO 2 and CT-XCO 2 over low-latitude areas.Considering the strong terrestrial respiration, and its relationship with temperature over low-latitude zones, LST, an important remote sensing parameter, was used for correlation with the seasonal change of XCO 2 .Fortunately, high positive coefficients are shown in low-latitude area for LST and seasonal change of XCO 2 , which provide us a potential method of utilizing remote sensing parameters related to the photosynthesis and terrestrial respiration for better understanding the biosphere-atmosphere interaction for lack of surface measurements in the low-latitude areas.
Future studies, including improving the mapping XCO 2 data over the tropical areas, comparing the mapping XCO 2 with results from other models, and a detailed investigation of the discrepancies between observations and modeling, should be further explored.It is expected that the data-driven assessment developed in this study will be improved as more CO 2 observations are becoming available from newly-launched satellite, such as OCO-2, and upcoming satellites, such a as TanSat [80], as well as more remote sensing data on vegetation activities.

Supplementary Materials:
The following are available online at www.mdpi.com/2072-4292/9/3/251/s1. Figure S1.The global map of the coefficient of determination (R 2 ) between fitting-XCO 2 and GM-XCO 2 using Equation (1).R 2 in each grid represents the goodness-of-fit of the grid, such as one fitting shown in Figure 2. Figure S2: Temporal variation comparison for the 12 TCCON sites.As shown in these panels, the original ACOS-GOSAT XCO 2 retrievals within 500 km of the TCCON site are in gray dots.The TCCON data, smoothed by applying the ACOS-GOSAT averaging kernel, are indicated by blue dots.The data are chosen using coincidence criteria of within ±1 h of the GOSAT overpass time, and a three-day (one time-unit) mean is calculated for the comparison.The predicted TCCON site XCO 2 time series using the mapping approach are indicated by the red dots.Figure S3: SCA difference between averaged GM-XCO 2 and CT-XCO 2 in the mapping area from June 2009 to May 2014, which is calculated using GM-XCO 2 shown in Figure 3 minus corresponding values of CT-XCO 2 shown in Figure 9. Negative values indicate GM-XCO 2 SCA is lower than that of CT-XCO 2 , and positive values indicate that GM-XCO 2 is higher.

Figure 1 .
Figure 1.Regions of interest selected from global land region from 45 • S to 60 • N overlapped with land-cover data in 2009 obtained from Europe Space Agency (ESA, [25]).Nine different regions of interest are outlined as red rectangles, including Forest in Northern Canada (FnC), Forest in Eastern Russia (FeR), Cropland in Western Russia (CwR), Cropland in Northern China (CnC), Cropland in Northern India (CnI), Cropland in Eastern Brazil (CeB), Grassland in Central Africa (GcA), Mixed forest in Southern Africa (FsA) and Shrub-land in Northern Australia (SnA).Six regions in the Northern Hemisphere, including FnC, CwR, FeR, CnC, CnI, and GcA, are used for correlation analysis of seasonal cycle between NDVI and XCO 2 in Section 3.2, while five regions, including CnI, GcA, CeB, FsA, and SnA, are used for correlation analysis of seasonal variations between LST and XCO 2 in Section 4.2.Twelve selected sites of TCCON are marked with black dots and used for evaluating mapping data in Section 4.1.2.

1 GM 8 Figure 3 .
Figure 3. Spatial distribution of SCA of XCO 2 in global land areas derived from the GM-XCO 2 dataset from June 2009 to May 2014.The SCA for each grid is calculated by fitting Equation (1) to the 5-year time series of GM-XCO 2 and obtaining the annual mean of differences between peaks and troughs after subtracting the linear trend of the yearly increase.Two contour lines for amplitude of 8.0 ppm and 4.0 ppm are outlined with solid and dotted black lines, respectively.

1 Figure 4 .
Figure 4. Spatial distribution of NDVI SCA, which is calculated for each grid by fitting Equation (1) to the 5-year NDVI data and obtaining the annual-mean differences between peaks and troughs after subtracting the linear trend.The two overlapped contour lines are the XCO 2 SCA from GM-XCO 2 , with 8.0 ppm in solid black line and 4.0 ppm in dotted black line as shown in Figure3.

Figure 5 .
Figure 5. Pearson Correlation Coefficients (PCCs) between monthly mean of detrended GM-XCO 2 (dXCO2) and parameters for biosphere activities, including (a) NDVI, (b) EVI, (c) LAI, and (d) GPP from MODIS data products in each grid from June 2009 to May 2014.Only grids with significant levels of correlation (p-value < 0.01) are shown.In calculation of the correlation coefficient, all data are averaged by month.The overlapped lines are 8.0 ppm in solid black and 4.0 ppm in dotted black as shown in Figure 3.

Figure 7 .
Figure 7. (a) The global distribution of phase delay effect, in terms of lags of months, between seasonal variations of XCO 2 and NDVI, and (b) the correlation between XCO 2 and NDVI if the phase delay effect is excluded.Only grids with month lags of no more than three are shown.

Figure 8 .
Figure 8.The relationship between predicted XCO 2 and observed XCO 2 values in cross-validation of global land mapping of XCO 2 .The color grids represent the density of data distribution.The dotted line is derived from linear regression of predicted values of XCO 2 (Y) and the observed values of XCO 2 (X), which shows a significant linear relationship with R 2 equals 0.91 (p-value < 0.01) and good consistency of observed XCO 2 and predicted XCO 2 with MAPE equal to 0.85.The solid line shows the one-to-one line.

1 Figure 9 .
Figure 9. Global distribution of XCO2 SCA derived from CarbonTracker (CT2015) CT-XCO2 dataset from June 2009 to May 2014.Similar to Figure4, the SCA for each grid is calculated by fitting its XCO2 time series using Equation (1) and obtaining the annual mean of differences between peaks and troughs after subtracting the linear trend.The two overlapped contour lines are from the XCO2 SCA (8.0 ppm in the solid black line and 4.0 ppm in the dotted black line) as shown in Figure3.

Figure 10 .
Figure 10.XCO 2 from GOSAT-ACOS retrievals (black) and CarbonTracker (red) from June 2009 to May 2014 over mid-and high latitude of North Hemisphere.XCO 2 over latitude bands between 50 • -60 • N in (a) Eurasia and (c) North-America land regions, and between 40 -50 • N for the same two regions in (b,d), respectively.

Figure 11 .
Figure 11.Pearson correlation coefficients (PCCs) between the monthly values of detrended XCO 2 from GM-XCO 2 and the land surface temperature (LST) from the MODIS data products in each grid from June 2009 to May 2014 between 30 • S and 30 • N.Only grids with a significant level of correlation (p-value < 0.01) are shown.In calculation of correlation coefficient, all data are averaged by month.

Figure 12 .
Figure 12.Temporal variation of de-trended GM-XCO 2 (dXCO 2 , black lines) and LST (red lines) from June 2009 to May 2014 in five regions of interests.Vertical bars represent the range of ±1σ.
Figure S4: Monthly mean of detrended XCO 2 (dXCO 2 ) and bottom-up estimation of fossil fuel emissions (FFCO 2 ) from ODIAC (Open-source Data Inventory for Anthropogenic CO 2 ) and CDIAC (Carbon Dioxide information Analysis Center) over (a) cropland in Northern India (CnI) and (b) cropland in Northern China (CnC), two regions of interest as shown in Figure 1, from June 2009 to May 2014..

Table 1 .
Specifications of the used satellite observations from GOSAT and MODIS and model simulations from CarbonTracker spanning from 1 June 2009 to 31 May 2014.

Table 2 .
Specifications of the selected regions of interest including the location, major vegetation types and its area proportion, and averaged Pearson Correlation Coefficients (PCCs) of seasonal variation between GM-XCO 2 and NDVI or LST over each region of interest.Several regions of interests are selected for NDVI and LST, respectively.

1
Figure 6.Seasonal variations of monthly-averaged seasonal GM-XCO 2 (dXCO 2 , black lines) and NDVI (green lines) from June 2009 to May 2014 in six selected regions of interests as shown in Figures1 and 5a.Vertical error bars represent one standard deviation from its mean value.The PCC values given in the figure titles, between seasonal variations of GM-XCO 2 and NDVI, are averaged values for all grids within the region of interest.The time delay, in terms of lags of months, of the NDVI time series to maximize the negative correlation with XCO 2 is 1.6 months for FnC, with a minimum averaged PCC of −0.82.They are 1.0 month and −0.85 for FeR, 1.8 months and −0.83 for CwR, 0.6 months and −0.85 for CnC, 0.8 months and −0.86 for GcA, and 0.3 months and −0.81 for CnI.

Table 3 .
Statistics of comparison between GM-XCO 2 and TCCON data (smoothed by applying the ACOS-GOSAT averaging kernel), and their SCAs for the following 12 sites.Bias is calculated using GM-XCO 2 minus TCCON XCO 2 for each coincident data pair and averaged for each site.While almost impossible for surface measurements, the GM-XCO 2 and NDVI data in this study allow us to investigate the global distribution of phase delay effect in terms of lags of months.The results are shown in Figure