Multivariate spatial data fusion for very large remote sensing datasets

Global maps of total-column carbon dioxide (CO2) mole fraction (in units of parts per million) are important tools for climate research since they provide insights into the spatial distribution of carbon intake and emissions as well as their seasonal and annual evolutions. Currently, two main remote sensing instruments for total-column CO2 are the Orbiting Carbon Observatory-2 (OCO-2) and the Greenhouse gases Observing SATellite (GOSAT), both of which produce estimates of CO2 concentration, called profiles, at 20 different pressure levels. Operationally, each profile estimate is then convolved into a single estimate of columnaveraged CO2 using a linear pressure weighting function. This total-column CO2 is then used for subsequent analyses such as Level 3 map generation and colocation for validation. In principle, total-column CO2 in these applications may be more efficiently estimated by making optimal estimates of the vector-valued CO2 profiles and applying the pressure weighting function afterwards. These estimates will be more efficient if there is multivariate dependence between CO2 values in the profile. In this article, we describe a methodology that uses a modified Spatial Random Effects model to account for the multivariate nature of the data fusion of OCO-2 and GOSAT.We show that multivariate fusion of the profiles has improved mean squared error relative to scalar fusion of the column-averaged CO2 values from OCO-2 and GOSAT. The computations scale linearly with the number of data points, making it suitable for the typically massive remote sensing datasets. Furthermore, the methodology properly accounts for differences in instrument footprint, measurement-error characteristics, and data coverages. Disciplines Engineering | Science and Technology Studies Publication Details Nguyen, H., Cressie, N. & Braverman, A. (2017). Multivariate spatial data fusion for very large remote sensing datasets. Remote Sensing, 9 (2), 142-1-142-19. This journal article is available at Research Online: http://ro.uow.edu.au/eispapers/6545


Introduction
The monitoring of the global carbon cycle is an important component of atmospheric science due to carbon dioxide's impact on Earth's climate and biology.Carbon dioxide (CO 2 ) is one of the key inputs in photosynthesis, the process in which plants, algea, and cyanobacteria convert sunlight into chemical energy.It is also one of the by-products of respiration, the reverse process.In the atmosphere, CO 2 acts as a greenhouse gas, and its increase since the late 19th century is believed to be playing an important role in global warming [1].In water, CO 2 dissolves to form carbonic acid, which contributes to ocean acidification and poses a threat to food chains connected to the oceans.
A quantity often estimated in studying CO 2 in the carbon cycle is the column-averaged dry-air mole fraction, or more succinctly total-column CO 2 .Understanding its spatial distribution and temporal evolutions are important for understanding the carbon cycle and its implication as a determinant of climate.There are two remote sensing instruments-the Orbiting Carbon Observatory-2 (OCO-2) and the Greenhouse gases Observing SATellite (GOSAT)-that produce total-column CO 2 globally.Both instruments consist of spectrometers that measure that number of photons reaching the top of the atmosphere in several spectral bands.Since the amount of photons within a spectrum is a function of the CO 2 concentration within the atmosphere , the amount of CO 2 in the atmosphere can be inferred from photon counts across multiple spectra through a process called inverse modeling or optimal estimation [2].NASA's optimal-estimation algorithms for GOSAT and OCO-2 produce, at each observation location, CO 2 concentrations at 20 different atmospheric pressure levels; these 20 levels are then combined in a weighted average to form the column-averaged CO 2 estimate [3].It is this column-averaged CO 2 that is used in subsequent analyses such as Level 3 map generation, colocation, validation, flux inversion, and statistical data fusion.
In this paper, our new approach is to reverse the order and first perform data fusion on the CO 2 profiles before combining them to form an estimate of the column-averaged CO 2 .This accounts for the statistical dependence among the profile heights and should produce more efficient statistical inferences.This idea of leveraging statistical dependence to improve inferences has already been demonstrated in univariate and bivariate remote sensing applications.For instance, Nguyen, H. et al. [4] show that it is possible to obtain better inferences on geophysical processes by combining complementary and reinforcing data from multiple satellite instruments.This idea is used in [5] and subsequently in a non-spatial approach by [6] to obtain an estimate of the lower-atmosphere.In principle, a multivariate approach can improve upon the inferences for column-averaged CO 2 by working more closely with the retrieval processes for these two instruments.
In what follows, we leverage the computational efficiency provided by the Spatial Random Effects model [4,7] to solve a data-fusion problem where the data sources are massive and multivariate.In Section 2, we review the data sources used in this article.Section 3 gives the statistical methodology underlying Multivariate Spatial Data Fusion (MSDF).In Section 4, we apply this methodology to the GOSAT and OCO-2 Level 2 data from calendar-year 2015 and compare the predictions to independent data from the ground-based Total Carbon Column Observing Network (TCCON).Section 5 contains discussion and conclusions, including the extension of MSDF to spatio-temporal settings.

Remote Sensing Data Sources for Atmospheric CO 2
The Greenhouse gases Observing Satellite (GOSAT) was launched by Japan on 23 January 2009 as a joint venture by Japan's National Institute for Environmental Studies (NIES), the Japanese Space Agency (JAXA), and the Ministry of the Environment (MOE).It is a polar-orbiting satellite dedicated to the observation of column-averaged CO 2 and methane (CH 4 ), both major greenhouse gases, from space using spectra of reflected sunlight [8].GOSAT flies at approximately 665 kilometers (km) altitude, and it completes an orbit every 100 min.The satellite returns to the same observation location every three days [9].There are several 'retrievals' of the GOSAT raw-radiance data.In this paper, we make use of the NASA version, which was produced by the Atmospheric CO 2 Observations from Space (ACOS) team at the Jet Propulsion Laboratory.Hereafter, we will refer to this GOSAT dataset as ACOS data.
The Orbiting Carbon Observatory-2 (OCO-2) is NASA's first Earth remote sensing instrument dedicated to studying carbon dioxide's global distribution in the atmosphere.It was launched on 2 July 2014, and it uses three high-resolution grating spectrometers to acquire observations of the atmosphere.OCO-2 has a repeat cycle of sixteen days and a sampling rate of potentially one million observations per day, making it a high-density and high-resolution complement to GOSAT.The NASA retrieval algorithms for both instruments infer CO 2 concentrations in an atmospheric column from the observed spectra through optimal estimation [3].The outputs are available as 20-dimensional CO 2 profiles and column-averaged CO 2 concentrations.The latter is derived from the former using a pressure weighting function, which is a 20-dimensional vector of weights derived from local atmospheric conditions.A pressure weighting function is convolved with the 20-dimensional CO 2 vector in a linear combination to form the column-averaged estimate [10].
However, in principle, it is possible to fuse the 20-dimensional CO 2 data vectors from GOSAT and OCO-2 directly to obtain an optimal estimate of this vector profile.Then the column-averaged CO 2 value can be obtained by applying the pressure weighting function to the fused vector of CO 2 values.This multivariate-data-fusion approach has a distinct advantage when there is dependence down the profile, something that is expected for physical reasons due to atmospheric transport.
Attempts to apply spatial inferences on these datasets would have to deal with both the change-of-support issue and the massive data sizes in addition to the vector-valued nature of the observations.The first issue concerns the problem of inferring a spatial process at one resolution using data that are obtained at another resolution.Geophysical processes of interest are often assumed to be continuous and smooth, but most remote sensing satellites observe and record the relevant processes as pixel values, where each pixel corresponds to some area in the domain.These pixels are also called footprints, and in what follows the value observed over a footprint is assumed to be a spatial average of the true process over the area of the footprint plus a measurement error.When estimating the underlying processes, we need to properly account for the differences in the footprints' sizes, shapes, and orientations.Inferences that do not deal with this change-of-support issue appropriately are susceptible to a so-called "ecological fallacy," namely erroneous conclusions can occur when inferences drawn from data aggregated over units are assumed to apply to individual units, (e.g., [11]).

Multivariate Spatial Data Fusion
Spatial interpolation and statistical inference for massive data is an active area of research.Scalable spatial and spatio-temporal approaches in the recent literature include a hierarchical Bayesian spatio-temporal model with multiresolution wavelet basis functions and two data sources of different support [12]; science-based orthogonal eigenfunctions and multiresolution basis functions to capture residual dependencies [13]; modelling nonstationary covariance functions with multiresolutional wavelet models [14]; a hierarchical Bayesian model with FFT representation of Spatial Random Effects [15]; spectral parameterization of a spatial Poisson process [16]; approximate optimal prediction with dimension reduction through conditioning on a small set of space-filling locations [17]; a bivariate dynamic process-convolution model [18]; Fixed Rank Kriging based on the Spatial Random Effects model [7]; modelling with nonstationary covariance models using the discrete Fourier transform [19]; Fixed Rank Filtering and Fixed Ranked Smoothing based on the Kalman filter and the Spatio-Temporal Random Effects model [20]; and linking Gaussian fields and Gaussian Markov random fields using stochastic partial differential equations [21].
To our knowledge, there has been no attempt in the literature to combine multivariate profiles of considerable length (here, 20 atmospheric levels) from massive spatial datasets.The spatio-temporal methods in [5] are, in principle, generalizable to multiple profile levels, but a key parameter, the spatial covariance matrix of the Spatial Random Effects vector, increases quadratically in size with respect to the number of levels, thereby making it quickly infeasible.In this article, we modify the approach in [5] to use a three-dimensional (surface × height) spatial covariance model whose computational complexity does not depend on the number of atmospheric levels.
In this section, we introduce the data model and the process model of a hierarchical statistical model.The motivation and partial derivations for Multivariate Spatial Data Fusion (MSDF) are presented in Section 3.2.Multivariate spatial basis functions are constructed in Section 3.3.The model's parameters are estimated using the EM algorithm, which we describe in Section 3.4.

Data Model and Properties
Let Y(s, h) represent the CO 2 concentration at location s ∈ D and physical or geopotential height h ∈ H, where D and H represent the domain in the horizontal and vertical directions, respectively.Let the horizontal domain of interest be defined as ∪{A l ⊂ S : l = 1, . . ., N D }, which is made up of N D fine-scale, non-overlapping, Basic Areal Units (BAUs) {A l } with locations D ≡ {p l ∈ A l : l = 1, . . ., N D }, and S is the surface of a sphere that approximates Earth.Similarly, the vertical domain of interest is ∪{V m ⊂ R + : m = 1, . . ., N H }, which is a collection of non-overlapping Basic Vertical Units (BVUs) with H = {q m ∈ V m : m = 1, . . ., N H }. The BAUs and BVUs represent the smallest resolution at which we will make predictions with our model.
Suppose we have data at K different heights {h 1 , h 2 , . . ., h K } ⊂ H.For convenience, we will refer to these heights by their index k rather than their height h k (e.g., when referring to a dataset at height h 2 , we will simplify the notation and refer to the dataset with a superscript k = 2).We define the observations from instrument i at height k as an N (i,k) -dimensional vector, where A i,j is the j-th footprint of the i-th instrument, which is made up of an appropriate subset of BAUs {A l }.Here, for ease of exposition, we assume that there are two separate instruments to be fused (i.e., i = 1, 2), although in principle that number could be larger than 2. We assume that the observation at footprint A i,j and height h k is a spatial average of the true unobserved process Y(•, h k ) over the footprint, plus an instrument-and-height-specific Gaussian measurement-error process.That is, where u is a BAU in D, and (i,k) (•) is a measurement-error process that is potentially a function of the location, size, shape, and orientation of the footprint A i,j .Here, we assume that for a given i and j the measurement errors { (i,k) (•) : k = 1, . . ., K} are independent Gaussian random variables with standard deviations {σ (i,k) : k = 1, . . ., K}, where the measurement errors are independent of Y(•).
This measurement-error component in (1) may have a height-dependent non-zero mean that captures the instrument bias.Because many remote sensing instruments have multiplicative bias [5], we assume that the measurement-error process (i,k) (A i,j ) satisfies E( (i,k) (A i,j )) = c (i,k) E(Y(A i,j , h k )), where c (i,k) is a known multiplicative bias constant for the i-th instrument at the k-th height, and the case of zero bias is captured by c (i,k) = 0. We further assume that for i The underlying true process at height h is assumed to follow the form, where at height h, µ(•, h) is the large-scale trend process, ν(•, h) is the small-scale spatial-variability process, and ξ(•, h) is the fine-scale spatial-variability process.Equivalently, we can group Y(•, h) in ( 2) over all the BAUs in D into vector form as follows,

Y(h) = µ(h) + ν(h) + ξ(h); h ∈ H, where µ(h), ν(h), and ξ(h) are all N D -dimensional vectors, ν(h) and ξ(h) are mean-zero random vectors, and ν(h) is statistically independent of ξ(h).
We assume that the trend has the form µ(s, h) = t(s, h) α(h), and that the small-scale spatial variability term has the form ν(s, h) = S(s, h) η, which is the Spatial Random Effects (SRE) model [7].Then (2) can be written as, where, at height h, t(s, h) is a p-dimensional vector of known covariates at location s, α(h) is a p-dimensional vector of unknown regression coefficients, S(s, h) is a q-dimensional vector of given spatial basis functions (of location s and height h), and η is a q-dimensional Gaussian random vector with mean 0 and unknown variance-covariance matrix K. Similarly, Equation (3) can be stacked over the N D BAUs into vector form as follows: where T(h) and S(h) are the matrices [t(s Then the process can be further stacked over all BVU heights q m ∈ H, for m ∈ 1, . . ., N H , as follows: . . .
where the process vectors are N D N H -dimensional, the regression-coefficient vector is pN D -dimensional, and importantly the basis-function coefficient vector η remains q-dimensional.Recall that for a generally unknown q × q variance-covariance matrix parameter K, the basis-function coefficient vector is distributed as,

Multivariate Spatial Data Fusion (MSDF)
Having described the process model and its representation in terms of spatial basis functions, we now describe how to optimally combine (i.e., fuse) information when we have K-variate spatial data for two instruments, written here as {Z (k,i) : k = 1, . . ., K; i = 1, 2}.Notice that we have chosen to keep the discussion quite general in terms of fusing K-variate data from two different instruments; for our application, K = 20, but fusing sub-vectors of, for example, lower-atmosphere values where K < 20 is also possible.
We first concatenate all datasets at the same height and denote that dataset as the N (k) -dimensional vector Z (k) for k ∈ 1, . . ., K. The full data model for all heights h 1 , . . ., h K can be expressed as Z (1)  . . .
or equivalently, where the tilde "∼" on the components of ( 5) indicates that the corresponding term or process is aggregated over BAUs within the observations; see (1).For instance, the aggregated matrix S is made up of terms, where recall that A i,j is the j-th footprint of the i-th instrument.This procedure very effectively accounts for the change-of-support issue between the two instruments, taking advantage of the linearity of the SRE model.For more details, see [4].Given the formulation above, we can carry out spatial prediction of the process Y at all (BAU, BVU) combinations using a linear combination of the data Z ≡ {Z (k) : k = 1, . . ., K}.That is, our estimate of Y(s, h), the true process at location s and height h, is where a implicitly depends on s and h.Note that a is N-dimensional where N = ∑ K k=1 N (k) and 2) .Subject to the uniform unbiasedness constraint, we minimize, with respect to a, where the uniform unbiasedness constraint is 1-dimensional: T is an N × Kp matrix given by T (1) . . .0 . . . . . .
and B is a N × N matrix with the multiplicative bias coefficients down the diagonal.That is, B (1) . . .0 . . . . . .
The minimization of ( 7) with respect to a can be solved using the method of Lagrange multipliers.Let Σ ≡ cov(Z) and c ≡ cov(Z, Y(s, h)).Then, the solution â is Having derived the data-fusion coefficients â, we can produce the fused prediction and its prediction standard error at s ∈ D and h ∈ H, as follows, where â is given by (9).Computation of ( 9) and hence of ( 10) and ( 11) requires inversion of Σ, which is typically enormous.However, because of the SRE parameterization in (3), inversion of Σ can be computed exactly with linear computational complexity using the Sherman-Morrison-Woodbury formula [22].Due to the data model given by ( 5), the covariance matrix of the dataset has the following form: where V ≡ cov( ) and cov(ξ) = CE.The components of the fine-scale covariance term CE are both N × N matrices, and they are given by, E (1) . . .0 . . . . . . where matrix constructed from footprint overlaps.It is defined as follows, Under this parameterization, the inversion can be computed exactly using the Sherman-Morrison-Woodbury formula, where D ≡ CE + V.The procedure requires inversion of the N × N matrix D, which is typically very sparse, and inversion of K and (K −1 + S D −1 S), both of which are q × q.Since q is the number of spatial basis functions and is chosen by the user, it is typically much smaller than N, which results in very substantial speed-ups.The overall computational complexity for MSDF is O(Nq 2 ), where N, the total number of observations, is much larger than the number of basis functions q.Furthermore, the model in (3) does not assume isotropy or stationarity in the data.Together, the scalability and flexibility of MSDF makes it well suited for the typically massive and heterogeneous data found in remote sensing applications.
The covariance parameters K and {(σ . ., K} are typically unknown and have to be estimated from the data.The maximum likelihood estimates of these parameters are analytically intractable, so the preferred method is to obtain them iteratively using the EM algorithm (see [5,23] for more details).We present the details of the EM algorithm for MSDF in Section 3.4.

Constructing the Spatial Basis Function S(s, h)
In the previous section we defined the term S(s, h) as a q-dimensional vector of basis functions at location s and height h.The model in (3) allows scalable computations as a consequence of this formulation, but in practice specifying and estimating a consistent three-dimensional (varying in space and height) covariance function is difficult.This problem is somewhat related to that of estimating the joint covariance between mid-tropospheric and column-averaged CO 2 mole fractions as described in [5].There, they specified a covariance model in which the number of basis function centers q varied linearly with respect to the number of processes P they considered.That is, q = c • P, where c ≈ 300 and P = 2 in that case.However, P = 20 in our CO 2 -profile application, which for c = 300 yields q = 6000, a number that is too large for fast computations.
We take advantage of the fact that CO 2 concentration is a function of height and assume that the vector of spatio-temporal basis functions S(s, h) can be written as,

S(s, h) = τ(s, h) ⊗ B(s),
where B(s) is a b-dimensional multi-resolutional "horizontal" basis expansion over latitudes and longitudes (e.g., bisquare basis functions, wavelets, etc.), τ(s, h) is an r-dimensional "vertical" basis expansion, and ⊗ denotes the Kronecker product.The dimension b typical ranges between 100 and 300, while r is much smaller and ranges between 4 and 10.Note that the horizontal basis functions in B(s) are independent of height, while the vertical basis functions change depending on the location s and the height h.There is a good physical reason for this, as we explain below.
For B(s), we choose to use three resolutions of bisquare basis functions based on the Discrete Global Grid (for more detail, see [24]).The basis functions are chosen to be multi-resolutional, which enables the methodology to better capture spatial dependence at different scales [7].For τ(s, h), we choose to use cubic B-splines with exterior knots placed at the surface and the top of the atmosphere (for more details, see [25]).The interior knots should naturally be placed at the boundary of the atmospheric layers (planetary boundary layers, troposphere, mesospheres, etc.), where a change in the behavior of CO 2 is expected as the height transitions between different atmospheric regimes.
OCO-2 and GOSAT have 20 different levels {h k : k = 1, . . ., 20}, which range from the surface (0 km) to the middle of the stratosphere (approximately 32 km).Consequently, when combining profiles from these instruments, we use two geophysical transition points (planetary boundary layer height and tropopause height) as the interior knots between 0 km and 32 km.
The tropopause is the height at which the atmospheric temperature transitions from decreasing with altitude to increasing with altitude.Below the troposphere is the planetary boundary layer (PBL), which is another important transition point; it is the upper boundary of the the lowest layer of the troposphere where wind is influenced by friction.The meteorological and atmospheric regime changes substantially at these two vertical transitions, hence our decision to place the two interior cubic-spline knots there.In our application, we make the simplifying assumption that the PBL height is constant at 1 km across all latitudes.For the tropopause height we use data from the v5 AIRS Level 3 product.The tropopause height typically depends on latitude, longitude, and time, but we simplified the calculations by averaging the AIRS product across 1 degree latitude bands and using it in the calculations below.
Since we have two exterior knots and two interior knots, the resulting cubic-spline vertical basis functions have dimension five.More specifically, given a four-dimensional knot vector t(s) = (t 1 (s), t 2 (s), t 3 (s), t 4 (s)) , which are the surface, PBL, tropopause, and top-of-atmosphere heights, respectively, the vertical basis functions are τ(s, h) ≡ (B 1,3,s (h), . . ., B 5,3,s (h)), where The horizontal basis functions B(s) have approximately 300 basis functions, so q ≈ 300 × 5 = 1500, and the resulting variance-covariance matrix K of the SRE model is approximately 1500 × 1500.Hence, the dimension of the problem has been reduced even further from around 6000 to 1500, which is now computationally feasible.

EM Algorithm for Parameter Estimation
To apply MSDF to actual data, we need to estimate the parameters θ ≡ {α, K, (σ (k) ξ ) 2 ; k = 1, . . ., K}.We estimate them using the EM algorithm in a like manner to the estimation found in [5].There, we define η and ξ as "missing data".Let θ [b] denote the vector of parameter values at the b-th iteration.Using the current value of the parameter vector, θ = θ [b] , Katzfuss and Cressie [23] give the following conditional expectations and covariance matrices for the missing data: where and Σ [b] ≡ SK [b] S + D [b] .The updating equations for the parameters are: where for j ≥ i is the sub-block of the square matrix A consisting of all elements of A whose row and column indices both belong to the set given by the sequence of successive integers {i, . . ., j}; and k = 1, . . ., K.

Application to CO 2 Data from ACOS and OCO-2
In this section we present a demonstration of our MSDF methodology for fusing the ACOS (GOSAT) and OCO-2 profile data of CO 2 and compare it to independent validation data.Unfortunately, ground-based CO 2 profile data are mostly limited to aircraft flights or balloon campaigns, (e.g., [26]), which tend to be sparse in both spatial and temporal coverage.Because of this difficulty, we make use of column-averaged CO 2 data from the Total Carbon Column Observing Network (TCCON) instead.These are a set of about 30 globally distributed ground-based stations that record daily observations of "total-column" (i.e., column-averaged) CO 2 .While this dataset does not provide a CO 2 profile, we can convert our profile estimates at these TCCON locations into the corresponding column-averaged CO 2 values for comparison.To assess the impact of accounting for the vertical dependence in the data-fusion algorithm, we also interpolate the already-column-averaged univariate CO 2 data from ACOS and OCO-2 to the TCCON sites using two competing methods: (1) a simple coincidence criterion (SCC); and (2) a form of kriging modified to accommodate massive datasets (Spatial Statistical Data Fusion or SSDF; [4]).In what follows, we compare column-averaged MSDF, SCC, and SSDF estimates relative to the "ground-truth" TCCON values.

Overview of ACOS, OCO-2, and TCCON Data
The Greenhouse gases Observing Satellite (GOSAT) was launched on 23 January 2009 and the OCO-2 satellite was launched on 2 July 2014.As of August 2016, both instruments are still operational.For the period of comparison, we make use of OCO-2 and ACOS (i.e., GOSAT) Level 2 data between 1 January and 31 December 2015.We use the v3.5 release of the ACOS data, which are available from the Goddard Data and Information Services Center.Following recommendations from the ACOS Data User's Guide, we apply the recommended data-screening procedure to eliminate potentially bad retrievals [27].For the OCO-2 data, we use version 7.0, which is also available from the Goddard Data and Information Services Center .The OCO-2 data have a set of data-quality flags called Warn Levels [28], and for this exercise we only use data with Warn Levels less than 15.This particular filter eliminated about 25% of the converged OCO-2 data.For more information on both datasets, see [29,30].
The Total Carbon Column Observing Network (TCCON) consists of ground-based Fourier Transform Spectrometers that record direct solar spectra in the near-infrared.These spectra are then used to retrieve column-averaged abundances of atmospheric constituents including CO 2 , CH 4 , N 2 O, HF, CO, and H 2 O, which are directly comparable with the near-infrared column-averaged measurements from space-based instruments [31].Whereas ACOS and OCO-2 retrievals are susceptible to variability resulting from contamination by optically thick clouds and aerosols that were missed by the cloud-screening process [10], TCCON makes direct observation of the solar disk and hence is less sensitive to errors from scattered light [32].
The TCCON sites sample in a diverse range of atmospheric states, which include tropical and polar, continental and maritime, polluted and clean, providing a valuable validation link between the high-coverage, medium-precision measurements from satellites (e.g., GOSAT and OCO-2) and a network of ground-based high-precision station measurements [31].TCCON's column-averaged CO 2 data are in turn validated against integrated aircraft profiles [ [33][34][35][36] and have a precision and accuracy of ∼0.8 ppm [36].We obtained TCCON data at all sites that were operational in 2015 via the TCCON Data Archive (see [37]).There were 27 of these sites, but only 14 produced data on days when both GOSAT and OCO-2 were operational in 2015.Their locations are presented in Figure 1.Fortunately, these 14 sites are fairly well distributed globally and include both continental and maritime regions as well as spanning a range of diverse geographical and ecological regimes.For our demonstration, we obtained daily global ACOS and OCO-2 Level 2 data (using both the profile and column-averaged CO 2 estimates) and fused them to produce MSDF and SSDF estimates of column-averaged CO 2 at TCCON locations that have at least one ACOS or OCO-2 retrieval within 1000km on the same day.These two estimates are compared to each other and to a third estimate, the simple coincidence criterion (SCC) estimate, at each TCCON site.The SCC estimate is defined as the unweighted average of all ACOS and OCO-2 retrievals within 1000 km of said location, which could be considered as an "ad hoc" fusion.SCC and SSDF both operate on scalar column-averaged CO 2 concentrations from ACOS and OCO-2, while MSDF operates on the 20-dimensional CO 2 profiles for ACOS and OCO-2.The MSDF profile estimate is convolved into a column-averaged CO 2 value using a linear combination pressure weighting function that is a function of the local specific humidity and the local acceleration due to gravity (see Appendix A of [10]).We did not account for temporal dependence in the observed data, treating all the data within the same day as if they were observed at a fixed time point.
These three estimates (SCC, SSDF, and MSDF) were then compared to the daily median CO 2 values from the corresponding TCCON sites.Since both the scalar (SCC, SSDF) and profile (MSDF) fusions are spatial only, our research illustrates most clearly whether the extra effort involved in doing a multivariate data fusion is worth the effort.
Both the GOSAT and OCO-2 instruments have long periods of observation punctuated with brief periods of non-operation for reasons of maintenance, thermal control, or spacecraft maneuvers.For each day in 2015, we performed data fusion and estimated column-averaged CO 2 at the TCCON sites if and only if both instruments produced observations on that day.There were 258 days in 2015 in which both instruments were operationally making observations.Typically, there are about 70,000 observations from OCO-2 per day, while GOSAT has about 2000 observations per day.Figure 2 illustrates the spatial patterns of each of the instruments on a particular day, 3 March 2015.Both maps show thin stripes of data running in a north-south direction, which are indicative of their polar, sun-synchronous orbits.Both instruments notably have sections of their observational swath missing, which is likely due to the retrieval process failing because of contamination from clouds.Note that OCO-2's and GOSAT's observational patterns are both reinforcing and complementary.In regions such as the southern oceans, their overlapping coverage can contribute to lower uncertainties in the estimates.In other places such as the high-latitude Siberian tundra, one instrument (in this case, GOSAT) is able to make observations while the other cannot.However, in the North American continent, the opposite is true.
The root mean squared error (RMSE) of column-averaged CO 2 from OCO-2 is between 1.5-3.5 ppm over land and 1.5-2.5 ppm over ocean [38].This figure includes the bias, which is typically between 1 and 2 ppm, and the variance.In a more recent study, Wunch, D. et al. [39] compared OCO-2 bias-corrected OCO-2 measurements against TCCON and found the RMSE to be typically under 1.5 ppm.For simplicity, we use a conservative constant value of 1.5 ppm for the measurement error's standard deviation.There is no corresponding validation study for the ACOS v3.5 data due to its recent production, so we assume that its measurement-error standard deviation is also 1.5 ppm.
To the best of our knowledge, there has been no attempt in the literature to quantify the biases and measurement-error variance of the individual height-specific CO 2 concentrations for either OCO-2 or ACOS.To obtain rough estimates of the height-specific measurement-error variance, {(σ we first compute the empirical variance of the CO 2 -concentration data at each height.That is, we obtain an instrument-specific vector of empirical variances v (i) ≡ (v (i,1) , . . ., v (i,K) ) , where v (i,k) is the empirical variance of the elements in Z (i,k) .Similarly, we also compute a vector of the average pressure weighting function over the entire dataset, and we call it ĥ.We then average the retrieval covariance matrix across all retrieved values to produce the mean posterior covariance matrix Q(i) .This matrix is then 'normalized' to produce QN as follows, where I Q,i is a diagonal matrix with the same diagonal elements as Q(i) .This produces a correlation matrix Q(i) N with 1's along the diagonal and inter-height correlations in the off-diagonal elements.
We then specify that the measurement error {(σ (i,k) ) 2 } is approximately equal to the diagonal of the matrix , where I v,i is a diagonal matrix with v (i) along the diagonal, and β (i) is a constant for the i-th instrument and is chosen such that β (i) ĥ I 1/2 v,i

Q(i)
N I 1/2 v,i ĥ = (1.5) 2 .For GOSAT and ACOS, we calculated this coefficient to be 0.36 and 0.33, respectively.Erring on the conservative side, we chose 0.40 as the coefficient for both instruments.Essentially, this procedure assigns about 40% of the marginal variability of CO 2 at each height as the corresponding measurement-error variance.Regarding the bias, the OCO-2 instrument team uses a linear bias correction scheme that estimates a bias in column-averaged CO 2 at each location as a function of the footprint index, the difference between the retrieved and the a priori surface pressure; the relative abundance of coarse aerosols; and the variation in the retrieved profile from that assumed in the prior [28].The coefficients of this bias-correction model are estimated by comparison to TCCON data and by using the so-called southern hemisphere approximation, which assumes that the entire region from −25 to −60 latitude has minimal and negligible variation in the signal for column-averaged CO 2 [40].ACOS v3.5 has a similar linear bias correction scheme for column-averaged CO 2 [27].These bias-correction schemes make it straightforward to remove the bias from the GOSAT and OCO-2 column-averaged CO 2 values before fusing them with SSDF.However, at present it is not clear how to translate this linear bias-correction scheme in column-averaged CO 2 into biases at each of the 20-dimensional CO 2 concentrations, as is required for applying MSDF.Since we are principally concerned with understanding the effect of "vertical" CO 2 dependence on inferences (i.e., MSDF versus SSDF), we opted not to apply bias correction on ACOS and OCO-2 data in our comparisons.We see later that we can detect this bias in the predictions, which is consistent for both MSDF and SSDF.
For the horizontal basis functions, we make use of multi-resolutional bisquare basis functions.Specifically, we use three resolutions whose centers are the centers of Level 1, 2, and 3, respectively, of an Icosahedral Snyder Equal Area Aperture 3 (ISEA3) grid, which is generated from the Discrete Global Grid software [24].The three resolutions have 32, 92, and 272 basis centers, respectively, for a total of 396 functions.These basis centers are used for both MSDF and SSDF; and for the former a set of vertical basis functions based on cubic B-splines are also used, as discussed in Section 3.3.At each height, the EM algorithm's initial values for the fine-scale and the small-scale parameters are set at 10% and 90%, respectively, of the difference between the empirical variance minus the measurement-error variance (see Supplementary Materials of [5] for more details).
Having performed the data fusions, we compare SCC, SSDF, and MSDF predictions to the TCCON daily median column-averaged CO 2 values, of which there are 1055 separate values.Table 1 displays the Mean Squared Error (MSE), Mean, and Variance for the prediction errors.Looking at the MSE, we see that SSDF's and MSDF's are substantially smaller than SCC's value of 7.84 ppm 2 , indicating that the principled framework underlying the statistical data fusions does capitalize on spatial dependence to improve the MSE.For a better understanding of the improvement, we also looked into the Mean and Variance in the bottom rows of Table 1.SCC has the worst Mean, also called bias, value of 1.49 ppm, while SSDF and MSDF have about the same average bias.This is not surprising, as we did not correct for the bias before carrying out data fusion.MSDF has a slightly larger bias than SSDF (1.36 ppm versus 1.31 ppm), but this is probably due to noise not signal.
While SSDF and MSDF do not differ very much with regard to the bias, they are different for the prediction-error variance.SSDF has a variance of 5.33 ppm 2 (standard deviation = 2.31 ppm), while MSDF comes with a slightly improved variance of 5.03 ppm 2 (standard deviation = 2.24 ppm), or about a 6% reduction in prediction-error variance.We note that MSDF's MSE value of 6.87 ppm 2 (RMSE = 2.62 ppm) is consistent with OCO-2 validation studies showing an RMSE of 1.5-3.5 ppm over land and 1.5-2.5 ppm over ocean [38].Based on variance and MSE, the general ranking of the methods from best to worst is MSDF, followed by SSDF, followed by SCC.
The metrics in Table 1 are averaged across the 14 TCCON sites, and in Figure 3 we explore the improvement in detail through boxplots of the prediction errors at the individual sites.From this figure, the distribution of the errors within any particular TCCON site tends to have the same mean for both methods, which is consistent with the results of Table 1.One interesting feature from Figure 3 is that MSDF has fewer outliers overall than SSDF, indicated on the boxplots by dots extending far past the "whiskers."This is especially noticable over Ascension Island, Lamont, Lauder, and Sodankyla.Overall, Figure 3 shows what statistical theory suggests.Data fusion of column-averaged CO 2 based on MSDF does not improve upon the bias of an estimate, but it does reduce its variability.

Discussion
The results in Section 4 show that SSDF is able to improve upon SCC by accounting for horizontal spatial dependence.This is because SCC (defined as an average of all data points falling within a certain radius) is in essence a linear estimator of the form a scc Z XCO2 , where a scc is a vector of averaging weights and Z XCO2 is a vector of all the observed total-column CO 2 .SSDF is also a linear estimator of the form a ssd f Z XCO2 , but here the weights are computed such that the expected squared error relative to the truth is minimized (for more detail, see [4]).Since SCC is a linear estimator, its prediction error should always be larger than or equal to that of the best linear unbiased estimator SSDF.Similarly, we can view SSDF as a restricted linear estimator on the much larger space of the profile data.To see this, let x(s) be the 20-dimensional CO 2 profile at location s, and let Z * = (x(s 1 ) , x(s 2 ) , . . ., x(s N ) ) be the (20N)-dimensional vector of stacked profiles.Without loss of generality, assume that we are considering a dataset from one instrument with no bias.The total column CO 2 value, χ(s), is given by where h(s) = (h 1 (s), . . ., h 20 (s)) is the pressure weighting vector of coefficients of the profile values, x(s).
In both SSDF and MSDF, we try to compute the vector of kriging coefficients that minimizes the expected squared error at a prediction location s 0 .For MSDF, this involves computing the estimates with the unbiasedness constraint 1 a k = 1, where a k = (a k,1 , a k,2 , . . ., a k,20N ) is (20N)-dimensional.
For SSDF, this involves minimizing the prediction error for the following estimate: x(s 1 ) x(s 2 ) . . .
Comparing ( 22) and ( 23), we see that MSDF has a prediction advantage because it incorporates the true pressure weighting function at the prediction location s 0 in its computation, taking advantage of the definition in (21).SSDF, on the other hand, has to 'interpolate' the effect of that pressure weighting function from nearby observations.Even if we remove the advantage from the knowledge of h(s 0 ), MSDF still has better theoretical performance.We can see this by setting the pressure weighting function to be constant.That is, we assume that h(s) = h = (h 1 , . . ., h 20 ) .Plugging h into ( 22) and ( 23) and multiplying the first two terms on the right hand side results in the following simplifications: where a m and a s are both (20N)-dimensional with the following forms: . . .
where λ(l) = 1 + ((l − 1) mod 20) and γ(l) = l/20 , where x the smallest integer greater than or equal to x. Examining (24), we see that both SSDF and MSDF can be seen as linear combinations of the elements in the (20N)-dimensional vector Z * .However, SSDF is minimizing the error over the space spanned by the N-dimensional a * , which does not account for interaction effects among the 20 profile levels, while MSDF is minimizing the expected error over a much larger space spanned by {a i : i = 1, . . ., 20}.On average, MSDF should return smaller error than SSDF.Another way to interpret this is that SSDF is accounting for horizontal spatial dependence, and MSDF improves upon SSDF through the further incorporation of vertical dependence.
Yet another way to understand the improvement in prediction-error variance is by comparing the scalar estimates to the MSDF prediction, with regard to the latter's ability to dampen more efficiently the large variability of the retrieved CO 2 at specific heights.Retrieving CO 2 concentrations at 20 different heights from photon counts is a difficult process and, in particular, the OCO-2 retrieval algorithm that yields the profile vector of CO 2 concentrations has large variability in the estimates of individual components.While the empirical measurement-error standard deviation of the column-averaged CO 2 , compared to validation data, is about 2-3 ppm, the standard deviations of the individual components, particularly those at lower heights, can be much larger.We demonstrated this by collecting all retrieved profiles from OCO-2 within 100 km of the Darwin, Australia TCCON site and compared them to the fused estimates from MSDF at the same location.In Figure 4, we give the boxplots of CO 2 concentration as a function of height for both the OCO-2 values and the MSDF predicted values.Recall that the total-column CO 2 values are obtained by applying a pressure weighting function to the 20-dimensional profiles.The OCO-2 retrieved profiles and the MSDF profiles have the same mean within each pressure level, so we can expect that the total-column CO 2 produced from these two datasets would have the same means.However, statistical theory suggests that MSDF estimates will have much smaller variability compared to the OCO-2 profile data, and hence, in principle its total-column CO 2 should have smaller variance.This is supported by the results in Table 1, indicating that our methodology is able to take advantage of the vertical dependence in addition to the horizontal dependence to dampen the inherently large "measurement error" in the retrieved profile vectors.
Another practical implication of Figure 4 is that the smoothed estimates, with their much-reduced variability, are more conducive to studies of CO 2 trends and evolution at specific heights.In this article we have motivated MSDF in terms of deriving a better total column CO 2 , and we have demonstrated this by optimally estimating the entire profile at once and then applying a linear combination to the profile to retrieve the total column CO 2 .In principle, however, the CO 2 concentrations from MSDF are potentially useful in and of themselves.Typically, CO 2 concentrations at individual heights are overlooked in carbon-cycle analysis due to their large variability and the low degrees of freedom in the vertical direction.Hence, scientific analyses predominantly focus on total column CO 2 , even though in many applications it is the height-specific behaviors that are of primary interest.For instance, the exchange of CO 2 between the atmosphere and Earth's surface is a critical part of the global carbon cycle and an important determinant of future climate [41].Of particular interest to climate scientists is the global distribution of CO 2 flux, or the net amount of CO 2 exchanged between the atmosphere and the terrestrial biomes (plants, oceans, etc.) per unit of time.This could be approximated by selecting the CO 2 concentrations corresponding to the surface from the MSDF CO 2 profiles, and then differencing with respect to time.Statistical theory suggests that this idea should work, but its feasibility and robustness will need to be explored in further research.

Conclusions
This article examines a methodology for estimating column-averaged CO 2 from two remote sensing instruments-GOSAT and OCO-2.Both instruments produce multivariate profile CO 2 values and a column-averaged CO 2 value that is a linear combination of the profile with coefficients given by a deterministic pressure weighting function.Derivative uses of the CO 2 data such as Level 3 map generation or colocation for instrument validation tend to make use of the column-averaged CO 2 field.We show here that there is information in the profile that is potentially lost when the individual profiles are first convolved with the pressure weighting function.In the Application section, we show that performing multivariate data fusion on the profile first, and then convolving the fused profile prediction into the univariate column-averaged CO 2 value afterwards, has smaller mean squared error.We note that this result is in line with the conventional wisdom that OCO-2 has between 1 and 2 degrees of freedom along the vertical column.The process of convolving the column vector into a total-column scalar value is a lossy process, and the improvement we observed in the Application section is due to our multivariate methodology being better able to capitalize on the extra degrees of freedom along the column in the spatial interpolation process.
The datasets in this application are fairly large, totaling about 70,000 observations per day.This is not a problem for MSDF, which has computational complexity that is linear with respect to the data size.Furthermore, MSDF does not assume isotropy or stationarity in the data, which makes it appropriate for a wide range of applications.Note that having large amounts of data yields robust estimates of the spatial-dependence parameters through the EM algorithm.Hence, the methodology is well suited for data-rich applications, where the dependence on massive amounts of data is a strength rather than a weakness.This is particularly relevant to remote sensing, where a typical instrument returns thousands to hundreds of thousand of observations per day, and future instruments will be able to return millions of observations per day.
One direction for further research is incorporating temporal dependence into the model.We could allow the parameter η to evolve in discrete time steps according to a first-order autoregressive process as demonstrated in [5,20].In principle, that approach can be applied to the MSDF algorithm, but we anticipate that the main challenge would be the need to estimate the significantly larger propagation and innovation matrices.Further research is needed to balance the computational needs of a spatio-temporal MSDF against robustness of the data fusion.

Figure 1 .
Figure 1.Locations of the 14 TCCON sites that were used to compare to column-averaged, data-fused CO 2 values.

Figure 4 .
Figure 4. Boxplots of CO 2 concentrations for MSDF predictions and colocated OCO-2 retrieved profiles at Darwin.Heights on the horizontal axis are the index k from height h k ; k = 1, . . ., 20.