1. Introduction
The concentration of carbon dioxide in the atmosphere, accounting for approximately 0.04% of the total atmospheric composition [
1], poses a significant threat as a greenhouse gas, ranking second only to water vapor. The concentration of carbon dioxide is 50% higher than that in the pre-industrial era, trapping heat in the atmosphere. Because carbon dioxide has a long lifetime in the atmosphere, temperatures will continue to increase in the coming years [
2]. Since the reform and opening-up, China’s economic development has accelerated, and the country has now become the world’s second-largest economy, a leader in green economy technology, and has expanding global influence. In 2020, based on the inherent requirements of promoting sustainable development and the responsibility of building a community with a shared future for mankind, China announced the goal of carbon peaking and carbon neutrality [
3]. To achieve the goal of “double carbon”, it is necessary to accurately monitor and evaluate carbon sources and sinks.
Currently, carbon dioxide data are obtained from ground-based, space-based observations and model simulations. It is difficult to describe the spatial distribution characteristics of regional carbon dioxide concentrations due to the scarcity of global ground-based sites. The reliability of the model simulation concentration is uncertain. Satellite observations have become one of the most effective means of monitoring global greenhouse gas emissions with high spatial and temporal resolution [
4], and have improved human insight into the global distribution of carbon dioxide [
5]. However, the spatial resolution of satellite products is relatively low, with spatial resolutions of 30 km, 10 km, 7 km, 2.5 km, 1.5 km, and 2.5 km for SCIAMACHY (Scanning Imaging Absorption Spectrometer for Atmospheric CHartographY, European Space Agency (ESA), Paris, France), GOSAT (Greenhouse Gases Observing Satellite), GOSAT-2, GOSAT-GW (Greenhouse Gases and Water Cycle Observation satellite), OCO-2, and TanSAT (Global Carbon Dioxide Monitoring Science Experiment Satellite), respectively [
6]. The number of valid pixels is relatively low. In our study area, the proportion of the monthly average valid pixels of OCO-2 in 2018 was only 0.49%. The different satellite
products lack physical consistency owing to the differences in inversion algorithm. For example, the Differential Optical Absorption Spectroscopy (DOAS) algorithm, the Weighting Function Modified DOAS (WFM-DOAS) algorithm, and the Band-Enhanced Sensitivity Differential (BESD) algorithm used by SCIAMACHY, the ACOS (Atmospheric CO2 Observations from Space) XCO
2 inversion algorithm used by TANSAT, and the combination of high-resolution spectroscopic measurements and advanced inversion techniques used by OCO-2.
Many studies have reconstructed and simulated regional or global atmospheric
concentration by simulating the relationship between atmospheric
concentration observed by satellites and environmental variables. To solve the problem of missing values in single-source satellite
products, the current main methods are the kriging spatiotemporal interpolation method [
7,
8,
9], regression method [
10], artificial neural network method [
11], and high-precision surface modeling method [
12]. Both kriging spatiotemporal interpolation and high-precision surface modeling methods are susceptible to the influence of the number and distribution of effective observations. The more effective measurement quantities there are and the more evenly the distribution is, the better the simulation effect is. In areas with sparse effective observations, the interpolation uncertainty was large. The interpolated data have strong continuity, but the data are smooth and lack local details. The current method for the fusion of multi-source
data is relatively simple, for example, the multi-source data averaging method [
13,
14]. When multi-source data are averaged based on a certain spatial resolution, the global spatial coverage of the fused dataset can be improved by 20% and the temporal resolution can be improved by two to three times. For example, several months of GOSAT and SCIAMACHY data were integrated, but there were still missing pixels and spatial discontinuity [
13,
14]. In the future, more ground-based observations are needed to integrate multi-source satellite-derived data, especially data from China’s Carbon Satellite (TANSAT) [
15], to study the fusion methods and theories for multi-source products.
The spatial distribution of
was found to be related to the vegetation index, net primary productivity, leaf area index, atmospheric temperature, and wind [
5,
10,
11,
16]. Research results have shown that land surface parameters can be used to simulate atmospheric
concentrations [
5]. Environmental variables, especially spatial pollution data, are closely related to human health. By linking them through advanced modeling, the motivation for improved
reconstruction is reinforced [
17].
In the past 100 years, changes in vegetation and concentration have led to a trend of aridity in some areas of East Asia, especially in the Huai River Basin, Shandong Peninsula, and Yunnan. Therefore, to resolve the problem of missing values in single-source satellite products, in this study, a multiple environmental variable regression analysis method was used to reconstruct spatially continuous atmospheric concentration data in the provinces of Huai River Basin, China in 2018. First, several environmental variables, such as vegetation coverage, relative humidity, evapotranspiration, temperature, and wind, were selected. The correlation between the OCO-2 concentration and each environmental variable was evaluated. Variables with high correlation were selected, and collinearity analysis was conducted to determine the regression variables. The OCO-2 concentration and environmental variables were resampled to a fine resolution, a regression model was constructed, and the spatially complete concentration in the study area was obtained.
This study aimed to construct a regression model based on multiple environmental variables and obtain a spatially continuous monthly carbon dioxide concentration dataset for the study area in 2018. The remainder of this paper is organized as follows. The details of the datasets used in this study and the procedures for datasets preprocessing are provided in Section II. In this section, we also describe the regression model based on multiple environmental variables that was developed in this study. Section III demonstrates the implementation of the methods developed for the OCO-2 products. The final section summarizes the conclusions of our study.
4. Conclusions
In this study, we constructed a multiple regression model based on the OCO-2 concentration products and multiple environmental variables, namely LAI, FVC, NPP, T, U, ET, and RH, to reconstruct the spatially complete monthly concentration data with fine resolution in the provinces in Huai River Basin in 2018. Among the multiple environmental variables, LAI, FVC, T, U, ET, NPP, and RH were strongly correlated with the concentration, whereas V was weakly correlated. These enhanced data better reflect the spatiotemporal patterns of in the study region. The proportion of valid pixels increased from less than 1% to over 90%, achieving full coverage of valid pixels in the study region. The reconstructed concentration data of the Huai River Basin have better spatial completeness. In addition to the advantage of spatial coverage, it can be seen from the comparison results of local variances that the local variance in the reconstructed dataset is significantly higher than that of the original data, indicating that the reconstructed dataset has more local detailed information.
The reconstructed atmospheric concentration showed an increasing trend from summer (June, July, and August) to winter (November, December, and January). Based on the proportion of valid pixels, local variance, and reference data, the spatial integrity, accuracy, and spatial structure of the reconstructed dataset were evaluated. The results show the value of using satellite-driven observations by extending discrete satellite observation of atmospheric concentrations to spatiotemporally continuous datasets. The regional atmospheric concentration map produced in this study can serve as a baseline map for studying regional climate change and the carbon cycle in terrestrial ecosystems. Future work will focus on using machine learning, artificial intelligence, and multisource remote sensing data to estimate a spatiotemporal complete concentration dataset with a high spatiotemporal resolution.