An Assessment of Anthropogenic CO2 Emissions by Satellite-Based Observations in China

Carbon dioxide (CO2) is the most important anthropogenic greenhouse gas and its concentration in atmosphere has been increasing rapidly due to the increase of anthropogenic CO2 emissions. Quantifying anthropogenic CO2 emissions is essential to evaluate the measures for mitigating climate change. Satellite-based measurements of greenhouse gases greatly advance the way of monitoring atmospheric CO2 concentration. In this study, we propose an approach for estimating anthropogenic CO2 emissions by an artificial neural network using column-average dry air mole fraction of CO2 (XCO2) derived from observations of Greenhouse gases Observing SATellite (GOSAT) in China. First, we use annual XCO2 anomalies (dXCO2) derived from XCO2 and anthropogenic emission data during 2010–2014 as the training dataset to build a General Regression Neural Network (GRNN) model. Second, applying the built model to annual dXCO2 in 2015, we estimate the corresponding emission and verify them using ODIAC emission. As a results, the estimated emissions significantly demonstrate positive correlation with that of ODIAC CO2 emissions especially in the areas with high anthropogenic CO2 emissions. Our results indicate that XCO2 data from satellite observations can be applied in estimating anthropogenic CO2 emissions at regional scale by the machine learning. This developed method can estimate carbon emission inventory in a data-driven way. In particular, it is expected that the estimation accuracy can be further improved when combined with other data sources, related CO2 uptake and emissions, from satellite observations.


Introduction
Atmospheric carbon dioxide (CO 2 ) is the most significant anthropogenic greenhouse gas (GHG) and its concentration in atmosphere has been increasing from 280 ppm since the preindustrial era to a level higher than 400 ppm at present at a global scale [1]. The enhancement of atmospheric CO 2 has been known as one of the factors inducing global warming and playing an important role in climate change. Anthropogenic CO 2 emissions, 70% of which come from fossil fuel combustion and industrial activities [2], are the main driver of the atmospheric CO 2 concentration increase. If atmospheric CO 2 concentration continues to increase at the current rate, 1.5 • C of global warming will be reached between 2030 and 2052, which will cause more climate extremes [3]. Atmospheric CO 2 concentration, moreover, will be continually increasing as the rapid development of industrialization requires enormous energy around the world. In order to slow down the increase of atmospheric CO 2 concentration, many countries are making efforts for CO 2 emissions reduction. For that we need an efficient and reliable way to monitor CO 2 emissions in order to evaluate the effectiveness of CO 2 emissions reduction policy.
Over the past 20 years, satellite-based measurements of greenhouse gases have been facilitating the way monitoring atmospheric constituents with the great advancement of satellite observing technology in the development of highly accurate sensors. It is also becoming the major data source to detect the change of atmospheric CO 2 concentration at regional and global scales [4][5][6][7]. Compared with ground-based observation, CO 2 concentration retrieved by satellite has global coverage and consistent observation characteristics, which can better reveal the spatio-temporal variation of atmospheric CO 2 concentration. Currently, the GHG observing satellites in orbit include the Greenhouse gases Observing SATellite (GOSAT) from Japan, Orbiting Carbon Observatory 2 (OCO-2) from the USA and TanSat from China, which can provide us the column-averaged dry air mole fraction of CO 2 (XCO 2 ) dataset since 2009 [8][9][10].
Many previous studies indicated that XCO 2 retrieved from satellite observations can detect changes of CO 2 concentration induced by anthropogenic emissions [11][12][13]. The anthropogenic emission is expected to induce an increase of about 4 ppm of XCO 2 around power plants [11]. With multi-year XCO 2 dataset available from GOSAT and OCO-2, anthropogenic CO 2 emissions have been quantified by excluding the background concentration. It was reported that megacities such as Los Angeles and Beijing, and high density urban regions such as eastern USA and the Beijing-Tianjin-Hebei area in northern China have about 2 ppm enhancements [14][15][16]. These studies mainly obtained regional CO 2 enhancements in contrast to the background using empirical conversion factors. It has been shown that the XCO 2 has a positive correlation with the anthropogenic CO 2 emissions through correlating OCO-2 observations with emission inventories [17]. The correlation implies that satellite-based observations are capable to quantitatively assess the anthropogenic CO 2 emissions through detection of XCO 2 enhancements. Estimation of anthropogenic emissions from satellite-based observation can support the investigation of carbon emissions as a data-driven method, which is different to the conventional method in calculating emission inventory. Satellite observations can detect the CO 2 changes in specific regions such as strong sources of anthropogenic emissions, e.g., megacities and high density urban area, so as to monitor CO 2 emissions effectively. These studies, however, mostly focus on investigating enhancement of CO 2 induced by anthropogenic emissions through regional contrast. It is still a challenge in using XCO 2 data to quantitatively estimate the magnitude of anthropogenic CO 2 emissions. This data-driven approach, as an additional way of quantifying anthropogenic CO 2 emissions, can help policymakers to obtain more information for evaluating the effects for CO 2 emissions reduction at both regional and global scales.
In this paper, we propose a method of using satellite-based observation to assess the anthropogenic CO 2 emissions aiming to assist the national routine investigation of carbon emissions. We focus on mainland China as the studying area since it is a major national emitter of CO 2 [18]. We extracted XCO 2 anomalies (dXCO 2 ) using XCO 2 dataset obtained from GOSAT observations. The anomalies are found to be significantly correlated with anthropogenic CO 2 emissions from the CO 2 emitting sources such as power plant emission. We further introduce artificial neural network algorithm (ANN) to construct an estimation model for anthropogenic CO 2 emissions based on the changes of atmospheric CO 2 concentration derived from satellite observations.  [19]. In order to ensure high reliability of the data, only those data over land with high gain are used after screening and correction of systematic bias as described in the ACOS Level 2 Standard Product and Lite Data Product Data User's Guide, v7.3 [20]. ACOS XCO 2 retrievals have a standard deviation of error of 1.48 ppm when comparing with ground-based measures of Total Carbon Column Observing Network (TCCON) [21]. Figure 1 shows the total counts of XCO 2 data points during 6 years from January 2010 to December 2015, and their temporal variation in the study area. An annual increase of XCO 2 and the seasonal variation can be clearly seen.

Data and
the ACOS Level 2 Standard Product and Lite Data Product Data User's Guide, v7.3 [20]. ACOS XCO2 retrievals have a standard deviation of error of 1.48 ppm when comparing with ground-based measures of Total Carbon Column Observing Network (TCCON) [21]. Figure 1 shows the total counts of XCO2 data points during 6 years from January 2010 to December 2015, and their temporal variation in the study area. An annual increase of XCO2 and the seasonal variation can be clearly seen.
However, XCO2 data are irregularly distributed and have many gaps in space and time as shown in Figure 1a because of the limitation of GOSAT observation mode, cloudy and data screening. To investigate the space-time changes of XCO2, we generate a mapping XCO2 dataset in which those gaps are filled using the kriging interpolation method based on the spatio-temporal geo-statistics model [22][23][24]. The mapping XCO2 dataset is generated mainly in Chinese mainland area from 18° N to 57° N and from 65° E to 148° E with 0.5° × 0.5° grid cells and 3-day interval in time from 1 January 2010 to 31 December 2015. In order to match with collected ODIAC emission dataset in 1° × 1°, we resampled the spatial resolution of data we used in this paper to 1° × 1°. This mapping dataset is hereafter referred to as Mapping-XCO2.

Anthropogenic Emission Data
We collected two datasets of the bottom-up anthropogenic CO2 emissions. One is the Opensource Data Inventory for Anthropogenic Carbon dioxide (ODIAC) for same years as the used XCO2 dataset in this study. The other is the CARbon Monitoring for Action (CARMA) power plant database in 2009. The specifications of these data are described in Table 1.
The ODIAC emissions data product is a global 1° × 1° gridded monthly fossil fuel CO2 emission inventory, developed based on country level fossil fuel CO2 emission estimates, fuel consumption statistics, satellite-observed nightlight data, and point source information (geographical locations and emission intensities) from the CARbon Monitoring for Action (CARMA) power plant database (ODIAC2015a, available at http://db.cger.nies.go.jp/dataset/ODIAC/). The global nightlight data were used as a geo-referenced, spatial proxy to determine the spatial extent of anthropogenic However, XCO 2 data are irregularly distributed and have many gaps in space and time as shown in Figure 1a because of the limitation of GOSAT observation mode, cloudy and data screening. To investigate the space-time changes of XCO 2 , we generate a mapping XCO 2 dataset in which those gaps are filled using the kriging interpolation method based on the spatio-temporal geo-statistics model [22][23][24]. The mapping XCO 2 dataset is generated mainly in Chinese mainland area from 18 • N to 57 • N and from 65 • E to 148 • E with 0.5 • × 0.5 • grid cells and 3-day interval in time from 1 January 2010 to 31 December 2015. In order to match with collected ODIAC emission dataset in 1 • × 1 • , we resampled the spatial resolution of data we used in this paper to 1 • × 1 • . This mapping dataset is hereafter referred to as Mapping-XCO 2 .

Anthropogenic Emission Data
We collected two datasets of the bottom-up anthropogenic CO 2 emissions. One is the Open-source Data Inventory for Anthropogenic Carbon dioxide (ODIAC) for same years as the used XCO 2 dataset in this study. The other is the CARbon Monitoring for Action (CARMA) power plant database in 2009. The specifications of these data are described in Table 1. The environmental protection agency and department of energy International atomic energy agency

Producer
Center for global environment research, national institute for environment studies Center for Global Development Oda, T et al. [25] Wheeler, D et al. [26] The ODIAC emissions data product is a global 1 • × 1 • gridded monthly fossil fuel CO 2 emission inventory, developed based on country level fossil fuel CO 2 emission estimates, fuel consumption statistics, satellite-observed nightlight data, and point source information (geographical locations and emission intensities) from the CARbon Monitoring for Action (CARMA) power plant database (ODIAC2015a, available at http://db.cger.nies.go.jp/dataset/ODIAC/). The global nightlight data were used as a geo-referenced, spatial proxy to determine the spatial extent of anthropogenic emissions from line and diffused (area) sources (e.g., road traffic, residential or commercial fuel consumption) [25]. The ODIAC gridded emissions fields defined on a global rectangular (latitude ×longitude) coordinate are remapped to meet the grids resolutions for each simulation domain.
Additionally, the CO2 emissions from power plant, which is one of the dominant CO 2 emitting sources, are collected in the study area from the database of Carbon Monitoring for Action (CARMA, available at http://carma.org/plant). At the same time, we unify the units of the two sets of emission data to ton, and take the logarithm of two emission data base on 10 (refer to as lgE) to facilitate the calculation.

Methodology
The method for estimating anthropogenic CO 2 emission include three major steps as shown in Figure 2.
Firstly, we enhance the signals of CO 2 from anthropogenic emission in XCO 2 which is described in Section 2.3.1. Secondly, we apply the training datasets of XCO 2 and ODIAC in 2010-2014 to GRNN to get the estimating model of anthropogenic emission which is described in Section 2.3.2 in detail. Thirdly, anthropogenic emissions are estimated by GRNN model using XCO 2 in 2015, and validated by comparing with ODIAC data in 2015.

Variable of XCO 2 Used for Estimation of Anthropogenic Emission
The magnitude of XCO 2 include CO 2 emitted by anthropogenic activities, the fluxes of terrestrial biosphere, fluxes transported by atmospheric wind fields [27,28] and CO 2 of regional background. We introduce therefrom an interannual variability by removing the regional background signal and calculating their annual mean to enhance the signals of CO 2 from anthropogenic emission as following equation proposed by Hakkarainen et al. [17]: where dXCO 2 (grid,t) indicates the deviation from regional background for each grid at a specific time unit t where t is the 3-day unit of used mapping-XCO 2 data; XCO 2 (grid,t) is XCO 2 for each grid at time t from mapping-XCO 2 data; MXCO 2 (t) is median of XCO 2 for all girds in the study region at time t calculated from mapping-XCO 2 data with 0.5 • × 0.5 • grid cell. Lastly we apply the annual mean of dXCO 2 (grid,t) for the year from 2010 to 2015 in the estimation of anthropogenic emission. This annual mean of dXCO 2 (grid,t) could detrend the seasonal variation at locale and simultaneously reduces the effect of the atmospheric transport [17].
We computed the monthly averaged dXCO 2 and annual averaged dXCO 2 for each grid to generate monthly averaged dXCO 2 dataset and annual averaged dXCO 2 dataset from the year 2010 to 2015 with 1 • × 1 • grids using the mapping XCO 2 dataset from 2010 to 2015. The annual dXCO 2 dataset and ODIAC data will be used in the following analysis.

Methodology
The method for estimating anthropogenic CO2 emission include three major steps as shown in Figure 2.

Estimation of Anthropogenic CO 2 Emission by Neural Network Development
Because XCO 2 variations are forced by anthropogenic emissions, exchange between the atmosphere and the ocean and the terrestrial biosphere [27,28], there are both non-linear and linear mapping between XCO 2 and emissions. Here we adopt a General Regression Neural Network (GRNN) algorithm [29] to represent non-linear mapping between the independent variables (dXCO 2 in this study) and dependent variable (CO 2 emissions in this study). GRNN directly draws the function estimate approximating any arbitrary function between the input and output vectors of variables. The GRNN converges to the optimal regression result when the training samples increases in number, meanwhile, the error of estimation is closed to 0. There are four layers in the GRNN model we used, an input layer, a hidden layer, a summation layer, and a decision layer (Figure 3; [30,31]). In the input layer, each neuron corresponds to an independent variable which is defined as a mathematical function, the independent variable values will be standardized. Then the standardized independent variable values were transferred to the neurons in the hidden layer. In this layer, each neurons stores the values of the independent variables and the dependent variable, and a scalar function will be calculated. There are two neurons in the summation layer, the denominator summation unit sums the weight values coming from the hidden neurons, and the numerator summation unit sums the weight values multiplied by the actual target dependent variable value for each hidden neuron. At last, dividing the value accumulated in the numerator summation unit by the value in the denominator summation unit in the decision layer, we uses the division result as the predicted target dependent variable value [32]. Sensors 2019, 19, x FOR PEER REVIEW 6 of 12 where denotes the dimension of variable vector , is the spread parameter, whose optimal value is determined by minimizing the root mean square error (RMSE) between the training data and the predicted values of the dependent variable.
The weight of the denominator neuron is set to 1.0. The GRNN training algorithm uses only one adjustable parameter for a given training set. Here we use "the holdout method" [29] to optimize the value, and detailed introduction can refer to the article [29]. The predicted target dependent variable, the ODIAC CO2 emissions, is defined by the following Equation (3): where the values calculated with the scalar function in a hidden neuron are weighted with the corresponding values of the training samples , and then passed to the numerator neuron. is the number of training samples.

Estimated Anthropogenic Emissions by GRNN
We use the annual dXCO2 dataset and ODIAC CO2 emissions data from the year 2010 to 2014 as the training dataset, which have the total of 5415 samples available, to build a GRNN model for estimating anthropogenic emission. By applying "the holdout method" described in Section 2.4, we obtain the optimized spread parameter as 0.1. Then we apply GRNN model to the annual dXCO2 data in 2015 to predict target dependent variable, anthropogenic emission with the same unit as the ODIAC CO2 emissions.
The CO2 emissions estimated using the annual dXCO2 and the actual ODIAC CO2 emission in 2015 are shown in Figure 4. Comparing Figure 4a with Figure 4b, we can see that the spatially changing pattern of estimated emission by satellite-based observation is exactly similar as that of the actual magnitude of ODIAC. Moreover, the estimated emission presents a more smoothing spatial details than the actual emission, which is mainly because the Kriging procedure smooths the CO2 signals from point sources of strong anthropogenic emission, and 10 km spatial resolution of each GOSAT footprint observations also smooths the signals. The magnitude of estimated emission is According to the calculation steps of developing a neural network, we need to standardize all the independent and dependent training variables, so that in the input layer all training data will have the same order of magnitudes.
where p denotes the dimension of variable vector x i , σ is the spread parameter, whose optimal value is determined by minimizing the root mean square error (RMSE) between the training data and the predicted values of the dependent variable.
The weight of the denominator neuron is set to 1.0. The GRNN training algorithm uses only one adjustable parameter σ for a given training set. Here we use "the holdout method" [29] to optimize the σ value, and detailed introduction can refer to the article [29]. The predicted target dependent variable, the ODIAC CO 2 emissions, is defined by the following Equation (3): where the values calculated with the scalar function in a hidden neuron i are weighted with the corresponding values of the training samples y i , and then passed to the numerator neuron. n is the number of training samples.

Estimated Anthropogenic Emissions by GRNN
We use the annual dXCO 2 dataset and ODIAC CO 2 emissions data from the year 2010 to 2014 as the training dataset, which have the total of 5415 samples available, to build a GRNN model for estimating anthropogenic emission. By applying "the holdout method" described in Section 2.3, we obtain the optimized spread parameter σ as 0.1. Then we apply GRNN model to the annual dXCO 2 data in 2015 to predict target dependent variable, anthropogenic emission with the same unit as the ODIAC CO 2 emissions.
The CO 2 emissions estimated using the annual dXCO 2 and the actual ODIAC CO 2 emission in 2015 are shown in Figure 4. Comparing Figure 4a with Figure 4b, we can see that the spatially changing pattern of estimated emission by satellite-based observation is exactly similar as that of the actual magnitude of ODIAC. Moreover, the estimated emission presents a more smoothing spatial details than the actual emission, which is mainly because the Kriging procedure smooths the CO 2 signals Sensors 2019, 19, 1118 7 of 12 from point sources of strong anthropogenic emission, and 10 km spatial resolution of each GOSAT footprint observations also smooths the signals. The magnitude of estimated emission is generally less than that of ODIAC. Figure 5 presents the differences between them and the corresponding histogram. generally less than that of ODIAC. Figure 5 presents the differences between them and the corresponding histogram.
(a) (b) It can be seen from Figure 5a that the difference between the estimated CO2 emission and ODIAC emission mainly change from −5 Mt to 5 Mt, which accounts for 91% of the total grids. The magnitude of difference from −1 Mt to 1 Mt accounts for 71% of the total grids. The low magnitude of ODIAC emissions in the range of 1-10 4 t/year shown in Figure 4b are generally underestimated by satellitebased observations (shown in yellow in Figure 5a). These are mostly located in semi-arid grasslands, forests in the northern areas as shown in land use map of Figure 5c. This underestimation implies that the emission estimated by dXCO2 has high uncertainty in the areas of low anthropogenic emission that is likely due to the CO2 uptake of biosphere which is still remaining in dXCO2. The estimated emission, moreover, is much lower than ODIAC emission over the areas around big cities, such as Beijing, Shanghai, Guangzhou. This underestimation indicates that the smoothing effects of the estimated emission, which is likely because the spatial resolution of GOSAT observations (10 km) is not sufficient to detect the emission of point sources. On the other hand, the estimated emissions are generally larger than ODIAC emission in the south-eastern region of China where there are many anthropogenic emitting sources which can be seen in Figure 8. The generally less than that of ODIAC. Figure 5 presents the differences between them and the corresponding histogram.  It can be seen from Figure 5a that the difference between the estimated CO2 emission and ODIAC emission mainly change from −5 Mt to 5 Mt, which accounts for 91% of the total grids. The magnitude of difference from −1 Mt to 1 Mt accounts for 71% of the total grids. The low magnitude of ODIAC emissions in the range of 1-10 4 t/year shown in Figure 4b are generally underestimated by satellitebased observations (shown in yellow in Figure 5a). These are mostly located in semi-arid grasslands, forests in the northern areas as shown in land use map of Figure 5c. This underestimation implies that the emission estimated by dXCO2 has high uncertainty in the areas of low anthropogenic emission that is likely due to the CO2 uptake of biosphere which is still remaining in dXCO2. The estimated emission, moreover, is much lower than ODIAC emission over the areas around big cities, such as Beijing, Shanghai, Guangzhou. This underestimation indicates that the smoothing effects of the estimated emission, which is likely because the spatial resolution of GOSAT observations (10 km) is not sufficient to detect the emission of point sources. On the other hand, the estimated emissions are generally larger than ODIAC emission in the south-eastern region of China where there are many anthropogenic emitting sources which can be seen in Figure 8. The It can be seen from Figure 5a that the difference between the estimated CO 2 emission and ODIAC emission mainly change from −5 Mt to 5 Mt, which accounts for 91% of the total grids. The magnitude of difference from −1 Mt to 1 Mt accounts for 71% of the total grids. The low magnitude of ODIAC emissions in the range of 1-10 4 t/year shown in Figure 4b are generally underestimated by satellite-based observations (shown in yellow in Figure 5a). These are mostly located in semi-arid grasslands, forests in the northern areas as shown in land use map of Figure 5c. This underestimation implies that the emission estimated by dXCO 2 has high uncertainty in the areas of low anthropogenic emission that is likely due to the CO 2 uptake of biosphere which is still remaining in dXCO 2 . The estimated emission, moreover, is much lower than ODIAC emission over the areas around big cities, such as Beijing, Shanghai, Guangzhou. This underestimation indicates that the smoothing effects of the estimated emission, which is likely because the spatial resolution of GOSAT observations (10 km) is not sufficient to detect the emission of point sources. On the other hand, the estimated emissions are generally larger than ODIAC emission in the south-eastern region of China where there are many anthropogenic emitting sources which can be seen in Figure 8. The general overestimation in this region is likely because the large emitting sources around raise the concentration of CO 2 over those non-emitting areas nearby them through the atmospheric transport.
Lastly, comparing the satellite-based estimation of CO 2 emissions with ODIAC emission for all grids as shown in Figure 6, we find they show a significant correlation (R 2 ) of 0.65 with p value less than 0.01. general overestimation in this region is likely because the large emitting sources around raise the concentration of CO2 over those non-emitting areas nearby them through the atmospheric transport.
Lastly, comparing the satellite-based estimation of CO2 emissions with ODIAC emission for all grids as shown in Figure 6, we find they show a significant correlation (R 2 ) of 0.65 with p value less than 0.01.

Discussion of Correlation between Retrieved XCO2 and Anthropogenic Emissons
It has been indicated that the cluster of XCO2 changes derived from GOSAT observations shows a correlating coefficient of 0.5 with anthropogenic emission. This correlation is more significant than a single grid of XCO2 as the atmospheric CO2 measurement is an instantaneous snapshot of the realistic atmosphere [33]. Its clustering analysis is derived from original XCO2 data. We segment the ODIAC emissions which are binned according to every 0.3 t/yr of lgE ( Figure  7a) using mean emission calculated from annual emission during 2010-2015, and then make correlation analysis between the mean of emission and mean of dXCO2 within binned regions. It is found that the segmental mean of dXCO2 demonstrate a significant and positive correlation with ODIAC emissions in which the determined coefficient (R 2 ) for all data is up to 0.82 (Figure 7b) and the dXCO2 demonstrate strong positive linear correlation with emission starting from 10 4 t/yr where R 2 is up to 0.95 (red line in Figure 7b). The dXCO2 is almost unchanged in the region with emission

Discussion of Correlation between Retrieved X CO2 and Anthropogenic Emissons
It has been indicated that the cluster of XCO 2 changes derived from GOSAT observations shows a correlating coefficient of 0.5 with anthropogenic emission. This correlation is more significant than a single grid of XCO 2 as the atmospheric CO 2 measurement is an instantaneous snapshot of the realistic atmosphere [33]. Its clustering analysis is derived from original XCO 2 data.
We segment the ODIAC emissions which are binned according to every 0.3 t/yr of lgE ( Figure 7a) using mean emission calculated from annual emission during 2010-2015, and then make correlation analysis between the mean of emission and mean of dXCO 2 within binned regions. It is found that the segmental mean of dXCO 2 demonstrate a significant and positive correlation with ODIAC emissions in which the determined coefficient (R 2 ) for all data is up to 0.82 ( Figure 7b) and the dXCO 2 demonstrate strong positive linear correlation with emission starting from 10 4 t/yr where R 2 is up to 0.95 (red line in Figure 7b). The dXCO 2 is almost unchanged in the region with emission lower than 10 4 t/yr. These results imply that satellite observations of atmospheric CO 2 could be used to estimate regional anthropogenic emissions for those regions with larger magnitude of anthropogenic CO 2 emissions. Additionally, we overlay the CARMA power plants dataset on the mean dXCO 2 from the annual dXCO 2 during 2010 to 2015 (Figure 8a). It can be seen that the high dXCO 2 are corresponding to high-density power plants, especially in northeast China. We accumulate the magnitude of emissions of power plants within one grid of mapping XCO 2 dataset, then we segment emissions of power plants which are binned according to every 0.3 t of lgE, and take correlation analysis between the mean of power plants emission and the mean of dXCO 2 within binned regions (Figure 8a). dXCO 2 demonstrate strong positive linear correlation with power plants emission starting from 10 6 t (blue dots). The grids they represent are distributed consistently with high dXCO 2 area. The result demonstrates a R 2 of 0.59 which is less than regional statistics. Power plants emission lower than 10 5.5 t demonstrate weak linear correlation with dXCO 2 because the influence of CO 2 uptake of biosphere.

Discussion of Correlation between Retrieved XCO2 and Anthropogenic Emissons
It has been indicated that the cluster of XCO2 changes derived from GOSAT observations shows a correlating coefficient of 0.5 with anthropogenic emission. This correlation is more significant than a single grid of XCO2 as the atmospheric CO2 measurement is an instantaneous snapshot of the realistic atmosphere [33]. Its clustering analysis is derived from original XCO2 data.   lower than 10 4 t/yr. These results imply that satellite observations of atmospheric CO2 could be used to estimate regional anthropogenic emissions for those regions with larger magnitude of anthropogenic CO2 emissions. Additionally, we overlay the CARMA power plants dataset on the mean dXCO2 from the annual dXCO2 during 2010 to 2015 (Figure 8a). It can be seen that the high dXCO2 are corresponding to high-density power plants, especially in northeast China. We accumulate the magnitude of emissions of power plants within one grid of mapping XCO2 dataset, then we segment emissions of power plants which are binned according to every 0.3 t of lgE, and take correlation analysis between the mean of power plants emission and the mean of dXCO2 within binned regions (Figure 8a). dXCO2 demonstrate strong positive linear correlation with power plants emission starting from 10 6 t (blue dots). The grids they represent are distributed consistently with high dXCO2 area. The result demonstrates a R 2 of 0.59 which is less than regional statistics. Power plants emission lower than 10 5.5 t demonstrate weak linear correlation with dXCO2 because the influence of CO2 uptake of biosphere. From Figure 8a, it can be found that the dXCO2 in western area, the desert area of Xinjiang, shows high values even if there are much less anthropogenic emission over this area as shown in Figure 4b. This is likely resulted in the uncertainty of ACOS XCO2 retrievals in desert which has been indicated by Bie et al. [34].

Conclusions
In this paper, to support the verification of bottom-up inventory of anthropogenic emission, an anthropogenic CO2 emission estimation method using a machine learning technique is applied to the gap-filled ACOS XCO2 dataset over the mainland of China derived from GOSAT observations. The annual emission signatures, indicated by dXCO2, is enhanced by removing the background XCO2 from the 2010 to 2015 XCO2 data. We then apply the annual averaged dXCO2 from 2010 to 2014 to build an estimating model of anthropogenic emission using an artificial network approach. The model is verified by estimating results in 2015 and comparing with the ODIAC emissions. Lastly, we quantify the correlation between the annual dXCO2 and the magnitude of anthropogenic emission. Our result indicate that the anthropogenic emission can be estimated at regional scale by the changing magnitude of XCO2 especially for those regions with larger emissions. However, it has relatively higher uncertainty to grasp the CO2 signals of the low or without anthropogenic emission areas and point emitting sources. The CO2 uptake of biosphere and fluxes of wind field affect the estimation when using the annual dXCO2. The observation mode of GOSAT satellite in space and time and fast mixing of atmospheric CO2 also affect the detection of point emitting sources.
Our study demonstrates that the XCO2 derived from satellite observation can effectively provide a way to reveal the spatial patterns of underlying anthropogenic emissions. It is expected that the estimation of anthropogenic emission could be greatly improved by using more and more XCO2 data From Figure 8a, it can be found that the dXCO 2 in western area, the desert area of Xinjiang, shows high values even if there are much less anthropogenic emission over this area as shown in Figure 4b. This is likely resulted in the uncertainty of ACOS XCO 2 retrievals in desert which has been indicated by Bie et al. [34].

Conclusions
In this paper, to support the verification of bottom-up inventory of anthropogenic emission, an anthropogenic CO 2 emission estimation method using a machine learning technique is applied to the gap-filled ACOS XCO 2 dataset over the mainland of China derived from GOSAT observations. The annual emission signatures, indicated by dXCO 2 , is enhanced by removing the background XCO 2 from the 2010 to 2015 XCO 2 data. We then apply the annual averaged dXCO 2 from 2010 to 2014 to build an estimating model of anthropogenic emission using an artificial network approach. The model is verified by estimating results in 2015 and comparing with the ODIAC emissions. Lastly, we quantify the correlation between the annual dXCO 2 and the magnitude of anthropogenic emission. Our result indicate that the anthropogenic emission can be estimated at regional scale by the changing magnitude of XCO 2 especially for those regions with larger emissions. However, it has relatively higher uncertainty to grasp the CO 2 signals of the low or without anthropogenic emission areas and point emitting sources. The CO 2 uptake of biosphere and fluxes of wind field affect the estimation when using the annual dXCO 2 . The observation mode of GOSAT satellite in space and time and fast mixing of atmospheric CO 2 also affect the detection of point emitting sources.
Our study demonstrates that the XCO 2 derived from satellite observation can effectively provide a way to reveal the spatial patterns of underlying anthropogenic emissions. It is expected that the estimation of anthropogenic emission could be greatly improved by using more and more XCO 2 data from multi-satellite such as OCO-2, OCO-3, GOSAT-2, and TanSat in future. Moreover, we can combine the ancillary data related with CO 2 uptake and emission which can be obtained by satellite remote sensing observations at the same time, such as gross primary production (GPP), industrial heat source from VIIRS (Visible infrared Imaging Radiometer) Night fire product for point sources, Night light from Defense Meteorological Satellite Program/Operational Linescan System (DMSP/OLS), to constrain the estimating model developed in this study. This data-driven approach based on satellite-based observations can offer the possibility of rapid updates for anthropogenic CO 2 emissions, and provide a new way of investigating anthropogenic emissions to support the implement of regional reduction of carbon emissions.