Spatiotemporal Analysis of XCO 2 and Its Relationship to Urban and Green Areas of China’s Major Southern Cities from Remote Sensing and WRF-Chem Modeling Data from 2010 to 2019

: Monitoring CO 2 concentrations is believed to be an effective measure for assisting in the control of greenhouse gas emissions. Satellite measurements compensate for the sparse and uneven spatial distribution of ground observation stations, allowing for the collection of a wide range of CO 2 concentration data. However, satellite monitoring’s spatial coverage remains limited. This study ﬁlls the knowledge gaps of column-averaged dry-air mole fraction of CO 2 (XCO 2 ) products retrieved from the Greenhouse Gases Observing Satellite (GOSAT) and Orbiting Carbon Observatory Satellite (OCO-2) based on the normalized output of atmospheric chemical models, WRF-Chem, in Southern China during 2010–2019. Hefei (HF)/Total Carbon Column Observing Network (TCCON), Lulin (LLN)/World Data Centre for Greenhouse Gases (WDCGG) station observations were used to validate the results of void ﬁlling with an acceptable accuracy for spatiotemporal analysis ( R = 0.96, R 2 = 0.92, RMSE = 2.44 ppm). Compared to the IDW (inverse distance weighting) and Kriging (ordinary Kriging) interpolation methods, this method has a higher validation accuracy. In addition, spatiotemporal distributions of CO 2 , as well as the sensitivity of CO 2 concentration to the urban built-up areas and urban green space areas in China’s major southern cities during 2010–2019, are discussed. The approximate annual average concentrations have gradually increased from 388.56 to 414.72 ppm, with an annual growth rate of 6.73%, and the seasonal cycle presents a maximum in spring and a minimum in summer or autumn from 2010 to 2019. CO 2 concentrations have a strong positive correlation with the impervious area to city area ratio, while anomaly values of the impervious area to urban green area ratio occurred in individual cities. The experimental ﬁndings demonstrate the viability of the study hypothesis that combines remote sensing data with the WRF-Chem model to produce a local area dataset with high spatial resolution and an extracted urban unit from statistical data.


Introduction
Monitoring the spatiotemporal distribution of greenhouse gases aids in achieving carbon neutrality by measuring their concentrations in the atmosphere. CO 2 is the most abundant of the three major greenhouse gases (CO 2 , CH 4 , N 2 O, H 2 O, etc.) and contributes the most to the greenhouse effect. Compared to ground-based observations, satellitebased observations provide a larger spatial coverage. A number of satellites and sensors capable of monitoring CO 2 existed before the launch of GHG satellites, although they were not specifically designed to measure CO 2 concentrations; for example, IMG on ADEOS   [1], AIRS on AQUA, SCIAMACHY on ENVISAT, and IASI on METOP-A [2]. Then, countries around the world successfully launched global greenhouse gas monitoring satellites to provide data for carbon monitoring and analysis   [2]. For example, the first global greenhouse gas monitoring satellite, GOSAT (Greenhouse Gases Observing Satellite) was launched by Japan in 2009; OCO-2 (Orbiting Carbon Observatory) was successfully launched by the United States in 2014; OCO-3 was launched in May 2019, using the same instrument as used in OCO-2, and the instrument was mounted to the International Space Station (ISS); China launched the third greenhouse gas monitoring satellite (TanSat) in 2016; and Japan launched GOSAT-2 in 2018. With the improvement of sensor technology and advancements in retrieval algorithms, the accuracy of atmospheric CO 2 concentrations retrieved from satellites can reach around 1 ppm Mustafa et al., 2020; D.   [3][4][5].
A substantial number of studies have mapped or analyzed spatiotemporal CO 2 concentrations based on retrieval data from satellites. For example, Hammerling et al. used GOSAT inversion data to estimate the column-averaged dry-air mole fraction of CO 2 (XCO 2 ) in global land areas   [6]. Yang et al. performed the atmospheric CO 2 concentration inversion algorithm (IAPCAS) based on satellite observations on the Chinese Carbon Satellite (TanSat) data, and generated the first global carbon dioxide maps produced from TanSat measurements (D.   [7]. The analysis revealed that the CO 2 concentration in the northern hemisphere was significantly higher than that in the southern hemisphere with a seasonal trend of decline. Sheng et al. mapped a global spatiotemporally continuous XCO 2 dataset (Mapping-XCO 2 ) using the XCO 2 retrievals from April 2009 to December 2020 from GOSAT and OCO-2   [8]. Constructing carbon dioxide map datasets can help us better understand the spatiotemporal variations of CO 2 and the driving factors of the variations, allowing us to support actions for emissions reduction and control. In addition, a number of scholars have studied the analysis of the spatial and temporal distribution of CO 2 and its influencing factors, such as regional vegetation and land cover, climate types, temperature, precipitation, wind speed and direction, anthropogenic emissions and energy consumption, the economy situation, and so on ( [9][10][11][12][13][14][15]. Among these, land cover conversion is a comprehensive factor reflecting human activities. However, satellite CO 2 retrieval datasets still have some limitations of retrievability, reducing the spatial continuity of historical CO 2 datasets. On the one hand, clouds, aerosol scattering, temperature, and other atmospheric scattering factors may affect the validity of retrievals; on the other hand, the coverage of the observations is insufficient, resulting in the uncertainty of data loss area. Yoshida et al. points out that only about 3% of the earth's surface was covered by valid observations for GOSAT retrieval analysis from July 2009 to January 2010 (Yoshida et al., 2011) [16]. In terms of data distribution, OCO-2 and GOSAT satellite data show a banded distribution on the global scale, which can be sufficiently to analyze the distribution of CO 2 on the global scale. However, for a local scale, the data for valid observation points we can capture is very limited. The best way to obtain a high quality of CO 2 maps is to perform void filling. In the case of sparse data coverage, the traditional interpolation methods are not suitable, for the general interpolation methods have a high dependence on the quantity of input data. Therefore, to fill missing data gaps requires a more efficient approach. Yue et al. verified that HASM (high-accuracy surface modeling) is an alternative approach to filling voids on XCO 2 surfaces from satellites, better than the approach of inverse distance weighting (IDW) and ordinary Kriging (OK) (Yue et al., 2015) [17]. With the increasing application of atmospheric chemical models, such as WRF-Chem and GEOS-Chem, these studies provide an approach to combine atmospheric models with remote sensing observations to fill the missing data, indicating that accuracy conducted by this method is superior than other geostatistical methods. Liu et al. employed HASM, using the output of CO 2 concentration from WRF-Chem as the driving field and the retrieval of XCO 2 data from GOSAT as the precision control condition, to obtain a high-precision XCO 2 field   [18]. Zhang [20].
Over the past few decades, the integration of atmospheric models with satellite products has attracted widespread interest from the scientific community in a variety of fields (Giannaros et al., 2022;Rizza et al., 2023) [21,22]. Carbon monitoring satellites can help track carbon emissions on a global scale, but mapping carbon concentrations in a local area is still to be explored, which requires a higher-precision, more-precise scale analysis. The combination of remote sensing data and WRF-Chem models through appropriate data assimilation methods helps to overcome the spatial and temporal limitations in existing studies. A single city's historical concentration data sheet would be more straightforward to comprehend than a map dataset and more helpful for making mitigation decisions, because a city is a basic administrative territorial entity. There are currently quite few studies on spatial and temporal analysis of XCO 2 at the city scale. Therefore, this paper chose the major cities in Southern China to perform a case study.
In this paper, a spatial and temporal analysis of XCO 2 and one of its main influencing factors in China's major southern cities was carried out from 2010 to 2019. For better geostatistical analysis results, we fill the missing data using both satellite data and the output of atmospheric model simulations. Therefore, this study aims to complete the void filling with a more general method and conduct the historical dataset of XCO 2 in Southern China from 2010 to 2019-a dataset to support a spatiotemporal analysis for twenty-seven major citiesand a discussion on its influencing factor about land conversion. In detail, GOSAT and OCO-2 satellite observations of the atmospheric column concentration of carbon dioxide were used as basic data, and the CO 2 concentrations output from WRF-chem as the driving factor to build a Random Forest algorithm.

Study Area
The study area include twenty-seven major cities located in Southern China (16 • N-32 • N, 106 • E-122 • E). These cities include municipalities directly under the central government, provincial capitals, special economic zones, and other typical representative cities (Table 1). Their distribution is even, with developed cities along the east coast and less-developed cities in the southwest ( Figure 1).

XCO2 Data from Satellite Observation
This study is based on GOSAT and OCO-2 satellite observations of XCO2 data, including XCO2 datasets released by the GOSAT project and GES DISC/NASA (OCO-2). In detail, we chose the GOSAT FTS SWIR L2 data observations (Science Data Operations System Jet Propulsion Laboratory, 2019) [23] from January 2010 to December 2019 and the OCO-2 XCO2 observations (Science Data Operations System, 2020) [24] from September 2014 to December 2019, and filtered them using the standard criteria described with the product specifications. Combining these two satellite monitors can improve coverage of valid data. The website containing OCO-2 data is https://disc.gsfc.nasa.gov/datasets/OCO2_L2_Lite_FP_10r (accessed on 1 January 2023) and the website containing GO-SAT data is https://disc.gsfc.nasa.gov/datasets/ACOS_L2_Lite_FP 9r (accessed on 1 January 2023).

Input Data for WRF-Chem
The CO2 concentrations output from WRF-chem was used as driving factor for this study's void filling. The input data for WRF-Chem in this paper mainly consist of weather conditions, anthropogenic emissions, initial fields, and boundary conditions of CO2.

XCO 2 Data from Satellite Observation
This study is based on GOSAT and OCO-2 satellite observations of XCO 2 data, including XCO 2 datasets released by the GOSAT project and GES DISC/NASA (OCO-2). In detail, we chose the GOSAT FTS SWIR L2 data observations (Science Data Operations System Jet Propulsion Laboratory, 2019) [23] from January 2010 to December 2019 and the OCO-2 XCO 2 observations (Science Data Operations System, 2020) [24] from September 2014 to December 2019, and filtered them using the standard criteria described with the product specifications. Combining these two satellite monitors can improve coverage of valid data. The website containing OCO-2 data is https://disc.gsfc.nasa.gov/datasets/OCO2 _L2_Lite_FP_10r (accessed on 1 January 2023) and the website containing GOSAT data is https://disc.gsfc.nasa.gov/datasets/ACOS_L2_Lite_FP9r (accessed on 1 January 2023).

Input Data for WRF-Chem
The CO 2 concentrations output from WRF-chem was used as driving factor for this study's void filling. The input data for WRF-Chem in this paper mainly consist of weather conditions, anthropogenic emissions, initial fields, and boundary conditions of CO 2 .
The meteorological input data is FNL (Final Operational Global Analysis) data from the National Center for Climate Prediction (NCEP), which is a global grid data with a spatial resolution of 1 • × 1 • and temporal resolution of 6 h (National Centers For Environmental Prediction/National Weather Service/NOAA/U.S. Department Of Commerce, 2000) [25]. EDGAR (Emission Database for Global Atmospheric Research) and FFDAS (The Fossil Fuel Data Assimilation System) are used to process anthropogenic emissions (Crippa et al., 2019) [26]. EDGAR provides emission inventories of a variety of global atmospheric pollutants from 1990 to 2018; FFDAS is a global product with a spatial resolution of 0.1 • × 0.1 • , providing anthropogenic flux changes per hour from 2009 to 2015. The initial fields and boundary conditions of CO 2 is from the Carbon Tracker Dataset, a near-realtime gridded product provided by the NOAA Earth System Research Laboratory (ESRL). Carbon Tracker contains three-dimensional CO 2 concentration (ppm) with three-hour time resolution and a spatial resolution of 3 • × 2 • (Jacobson et al., 2020) [27]. In this paper, an ARIMA time series (Autoregressive Integrated Moving Average model) prediction model is used to process all of the missing input data, and details about this method are in Section 2.3.1. The website containing FNL data is https://rda.ucar.edu/datasets/ds083.2/# access (accessed on 1 January 2023); the website containing EDGAR data is https://edgar. jrc.ec.europa.eu/dataset_ap50 (accessed on 1 January 2023); the website containing FFDAS data is https://ffdas.rc.nau.edu/Publications.html (accessed on 1 January 2023); and the website containing Carbon Tracker data is https://gml.noaa.gov/ccgg/carbontracker/CT2 019B (accessed on 1 January 2023).

XCO 2 Data of Ground Observation
It is essential to validate the XCO 2 fields for understanding the quality of datasets from different satellites. The Total Carbon Column Observing Network (TCCON) and the World Data Centre for Greenhouse Gases (WDCGG) have provided an important validation resource for assessing the accuracy of XCO 2 surfaces from the satellites and have been applied to a series of validation studies (Buchwitz et [30] is https://tccondata.org (accessed on 1 January 2023) and the website containing WDCGG data (Re3data.Org, 2012) [31] is https://gaw.kishou.go.jp (accessed on 1 January 2023).

Landcover Dataset (CLCD)
The distribution of CO 2 concentration is closely related to how people use the land; therefore, landcover data is used to investigate the changing cause of CO 2 concentration. CLCD (annual China Land Cover Dataset) is an annual land cover dataset with 30 m resolution and its dynamics in China from 1990 to 2020 produced by Wuhan University, with nine classifications of Cropland, Forest, Shrub, Grassland, Water, Snow/Ice, Barren, Impervious, and Wetland (J. Yang and Huang, 2021) [32]. The website containing the data is https://doi.org/10.5281/zenodo.5816591 (accessed on 1 January 2023). The accuracy of the CLCD was first assessed by an independent sample of visual interpretations and, overall, the accuracy of the CLCD was found to be 76.45% < OA < 82.51%, with an average OA of 79.30 ± 1.99% (J. Yang and Huang, 2021) [32].

ARIMA Time Series
The WRF-Chem input data from EDGAR, FFDAS, and Carbon Tracker do not cover the entire research period (2010-2019), so missing data must be processed. They follow the seasonal law, so the seasonal time series model is preferable. The ARIMA model is a wellknown and widely used time series model that predicts future changes based on past and present values by describing the autocorrelation of time (t)-dependent random variables (Contreras et al., 2003) [33]. When the original sequence contains seasonal variation trends, the Arima product seasonal model is used. As a result, the ARIMA time series model is used to predict the WRF-Chem model's missing input data.
ARIMA (p, d, q) is obtained by combining the autoregressive model (AR), the moving average model (MA) and the differential method. Autoregressive models (AR) describe the relationship between current and historical values, using the variable's own historical time data to make predictions about itself. Moving average models (MA) are linear combinations that depend only on historical white noise, where the current values of the time series are not related to the historical values. Combining AR (p) with MA (q) yields a general autoregressive moving average model, ARMA (p, q). ARIMA can obtain the original series by inversion on the basis of the ARMA model, which can be widely used for a non-stationary time series. The specific parameters used by this method are set to autoregressive coefficient (AR_Order) = 3, moving average coefficient (MA_Order) = 2, difference order (d) = 2. ables (Contreras et al., 2003) [33]. When the original sequence contains seasonal variation trends, the Arima product seasonal model is used. As a result, the ARIMA time series model is used to predict the WRF-Chem model's missing input data. ARIMA (p, d, q) is obtained by combining the autoregressive model (AR), the moving average model (MA) and the differential method. Autoregressive models (AR) describe the relationship between current and historical values, using the variable's own historical time data to make predictions about itself. Moving average models (MA) are linear combinations that depend only on historical white noise, where the current values of the time series are not related to the historical values. Combining AR (p) with MA (q) yields a general autoregressive moving average model, ARMA (p, q). ARIMA can obtain the original series by inversion on the basis of the ARMA model, which can be widely used for a nonstationary time series. The specific parameters used by this method are set to autoregressive coefficient (AR_Order) = 3, moving average coefficient (MA_Order) = 2, difference order (d) = 2.

Void Filling Workflow
Based on GOSAT and OCO-2 satellite observations and the output of WRF-Chem model, a simulation dataset of XCO2 covering southern China (16° N-32° N, 106° E-122° E) from January 2010 to December 2019 was proposed, using the WRF-Chem model results (WRF-Chem XCO2) as independent variables and the satellite observations as dependent variables for the Random Forest algorithm. The dataset, components, and workflow involved in the operation of void filling are shown in Figure 2. WRF-Chem is the most widely used atmospheric chemical model, and its parameters are appropriate for local simulation. The output of the WRF-Chem model demonstrates the variation in gas concentration affected by terrain, weather, and emission source, but the estimation of gas concentration does not agree with the actual concentration; therefore, it is necessary to normalize their gas estimates into 0-1 and combine the satellite observations of XCO2 with a Random Forest interpolation method to complete the interpolation process. Then, the final XCO2 simulation dataset can be obtained. WRF-Chem is the most widely used atmospheric chemical model, and its parameters are appropriate for local simulation. The output of the WRF-Chem model demonstrates the variation in gas concentration affected by terrain, weather, and emission source, but the estimation of gas concentration does not agree with the actual concentration; therefore, it is necessary to normalize their gas estimates into 0-1 and combine the satellite observations of XCO 2 with a Random Forest interpolation method to complete the interpolation process. Then, the final XCO 2 simulation dataset can be obtained.
WRF-Chem version 3.9.1.1 was selected as the CO 2 transport model (Grell et al., 2005) [34]. The model domain is centered at 25 • N, 113.5 • E, with a 50 km × 50 km horizontal resolution and an hourly output on the Lambert projection. In the vertical directions, the model has 50 terrain-following levels, from the surface to the upper boundary located at 50 hPa. The chosen physical parameterization schemes are listed in Table 2. Chem_opt = 101, a RADM2 Chemistry option using the KPP library in WRF-Chem, was chosen as the chemical configuration of the model. The output of this option contains a CO 2 simulation. The FNL data was used as meteorological initial and boundary conditions of WRF-Chem. The initial and boundary conditions of CO 2 concentrations of WRF-Chem were interpolated from CO 2 mole fraction products of Carbon Tracker, version 2019 (CT2019). The anthropogenic emissions is from EDGAR and FFDAS datasets; the time resolution and coverage period of the two datasets are different, so both are considered, and the missing year data are processed by time series method. WRF-Chem was performed 120 times, starting at the first day of each month and ending at the last day, and then the average monthly total number of CO 2 columns in each outcome document were normalized. The results needed from WRF-Chem are spatial differences for CO 2 at the same time, so we normalize the daily average CO 2 into 0-1 in different days to eliminate the effect of differences in simulation results between different time periods. In this study, the Random Forest spatial interpolation method (Sekulić et al., 2020) [35] is used to complete the assimilation simulation. Random Forest spatial interpolation improves the spatial interpolation results by adding the spatial horizontal and vertical coordinates and target variables of a number of neighboring sample points as separate attributes to the Random Forest. The essence is to transform the missing spatial data into a non-linear regression problem based on a Random Forest prediction model. The Random Forest model is quite representative of machine learning algorithm, building a prediction model through variable sampling, forming multiple decision trees, then classifying the objects in turn, and finally summarizing the classification information of each decision tree (Biau and Fr, 2012) [36]. The mode category in all prediction categories is the object category predicted by Random Forest, which greatly improves prediction accuracy. The normalized WRF-Chem results were presented as independent variables, CO 2 concentrations as dependent variables, valid satellite observation data as train datasets, and unknown regions as test datasets (the unknown test data will be generated after the Random Forest prediction) when running the Random Forest prediction model, completing the entire void filling process.

Geo-Statistical Analysis Methods
The monthly and annual average CO 2 concentrations of twenty-seven major cities in Southern China were calculated to carry out the geo-statistical analysis. Considering the periodic variation of CO 2 concentration in the study area, a curve fitting model is introduced to fit the curve of XCO 2 and Equation (1) is suitable to fit the continuous-growing CO 2 concentrations.
where y is XCO 2 in ppm; t is the time since the start date in years; A 1~A4 are the sine and cosine wave coefficients that determine the seasonal cycle of XCO 2 ; A 5 is the background concentration for the start date; and A 6 is the linear change of XCO 2 over time.

Correlation Analysis
In order to better reflect the degree of correlation between carbon dioxide and urban land use, this paper selects the correlation analysis method to calculate the CO 2 concentration and urban expansion rate, respectively, the correlation coefficient between CO 2 concentration, and the ratio of urban built-up area and urban green space area. Here, we use two ratios to perform the calculation: the impervious area to city area ratio and the impervious area to urban green area ratio.
The change of the proportion of built-up land area in a city can represent the urbanization rate of the city.
Urbanization rate = I impervious I city area (2) where I impervious and I city area stand for area of impervious and area of the city. In addition to urban built-up areas, urban green space may also have an impact on atmospheric CO 2 , so the additional consideration of urban green area is driven by this factor. To calculate the correlation coefficient between the analysis factors and the annual average carbon dioxide concentration value, the ratio of impervious area to urban green area of 27 cities is used as the analysis index. The calculation formula is as follows: where I impervious and I urban green space stand for area of impervious and area sum of forest, shrubs, grassland, and wetland, respectively.
The correlation coefficient (R) refers to the Pearson correlation coefficient, which is used to measure the degree of linear correlation of two paired continuous variables.
where cov(X,Y) is the covariance of X and Y; var(X) and var(Y) represent the variance of X and Y, respectively. The coefficient of determination (R 2 ) is a commonly used measure of precision in linear models and ANOVA, representing the percentage of variance in the model where the dependent variable can be explained by the independent variable; that is, R 2 shows how well the data fit the regression model, and the closer R 2 is to 1, the better the fit. It is calculated as the ratio of the regression sum of squares to the total sum of squares.
The root mean square error (RMSE) represents the degree of difference between the predicted and observed values in the model, and a larger value indicates a larger difference between the predicted and observed values, which is obtained by calculating the average model residuals as follows: where n is the number of samples,ŷ i refers to the predicted value of the observation in the dataset, y i refers to the measured value of the ith observation in the dataset, and y refers to the average of the measured values.

Ground Validation of Simulation Results
In order to evaluate the accuracy of XCO 2 , the simulation results of XCO 2 are compared with ground observation data. The ground observation data are provided by TCCON and WDCGG, including HF and LLN (Figure 1).
The ground validation results are shown in Figure 3. Three indices are used for measuring precision, including correlation coefficient (R), coefficient of determination (R 2 ), and root mean square error (RMSE), and the calculation of these statistical indicators is described in Section 2.
where n is the number of samples, refers to the predicted value of the observation in the dataset, refers to the measured value of the ith observation in the dataset, and refers to the average of the measured values.

Ground Validation of Simulation Results
In order to evaluate the accuracy of XCO2, the simulation results of XCO2 are compared with ground observation data. The ground observation data are provided by TCCON and WDCGG, including HF and LLN (Figure 1).
The ground validation results are shown in Figure 3. Three indices are used for measuring precision, including correlation coefficient (R), coefficient of determination (R 2 ), and root mean square error (RMSE), and the calculation of these statistical indicators is described in Section 2.3.5. The general validation result is R = 0.96, R 2 = 0.92, and RMSE = 2.44 ppm. The frequency of the difference values (∆ ) are basically a normal distribution with the majority of values being small. In this article, simulation results were compared with two interpolation methods, inverse distance weighting (IDW) and ordinary Kriging, to demonstrate the superiority of the void filling method (Figure 4). The general validation result of IDW is R 2 = 0.91 and RMSE = 2.61 ppm, and the general validation result of Kriging is R 2 = 0.89 and RMSE = 2.94 ppm. Each site's accuracy is less precise than the results obtained through simulation. Given these validation results, we assume the void filling based on the WRF-Chem model is more reliable. In this article, simulation results were compared with two interpolation methods, inverse distance weighting (IDW) and ordinary Kriging, to demonstrate the superiority of the void filling method (Figure 4). The general validation result of IDW is R 2 = 0.91 and RMSE = 2.61 ppm, and the general validation result of Kriging is R 2 = 0.89 and RMSE = 2.94 ppm. Each site's accuracy is less precise than the results obtained through simulation. Given these validation results, we assume the void filling based on the WRF-Chem model is more reliable.

Spatial and Temporal Trend of XCO2 in 2010-2019
The annual average XCO2 map of the simulation data is shown in Figure 5, and the annual average CO2 concentrations of the twenty-seven cities are calculated in Table 3. It is discovered that the CO2 distribution has a characteristic of spatial aggregation. Furthermore, CO2 may accumulate in the atmosphere over time, so the area covered by high con-

Spatial and Temporal Trend of XCO 2 in 2010-2019
The annual average XCO 2 map of the simulation data is shown in Figure 5, and the annual average CO 2 concentrations of the twenty-seven cities are calculated in Table 3. It is discovered that the CO 2 distribution has a characteristic of spatial aggregation. Furthermore, CO 2 may accumulate in the atmosphere over time, so the area covered by high concentrations expands. An analysis of the spatial increment of XCO 2 ( Figure 6) showed that the highest increment occurred at locations where average concentrations increased by nearly 24 parts per million over a 10-year period. With an exception in 2016, the average annual concentration increase was in the range of 1-4 ppm, while the 2015-2016 increase was 2.62-4.78 ppm, higher than the other time periods.     Based on the simulation dataset, monthly average CO2 concentration of the twentyseven cities was calculated (Figure 7). They have the same spatial and temporal trend, an increasing trend with seasonal fluctuations. The approximate annual average concentrations have gradually increased from 390 to 410 ppm. Meanwhile, the statistical results show a seasonal oscillation with lower concentrations in summer and autumn and higher concentrations in winter and spring, with the low trough typically occurring in August and September and the high peak typically occurring in April and May. Based on the simulation dataset, monthly average CO 2 concentration of the twentyseven cities was calculated (Figure 7). They have the same spatial and temporal trend, an increasing trend with seasonal fluctuations. The approximate annual average concentrations have gradually increased from 390 to 410 ppm. Meanwhile, the statistical results show a seasonal oscillation with lower concentrations in summer and autumn and higher concentrations in winter and spring, with the low trough typically occurring in August and September and the high peak typically occurring in April and May. Based on the simulation dataset, monthly average CO2 concentration of the twent seven cities was calculated (Figure 7). They have the same spatial and temporal trend, a increasing trend with seasonal fluctuations. The approximate annual average concentr tions have gradually increased from 390 to 410 ppm. Meanwhile, the statistical resu show a seasonal oscillation with lower concentrations in summer and autumn and high concentrations in winter and spring, with the low trough typically occurring in Augu and September and the high peak typically occurring in April and May. This temporal pattern of CO2 had been confirmed by Keeling (Keeling et al.,197 [40]; the fitting coefficients for each city are shown in the Table 4, and all the fitting con dences are greater than 95%. The general fitting result of the monthly average concentr tion is  This temporal pattern of CO 2 had been confirmed by Keeling (Keeling et al., 1976) [40]; the fitting coefficients for each city are shown in the Table 4, and all the fitting confidences are greater than 95%. The general fitting result of the monthly average concentration is A 1 = 2.09, A 2 = 0.51, A 3 = −0.39, A 4 = 0.35, A 5 = 388.56, A 6 = 2.52, displayed in the red line in Figure 7, representing the approximate CO 2 concentration in Southern China over the last ten years and the temperate trend of CO 2 concentration.  The result shows that initial high concentrations (A 5 ) were concentrated in large cities along the eastern seaboard.
Although the trend of CO 2 growth is the same, there are slight differences in the growth rate and fluctuation of CO 2 among cities (Figure 8). The CO 2 growth rate of capital cities in the same province tends to be higher than that of other cities. Among the 27 cities, Fuzhou had the highest growth rate (K = 2.567) and Kunming the lowest (K = 2.399). In the study area, the growth rate of the central and eastern region (Shanghai, Nanjing, etc.) is higher than that of the southwest region (Kunming, Qujing, Guiyang). This is basically in line with the level of urban economic development in China. Cities with lower latitudes (Ganzhou, Hengyang, Nanning, Guilin, Guangzhou, Shenzhen, etc.) have less seasonal fluctuation, for the seasonal amplitude grows with higher A 1−4 . The annual temperature fluctuation is smaller in the low-latitude region, which indicates that the seasonal variation of CO 2 is mainly caused by temperature.

The Sensitivity of XCO2 to Urban Land Use
In order to develop the application analysis of the dataset, this section introduced one of the important factors affecting the increase of CO2 concentration to conduct a brief analysis. In this section, the response of CO2 trends to land-use change is analyzed, and the direct correlation between CO2 concentration and land-use change is discussed. Car- Figure 8. CO 2 growth rates in major southern cities.

The Sensitivity of XCO 2 to Urban Land Use
In order to develop the application analysis of the dataset, this section introduced one of the important factors affecting the increase of CO 2 concentration to conduct a brief analysis. In this section, the response of CO 2 trends to land-use change is analyzed, and the direct correlation between CO 2 concentration and land-use change is discussed. Carbon dioxide emissions come from a variety of sources. It can be produced by animal and plant respiration, urban electricity and heating, manufacturing and construction production, transportation, and other sectors, but these factors are ultimately reflected in the use of urban land. Therefore, in order to better understand the causes of aforementioned CO 2 changes, the landcover dataset, CLCD, is chosen as the analysis data in this section to examine the relationship between carbon dioxide concentration changes and changes in urban land use over time.
The correlation coefficient between CO 2 concentration and the ratio of urban built-up area and urban green space area shows in Figure 9. The detailed charts of the data for years are in the Appendix A. Overall, the correlation coefficient between XCO 2 and the urbanization rate of all 27 cities was greater than 0.95, showing a strong correlation. Among them, Kunming has the highest level of correlation, while Wuhu has the lowest.
Geographies 2023, 3, FOR PEER REVIEW 15 test this hypothesis, forest, shrubs, grassland, and wetland in CLCD were selected to represent urban green land areas. The impervious field represents the built-up area of a city.
To calculate the correlation coefficient between the analysis factors and the annual average carbon dioxide concentration value, the ratio of impervious area to urban green area of 27 cities is used as the ratio index. In theory, the increase of urban green space will slow down the increase of CO2; however, it did not turn out that way. The above calculation results show that the ratio index of most cities is increasing year by year, and the correlation coefficient with XCO2 is greater than 0.9, except for Shanghai, Wuhan, and Chengdu (RShanghai = 0.6830, RWuhan = −0.1668, RChengdu = 0.5582). From the result of 3.2, the growth rate of CO2 in these cities did not slow down with the increase of urban green space area. As environmental issues become more prominent, some cities also pay attention to the development of urban greening while in the process of urban expansion. For example, Wuhan's green space area grows year after year, so its correlation coefficient R is negatively correlated.

Ground Validation and Simulation
In general, the satellite observation data and the ground station data have good consistency. The monitored values from HF/TCCON ground-based remote sensing stations are closer to the total column values of the troposphere than those monitored by satellite observations. Ground monitoring stations LLN/WDCGG measure surface concentration. As a result, RMSE for HF has higher reference value among the validated parameters, whereas LLN has worse accuracy. Studies that compared the observation data between satellite-based and ground-based methods found some stations to have discrepancies  The primary source of urban CO 2 emissions is urban built-up area, while the primary carbon sink is urban green space area. With the development of urbanization, the built-up area of cities continues to expand, while green space may be compressed, resulting in the continuous increase of atmospheric carbon dioxide that is difficult to be absorbed. To test this hypothesis, forest, shrubs, grassland, and wetland in CLCD were selected to represent urban green land areas. The impervious field represents the built-up area of a city. To calculate the correlation coefficient between the analysis factors and the annual average carbon dioxide concentration value, the ratio of impervious area to urban green area of 27 cities is used as the ratio index.
In theory, the increase of urban green space will slow down the increase of CO 2 ; however, it did not turn out that way. The above calculation results show that the ratio index of most cities is increasing year by year, and the correlation coefficient with XCO 2 is greater than 0.9, except for Shanghai, Wuhan, and Chengdu (R Shanghai = 0.6830, R Wuhan = −0.1668, R Chengdu = 0.5582). From the result of 3.2, the growth rate of CO 2 in these cities did not slow down with the increase of urban green space area. As environmental issues become more prominent, some cities also pay attention to the development of urban greening while in the process of urban expansion. For example, Wuhan's green space area grows year after year, so its correlation coefficient R is negatively correlated.

Ground Validation and Simulation
In general, the satellite observation data and the ground station data have good consistency. The monitored values from HF/TCCON ground-based remote sensing stations are closer to the total column values of the troposphere than those monitored by satellite observations. Ground monitoring stations LLN/WDCGG measure surface concentration. As a result, RMSE for HF has higher reference value among the validated parameters, whereas LLN has worse accuracy. Studies that compared the observation data between satellite-based and ground-based methods found some stations to have discrepancies (Boesch et al., 2011;Butz et al., 2011;Crisp et al., 2012;Wennberg et al., 2011) [41][42][43][44].
Although affected by aerosol scattering, temperature, and other atmospheric scattering factors, studies have indicated that when controlling the XCO 2 accuracy within 4 ppm, global continuous observation can fill in the deficiency of ground observation (Hungershoefer et al., 2010; Liu et al., 2013) [45,46]. More than 90% of simulated data have an error of less than 4 ppm. Compared to the IDW and Kriging interpolation methods, this void filling method has a higher validation accuracy.

Trends and Factors of XCO 2 in 2010-2019
The spatial resolution for the simulation data is 0.25 • × 0.25 • while other datasets of the same type are usually 1 • × 1 • , so it has a more refined spatial scale than other studies   [8]. Rising CO 2 concentrations in the atmosphere could have a catastrophic global impact on climate change. The general growth trend of the approximate annual average (390-410 ppm) is roughly consistent with the global statistics reported by IPCC (IPCC, 2021) [47]. According to the time series fitting model, the evaluated annual average concentrations of XCO 2 we investigated have gradually increased from 388.56 to 414.72 ppm, with a growth rate of 6.73% and average annual growth rate 2.616 ppm per year. The World Meteorological Organization's greenhouse gas bulletin estimates that the average global level of carbon dioxide in 2019 was between 410.5 and 410.7 ppm (Barrie and Braathen, n.d.) [48]. Carbon dioxide concentrations in Southern China are higher than the global average. The background concentration in 2010 was around 388.56 ppm; if it continues at this rate, it will have surpassed 515 ppm in 2060. In general, CO 2 levels in Southern China are in line with global trends but rising more quickly. Therefore, reducing carbon emissions is a critical task.
The seasonal cycle presents a maximum in spring and a minimum in summer or autumn. The seasonal fluctuations of XCO 2 are primarily affected by biosphere activities and anthropogenic emissions. In winter, vegetation activity decreases, as does CO 2 absorption capacity. At this time, the demand for urban heating increases, and a large amount of carbon-containing fossil fuels are burned, causing XCO 2 to rise in the winter. In spring, vegetation activity begins to recover. Since photosynthesis has not reached its maximum level, and soil decomposition and vegetation respiration are aided by temperature rise, photosynthesis is still weaker than vegetation and soil respiration; therefore, XCO 2 continues to rise until it reaches its peak in spring. Summertime XCO 2 levels are the lowest of the year because vegetation growth is vigorous and CO 2 absorption by photosynthesis is high, as evidenced by the seasonal trend.
China's urbanization construction has achieved remarkable results in terms of the economy, population, resource costs, and so on over the last ten years. Chinese cities' urbanization by country is divided into three stages: the stable stage, the rapid advancing stage, and the start-up stage (WEI et al., 2021) [49]. Among the 27 cities analyzed in this paper, the four cities with the highest correlation with the impervious area to city area ratio are cities at the rapid advancing stage (Kunming City, Changsha City, Hefei City, Hengyang City), while the three great cities at the stable stage (Shanghai, Shenzhen, Guangzhou) have lower degrees of correlation. In general, the strongest correlation between CO 2 growth rates and the urbanization rates appears in regional central cities at the rapid advancing stage such as Changsha, Wuhan, Suzhou, and Kunming. Cities of this type are at a stage of rapid development and attached more importance to the development of secondary sectors of the economy in 2010-2019; thus, the increase in CO 2 is even greater (Liu et al., 2016) [50]. Cities with a medium degree of correlation between CO 2 growth rates and the urbanization rates are cities at the stable stage, such as Shanghai, Shenzhen, and Guangzhou. These cities are at the industrial transformation stage because they are ahead of other cities in the urbanization process. They are also more aware of the importance of environmental protection. Cities with the lowest correlation between CO 2 growth rates and the urbanization rates appears in local central cities, such as Wuhu. This type of city is not a provincial city in its own province, and its level of development is lower among the major southern cities.
The release of CO 2 into the atmosphere is performed through anthropogenic activities such as combustion of fossil fuels and land-use change. According to the mechanism of carbon dioxide absorption and emission, it is generally considered that the concentration of CO 2 will be lower in areas with high vegetation cover. The main causes of the rise in CO 2 concentration in recent years have been the growth of cities and the degradation of urban green space. However, in places that are constructing more urban green space, carbon dioxide levels have not dropped. Along with improving the capacity of carbon sinks, the decrease of emission sources should receive more emphasis in the process of achieving carbon neutrality. This demonstrates that the increasing of carbon sinks in a single city is insufficient to have a significant impact on the city's CO 2 concentration, for the atmospheric environment is interconnected. As a result, it is unfavorable to separate cities as a unit to reduce emissions or perform carbon transfer in the process of carbon neutrality. To reduce emissions, an effective way must be integrated across the region's environment, as well as joint surrounding-city common macro-policy guidance.

Deficiency and Prospect
After a large number of experiments, the results of this study are consistent with global carbon statistics reports (IEA, 2019; IPCC, 2021) [47,51], indicating that the changing trend of CO 2 illustrated in this paper is basically reliable. However, some uncertainties remains, such as a small amount of cloud cover, aerosol scattering, temperature, and other factors affecting the quality of column CO 2 retrieval (Hungershoefer et al., 2010) [45]. As a result, we filtered the data in accordance with the data product user guide to reduce the impact of this uncertainty on the results (Osterman, 2020) [52]. The choice of model variables may also bring some uncertainties. Limited by some conditional factors, there are some deficiencies in the parameter setting of WRF-Chem in this paper-only anthropogenic emissions were considered for the input data for WRF-Chem, ignoring other emissions such as biogenic or wildfire emissions. Human activities release more carbon dioxide into the atmosphere than natural processes can remove (Dunn et al., 2020) [53]. Therefore, the simulation results of this paper have a reasonable analysis. However, theoretically, adding the biogenic carbon fluxes by using VPRM (Gourdji et al., 2022) [54] coupled with the WRF-Chem model, or with other terrestrial biosphere models, can improve the accuracy of the simulation. Therefore, the main target of future research is to correct the data uncertainty of this study due to the lack of biological emission sources. In addition, the AI (artificial intelligence algorithm) can be considered instead of the WRF-Chem model in this study. In recent years, with the rapid development of AI technology, atmospheric science itself carries the property of big data, and the research of AI technology extending to the field of atmospheric science has increased significantly. For example, Google released the MetNet neural network model to predict the weather, which can predict the precipitation of the United States by inputting radar and satellite information, and the calculation time is only a few seconds (Sønderby et al., 2020) [55]; in addition, NVIDIA developed a Fourier-based neural network prediction model FourCastNet to achieve a 10-day, 0.25 • high-resolution weather forecast, and the test set results show that the model has strong forecasting capability for wind speed and precipitation (Kurth et al., 2022) [56]. In terms of speed, it is 45,000 times faster than traditional numerical models for weather forecasting. Therefore, the use of AI algorithms instead of the WRF-Chem atmospheric chemistry model simulation in this study, which can achieve both faster algorithmic simulations and solve various complex problems that are difficult to give deterministic solutions directly by mathematical and physical models, can be considered as a next step.

Conclusions
Monitoring greenhouse gas concentrations has become an important part of mitigating global warming. If satellite observations are fully utilized, an accurate spatial and temporal dataset of CO 2 emission concentrations can be obtained while filling in the missing data using the output of atmospheric model simulation. GOSAT and OCO-2 satellite observations of the atmospheric column concentration of carbon dioxide were used as basic data, and the CO 2 concentrations output from WRF-chem as the driving factor to build a Random Forest algorithm; as a result we obtained a monthly 0.25 • mapping dataset of XCO 2 in Southern China from 2010-2019. Then, for each city, the monthly average CO 2 concentration to conclude their spatial and temporal trend was conducted. To learn more about the causes in the study area and study period, the sensitivity of carbon dioxide concentration to the urban built-up area and urban green space area was examined year by year. The main conclusions of this paper are as follows: (a) The validation result is R = 0.96, R 2 = 0.92, RMSE = 2.44 ppm. Despite this, more than 90% of the validation data have an error of less than 4 ppm, which is acceptable for XCO 2 spatial and temporal analysis. Compared to the IDW and Kriging interpolation methods, this void filling method has a higher validation accuracy.
(b) The approximate annual average concentrations of XCO 2 we investigated have gradually increased from 388.5 to 414.72 ppm, with a growth rate of 6.7%, and the seasonal cycle presents a maximum in spring and a minimum in summer or autumn.
(c) There is strong correlation between the increasing trend of CO 2 and the expansion of urban land, while in the cities where the area of urban green space increased do not have slower CO 2 growth rates.